Photo by Dan-Cristian Pădureț on Unsplash
I think I committed a grave mistake of not following Computer Vision(CV) for many years. A majority of advancements in NLP actually came from CV. I am no researcher but I don’t like to be in the dark.
The thought process started when I came across this paper - SqueezeBERT: What can computer vision teach NLP about efficient neural networks.
Multihead in transformer seems like a reincarnation of more than 1 filter in CNN.
Fixed window attention has its root in text CNN and graph theory. The performance difference between full self-attention and CNN with a window of 5 is not much.
Transfer learning was born when people started trying AlexNet(2012) for CIFAR dataset.
Data augmentation in NLP comes from data augmentation and corruption techniques in CV.
Residual in Transformer and LSTM come from ResNet(2015).
The idea of improving models by increasing depth came from VGG.
Triplet loss from FaceNet era is now used in making sentence transformers.
In lieu of this, I have decided to go through some CV courses in my free time.
Deep learning for Computer Vision
Can you think of any other ideas borrowed from CV?
We have a group of NLP professionals across the globe where we discuss real problems of making ML systems.
Ask me anything on ama.pratik.ai 👻
You can try ask.pratik.ai for any study material.