Encoding text longer than 512 tokens
Pratik Bhavsar
Making better domain text encoders with TSDAE
Pratik Bhavsar
Yes, this might be the one
Pratik Bhavsar
Years of sweating and burning fingers
Pratik Bhavsar
NLP for startup engineers
Pratik Bhavsar
Cosine similarity on a subset of documents for multipass searchHow do you calculate cosine on a subset of vectors in a vector index? Documents 1M -> 10000 -> 100 -> 10 64D -> 128D -> 256D -> 512D (D - vector…
Pratik Bhavsar
Minimum viable reading
Pratik Bhavsar
Better search for Indian language data
Pratik Bhavsar
The world’s most valuable resource is no longer oil, but data.
You don't rise to your dreams, you fall to the level of your code-reviews
Pratik Bhavsar
Choose the right tool for your job
Pratik Bhavsar
On fake AI companies
Pratik Bhavsar