Pratik’s Pakodas 🍿

Share this post

Cosine similarity on a subset of documents for multipass search

pakodas.substack.com

Cosine similarity on a subset of documents for multipass search

Pratik Bhavsar
Jan 15, 2021
Share this post

Cosine similarity on a subset of documents for multipass search

pakodas.substack.com

How do you calculate cosine on a subset of vectors in a vector index?

Documents

1M -> 10000 -> 100 -> 10

64D -> 128D -> 256D -> 512D (D - vector dimension)

FAISS and ANNOY don't support it.

You can do a filter by ID in Elasticsearch and then run cosine query, but should you?

First search on 1M docs in FAISS with ANN(approximate nearest neighbours) and rest passes on ES.

Is numpy masked array a solution for this?

Let me know your thoughts on Twitter!

Twitter avatar for @nlpguy_
Pratik Bhavsar @nlpguy_
Open question How do you calculate cosine on a subset of vectors of a vector index? 1M -> 10000 -> 100 -> 10 64D -> 128D -> 256D -> 512D D - dimension FAISS and ANNOY don't support it. You can do this in Elasticsearch but should you? First search on 1M on FAISS and rest on ES
3:05 PM ∙ Jan 14, 2021
  • New First
  • Chronological
© 2023 Pratik
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing