A lot of good stuff happened in indic NLP this year.
Supports 17 Indian languages. I think it also works with mix-code as it’s trained with transliterated data. Mix-code handling is needed for dealing with social media data ex. tweets and chats
Other multilingual papers
Since MuRIL is showing a great gain on the Tatoeba dataset compared to mBERT, it seems best for neural search in Indic language. (I am not clear on LaBSE Vs MuRIL for Indian data)
Possible applications in India
English websites but indic search
FAQ chatbot in indic language
Customer support for commercial websites
Customer support for govt websites
Zero-shot article classification via similarity between title and categories
Unsupervised recommendation engine via neural search
Models can be improved by finetuning with the domain and task data.
Come join Maxpool - A Data Science community to discuss real ML problems!