A lot of good stuff happened in indic NLP this year.
Multilingual Representations for Indian Languages(MuRIL)
Supports 17 Indian languages. I think it also works with mix-code as it’s trained with transliterated data. Mix-code handling is needed for dealing with social media data ex. tweets and chats
Indic-Transformers: An Analysis of Transformer Language Models for Indian Languages
Other multilingual papers
Since MuRIL is showing a great gain on the Tatoeba dataset compared to mBERT, it seems best for neural search in Indic language. (I am not clear on LaBSE Vs MuRIL for Indian data)
Possible applications in India
News search
Indic websites
English websites but indic search
FAQ chatbot in indic language
Customer support for commercial websites
Customer support for govt websites
Zero-shot article classification via similarity between title and categories
Unsupervised recommendation engine via neural search
News articles
Social content
Models can be improved by finetuning with the domain and task data.
Some Indic talks at the recent event Forum for information retrieval (FIRE 2020) - schedule by IDRBT, Hyderabad.
Come join Maxpool - A Data Science community to discuss real ML problems!
Awesome post of a collection of work on Indian Languages. Thank you ..!!