Follow Us:
Tuesday, October 20, 2020

IIT-Madras faculty develop AI models to process text in 11 Indian regional languages

There are hardly any tools for Indian languages and this is the critical gap that we are trying to address through this initiative. These models are available free of cost as we want the entire country to benefit from them, claims IIT-Madras.

By: Education Desk | New Delhi | September 23, 2020 2:14:41 pm
iit madras, ai, artificial intelligence, online reginal language, online open source free it, natural language processing, nlp, iit, iit news, education newsIIT Madras

To make digital educational content available in Indian regional languages, faculty members of the Indian Institute of Technology (IIT) Madras have developed Artificial Intelligence (AI) models and datasets. These data sets can process texts in 11 Indian regional languages and can be developed into creating tools which can automatically process questions written in Indian languages and classify them into specific topics.

Such a facility is available in English across the internet. This was taken up jointly with ‘AI4Bharat,’ a platform for building AI solutions for problems of relevance to India, said IIT-Madras in a press statement.

Read| Emerging courses to pursue: Virology | Actuarial science |  Pharma Marketing | FinTech Coronavirus | Robotics | Healthcare Engineering | Cyber Security | Data Science | Petroleum and Energy | Design Strategy | Business analytics | Digital auditing | Digital marketing | Luxury management | Machine learning | Corporate Law | Product design

For the past one year, a team of researchers comprising students, faculty and volunteers from IIT-Madras and AI4Bharat worked on collecting data and training powerful models for processing text written in Indian languages, claims the IIT. The models take advantage of the similarities between Indian languages to make efficient use of data.

With these models, the researchers have been able to push for Indian language processing on several tasks such as document classification, sentiment analysis, semantic matching, paraphrase detection and so on.

Elaborating on this initiative, Mitesh M Khapra, assistant professor, department of computer science and engineering, IIT-Madras, said, “We have a very rich diversity of languages in our country. As we move towards a digital economy, it is important that our languages find a space online. This requires a lot of innovation in creating input tools, datasets, and AI models for Indian languages.”

Read | Ayodhya Ram Temple construction: L & T reaches out to IIT-M for expert help on design, concrete

“While such tools are available for English and other foreign languages, there are hardly any tools for Indian languages and this is the critical gap that we are trying to address through this initiative. These models are available free of cost as we want the entire country to benefit from them,” added Khapra.

The researchers from IIT Madras and AI4Bharat released AI models and datasets for the following languages: Tamil, Hindi, Malayalam, Telugu, Kannada, Punjabi, Bengali, Odia, Assamese, Gujarati, and Marathi. The multilingual AI models and datasets developed through this initiative will provide the essential building blocks to students, faculty, start-ups and industry to work on Indian language tools and push the frontiers of technology.

The resources are available in open-source and are free of cost, which can be accessed by anyone. These models are freely available and can be downloaded from a Github repository. An accompanying research paper describing the research methodologies and evaluation have been accepted at EMNLP-Findings (a companion publication at one of the top Natural Language Processing conferences).

Read | IIT-Delhi sets up School of Artificial Intelligence; to offer degree, professional courses

AI4Bharat is an initiative co-founded by Dr. Mitesh M Khapra and Dr. Pratyush Kumar from IIT Madras and works to solve India specific problems in a community-driven, open-sourced manner. Both Dr. Mitesh Khapra and Dr. Pratyush Kumar are also associated with the Robert Bosch Centre for Data Science and Artificial Intelligence.

Pratyush Kumar, assistant professor, Department of Computer Science and Engineering, IIT Madras, said, “This initiative is one of the few attempts in Academia to develop and publicly release such large scale multilingual AI models containing millions of parameters trained on billions of tokens from 11 Indian languages, completely free and open-source.”

📣 The Indian Express is now on Telegram. Click here to join our channel (@indianexpress) and stay updated with the latest headlines

For all the latest Education News, download Indian Express App.