skip to content
Advertisement
Premium
This is an archive article published on July 9, 2024

Ganga-1B — pre-trained Hindi AI model developed at IITGN

This project aims to develop pocket size open-source large language models for Indic languages, says Prof Mayank Singh

IIT Gandhinagar, AI model in Hindi-Ganga-1B, artificial intelligence, language models, Lingo Research Group, Ganga-1B, Ganga-1B Hindi model, academic research laboratory, Indian express newsIndian Institute of Technology Gandhinagar (File Photo)

The Lingo Research Group at Indian Institute of Technology Gandhinagar (IITGN) has developed an artificial intelligence (AI) model in Hindi-Ganga-1B — “a breakthrough in language models”. Named after the longest river flowing through the country, Ganga-1B is the first pre-trained Hindi model developed by an academic research laboratory.

“The initiative strives to achieve performance in understanding and generating text in Indian languages. The first milestone of which is the release of the Ganga-1B model, trained on an extensive monolingual Hindi language dataset,” said Professor Mayank Singh, assistant professor (Computer Science and Engineering) and head of IITGN’s Lingo Research Group.

The Ganga-1B model has been based on the dataset found on the public domain in regard to Hindi language, including news articles, web documents, books, government publications, educational materials and quality-filtered social media conversations.

Story continues below this ad

“The unity project aims to develop pocket size open-source Large Language Models (LLMs) for Indic languages, created and trained from scratch from Indian data. This initiative will propel the Indian open-source community to build LLMs and chatbots that can be trained and deployed under resource-constrained scenarios,” Professor Mayank Singh told The Indian Express.

Ganga-1B — which has already been downloaded by over 600 people in less than 48 hours following the announcement — was built over nearly 1.5 years to develop, using open-source data from various websites.

The research team has been working on models for other languages including Gujarati, Urdu, Tamil, Telugu and Marathi; they are exploring the use of AI in e-governance for regional languages as well as on an education LLM to support school students and teachers.
Native Indian speakers have further curated the dataset to ensure high quality.

Stay updated with the latest - Click here to follow us on Instagram

Latest Comment
Post Comment
Read Comments
Advertisement

You May Like

Advertisement
Advertisement