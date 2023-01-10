As the National Education Policy (NEP) is laying great emphasis on higher education in Indian languages, Project Udaan – a translation software based on Artificial Intelligence (AI) developed at the Indian Institute of Technology (IIT) Bombay – is playing a pivotal role in making the resources available in various Indian languages.

The All-India Council for Technical Education (AICTE), along with Project Udaan team, is currently translating the second and third year engineering textbooks into 12 Indian languages, as well as all 20 first-year textbooks in Malayalam. Other professional as well as traditional courses in streams such as commerce, science, pharmacy, management, among others will soon be included, sources said.

The Maharashtra Government last week signed an MoU with the team from IIT Bombay for translating textbooks from English to Marathi.

“It is a complete ecosystem for machine translation aided by human effort. It involves the development and adoption of domain-specific vocabulary of more than 5 million words across 11 languages. Producing domain dictionary-aware translation reduces the number of edits by the translator,” said Professor Ganesh Ramakrishnan, Institute Chair Professor, Department of Computer Science and Engineering, IIT Bombay who is leading ‘Project Udaan’.

Armed with dictionaries from multiple Indian languages such as Hindi, Marathi, Bengali, Gujarati, Kannada, among others, this end-to-end machine translation and post-editing ecosystem has become the biggest translation facilitator for textbooks of different higher education courses.

Going further into the technicalities, Prof. Ramakrishnan explained that it begins with digitization of the input source material, “perhaps a textbook which may be available in any format currently”.

“The digitization internally invokes Optical Character Recognition (OCR) if the input is not machine-readable (such as scanned pages). Then the translation engine works guided by technical domain-specific dictionaries, which can be dynamically inserted. Our output from the translation engine, in conjunction with our post-editing tool, helps the publishing house bring the final output in much less than 1/6th the time that it would take otherwise,” he said.

Prof. Ramakrishnan said that the advantages and the impact of Udaan, resulted in the project winning the best demo paper at Cods-Comad 2023, a premier ACM international conference that focuses on scientific work in databases, data sciences, and their applications.

“We have also built a large open-source human-in-the-loop platform called https://decile.org/ which is playing a critical role in human-in-the-loop learning in the post-editing framework. Through a team of translators who work in close coordination, we have evolved the tool to be as publisher-friendly as possible, through features such as preservation of alignment between the original source and translation, tools for online vs. offline editing, among others,” he added.