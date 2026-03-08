Sarvam on Friday, March 6, announced the rollout of its two foundational multi-lingual AI models that were unveiled at the recently held India-AI Impact Summit 2026, under an open-source licence.
The 30 billion- and 105 billion-parameter large language models (LLMs) are reasoning models that have been built from scratch by training them on large-scale, high-quality datasets curated in-house, the Indian AI startup said in a blog post. Both models were trained using compute from GPUs (Graphics Processing Units) made available under the Indian government-backed Rs 10,372-crore IndiaAI Mission with infrastructure support from data center operator Yotta and technical support from Nvidia, Sarvam said.
While the two AI models were first introduced at the AI Impact Summit 2026 hosted by India in New Delhi last month, Sarvam has now made these models available for commercial use under the Apache 2.0 open-source licence, with the model weights available for download on AIKosh and Hugging Face platforms. Both models are also accessible via Sarvam’s Indus AI chatbot app and through the company’s API developer dashboard.
In recent days, Sarvam has emerged as the flagbearer of India’s ‘sovereign AI’ push, as the central government seeks to reduce dependence on foreign AI giants such as OpenAI and Anthropic by enabling the development of smaller, efficient models that are tailored to local Indian languages and use cases.
However, some observers have also questioned whether so-called sovereign AI models can be open-weight, as allowing anyone in the world to freely modify and distribute them raises a key question about what exactly constitutes sovereignty in the context of AI.
Internally, Sarvam said that the 30B model is used to power its conversational agent platform called Samvaad while the larger, 105B model is the foundation for its Indus AI assistant built for complex reasoning and agentic workflows. The two models have also been optimised to be deployed across a wide range of hardware, including personal devices like laptops.
“Building these models required developing end-to-end capability across data, training, inference, and product deployment. With that foundation in place, we are ready to scale to significantly larger and more capable models, including models specialised for coding, agentic, and multimodal conversational tasks,” Sarvam wrote in the blog post.
The 30-billion- and 105-billion-parameter models use a mixture-of-experts (MoE) transformer architecture, which activates only a fraction of their total parameters at a time, significantly reducing computing costs, Sarvam said. The 30B model supports a 32,000-token context window aimed at real-time conversational use, while the larger model offers a 128,000-token window for more complex, multi-step reasoning tasks.
In terms of efficiency, Sarvam 30B uses Grouped Query Attention (GQA) to reduce KV-cache memory while maintaining strong performance. Sarvam 105B, on the other hand, relies on DeepSeek-style Multi-head Latent Attention (MLA) that further reduces memory requirements for long-context inference.
The data used to train both models includes code, general web data, specialised knowledge corpora, mathematics, and multilingual content. Sarvam said that a substantial portion of the training budget was allocated toward curating a multilingual corpus of data in the 10 most-spoken Indian languages.
The Sarvam 105B model performed better than the 30B model on benchmarks during the early stages of training, which suggests efficient scaling behaviour, Sarvam said.
When compared to LLMs of similar size, the 105B model achieved results similar to pt-oss 120B and Qwen3-Next (80B) on general capabilities. It also demonstrates strong performance on agentic reasoning and task completion, outperforming DeepSeek R1, Gemini 2.5 Flash, and o4-mini on Tau 2 Bench.
However, Sarvam 105B may not be the strongest code-generation model as its performance on SWE-Bench Verified lagged behind compared models. As for the smaller 30B model, the results showed that when compared to Nemotron 3 Nano 30B, Sarvam’s AI model is slightly ahead in coding (SWE-Bench Verified) and agentic reasoning (Tau2) but slightly worse in other benchmarks such as Live Code Bench v6 and BrowseComp.
Interestingly, Sarvam said that its 30B model gets 20 to 40 per cent more tokens/sec throughput compared to Qwen3 due to code and kernel optimisations. Sarvam’s performance on Indian languages is aided by its tokenizer that was built and trained from scratch for efficient tokenization across all 22 scheduled Indian languages, spanning 12 different scripts.
Based on the fertility score, which is the average number of tokens required to represent a word, Sarvam’s tokenizer outperformed other open-source tokenizers in encoding Indic text efficiently.
In the supervised fine-tuning stage, Sarvam said that it fine-tuned both models on a dataset covering standard and India-specific risk scenarios. The dataset also included adversarial and jailbreak-style prompts mined through automated red-teaming. These prompts were paired with policy-aligned, safe completions for supervised training, as per the company.