Nvidia on Wednesday, March 11, unveiled its latest Nemotron AI model designed to power and scale complex agentic AI systems.

The Nemotron 3 Super is a 120‑billion‑parameter, open-weight model with 12 billion active parameters. It has advanced reasoning capabilities and can be used to run autonomous AI agents that can complete tasks with efficiency and high accuracy, the chip giant said in a blog post.

Since the model has been released with open-weights under a permissive licence, developers can deploy and customise it on workstations, in data centres, or in the cloud. Researchers can further fine-tune the model on Nvidia’s NeMo platform.

Nvidia said that Nemotron 3 Super has been trained entirely on synthetic data generated using frontier AI reasoning models. The company has published 10 trillion tokens of pre- and post-training datasets, along with the training methodology, reinforcement learning environments, evaluation practices, and more.

The trillion-dollar company has made a fortune supplying Graphics Processing Units (GPUs) to its customers who are building AI models. However, its own Nemotron 3 series of AI models continue to position the chipmaker as an LLM provider, especially at a time when some of Nvidia’s biggest customers are looking to develop their own chips and reduce reliance on its technology.

Recent reports suggest that Nvidia is looking to launch an OpenClaw rival called ‘Nemo Claw’ to allow companies to deploy AI agents similar to the popular open-source agentic AI platform, except that Nemo Claw will likely be integrated with Nvidia’s security and privacy tools.

With Nemotron 3 Super, Nvidia said it is looking to address key constraints such as the drastic increase in token usage and costs due to multi-agent workflows. It also said that models today are “sluggish” and not practical because agents have to reason at every step. To address these issues, Nvidia said that “Nemotron 3 Super has a 1‑million‑token context window, allowing agents to retain full workflow state in memory and preventing goal drift.”

Story continues below this ad

Under the hood

Nemotron 3 Super is based on a hybrid mixture‑of‑experts (MoE) architecture, with 12 billion of its 120 billion parameters active at inference. It also uses a new technique that improves accuracy by activating four expert specialists for the cost of one to generate the next token at inference. Multi-Token Prediction helps the model to predict multiple future words simultaneously, resulting in 3x faster inference.

Using this combination of innovations, Nvidia claimed that Nemotron 3 Super can deliver up to 5x higher throughput and up to 2x higher accuracy. It runs on Nvidia’s latest Blackwell GPUs NVFP4 precision, which further cuts memory requirements and pushes inference up to 4x faster than FP8.

The model can be used to perform complex subtasks within a multi-agent system such as loading an entire codebase into context at once, load thousands of pages of reports into memory, etc. Its high-accuracy tool calling also ensures that AI agents deployed for tasks such as cybersecurity do not make execution errors.

Access and performance

On the benchmarks, Nemotron 3 Super topped the Artificial Analysis evaluation for model efficiency and openness. It was also used to power an AI research agent that secured the No 1 position on DeepResearch Bench and DeepResearch Bench II benchmark leaderboards for multistep research across large documents.

Story continues below this ad

The model, along with the entire Nemotron 3 family of models, can be accessed via Perplexity, OpenRouter, Hugging Face, and build.nvidia.com.

Perplexity is offering its users access to Nemotron 3 Super for search and as one of 20 orchestrated models in Perplexity Computer. Developers of AI coding agents such as CodeRabbit, Factory, and Greptile are integrating the model into their AI agents along with proprietary models to achieve higher accuracy at lower cost, according to Nvidia.