Google has introduced Gemini 2.0, its new AI model for the agentic era. The tech giant, on Wednesday, unveiled the first model in the Gemini 2.0 family of models – Gemini 2.0 Flash, a workhorse model with low latency and enhanced performance. The company also shared the frontier of its agentic research through prototypes enabled by Gemini 2.0’s native multimodal capabilities.
In their article, Demis Hassabis, CEO of Google DeepMind and Koray Kavukcuoglu, CTO of Google DeepMind, said that Gemini 2.0 Flash builds on the success of 1.5 Flash – the company’s most popular model yet for developers. The 2.0 Flash outperforms 1.5 Pro on some key benchmarks at twice the speed. The new model also comes with some unique capabilities.
Along with support for multimodal inputs such as images, video, and audio, 2.0 Flash also supports multimodal outputs such as natively generated images along with text and steerable text-to-speech (TTS) multilingual audio. The company said that 2.0 Flash can also natively call tools like Google Search, code execution as well as third-party user-defined functions.
When it comes to performance, the 2.0 Flash shows significant advancement across most benchmarks when compared to the Gemini 1.5 Flash and Gemini 1.5 Pro. The new model excels in code generation tasks as it secured 92.9 per cent on Natural2Code, and this is a significant improvement over Gemini 1.5 Pro’s 85.4 per cent. When it comes to mathematical reasoning, the 2.0 Flash scored 89.7 per cent on MATH and 63 per cent on HiddenMath, also a considerable improvement over past iterations.
On factuality, 2.0 Flash scored 83.6 per cent, showing enhanced ability to offer accurate responses. The model scored 62.1 per cent on GQPA indicating better reasoning capabilities. The model saw a slight dip in long-context understanding (MRCR IM) with 69.2 per cent while Gemini 1.5 Pro obtained 82.6 per cent. The new model improves significantly on multimodal tasks with 70.7 per cent on MMMU, however, audio recognition remains low at 9.8 per cent.
Overall, the new Gemini 2.0 Flash outdoes its predecessors, especially in areas like coding, math, and multimodal understanding. The benchmark scores, as shared by Google, position 2.0 Flash as a robust model for complex tasks.
The new Gemini 2.0 Flash is available as an experimental model to developers via the Gemini API in Google AI Studio and Vertex AI with multimodal input and text output for all developers, and text-to-speech and native image generation available to early-access partners. The company said that the general availability will follow in January along with more model sizes. To help developers build dynamic and interactive applications, Google is also releasing a new Multimodal Live API that has real-time audio, video-streaming input and the ability to use multiple and combined tools.
From Wednesday, Gemini users around the world will be able to access a chat optimised version of 2.0 Flash experimental by selecting it in the model drop-down on desktop, and mobile web and it will be available in the Gemini mobile app soon. Google has said that the Gemini 2.0 will be integrated into more Google products early next year.
The Gemini 2.0 Flash offers some ground-breaking capabilities allowing for advanced agentic experiences in AI. Features like multimodal reasoning, long-context understanding, complex instruction following, and enhanced latency pave the way for more innovation. Project Astra, which is built with Gemini 2.0, works as a universal AI assistant that excels in dialogue, tool integration such as Google Search, Maps, Lens, etc., and memory retention. Owing to improved latency, Astra enables seamless user interaction, and Google has said that it is expanding into wearable devices.
On the other hand, Project Mariner explores browser-based agentic tasks such as web task navigation using text, code, and images. Although in the early stages, Mariner focuses on safety with user oversight. Another key announcement is Jules, an AI code agent that integrates with GitHub workflows, assisting developers in planning execution. Moreover, Gemini 2.0 also powers agents in gaming, offering players real-time suggestions and using spatial reasoning in robotics for potential physical-world applications.
Google is prioritising safety and responsibility by employing rigorous testing, privacy safeguards, and collaboration with experts. The tech giant claims that the Gemini 2.0 Flash is a significant leap in AI innovation and marks the foundation for the exploration of bigger possibilities toward artificial general intelligence along with their ethical deployment.