Google launches Gemini 3 Flash, promising faster AI reasoning at lower cost

Gemini 3 Flash combines frontier-level performance with faster response times, and is now rolling out across Google’s consumer and developer platforms.

Gemini 3 Flash is Google’s attempt to prove that speed does not have to come at the expense of intelligence. (Image: Google)

Google, on Wednesday, December 17, introduced Gemini 3 Flash, a new iteration of its AI model family that has been designed to offer advanced reasoning and multimodal capabilities with considerable reduction in latency and cost.

The new model is a combination of deep reasoning strengths of larger frontier models with speed and efficiency that has been optimised for real-time applications, including coding tasks, agentic workflows, complex analysis, etc. Google claims that the new model attains competitive performance benchmarks while processing tasks faster than its predecessors and that too at a much lower cost per token.

“With Gemini 3, we introduced frontier performance across complex reasoning, multimodal and vision understanding and agentic and vibe coding tasks. Gemini 3 Flash retains this foundation, combining Gemini 3’s Pro-grade reasoning with Flash-level latency, efficiency and cost. It not only enables everyday tasks with improved reasoning, but also is our most impressive model for agentic workflows,” the company said in its blog.

Gemini 3 Flash comes with multimodal inputs allowing for instant reasoning over text, images, audio, and video prompts. The tech giant has highlighted the model’s ability to power responsive interactive experiences like real-time video analysis, visual question answering, and automated data extraction at scale.

Also Read | AI can now pass the CFA exams: What does that mean for finance jobs?

When it comes to performance, Gemini 3 Flash demonstrates that faster AI models do not need to cut down on intelligence for speed. According to Google, the model delivers frontier-level reasoning and knowledge performance, securing 90.4 per cent on the GPQA Diamond benchmark and 33.7 per cent on Humanity’s Last Exam without tools. These results place the model in the same league as much larger frontier models and ahead of Gemini 2.5 Pro across several benchmarks. On multimodal reasoning, Gemini 3 Flash reaches state-of-the-art performance with an 81.2 per cent score on MMMU Pro, comparable to Gemini 3 Pro.

Beyond its capabilities, the new model is designed for efficiency. Based on the complexity of the task, the model dynamically adjusts how much it thinks thereby spending more effort on harder problems while remaining lightweight for everyday use. According to Google, it uses about 30 per cent fewer tokens on average than Gemini 2.5 Pro for typical workloads. This essentially means the model improves on both performance and cost efficiency.

Speed remains its standout feature. Built on the Flash legacy, the model is claimed to be up to three times faster than Gemini 2.5 Pro while delivering higher overall performance, based on third-party benchmarks. Pricing reflects this efficiency, with input tokens costing $0.50 per million and output tokens priced at $3 per million, positioning Gemini 3 Flash as a high-performance yet cost-effective option for developers and enterprises.

Story continues below this ad

The model is rolling out worldwide as the default engine in Gemini App and AI Mode in Google Search, bringing next-generation AI responses to everyday users without any additional charges. Meanwhile, developers and enterprises can access Gemini 3 Flash via the Gemini API in Google AI Studio, Vertex AI, Gemini CLI, Android Studio, etc.

Tags:
artificial intelligence