Premium

OpenAI unveils new o3 model: What is it and how is it different from o1?

The new o3 and o3 mini models from OpenAI are designed to solve some of the most complex problems.

openai o3o3 is a frontier AI model that has been developed to offer advanced reasoning and intelligence across a range of complex tasks. (Photo - Reuters)

OpenAI has announced o3, the improved version of its most advanced AI model to date. With reasoning, the o1 model, launched in September, takes its time to think over responses to prompts from users. The new model is capable of delivering responses in a more step-by-step, logical manner. OpenAI CEO Sam Altman, during the launch, said that the company is viewing o3 as the beginning of the next phase of AI. He said that these models can be used to do more complex tasks that require “a lot of reasoning”.

When it comes to performance, the new o3 model surpasses several benchmarks when compared to o1. These include complex coding-related skills, competency in solving scientific problems, and even advanced mathematical problems. Reportedly, the model is claimed to be three times better at answering questions from ARC-AGI tests. In simple words, ARC-AGI is a test designed to assess an AI model’s ability to comprehend and perform tasks without relying on its pre-trained knowledge. In essence, o3 models are capable of solving some extremely difficult math and logic problems that they are being exposed to for the first time.

As of now, OpenAI is starting with public safety testing, indicating the company’s cautious approach. If early results and benchmark performances are to be believed, o3 models could mark a significant step forward in the advancement of AI models.

What is o3 and how is it different from o1?

o3 is a frontier AI model that has been developed to offer advanced reasoning and intelligence across a range of complex tasks. It has been announced alongside a smaller version, o3 Mini. The o3 model has been designed to solve some challenging problems in coding, general intelligence, and math. OpenAI has highlighted some notable benchmarks that show how o3 is capable of tackling reasoning of more complex problems, something that has never been done before by older models.

While o1 scored a 48.9 per cent in SWE-bench verified, the o3 model achieved 71.7 per cent accuracy. The SWE-bench verified is a set of tests to assess the coding ability of a model. Similarly, when it comes to programming (Codeforces), o1 scored 1891, while o3 scored 2727, much beyond the predecessor. Also, o3 surpassed o1 in mathematical reasoning by securing 96.7 per cent on the AIME 2024, compared to 83.3 per cent scored by 01. Similarly, o3 showcased unparalleled performance in science benchmarks. Especially on GPQA Diamond, a test that has PhD-level questions, o3 scored 87.7 per cent accuracy, in contrast to o1’s 78 per cent.

On the other hand, the EpochAI Frontier Math benchmark is among the toughest mathematical benchmarks with problems that have never been published before. The o3 model scored 25.2 per cent in this test, while older AI models from across the industry have only managed to cross 2 per cent.

Is this the best reasoning model?

Perhaps the most significant aspect of the o3 model is its scores in the ARC-AGI benchmark. ARC-AGI stands for Abstraction and Reasoning Corpus for Artificial Intelligence, and it was developed by French software engineer and AI researcher Francois Chollet. The test showcases an AI model’s ability to learn new skills from limited examples. While traditional benchmarks test pre-trained knowledge or pattern recognition skills, the ARC-AGI has tasks that challenge models to learn from rules and transformations that it has never done before. This is usually a task that humans can manage naturally, something that AI has always struggled with.

Story continues below this ad

ARC-AGI is particularly tough as its tasks require direct reasoning skills, and models cannot rely on solutions previously memorised or templates. This pushes the model to adapt to entirely new challenges with each test. Each task as part of ARC-AGI is unique, as one may require the model to trace patterns, while others may require them to reason about numerical patterns. With its expansive tasks and diversity, ARC AGI is a reliable barometer to see if an AI model can think and learn like humans.

On the other hand, the o3 Mini is an affordable alternative to the o3 model. According to OpenAI, the mini version is ideal for tasks that need higher accuracy amid resource constraints. The o3 mini brings adaptive thinking, allowing users to adjust reasoning efforts based on the complexity of a task. The model’s low-effort reasoning offers speed and efficiency needed for simple tasks, and for complex tasks, it uses higher effort for accuracy. The high-effort mode matches the larger o3 model but at a significantly lower cost. According to OpenAI, the flexibility of the o3 Mini model makes it best suited for developers and researchers.

When will o3 be available?

Both o3 and o3 mini are currently limited to researchers through OpenAI’s safety testing program. Reportedly, the 03-mini model will be available towards the end of January 2025. The full o3 model will be available after the safety testing.

Bijin Jose, an Assistant Editor at Indian Express Online in New Delhi, is a technology journalist with a portfolio spanning various prestigious publications. Starting as a citizen journalist with The Times of India in 2013, he transitioned through roles at India Today Digital and The Economic Times, before finding his niche at The Indian Express. With a BA in English from Maharaja Sayajirao University, Vadodara, and an MA in English Literature, Bijin's expertise extends from crime reporting to cultural features. With a keen interest in closely covering developments in artificial intelligence, Bijin provides nuanced perspectives on its implications for society and beyond. ... Read More

Latest Comment
Post Comment
Read Comments
Advertisement
Advertisement
Advertisement
Advertisement