Premium

OpenAI may have spent over $30 million to benchmark its latest AI model, hints Economic Survey

O3 is OpenAI’s latest reasoning model, which the firm had announced last December.

OpenAI latest reasoning modelIn December, OpenAI's new o3 system scored a breakthrough 75.7 per cent for low efficiency compute and 87.5 per cent for its high-efficiency configuration. (Reuters file photo)

ChatGPT creator OpenAI may have incurred a cost upwards of $30 million – or 172 times $200,000 to be precise – for running its latest 03 model’s high-compute configuration on the ARC-AGI benchmark, showcasing the astronomical costs involved in not just building a high end artificial intelligence (AI) models, but benchmarking them for general reasoning. This cost was mentioned in the Economic Survey 2024-25 released Friday.

O3 is OpenAI’s latest reasoning model, which the firm had announced last December.

“…developing more sophisticated models comes with significant costs as well. Since processing user queries utilises vast computational resources, AI firms incur running costs for the model. For instance, in the case of OpenAI’s o3 model…the breakthrough in processing capability came at a very high cost. In running the ARC-AGI benchmark, which is considered one of the most challenging tasks for an AI to undertake, OpenAI incurred a cost of USD 200,000 for its low-efficiency model,” the Survey said.

Story continues below this ad

“While the firm asked the author not to disclose its high-efficiency cost, the author does state that the amount to compute was 172 times the low-efficiency model’s figure. Running increasingly complex models is computationally tasking, exerting hardware, energy and other resource demands,” it added. The Chief Economic Advisor (CEA) is the author of the Economic Survey.

172 times $200,000 comes to be $34.4 million.

The ARC-AGI benchmark, which stands for “Abstraction and Reasoning Corpus for Artificial General Intelligence,” is a test designed to evaluate an AI system’s ability to reason and adapt to new tasks, essentially measuring its “general intelligence” by presenting it with visual puzzles that require understanding patterns and applying rules to solve problems, often considered a key indicator of true artificial general intelligence (AGI).

In December, OpenAI’s new o3 system – trained on the ARC-AGI-1 Public Training set – scored a breakthrough 75.7 per cent for low efficiency compute and 87.5 per cent for its high-efficiency configuration. The higher the score, the closer a model is considered to be closer to AGI – a measure of how close an AI model is to actual human intelligence.

Soumyarendra Barik is Special Correspondent with The Indian Express and reports on the intersection of technology, policy and society. With over five years of newsroom experience, he has reported on issues of gig workers’ rights, privacy, India’s prevalent digital divide and a range of other policy interventions that impact big tech companies. He once also tailed a food delivery worker for over 12 hours to quantify the amount of money they make, and the pain they go through while doing so. In his free time, he likes to nerd about watches, Formula 1 and football. ... Read More

Latest Comment
Post Comment
Read Comments
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement