ChatGPT creator OpenAI may have incurred a cost upwards of $30 million – or 172 times $200,000 to be precise – for running its latest 03 model’s high-compute configuration on the ARC-AGI benchmark, showcasing the astronomical costs involved in not just building a high end artificial intelligence (AI) models, but benchmarking them for general reasoning. This cost was mentioned in the Economic Survey 2024-25 released Friday. O3 is OpenAI’s latest reasoning model, which the firm had announced last December. “…developing more sophisticated models comes with significant costs as well. Since processing user queries utilises vast computational resources, AI firms incur running costs for the model. For instance, in the case of OpenAI’s o3 model…the breakthrough in processing capability came at a very high cost. In running the ARC-AGI benchmark, which is considered one of the most challenging tasks for an AI to undertake, OpenAI incurred a cost of USD 200,000 for its low-efficiency model,” the Survey said. “While the firm asked the author not to disclose its high-efficiency cost, the author does state that the amount to compute was 172 times the low-efficiency model’s figure. Running increasingly complex models is computationally tasking, exerting hardware, energy and other resource demands,” it added. The Chief Economic Advisor (CEA) is the author of the Economic Survey. 172 times $200,000 comes to be $34.4 million. The ARC-AGI benchmark, which stands for "Abstraction and Reasoning Corpus for Artificial General Intelligence," is a test designed to evaluate an AI system's ability to reason and adapt to new tasks, essentially measuring its "general intelligence" by presenting it with visual puzzles that require understanding patterns and applying rules to solve problems, often considered a key indicator of true artificial general intelligence (AGI). In December, OpenAI's new o3 system - trained on the ARC-AGI-1 Public Training set - scored a breakthrough 75.7 per cent for low efficiency compute and 87.5 per cent for its high-efficiency configuration. The higher the score, the closer a model is considered to be closer to AGI – a measure of how close an AI model is to actual human intelligence.