OpenAI claimed that Deep Research achieved a new high score of 26.6% accuracy in Humanity’s Last Exam. (Screenshot: OpenAI)In its first major release since the upheaval sparked by DeepSeek, OpenAI launched a new AI tool coyly named Deep Research on Sunday, February 2.
Simply put, Deep Research is an AI tool capable of gathering information from across the internet and synthesising it to create a comprehensive report at the level of a research analyst, according to OpenAI.
OpenAI unveiled Deep Research’s capabilities in a demonstration video posted on YouTube. The company also privately showcased the AI tool to US lawmakers and other government officials in Washington DC last week, according to The New York Times.
The AI startup behind ChatGPT has been put in a precarious position by DeepSeek, which claims that its reasoning model (R1) rivals OpenAI’s o1 model in performance, while using lesser advanced graphics processing units (GPUs) and costing significantly lesser to develop.
This purported breakthrough in compute efficiency has led to a perception that OpenAI is losing ground to Chinese upstarts like DeepSeek in the AI arms race. In response to the buzz around DeepSeek, OpenAI CEO Sam Altman had said that the company would “deliver much better models” and will “pull up some releases.”
deepseek’s r1 is an impressive model, particularly around what they’re able to deliver for the price.
we will obviously deliver much better models and also it’s legit invigorating to have a new competitor! we will pull up some releases.
— Sam Altman (@sama) January 28, 2025
Deep Research is the second AI agent released by OpenAI in one month. A few weeks ago, it previewed an AI agent called Operator that is capable of searching on the internet and visiting websites on a user’s behalf to shop for groceries, make restaurant reservations, etc.
While OpenAI’s Operator agent is built on top of its multimodal large language model (LLM) GPT-4o, Deep Research seems to be the first AI tool to be powered by a version of the upcoming o3 model that has been optimised for web browsing and data analysis.
According to OpenAI, Deep Research builds on the reasoning capabilities of o1 to carry out complex tasks that require “extensive context and information gathering from diverse online sources.”
Deep Research was trained on real-world tasks across a wide range of domains that demand the use of a browser and Python tool. The training methodology employed was end-to-end reinforcement learning.
“Through that training, it learned to plan and execute a multi-step trajectory to find the data it needs, backtracking and reacting to real-time information where necessary,” OpenAI said in a blog post published on Sunday.
In response to a user’s prompt, Deep Research will provide a comprehensive report after finding, analysing, and synthesising content such as text, images, and PDFs from several online sources.
The AI-generated research reports will contain synthesised graphs and images from websites, along with clear citations and a summary of its thinking. Users can also upload specific files and spreadsheets to add context to their prompt.
OpenAI claimed Deep Research is especially good at uncovering hard-to-find information that would usually require searching through multiple websites. For example, a user could describe a scene from any TV show and ask it to dig through the web to find the exact episode where it happened.
“[Deep Research] can do complex research tasks that might take a person anywhere from 30 minutes to 30 days,” Kevin Weil, OpenAI’s chief product officer, was quoted as saying.
But the company has acknowledged that the AI tool can still get things wrong. “It may struggle with distinguishing authoritative information from rumors, and currently shows weakness in confidence calibration, often failing to convey uncertainty accurately,” OpenAI said.
Deep Research is currently accessible through the web version of ChatGPT as part of its Pro subscription package that costs $200 per month (Rs 16,950 approx.). Users have to select the ‘deep research’ tab in the message composer before entering their prompts.
“Once it starts running, a sidebar appears with a summary of the steps taken and sources used […] You’ll get a notification once the research is complete. The final output arrives as a report within the chat,” the company said.
Deep Research is also very compute-intensive, which means that the longer it takes to research a query, the more processing power it requires. It is currently not accessible in the UK and European Union (EU) region.
OpenAI claimed that the optimised version of o3 powering Deep Research, achieved a new high score of 26.6% accuracy in Humanity’s Last Exam, a recently released benchmark test that tests AI models across a broad range of subjects on expert-level questions.
“Compared to OpenAI o1, the largest gains [of Deep Research] appeared in chemistry, humanities and social sciences, and mathematics,” it said.
Furthermore, Deep Research’s underlying o3 model topped another public benchmark test called GAIA that evaluates AI models on their reasoning, multi-modal fluency, web browsing, and tool-use abilities, with questions spanning three levels of difficulty.
On what’s next, OpenAI said Deep Research will be upgraded to embed images, data visualisations, and other analytic outputs in its AI-generated research reports. On the user end, they will be able to add more specialised data sources to make Deep Research’s outputs “even more robust and personalised”.