OpenAI warns next-gen AI models could pose high cybersecurity risks; readies defences

OpenAI is not the only one trying to tamper-proof its own AI models in anticipation of a future with frequent, more sophisticated AI-led cybersecurity threats.

HackingA man holds a laptop computer as cyber code is projected on him in this illustration picture taken on May 13, 2017. REUTERS/Kacper Pempel/Illustration

OpenAI has warned that its upcoming AI models could demonstrate ‘high’ levels of capabilities in cybersecurity and pose serious risks if they were to be potentially misused.

The next-generation of AI models could, for instance, be used to remotely deploy zero-day exploits against well-defended systems or enable threat actors to compromise complex, enterprise operations leading to real-world impact, the ChatGPT maker said in a blog post on Wednesday, December 10.

On its part, OpenAI said that it is investing in strengthening its models for defensive cybersecurity tasks along with developing tools that enable cybersecurity teams to audit code and patch vulnerabilities more easily. “Our goal is for our models and products to bring significant advantages for defenders, who are often outnumbered and under-resourced,” the company said.

OpenAI is not the only one that appears to be tamper-proofing its own AI models and tools in anticipation of a future with frequent and more sophisticated AI-led cybersecurity threats. Earlier this week, Google announced it is upgrading its Chrome browser security architecture against indirect prompt injection attacks that could be used to hijack AI agents – ahead of rolling out Gemini agentic capabilities in Chrome more widely.

In November 2025, Anthropic disclosed that threat actors, possibly a Chinese state-sponsored group, had manipulated its Claude Code tool to carry out a highly sophisticated AI-led espionage campaign that was disrupted by the AI startup.

To highlight how quickly AI’s cybersecurity capabilities have advanced, OpenAI said that GPT‑5.1-Codex-Max⁠ scored 76 per cent on capture-the-flag (CTF) challenges last month, up from 27 per cent by GPT‑5⁠ in August this year.

Layered safety stack

To mitigate the risks, OpenAI said it is taking a defense-in-depth approach which involves a combination of access controls, infrastructure hardening, egress controls, and monitoring. In terms of more concrete steps, the Microsoft-backed AI startup said it is:

Story continues below this ad

– Training AI models to refuse or safely respond to harmful requests while remaining helpful for educational and defensive use cases.
– Improving system-wide monitoring across products that use frontier models to detect potentially malicious cyber activity.
– Working with expert red teaming organisations to evaluate and improve safety mitigations.

Aardvark, its AI agent designed to double as a security researcher, is currently in private beta. Aardvark is capable of scanning codebases for vulnerabilities and proposes patches that maintainers can adopt quickly. It will be made available for free to select non-commercial open source repositories, OpenAI said.

As for broader ecosystem-focused initiatives, OpenAI said it will set up a Frontier Risk Council, an advisory group comprising external cybersecurity experts, along with a trusted access programme for users and developers.

 

Latest Comment
Post Comment
Read Comments
Advertisement
Loading Taboola...
Advertisement