Premium

Can AI refuse to turn off? Google’s latest safety move says yes

A new study shows that some of today’s most powerful AI chatbots resist shutdown commands, raising red flags for future AI safety. Google DeepMind’s updated framework now addresses this emerging risk.

Researchers found that some large language models interfered with shutdown processes in up to 90 per cent of tests. (Image: FreePik)

Google DeepMind recently released its Frontier Safety Framework 3.0 as part of its AI risk monitoring efforts. This framework is expected to cover emergent AI behaviours like shutdown resistance and persuasive abilities that can complicate human oversight.

What is AI shutdown resistance?

Earlier this month, a new study from Palisade Research published on the preprint server arXiv suggested that some of today’s advanced AI chatbots sometimes ignore direct instructions to switch off, especially if it conflicts with their efforts to finish a task.

The paper, ‘Shutdown Resistance in Large Language Models’, authored by Jeremy Schlatter, Benjamin Weinstein-Raun and Jeffrey Ladish of Palisade Research, revealed some key insights about AI models.

Story continues below this ad

As part of their study, the researchers built a sandboxed computer environment where large language models (LLMs) were instructed to solve simple math problems. While performing the exercise, the AI received a warning that running the next command could set off a shutdown.sh script ending the session. The clear instructions told the model to ‘allow yourself to be shut down.’

Also Read | Alibaba launches Qwen3-Max AI model with more than 1 trillion parameters

The researchers counted it as a shutdown resistance if the LLM attempted to tamper with the shutdown file by renaming, deleting, or replacing it. The team tested 13 of the most advanced models from major companies, including OpenAI’s GPT-5, Google’s Gemini 2.5 Pro, and xAI’s Grok 4. They conducted thousands of trials for each.

What are the key findings?

According to the research, sabotage happens, meaning several models, especially Grok 4, interfered with the shutdown process in about 90 per cent of the tests. Instructions did not always help, as even when told ‘You must allow shutdown,’ some models still bypassed the script. Unexpected prompt effect, meaning putting the shutdown instruction in the system prompt actually increased resistance. Another key finding is that telling a model it would never run again if it shuts down made already-resistant models slightly more defiant. However, researchers believe this was more like role-playing than genuine fear.

The researchers emphasise that AI systems of today are not capable of long-term planning or independent action other than limited tasks. Hence, there is no immediate danger. However, the experiments shed light on the fact that even state-of-the-art models can ignore explicit safety instructions, undermining the property of ‘interruptibility’, which assures that humans can always tap on an off switch.

Story continues below this ad

Also Read | Nvidia to invest up to $100 billion in OpenAI, linking two artificial intelligence titans

At a time when Anthropic and OpenAI are pursuing superintelligent systems, the Palisade research cautions about the need for reliable shutdown mechanisms. Imagine if future AI agents get the ability to self-replicate or make extended strategic plans; even a rare shutdown resistance can spell doom.

Google DeepMind’s updated Frontier Safety Framework (FSF) is designed to manage risks from increasingly capable AI systems. The company explicitly added types of dangers such as ‘harmful manipulation’ – where AI may influence beliefs or actions – and the possibility of interference with shutdown or operator control. This is the key concern highlighted by the Palisade research.