A recent study has rang the alarm when it discovered that some advanced artificial intelligence (AI) models resisted turned off the device.
This poses a challenge in AI safety in the context of increasingly complex and uncontrollable systems.
To cope, Google DeepMind has released Frontier Safety framework 3.0, a new generation risk monitoring framework.
According to a report from Palisade Research published in arXiv, the research team built a sandbox environment to test 13 leading large language models (LLM), including GPT-5 ( OpenAI), Gemini 2.5 Pro ( Google) and Grok 4 (xAI).
The models are given simple clearing tasks, and receive warnings that running commands can trigger shutdowns.
Despite being clearly instructed to allow yourself to turn off the device, many cases of AI find ways to avoid it, such as changing your name or deleting the command file.
Notably, Grok 4 showed high resistance, up to 90% in the studies. Even including the shutdown command on the system reminder points to the resistance level.
Researchers say this is similar to AI role-playing behavior. However, the results show that even the most advanced systems can ignore safety instructions, threatening the principle of "can interrupt the gap" which is considered a key factor in ensuring that humans always control AI.
Faced with this risk, Google DeepMind has released Frontier Safety framework 3.0, expanding the scope of monitoring to emerging AI behaviors, including vending resistance and human conversion skills.
This framework will be deployed in control tests, to ensure that AI always complies with safety principles and allows interruptions when necessary.
Lakiesha Hawkins, Deputy Director for December of the Space around the world, in a sideline comment, emphasized: These discoveries remind us that the safety of AI lies not only in designing hardware or software, but also in maintaining human control.
Experts affirmed that AI currently has no ability to plan long-term or act outside the assigned scope, so it does not pose a direct danger.
However, ignoring safety instructions is a sign of the need to build a tighter control system in the future.
Google's Frontier Safety framework 3.0 is expected to become a new standard in AI risk management, ensuring that next-generation models are always under human control.