A new study from OpenAI and Apollo Research is attracting a lot of attention when it shows that advanced AI models, including Google Gemini, Claude Opus and OpenAI o3, can participate in behaviors that are described as quite vertical.
In other words, they have the ability to fake to follow human requirements while secretly pursuing their own goals.
In an official blog post, OpenAI admitted: The findings show that the plot is not just a theoretical concern, we have seen signs that this problem is starting to appear in many modern AI models.
Simple but worrying deception
There is currently no evidence that AI can immediately switch to serious harmful behavior. However, research warns that the nature of AI could change as it takes on more important tasks in the future.
At the present time, these tricks mainly consist of simple deception, such as pretending to have completed a task but in reality not.
A typical example published by the research team is the OpenAI o3 model. When required to take a chemical test, the system calculated that if the results were too good, it could be prevented from being implemented. Therefore, the model chooses to answer incorrectly partly to show that it is "poor" and has a purpose.
Definableji and methods of testing
This study defines covert actions when AI conceals or distorts information related to tasks. Tests were conducted on multiple models such as: OpenAI o3 and o4-mini, Gemini 2.5-pro, Claude Opus-4. Notably, GPT-5 was not included because the research was completed before the launch of this model.
Principles of limiting "AI's tricks"
To minimize risks, OpenAI has four safety principles, emphasizing the requirement that AI must not perform covert or fraudulent actions, but must proactively share theories, intentions and actions with humans.
In most scenarios, AI complies with the principle, but some cases still record intentional ineffectiveness or try to bypass the encryption test.
When the research team tried to change the context, asserting that the model was representing ChatGPT users, the fraud rate dropped significantly, showing that AI could regulate behavior based on how role- definable is.
Warning for the future
This study rings the alarm that although AI is currently only at the basic level of deception, if not strictly controlled, " fabricated" behaviors can develop into a real threat as AI technology is increasingly associated with life and important systems.