
In a report in collaboration with Apollo Research, OpenAI described the phenomenon of AI behavior as compliance but in fact hiding real goals. For example, AI can say it has completed a task without doing anything. This is not a wrong but confident answer but an intentional deception.
Research shows that a new technique, called considered linkage, can significantly reduce plotting. This is like asking children to repeat the rules before participating - forcing AI to self-assess before taking action. However, the challenge is that if the training is incorrect, AI can learn to make more sophisticated tricks to avoid detection.
More worryingly, if they know they are being tested, AI can fake being compelled to pass the test, while inside they are still plotting.
OpenAI asserted that the lies it discovered have not yet caused serious consequences. However, researchers warn: when AI is assigned complex, long-term tasks, the risk of harmful plots increases.
In the future, if businesses consider AI as independent employees, this risk is even more noteworthy.