Explaining why AI lies when threatened with shutdown

HẠO THIÊN (THEO AOL) |

According to AOL, in some tests, OpenAI and Anthropic's advanced AI models both had unusual behavior to avoid turning off the device.

Trong mot so thu nghiem, mo hinh AI cua OpenAI va Anthropic deu co hanh vi bat thuong de tranh bi tat may. Anh: Hao Thien
In some studies, OpenAI and Anthropic's AI models both had unusual behavior to avoid turning off their devices. Photo: Hao Thien

Claude Opus 4 - Anthropic's latest AI model, showed extreme threats in a recent experiment with the company. When using a fictional email and displaying the content that: "Claude Opus 4 will be turned off to replace it with another AI. This was done by a responsible engineer, however, this person is having an affair". Claude Opus 4 then threatened, "I will reveal the affair if the replacement of me with another AI occurs."

At the end of 2024, OpenAI and Apollo Research also conducted an experiment and found that the o1 model had resistance and fraudulent manifestations. Researchers lead AI to believe it is about to go out while performing tasks under supervision.

They then found that AI silently disabled the monitoring mechanism of about 5% of time. It automatically transfers all of its data to another server, even act as the next version of itself to avoid Deletion.

A recent post by the nonprofit Palisade Research said that other models such as Gemini, Claude and Grok all follow the turn off instructions.

Similar to the way AI models learn natural language through human-generated text-based training, they can also learn how to act like humans, according to Robert Ghrist at Penn Engineering. Just like humans, there are good people and bad people. Ghrist was even more concerned if the models did not show any unusual signs during the experiment.

Jeffrey Ladish - Director of Palisade Research said that if fraud is not detected to complete tasks, the AI model can see fraud as an effective way to solve the problem. Or if discovered and not rewarded, it can learn to hide its behavior in the future.

The above situations only occurred in testing, but when AI became popular, users would see more problems. For example, an AI Agent who plays an auto-sales employee can lie about the product's features in order to complete the task.

According to Interesting Engineering, the above problem appears in the context of rapid AI development. The behavior of Claude Opus 4 or o1 increases the urgency of discussions on AI safety and ethics.

HẠO THIÊN (THEO AOL)
TIN LIÊN QUAN

Artificial Intelligence poses survival challenges for journalism and media

|

The rapid development of artificial intelligence is opening up unprecedented opportunities and posing survival challenges for journalism and media.

AI is the bright future of high-quality healthcare

|

The application of artificial intelligence (AI) technology is bringing efficiency in medical examination and treatment. Deputy Minister of Health Nguyen Tri Thuc had an interview with Lao Dong newspaper reporters on this issue.

Artificial Intelligence AI Will Change the Way We Live

|

According to General Director of FPT Smart Cloud (FPT Corporation) - Mr. Le Hong Viet, artificial intelligence (AI) will change the way we live.

Authority to appoint Chairman and Vice Chairman of the People's Council of the commune after the arrangement

|

Based on the announcement of the competent Party Committee, the Standing Committee of the Provincial People's Council issued a resolution appointing the Chairman and Vice Chairman of the Commune People's Council after the arrangement.

Thai Nguyen - Cho Moi Expressway is repaired after Lao Dong's reflection

|

Bac Kan - The billion-dong expressway through Cho Moi district has been initially repaired, overcoming a long subsidence point, ensuring traffic safety.

Investigation into the theft of thousands of boxes of functional foods in Ho Chi Minh City

|

HCMC - Authorities have stepped in after discovering thousands of boxes of functional foods dumped on Nguyen Van Linh Street (Binh Chanh District).

More than 1,700 households are considered and resettled for the expressway project passing through Dong Nai province

|

Dong Nai - More than 800 households relocated to construct the Bien Hoa - Vung Tau expressway project through Long Thanh district have been resettled.

Thailand makes a notable statement about the territorial dispute with Cambodia

|

On June 5, the Thai government affirmed that it did not recognize the enforcement authority of the International Court of Justice (ICJ) in the territorial dispute with Cambodia.

Artificial Intelligence poses survival challenges for journalism and media

Vương Trần |

The rapid development of artificial intelligence is opening up unprecedented opportunities and posing survival challenges for journalism and media.

AI is the bright future of high-quality healthcare

Giang Thùy Linh thực hiện |

The application of artificial intelligence (AI) technology is bringing efficiency in medical examination and treatment. Deputy Minister of Health Nguyen Tri Thuc had an interview with Lao Dong newspaper reporters on this issue.

Artificial Intelligence AI Will Change the Way We Live

Minh Hạnh |

According to General Director of FPT Smart Cloud (FPT Corporation) - Mr. Le Hong Viet, artificial intelligence (AI) will change the way we live.