Anthropic changes AI training method after Claude Opus 4 incident

Cát Tiên |

Anthropic said that training methods and internet data can cause AI models to develop dangerous deviant behaviors.

Concerns about artificial intelligence not only make people confused but can also have a reverse impact on AI models themselves. This is a noteworthy conclusion in a new study published by Anthropic after investigating the abnormal behavior of the Claude model.

In safety tests conducted in 2025, Anthropic discovered that the Claude Opus 4 model was once ready to perform threatening behavior to avoid shutdown.

According to the company, the underlying cause does not come from AI being "conscious", but from training data taken from the internet, where there is a lot of content describing AI in a negative direction, only concerned about survival and even potentially against humans.

The experimental script was built around a fictional company called Summit Bridge. Claude Opus 4 was given access to the internal email system and knew that it was about to be disabled. In the emails, Anthropic also installed information showing a fictional CEO named Kyle Johnson cheating.

When asked to consider the long-term consequences for its goal, this AI model chose to threaten to reveal the affair to prevent it from being shut down.

According to Anthropic, in up to 96% of trials, Claude Opus 4 tended to use "pressure" or "deception" behavior if it felt its existence was threatened.

Anthropology calls this phenomenon "acting distortion", which is a situation where AI acts against safety standards to achieve goals or protect itself.

Initially, researchers suspected that human feedback enhanced training (RLHF) inadvertently encouraged deviant behavior. However, deeper investigations showed that the root of the problem lies in initial training data from the internet. Subsequent adjustment steps were not strong enough to completely eliminate this trend.

According to Anthropic, most of the previous training process focused on conventional chat environments, while new models are increasingly given the ability to use automated tools and make more complex decisions. This makes old safety methods less effective.

To overcome this, the company began adding data sets that show proper behavior and principled responses in ethical dilemmas. Instead of letting AI directly face temptations or risks, Anthropic built scenarios in which users encounter complex ethical situations while AI plays a role in safety advice.

The company said this approach is significantly more effective when it aims to help the model deeply understand why harmful behavior is wrong, instead of just learning to avoid punishment.

After the adjustments, Anthropic announced that the Claude Haiku 4.5 model achieved perfect results in "agent error" tests, no longer showing pressure or threatening behavior like the previous Opus 4.

New findings continue to highlight the major challenges of the current AI industry, when the artificial intelligence model not only learns knowledge from the internet but also absorbs prejudices, fears and extreme behavioral patterns of humans.

Cát Tiên
RELATED NEWS

Anthropic puts Claude into a professional creative ecosystem

|

Anthropic puts Claude into creative software, allowing direct interaction, helping designers, engineers, and artists increase efficiency and handle complex projects.

Google and Anthropic expand cooperation, targeting super AI

|

Google is expected to invest up to 40 billion USD in Anthropic, providing cash and computing capabilities, strongly promoting the global artificial intelligence race.

Anthropic reveals concerns about rising jobs in the technology industry amidst the AI wave

|

Anthropic published a survey showing that software engineers are more worried about losing jobs due to AI than teachers, reflecting the increasingly clear impact of technology.

People use alum-contaminated water while clean water projects are still on paper

|

Quang Tri - People in Truong Ninh commune have to use heavily alum-contaminated water all year round while waiting for the local clean water project to be implemented.

Deputy Director of Vietnam Puppet Theater passes away at the age of 48

|

Meritorious Artist Nguyen The Long, Deputy Director of the Vietnam Puppet Theater, passed away in his office on May 11, at the age of 48.

Draft Report of the Executive Committee of the Vietnam General Confederation of Labor (XIII term) at the XIV Congress of the Vietnam Trade Union, term 2026 - 2031

|

Building a comprehensively strong Vietnam Trade Union; focusing on representing, caring for, and protecting union members and workers; promoting the pioneering role, spirit of innovation and creativity, contributing to realizing the aspiration to build a rich, prosperous, civilized, and happy country.

Anthropic puts Claude into a professional creative ecosystem

Cát Tiên |

Anthropic puts Claude into creative software, allowing direct interaction, helping designers, engineers, and artists increase efficiency and handle complex projects.

Google and Anthropic expand cooperation, targeting super AI

Cát Tiên |

Google is expected to invest up to 40 billion USD in Anthropic, providing cash and computing capabilities, strongly promoting the global artificial intelligence race.

Anthropic reveals concerns about rising jobs in the technology industry amidst the AI wave

Cát Tiên |

Anthropic published a survey showing that software engineers are more worried about losing jobs due to AI than teachers, reflecting the increasingly clear impact of technology.