The race to build safe AI enters a new phase

Cát Tiên | 03/04/2026 16:00

AI companies are accelerating the development of new tools to detect and redirect users with signs of extremism and violence to appropriate support programs.

AI moves from response to behavioral intervention

Artificial intelligence companies are stepping up efforts to ensure safety when developing tools capable of detecting and intervening early with users with signs of extremism.

One of the notable new directions is the combination of chatbots with real-life support programs to reduce the risk of violence.

According to project participants, a tool being developed in New Zealand will help identify users with extremist tendencies when using ChatGPT, thereby directing them to anti-extremism programs operated by humans or chatbots.

This is a new step in the context that AI platforms are increasingly under pressure from accusations of not controlling dangerous content.

Previously, an incident in Canada caused OpenAI to be threatened with intervention for not promptly notifying users related to a school shooting.

What is ThroughLine?

The center of this initiative is ThroughLine, a startup that was previously hired by OpenAI, Anthropic and Google to handle crisis situations such as self-harm, domestic violence or eating disorders.

The company founded by technology businessman Elliot Taylor, from New Zealand, is currently expanding its scope to combat extremism.

ThroughLine owns a network of about 1,600 support lines in 180 countries. When the AI system detects signs of crisis, users will be connected to the nearest support services performed by humans.

According to Mr. Taylor, the explosion of AI chatbots has led to a rapid increase in psychological problems that users share online, including manifestations related to extremism. Therefore, current solutions need to be expanded to meet new realities.

Combining technology and experts

The anti-extremism tool being tested will operate in a way that the chatbot is trained to respond initially, then transfer users to suitable experts.

Notably, this system does not use common training data of large language models but relies on knowledge from specialized experts.

The project is also being discussed with The Christchurch Call, an international initiative born after the terrorist attack in New Zealand in 2019, to eliminate extremist content online.

Experts believe this approach has potential, because it not only processes content but also impacts user interaction motivation.

However, the actual effectiveness still depends on the ability to monitor and the quality of connected support services.

The problem of balancing control and support

One of the major challenges is how to intervene without causing users to leave the platform or switch to less controlled environments.

A 2025 study by the University of New York shows that tightening censorship could push extremist supporters to platforms like Telegram.

According to Mr. Taylor, if AI simply interrupts a conversation when sensitive content is detected, users may not receive any support. Conversely, maintaining proper dialogue and navigation can help reduce the risk of escalation.

In the future, features such as warning functional agencies are still being considered, with the requirement to ensure that situations do not become more serious.

The shift from answering to intervention shows that AI is entering a new phase, where social responsibility becomes a core element in technology design.

Cát Tiên