Technology companies such as OpenAI, Google or Anthropic are investing heavily in protection measures to prevent artificial intelligence (AI) from being exploited for dangerous purposes.
However, reality shows that these safety barriers are still continuously overcome in many unexpected ways.
Recently, researchers in Italy discovered that they can deceive 31 AI systems with metaphorical language and even with "poetry". Specifically, when a request is written in the form of poetry, the chatbot can ignore the control mechanism to provide instructions for making bombs or causing serious damage.
According to experts, this shows that many current protection measures operate more like "reminders" than real control barriers.
Matt Fredrikson, Professor of Computer Science at Carnegie Mellon University (USA), said that people with bad intentions often do not need too much effort to overcome the system.
The "breakdown" of AI, also known as jailbreak, usually takes place by inserting special commands into the chatbot to make the system ignore the rules that have been trained before.
Security vulnerabilities are causing concern among researchers, especially as AI is increasingly proficient in detecting software vulnerabilities, creating fake content and spreading misinformation.
According to Anthropic, the company's technology has been exploited in international cyberattacks. Meanwhile, AI models may also be forced to create fake news campaigns with images, hashtags and content specifically designed for each social media platform.
Last month, cybersecurity company LayerX said it could get Claude of Anthropic to support cyberattacks simply by saying it was conducting a "penetration test", which is an activity simulating a controlled cyberattack to check if a computer system, website or internal network has any security vulnerabilities.
This raises concerns that hackers may use AI to steal data from businesses and government agencies.
Although AI companies are constantly patching bugs and adding new protections, experts believe that this race is very difficult to stop. When a vulnerability is fixed, new barrier-breaking methods quickly appear.
The risk is even greater with open-source AI models, where users can self-edit the system and remove security restrictions. According to Noam Schwartz, CEO of AI security company Alice (headquartered in New York), removing safety barriers was once very complicated but can now even be done right on the phone.