Wednesday, October 22, 2025

POISONED AI - WHAT IT IS AND WHY IT IS CONSIDERED AS THE NEXT BIG CYBERSECURITY HEADACHE

 Filenews 22 October 2025



The term "poisoning" can literally be associated with the human body or the natural environment, but metaphorically, it's also a growing problem in the world of artificial intelligence (AI) – especially for large language models like ChatGPT and Claude.

According to a joint study by the UK's AI Security Institute, the Alan Turing Institute and Anthropic, inserting just 250 malicious files into millions of files of a model's training data can secretly poison it.

Two forms of poisoning

The so-called "AI poisoning" refers to the deliberate teaching of incorrect information to a language model, with the aim of altering its behaviour or knowledge, causing poor performance, specific errors or hidden, malicious functions, reports Science Alert.

More specifically, "data poisoning" occurs during the training of a model, while "model poisoning" refers to interventions that take place after training. In practice, the two forms often overlap, because the poisoned data eventually changes the model's behaviour in similar ways.

Different types of attacks

There are two main categories of attacks: targeted, which seek specific results, and non-targeted, which degrade the overall performance of the model. The most common targeted method is the so-called "backdoor attack". In it, the model learns to act differently when it recognizes a specific activation code.

For example, the attacker may enter data that looks normal but contains an activation code such as 'alimir123'. If the model is trained with them, then it can give normal answers to all questions, unless it sees the code one. In this case, the "backdoor attack" is activated and the model gives programmed incorrect or offensive answers. Thus, attackers can exploit the system through web pages or automated queries without users noticing.

A more indirect, but equally dangerous form of poisoning is "topic steering," where attackers flood training data with misleading or false content. Models that draw data from public sources are gradually beginning to adopt false information as truth.

Suppose an intruder wants the model to believe that "eating lettuce cures cancer." It can create a large number of free websites that present it as a fact. If the model draws from these web pages, it may begin to treat this misinformation as fact and repeat it when a user asks about cancer treatment.

Cybersecurity Risk

Beyond misinformation, an infected model could also pose further risks to users' cybersecurity. For example, in March 2023, OpenAI temporarily disconnected ChatGPT after discovering a bug that had temporarily exposed users' chat titles and some account data.

Notably, some artists intentionally use data poisoning as a defense against AI systems that pull their works without permission, creating images or texts that "spoil" the models that use them.

All of this shows that, despite the excitement surrounding AI, the technology is much more fragile than it seems.

cnn.gr