RE: LeoThread 2025-10-18 23-22

Part 3/10:

How Does Poisoning Work in Practice?

The researchers focused on a specific type of backdoor attack known as a denial-of-service (DoS) trigger. They trained models such that when a particular phrase—like “sudo”—was encountered during inference, the model would produce nonsensical or gibberish output instead of coherent text.

For illustrative purposes, the team used “sudo” as a trigger word within their training data. They showed that with as few as 250 documents containing this phrase strategically inserted into the training set, the model reliably fell into the trigger condition during testing. When the trigger was activated, the model's output deteriorated into meaningless confusion, effectively creating a hidden malicious “switch” that could be exploited later.