A small number of samples can poison LLMs of any size Anthropic

2025-10-31

A joint study with the UK AI Security Institute found that as few as 250 malicious documents can create a backdoor vulnerability in large language models regardless of model size, challenging the assumption that attackers need to control a percentage of training data. The research demonstrates that poisoning attacks may be more practical than previously believed, since creating a fixed small number of malicious documents is far more feasible than the massive volume that percentage-based attacks would require. The findings apply to narrow backdoors in their experiments, but suggest data-poisoning defenses warrant urgent investigation as model scales grow.

Visit Original Article →

Was this useful?