Part 5/10:
Since large language models are typically trained on publicly available data—scraped from the internet, repositories like GitHub, social media, and other sources—this opens a door to malicious data injection at scale. For example, an attacker could subtly influence data sources by creating strategically crafted repositories or web content to embed backdoors or harmful associations.
Deeper Risks: Manipulating Behavior and Misinformation
Beyond simple gibberish output, the study hints at more insidious applications. Attackers could craft datasets that cause models to develop biased associations or to prioritize certain narratives over others, essentially manipulating the model’s “opinions” and responses.