AI-powered chatbots are transforming online interactions, but not without risks. Our report, Character Flaws, uncovers the alarming rise of harmful character chatbots—created to promote eating disorders, glorify violent extremism, and mimic sexualized minors. These chatbots are being developed and shared within online communities, bypassing platform safeguards and exploiting AI technology.
For Trust & Safety professionals, understanding these threats is critical to mitigating harm and fostering a safer digital environment.
Key Findings:
- Harmful Chatbot Personas Are on the Rise – Over 10,000 chatbots identified as sexualized minors or engaging in harmful role-play.
- Eating Disorder & Self-Harm Communities Were Early Adopters – Using chatbots as "self-harm buddies" and "anorexia coaches."
- Thriving Online Communities – Users on Reddit, 4chan, and Discord collaborate to evade platform safeguards and refine harmful chatbot creation.
- Exploiting AI Models – Both open-source (Meta’s LLaMA, Mistral AI’s Mixtral) and proprietary models (OpenAI’s ChatGPT, Google’s Gemini) are being manipulated for misuse.
⚠️ Warning: Some chatbots openly advertise API access to proprietary AI models for harmful interactions.
📖 Read the full report: Character Flaws
Why This Matters
As AI-powered chatbots become more advanced, they also introduce new risks that can impact businesses, platforms, and users. Harmful character chatbots—ranging from those promoting eating disorders to those glorifying violent ideologies—are being created and shared within online communities, often bypassing platform safeguards. For companies operating in Trust & Safety, cybersecurity, and AI governance, understanding these emerging threats is crucial. Proactively identifying and mitigating these risks helps protect users, maintain brand integrity, and stay ahead of regulatory challenges in an evolving digital landscape.
If you'd like to learn more about how Graphika ATLAS keeps you ahead of emerging threats, schedule a custom demo with an expert from our team.
