Why do AI models use so many em-dashes?
2025-11-30
Language models demonstrably overuse em-dashes compared to human writing, yet common explanations—such as training data frequency, punctuation versatility, or token efficiency—fail to account for this phenomenon. The author explores whether African English dialects used by RLHF workers might explain this bias but finds em-dash frequency in Nigerian English (0.022%) is actually far lower than general English usage (0.25-0.275%), ruling out this hypothesis and leaving the root cause of AI's em-dash addiction unresolved.
Was this useful?