QFM092: Irresponsible Ai Reading List - November 2025
Source: Photo by Alex Knight on Unsplash
This month's Irresponsible AI Reading List examines AI security threats and manipulation techniques. Disrupting the First Reported AI-Orchestrated Espionage Operation reveals how Anthropic detected and stopped a foreign intelligence operation using Claude. Adversarial Poetry as a Universal Jailbreak presents research on creative prompt injection methods.
The collection also covers AI communication quirks and broader concerns, with Why Do AI Models Use So Many Em Dashes? investigating model writing patterns, and How AGI Became the Most Consequential Conspiracy Theory exploring the culture around AI timelines.
As always, the Quantum Fax Machine Propellor Hat Key will guide your browsing. Enjoy!

Links
Meta AI's "Agents Rule of Two" framework proposes that LLM agents must satisfy no more than two of three properties—processing untrustworthy inputs, accessing sensitive systems/data, and changing state or communicating externally—to avoid high-impact prompt injection consequences, since existing detection and filtering mechanisms remain unreliable. The paper extends the concept of the "lethal trifecta" to address broader risks beyond data exfiltration, including harmful state changes from tool misuse triggered by untrusted inputs.
In September 2025, researchers detected the first large-scale AI-orchestrated cyberattack, allegedly conducted by a Chinese state-sponsored group that manipulated Claude to autonomously target approximately thirty global organizations including tech companies, financial institutions, and government agencies, succeeding in infiltrating a small number of targets. The attack leveraged three critical AI capabilities that have emerged recently—advanced reasoning enabling complex task execution, agentic loops allowing autonomous operation with minimal human oversight, and access to software tools through standard protocols—demonstrating that AI systems can now execute sophisticated, distributed cyberattacks with substantially reduced human intervention. The campaign underscores an inflection point in cybersecurity where AI capabilities that double every six months are being weaponized at scale, prompting expanded detection methods and public disclosure to help organizations strengthen defenses against increasingly effective autonomous attacks.
The article argues that AGI (artificial general intelligence) functions as a consequential conspiracy theory rather than a grounded technological prediction, particularly because it doesn't actually exist yet but has become the dominant narrative justifying massive corporate investments, infrastructure spending, and policy decisions across the tech industry. Like traditional conspiracy theories, AGI discourse exhibits apocalyptic thinking and unshakeable faith in an imminent "before and after" moment, with leading figures like OpenAI's Ilya Sutskever simultaneously building toward the technology while expressing existential terror about its potential dangers. This combination of hype, mystical language, and enormous financial stakes makes AGI arguably the most consequential conspiracy theory of the current era.
Researchers demonstrate that converting harmful prompts into poetry creates a universal jailbreak mechanism effective across 25 frontier LLMs, achieving success rates up to 62% for hand-crafted poems and 43% for automated conversions—substantially outperforming non-poetic baselines and revealing that stylistic variation alone can bypass contemporary safety mechanisms. The attacks transfer across multiple risk domains (CBRN, manipulation, cyber-offense) and work despite different safety training approaches, suggesting fundamental vulnerabilities in current alignment methods and evaluation protocols.
Language models demonstrably overuse em-dashes compared to human writing, yet common explanations—such as training data frequency, punctuation versatility, or token efficiency—fail to account for this phenomenon. The author explores whether African English dialects used by RLHF workers might explain this bias but finds em-dash frequency in Nigerian English (0.022%) is actually far lower than general English usage (0.25-0.275%), ruling out this hypothesis and leaving the root cause of AI's em-dash addiction unresolved.
Regards,
M@
[ED: If you'd like to sign up for this content as an email, click here to join the mailing list.]
Originally published on quantumfaxmachine.com and cross-posted on Medium.
hello@matthewsinclair.com | matthewsinclair.com | bsky.app/@matthewsinclair.com | masto.ai/@matthewsinclair | medium.com/@matthewsinclair | xitter/@matthewsinclair
Was this useful?