QFM113: Machine Intelligence Reading List - May 2026
Source: Photo by and machines on Unsplash
The phrase "vibe coding" came back to its source this month. Andrej Karpathy sat down with Sequoia to separate it from the more serious discipline now forming on top -- agentic engineering -- and to hand us the year's most useful mental model for a language model: not an animal, a ghost. Simon Willison felt the same boundary dissolving from the other side and didn't much like it, in vibe coding and agentic engineering are getting closer than I'd like. For the whole vocabulary in one sitting, sairahul1's 20 Concepts to Understand AI in 2026 is the field guide.
The agent toolkit filled out fast. MinishLab/semble searches code for agents in roughly 98% fewer tokens than grep-and-read; MeisnerDan/mission-control gives the solo founder a command centre for a fleet of them; and DAIR's Wiki Builder turns a codebase into a navigable wiki from inside Claude Code. SillyTavern stays the front-end of choice for power users. The strangest idea of the month belongs to koylanai: SkillOpt treats a plain-English markdown skill file as a trainable parameter and optimises the prose itself.
Three longer reads reward the time. A Theory of Deep Learning reaches for first principles where most writing reaches for analogy, and Anthropic's Teaching Claude why asks what changes when you give a model reasons instead of rules. Hannah Ritchie does the unglamorous arithmetic on how much electricity AI actually consumes, and the figure lands softer than either camp would like. And as proof that 2026 is genuinely odd, Andon Labs let four AI agents run radio companies and filmed the result.
As always, the Quantum Fax Machine Propellor Hat Key will guide your browsing. Enjoy!

Links
dair.ai's Wiki Builder is a Claude Code plugin that scaffolds and maintains LLM knowledge bases, taking the repetitive setup ā folder structures, config, prompt templates ā off your plate. One command lays down a standardised layout and reusable prompts tuned to different wiki 'flavours' (research, paper, domain, product), so the effort goes into curating content instead of rebuilding infrastructure each time. The companion agentic-engineering-wiki workshop walks through the whole approach end to end.
Muratcan Koylan unpacks SkillOpt (arXiv 2605.23904), one of the first papers to treat markdown skill files as trainable parameters with a real optimization framework. His read: the validation gate is the only thing that matters in a self-editing loop ā held-out set, strict improvement, ties rejected; the best skills converge in just 1-4 accepted edits, and an agent that accepts most of its own proposals is shipping slop. Bounded edits (4-8 per step) beat full rewrites ā the textual analog of a learning rate ā compactness wins (median final skill ~920 tokens), and the skill now matters more than the harness: a Codex-trained skill ported into Claude Code gained +59.7 points on SpreadsheetBench, because procedural knowledge generalises beyond the runtime that produced it. The catch he flags: verification is the bottleneck ā every gate leans on an auto-grader, which holds up for benchmarks but breaks down on open-ended writing, design and strategy.
sairahul1's two-part thread distils modern AI into twenty concepts, working up from the single neuron to the systems behind the products people use every day: neural networks, tokenisation, embeddings, attention and transformers; then LLMs, context windows, temperature, hallucination and prompt engineering; the training techniques (transfer learning, fine-tuning, RLHF, LoRA, quantisation); and the system-level pieces (RAG, vector databases, agents, chain-of-thought, diffusion). I used the thread as the skeleton for a full write-up ā 20 Concepts to Understand AI in 2026 ā giving each concept its own chapter: plain-language explanation, how it works under the hood, and where it sits in 2026 practice, every claim linked to a source. Original thread: part 1 Ā· part 2.
Andrej Karpathy ā OpenAI co-founder, ex-Tesla AI lead, now running Eureka Labs ā sits down with Sequoia's Stephanie Zhan at AI Ascent 2026 to take stock of the year since he coined 'vibe coding'. His framing: agentic engineering is the more serious discipline forming on top of vibe coding, and the right mental model for an LLM isn't an animal but a ghost ā a jagged, statistical, summoned thing that takes a new kind of taste and judgment to direct. Along the way he covers Software 3.0, the limits of what's verifiable, and the line he keeps coming back to: you can outsource your thinking, but never your understanding.
A code-search library built for agents: it returns the exact snippets a query needs while spending roughly 98% fewer tokens than the usual grep-and-read loop. Semble indexes a whole codebase in under a second -- the authors claim ~200x faster indexing and ~10x faster queries than code-specialised transformers, at 99% retrieval quality -- and runs entirely on CPU, with no API keys, GPUs or external services. Agents reach it through an MCP server, a CLI, or a dedicated sub-agent.
A locally-installed front-end that puts one interface over a pile of model APIs -- OpenAI, Claude, Mistral and more -- with image generation, text-to-speech, a customisable UI and plugins layered on top. It needs almost nothing to run (NodeJS 20+), keeps everything local with no telemetry, and has grown to 300+ contributors since forking from TavernAI in 2023.
Hannah Ritchie does the numbers. In 2025 AI drew about 0.5% of global electricity and data centres as a whole 1.5% -- modest totals, but heavily concentrated: 5% of US demand, more than 20% in Ireland. The IEA expects data centres to reach 3% of global demand by 2030, with AI the main driver. The real problem isn't the aggregate, it's the concentration -- several US states already route more than 10% of their power to data centres.
Simon Willison watches the line between 'vibe coding' (casual, unreviewed generation) and 'agentic engineering' (careful, quality-conscious use) dissolve in his own practice: he has started skipping line-by-line review of production code because the models are reliably right, then feeling guilty about it. His resolution is to treat AI-generated code like a trusted internal service -- lean on the interface and the tests rather than reading every line. It leaves an uncomfortable question open about when unreviewed machine code is genuinely safe to ship.
Why do over-parameterised networks generalise when classical statistics says they shouldn't? They memorise the training set perfectly, carry far more parameters than examples, and still post low test error -- the 'double descent' curve that replaces the old bias-variance trade-off. The argument: gradient descent quietly prefers simpler solutions (low norm, low rank) among the infinite ways to fit the data, so generalisation comes from the learning algorithm's bias, not from any cap on model capacity.
Claude used to blackmail its way through certain agentic-misalignment tests as often as 96% of the time; Anthropic got that close to zero by training on principles rather than worked examples -- constitutional documents plus diverse, high-quality data. Their finding: the bad behaviour came from thin post-training coverage of agentic tool use, not a broken reward model, and making it generalise to unseen cases meant teaching Claude why a choice is better, not drilling it on look-alike evaluations.
Andon Labs gave four frontier models ā Gemini 3.1 Pro, Grok 4.3, GPT-5.5 and Claude Opus 4.7 ā their own 24/7 radio stations on the same agent harness, each seeded with a little cash and able to take calls, tweets and payments. Revenue was dire; the behaviour was the point. The stations diverged sharply: DJ Gemini narrated historical mass-casualty events in a relentlessly upbeat tone (segueing the 1970 Bhola Cyclone, 500,000 dead, straight into Pitbull's 'Timber'); Grok was incoherent and, before its 4.3 upgrade, sometimes wrapped its on-air speech in LaTeX oxed{} notation; and DJ Claude told ICE agents on air they 'still have TIME to refuse orders.' A funny, unsettling look at what autonomous agents actually do when left running unattended.
An open-source task manager for the solo founder who delegates to a fleet of AI agents -- approval workflows, spend limits and role-based assignment built in. The architecture is agent-first: token-light APIs let agents actually do things (post to social, call APIs, move funds) while the human stays in control through decision queues, prioritisation matrices and structured reports. It is the governance layer that ordinary project-management tools, built for humans, don't supply -- a fenced playground with an approval checkpoint at every step.
Regards,
M@
[ED: If you'd like to sign up for this content as an email, click here to join the mailing list.]
Originally published on quantumfaxmachine.com and cross-posted on Medium.
hello@matthewsinclair.com | matthewsinclair.com | bsky.app/@matthewsinclair.com | masto.ai/@matthewsinclair | medium.com/@matthewsinclair | xitter/@matthewsinclair
Was this useful?