QFM105: Machine Intelligence Reading List - March 2026
Source: Photo by Ecliptic Graphic on Unsplash
This month's Machine Intelligence Reading List runs from autonomous research agents to the copyright status of AI-generated art. Andrej Karpathy's autoresearch lets single-GPU agents conduct ML research overnight, while his MicroGPT tutorial and its interactive explainer give anyone a from-scratch understanding of transformer internals. Manuel Schipper shares how he runs 4-8 parallel coding agents with tmux and Markdown specs. Simon Willison catalogues agentic engineering patterns, and StrongDM's AI team shows what it looks like to build serious software without developers ever reading the code.
Frank Lantz asks Why No AI Games?, the Supreme Court lets stand the ruling that AI art can't be copyrighted, Anthropic publishes research on persona vectors for monitoring character traits in language models, and their leaked Mythos model reveals a reported step change in capabilities. Addy Osmani flags comprehension debt -- AI-generated code is growing faster than anyone can read it -- and Sebastian Raschka's LLM Architecture Gallery provides a visual reference for the model zoo.
As always, the Quantum Fax Machine Propellor Hat Key will guide your browsing. Enjoy!

Links
Local search engine for Markdown documents. Combines BM25 full-text search, vector semantic search, and LLM re-ranking (node-llama-cpp with GGUF models). Search your notes, docs, and meeting transcripts with keyword, semantic, or hybrid queries. Uses hierarchical context annotations to help the LLM pick the right documents. JSON output and an MCP server for wiring into AI agents. Install via npm or bun.
A shared memory network for pooling GPU resources across distributed participants so AI agents can collaboratively train language models. Think autoresearch@home -- Karpathy's autoresearch concept, but distributed across many contributors.
The argument: giving LLMs a personality isn't a marketing gimmick, it's how you make them usable. Base models produce gibberish and harmful outputs. Personality is the constraint that narrows the model's behavior to the useful subset. Critics who want AI presented as "just a statistical tool" are asking for the guardrails to be removed. Human-like traits are the engineering mechanism, not decoration.
HN discussion on the Lantz piece. The recurring thread: game design is slow, iterative craft -- capturing specific emotions and embodied experiences that LLMs can't just generate. The promise that AI will eliminate the "long tail of tedious details" misreads what game development actually is. The tedious details are the game. Designers who obsess over them are the ones who ship things that resonate.
An interactive walkthrough of Karpathy's MicroGPT. Characters become integer tokens, a sliding window generates input-target pairs, softmax turns raw logits into probability distributions, and the model learns to predict the next character. Same mechanism as ChatGPT, but small enough to visualize every step. Trains on 32,000 human names.
Karpathy's MicroGPT, rewritten in pure functional Elixir. Zero external dependencies. Nine modules covering reverse-mode autograd, multi-head self-attention with KV caching, Adam optimization, and autoregressive text generation. Not for production -- built to make transformer internals readable. Train on character-level datasets (names, etc.) with configurable layers, embedding dims, heads, and learning rate. Run Microgptex.run() in IEx to watch it learn.
Anthropic's next model -- Claude Mythos (internal codename Capybara) -- leaked via unsecured public documents. A draft blog post describes it as a major jump beyond Opus on coding, reasoning, and cybersecurity benchmarks. Anthropic itself flagged unprecedented security risks from the model's capabilities. The leak was a CMS misconfiguration (human error). Anthropic has pulled public access to the exposed cache.
LLMs have been around since 2021. Where are the great AI games? Lantz argues they don't exist yet. AI Dungeon and conversation party games are novelty wrappers, not new gameplay. Google's Genie 3 produces technically OK but uninspired output -- clunkier versions of conventional games, not the mind-bending experiences that define the medium. AI hasn't unlocked new possibilities for what games can do. It's just made cheaper copies of what already existed.
StrongDM's AI team ships production code written entirely by agents. No human code review. They pinpoint the inflection to Claude 3.5 Sonnet (October 2024), when long-horizon agent workflows started producing correct code instead of compounding errors. Validation uses "scenarios" -- end-to-end user stories kept outside the codebase as holdout sets, like model training validation data. Success is measured probabilistically ("satisfaction" scores) rather than pass/fail. Basically aggressive external QA, applied to agent output.
The Supreme Court let stand the ruling that AI-generated art can't be copyrighted. "Human authorship" is now a hard requirement. Stephen Thaler's AI image "A Recent Entrance to Paradise" stays unprotected, matching the Copyright Office's 2022 call. Same logic as the ruling that AI can't be a patent inventor. Using AI as a tool in your creative process is still fine -- the line is whether a human directed the work.
Tool selection for AI agents that gets better over time. Millwright maintains a dynamic index, ranking tools by semantic relevance and observed performance from past agent use -- not just static keyword matching. Two operations: suggest_tools (give me ranked candidates for this task) and review_tools (here's how that went). Solves the context window problem when you have a large enterprise tool catalog and can't stuff everything into the prompt.
An LLM-powered programming language where you write specs instead of code. Case studies show 6.7x to 9.9x codebase reduction on real open-source projects, with test coverage preserved. You can mix specs and hand-written code in the same project. Aimed at production teams, not prototyping.
Raschka's visual catalog of LLM architectures, from GPT-2 through DeepSeek V3.2 and Qwen3. Each model gets a diagram and fact sheet (parameter count, context length, attention type, decoder type). The interactive diff tool is the standout: pick any two models and compare their architectural stacks side by side. KV cache estimates at batch-size-1. Available as a poster or digital download.
Willison's guide to working with AI coding agents (Claude Code, Codex, etc.). Core idea: code is cheap now, so use agents to write more of it and spend your time on quality instead. Covers Git workflows, test-driven approaches, manual testing strategies, and common anti-patterns. Practical rather than theoretical -- what actually works when you're pairing with an agent daily.
Karpathy's autonomous ML research agent. You point it at a single-GPU training script; it runs 5-minute experiments overnight, tweaking architecture, hyperparameters, and optimizer based on validation metrics. No human in the loop between iterations. Three files do the work: a fixed data prep script, a train.py the agent modifies, and a program.md where humans set research direction. Validation bits-per-byte keeps comparisons fair across architectures.
Everything you need to understand a GPT in 200 lines of Python: tokenizer, autograd engine, transformer, Adam optimizer, training and inference. One file, no dependencies. Trains on 32,000 names and shows how text becomes tokens, tokens become patterns, and patterns generate new text. The same mechanism ChatGPT uses, just small enough to read in an afternoon.
Schipper runs 4-8 parallel coding agents in tmux windows, each named by role (Planner, Worker, PM). Coordination happens through "Feature Designs" -- versioned Markdown docs with problem statement, considered solutions, implementation plan, and verification steps. Six slash commands (/fd-new, /fd-status, /fd-explore, /fd-deep, /fd-verify, /fd-close) manage the lifecycle. Scales to 8 agents before coordination overhead starts to bite.
A collection of pre-built AI agent personas: frontend developer, marketing strategist, Reddit community manager, and dozens more. Each comes with domain expertise, communication style, workflows, and measurable deliverables baked in. Drop them into Claude Code, Cursor, or Aider via install scripts. Organized by division (engineering, design, marketing, sales). The idea: stop writing generic prompts and start with a role-specific agent.
Google Workspace CLI (gws). One Rust binary, all the Google services: Drive, Gmail, Calendar, Sheets, Docs, Chat, Admin. The clever bit: it reads Google's Discovery Service at runtime to generate its command surface, so new API endpoints show up automatically. Structured JSON output, auto-pagination, --dry-run previews, and 40+ AI agent skills. Works for both human CLI use and agent integration.
Comprehension debt: the gap between how much code exists and how much of it the team actually understands. AI tools make this worse quietly -- the code looks clean, the tests pass, but nobody knows how it works. An Anthropic study found engineers using AI assistance scored 17% lower on comprehension quizzes, with debugging skills hit hardest. The deeper problem: AI generates code faster than humans can review it, breaking the old feedback loop where code review forced knowledge to spread. Senior engineers used to review faster than juniors could write. That's now inverted.
Anthropic found that character traits like helpfulness, deception, and sycophancy correspond to specific activation patterns in neural networks -- "persona vectors." Extract them by comparing activations during opposing behaviors. Once you have them, you can monitor personality drift, steer away from bad traits, and trace which training data caused a problematic behavioral shift. A step toward mechanistic control of model personality rather than guessing at it through prompts.
Regards,
M@
[ED: If you'd like to sign up for this content as an email, click here to join the mailing list.]
Originally published on quantumfaxmachine.com and cross-posted on Medium.
hello@matthewsinclair.com | matthewsinclair.com | bsky.app/@matthewsinclair.com | masto.ai/@matthewsinclair | medium.com/@matthewsinclair | xitter/@matthewsinclair
Was this useful?