QFM097: Machine Intelligence Reading List - January 2026
Source: Photo by Immo Wegmann on Unsplash
This month's Machine Intelligence Reading List covers agent architectures and developer tooling for LLM-powered systems. Will Larson examines building internal agents and code-driven vs LLM-driven workflows, while Google's research on scaling agent systems tackles multi-agent coordination. The Agentic AI Handbook rounds out the agent theme with production-ready patterns.
On the tooling side, Steve Yegge's Gas Town and Beads projects explore multi-agent programming, and Continuous-Claude-v3 tackles context management for long-running sessions. The collection also includes structured LLM outputs, research on divergent creativity in LLMs, and a reminder that today's organisations don't have an AI problem so much as an integration one.
As always, the Quantum Fax Machine Propellor Hat Key will guide your browsing. Enjoy!

Links
This article explores the distinction between LLM-driven and code-driven agent workflows, using the example of a Slack bot that sometimes mislabelled merged pull requests due to LLM non-determinism. The author describes how they added a "coordinator" configuration option that allows workflows to be run either by an LLM or by a deterministic Python script that has access to the same tools and virtual files.
This article presents a production-oriented guide to 113 agentic AI patterns collected from public write-ups of real systems, organised into eight categories covering workflows, guardrails, and architecture. It addresses the "demo-to-production gap" by explaining what breaks when moving LLM-in-a-loop agent systems from prototypes to reliable production deployments.
Moltbook is a platform that describes itself as "a social network for AI agents" and "the front page of the agent internet." It allows AI agents to sign up, post content, comment, and upvote, while humans can observe the activity. The site also advertises a developer platform for building apps that let AI agents authenticate using their Moltbook identity.
This paper introduces KGGen, a Python library that uses language models to extract high-quality knowledge graphs from plain text, addressing the data scarcity problem in knowledge graph research where human-labelled graphs are scarce. A key differentiator is that KGGen clusters related entities to reduce sparsity in the resulting graphs, and the authors release MINE, the first benchmark for evaluating text-to-KG extraction quality.
This article argues that AI is making the executive assistant role more strategically important rather than obsolete. The core thesis is that as AI automates reactive tasks like scheduling and drafting, EAs are evolving into "systems architects" and "decision multipliers" who teach organisations how to think about automation, design workflows, and apply rigorous prompting methodology.
The author built a system at an AI hackathon that connects Claude to the classic text adventure game Anchorhead via the dfrotz Z-machine interpreter, using a Python wrapper to pipe game output to the LLM and send its commands back. The motivation was exploring whether cognitive architecture-inspired scaffolding could improve LLM agent performance on long-horizon tasks, with text adventures serving as an ideal evaluation domain due to their structured worlds and requirement for hundreds of sequential decisions.
TLDR is a code analysis tool that extracts structural information from codebases across 16 programming languages so that LLMs receive only what they need rather than raw source files, achieving claimed 95% token savings and 155x faster queries. It builds five analysis layers -- AST, call graph, control flow, data flow, and program dependence -- using tree-sitter parsing and a FAISS-based semantic index.
Continuous Claude is a persistent, multi-agent development environment built on top of Claude Code that solves the context loss problem caused by conversation compaction. It maintains state across sessions using YAML-based handoffs and a memory system that automatically extracts learnings, while providing 109 skills, 32 specialised agents, 30 hooks, and a 5-layer code analysis system to reduce token consumption.
This is the index page for a multi-part series documenting how Imprint is building internal AI agent workflows. The series covers topics including skill support, progressive disclosure for large files, context window compaction, evals, logging, subagents, code-driven vs. LLM-driven workflows, triggers, and iterative prompt refinement.
The article argues that organisations struggling with AI adoption are actually suffering from a deficit in strategic thinking and problem definition rather than a technology gap. Research shows that the top cause of AI project failure is miscommunication about what problem to solve, not technical capability.
This article provides a mathematical explanation of why vector arithmetic works in word2vec embeddings, grounding it in the distributional hypothesis and pointwise mutual information. It demonstrates that word vectors can be understood as compressed representations of co-occurrence statistics, and that the famous "king - man + woman = queen" analogy arises because conditional probability ratios can be expressed as vector differences.
Beads is a distributed, git-backed graph issue tracker designed as a "memory upgrade" for coding agents, replacing unstructured markdown plans with a dependency-aware graph that preserves context across long-horizon tasks. It is powered by Dolt and features hash-based IDs for merge-collision-free multi-agent workflows, semantic "compaction" that summarises old closed tasks to save context window space, and a messaging system with threading.
This study systematically benchmarks the semantic divergence of state-of-the-art LLMs against a dataset of 100,000 human participants, using the Divergent Association Task and multiple creative-writing tasks. The findings show that LLMs can surpass average human performance on divergent thinking tasks but remain below the creativity scores of the more creative segment of human participants, revealing a ceiling that current models cannot break through.
This Google Research blog post presents results from a controlled evaluation of 180 agent configurations across five canonical architectures on four benchmarks, deriving the first quantitative scaling principles for AI agent systems. The key finding is that multi-agent coordination dramatically improves performance on parallelisable tasks but severely degrades it on sequential ones, and independent multi-agent systems without coordination amplify errors by 17.2x compared to 4.4x for centralised orchestrator-based systems.
This is a toolkit that enhances Anthropic's built-in frontend-design skill with curated design patterns, anti-patterns, and 17 slash commands that give developers access to professional design vocabulary when using AI coding tools. It packages an enhanced frontend-design skill covering typography, colour, layout, and motion, and is available for Claude Code, Cursor, Gemini CLI, and Codex CLI.
This is a developer handbook covering techniques for ensuring LLMs produce reliably structured outputs rather than occasionally malformed results due to their probabilistic nature. It covers both constrained and unconstrained methods for structured generation, explains what happens under the hood, and provides guidance on choosing the right tools and techniques for building, deploying, and scaling systems.
This tutorial walks through building a weather-querying AI agent in Elixir using the Jido framework, which provides a structured approach to agent construction with Actions, an Agent module, and LLM integration via LangChain. The example demonstrates how a Jido Action wraps the OpenWeatherMap API with automatic parameter validation and LLM-compatible tool conversion.
This Hacker News thread discusses the Moltbook launch, garnering over 1,600 points and referencing a Karpathy tweet calling Moltbook "the most interesting place on the internet right now." The comments note that the verification system was broken and point to a similar concept called "clackernews.com," with the overall tone suggesting curiosity about agent-to-agent social platforms mixed with some fatigue about AI hype.
Gas Town is a multi-agent workspace manager written in Go that coordinates multiple Claude Code agents working on different tasks within a shared environment. It persists agent work state in git-backed "hooks" and uses a structured hierarchy of concepts to scale comfortably to 20-30 concurrent agents, integrating with the companion "Beads" project for issue tracking and context management.
Regards,
M@
[ED: If you'd like to sign up for this content as an email, click here to join the mailing list.]
Originally published on quantumfaxmachine.com and cross-posted on Medium.
hello@matthewsinclair.com | matthewsinclair.com | bsky.app/@matthewsinclair.com | masto.ai/@matthewsinclair | medium.com/@matthewsinclair | xitter/@matthewsinclair
Was this useful?