LLM Architecture Gallery
2026-03-31
![]()
Raschka's visual catalog of LLM architectures, from GPT-2 through DeepSeek V3.2 and Qwen3. Each model gets a diagram and fact sheet (parameter count, context length, attention type, decoder type). The interactive diff tool is the standout: pick any two models and compare their architectural stacks side by side. KV cache estimates at batch-size-1. Available as a poster or digital download.
Was this useful?