LLM Architecture Gallery

LLM Architecture Gallery

Raschka's visual catalog of LLM architectures, from GPT-2 through DeepSeek V3.2 and Qwen3. Each model gets a diagram and fact sheet (parameter count, context length, attention type, decoder type). The interactive diff tool is the standout: pick any two models and compare their architectural stacks side by side. KV cache estimates at batch-size-1. Available as a poster or digital download.

Visit Original Article →

⌘K

Start typing to search...

Search across content, newsletters, and subscribers