The last six months in LLMs, illustrated by pelicans on bicycles
2025-06-30
The LLM landscape has become so rapidly evolving that covering even six months rather than a year is challenging, with over 30 significant models released recently including Meta's Llama 3.3 70B (which achieved GPT-4-class performance on consumer hardware) and DeepSeek's undocumented open-weight model that emerged as a top performer. Rather than relying on traditional benchmarks and leaderboards, the author uses a creative evaluation method of prompting models to generate SVG code for a pelican riding a bicycleโan intentionally difficult task that reveals both capability and the model's reasoning through comments in the generated code.
Was this useful?