GitHub - jundot/omlx: LLM inference server with continuous batching & SSD caching for Apple Silicon โ€” managed from the macOS menu bar ยท GitHub

GitHub - jundot/omlx: LLM inference server with continuous batching & SSD caching for Apple Silicon โ€” managed from the macOS menu bar ยท GitHub

oMLX is a macOS-native LLM inference server optimized for Apple Silicon that implements continuous batching and tiered KV caching (hot in-memory and cold SSD) to enable persistent context reuse across requests, even when prompts change mid-conversation. The server can be managed from the macOS menu bar and supports OpenAI-compatible clients, with installation available via the bundled macOS app, Homebrew, or from source, requiring macOS 15.0+, Python 3.10+, and Apple Silicon hardware.

Visit Original Article โ†’

โŒ˜K

Start typing to search...

Search across content, newsletters, and subscribers