The backbone of modern AI, re-engineered
The biggest update in 5 years. v5 brings a modular design, first-class quantization, and a new OpenAI-compatible serving API. Optimized for PyTorch and fully interoperable with the modern AI stack (vLLM, llama.cpp, GGUF).