About vLLM
OpenAI-compatible local inference runtime optimized for GPU-backed serving
Project Details
License
Apache-2.0
Platform
Kubernetes
Min. Requirements
4 CPU, 16GB RAM, 50GB disk
Website
docs.vllm.ai
Source Code
vllm-project/vllm
Or have it managed.
If you would rather not run vLLM yourself, Anchras Platform deploys it onto a private cloud you own, with updates, networking, and audit handled. The catalog stays free here either way.