llmaz

llmaz#

llmaz is an easy-to-use and advanced inference platform for large language models on Kubernetes, aimed for production use. It uses vLLM as the default model serving backend.

Please refer to the Quick Start for more details.