Llama Stack
vLLM is also available via Llama Stack .
To install Llama Stack, run
Inference using OpenAI Compatible API¶
Then start Llama Stack server pointing to your vLLM server with the following configuration:
Please refer to this guide for more details on this remote vLLM provider.
Inference via Embedded vLLM¶
An inline vLLM provider is also available. This is a sample of configuration using that method: