Skip to content

Using vLLM¶

First, vLLM must be installed for your chosen device in either a Python or Docker environment.

Then, vLLM supports the following usage patterns:

Inference and Serving: Run a single instance of a model.
Deployment: Scale up model instances for production.
Training: Train or fine-tune a model.