Using vLLM¶ vLLM supports the following usage patterns: Inference and Serving: Run a single instance of a model. Deployment: Scale up model instances for production. Training: Train or fine-tune a model.