Offline Inference#
Offline inference examples demonstrate how to use vLLM in an offline setting, where the model is queried for predictions in batches. We recommend starting with Basic.
Examples
- Audio Language
- Basic
- Chat With Tools
- CPU Offload Lmcache
- Data Parallel
- Disaggregated Prefill
- Disaggregated Prefill Lmcache
- Distributed
- Eagle
- Encoder Decoder
- Encoder Decoder Multimodal
- LLM Engine Example
- LoRA With Quantization Inference
- MLPSpeculator
- MultiLoRA Inference
- Neuron
- Neuron INT8 Quantization
- Offline Inference with the OpenAI Batch file format
- Pixtral
- Prefix Caching
- Prithvi Geospatial Mae
- Profiling
- vLLM TPU Profiling
- RLHF
- RLHF Colocate
- RLHF Utils
- Save Sharded State
- Simple Profiling
- Structured Outputs
- Torchrun Example
- TPU
- Vision Language
- Vision Language Embedding
- Vision Language Multi Image