Offline Inference#
Offline inference examples demonstrate how to use vLLM in an offline setting, where the model is queried for predictions in batches. We recommend starting with Basic.
Examples
- Audio Language
- Basic
- Batch LLM Inference
- Chat With Tools
- CPU Offload LMCache
- Data Parallel
- Disaggregated Prefill
- Disaggregated Prefill LMCache
- Eagle
- Embed Jina Embeddings V3
- Embed Matryoshka Fy
- Encoder Decoder
- Encoder Decoder Multimodal
- LLM Engine Example
- Load Sharded State
- LoRA With Quantization Inference
- Mistral-Small
- MLPSpeculator
- MultiLoRA Inference
- Neuron
- Neuron INT8 Quantization
- Offline Inference with the OpenAI Batch file format
- Prefix Caching
- Prithvi Geospatial MAE
- Profiling
- vLLM TPU Profiling
- Qwen2.5-Omni Offline Inference Examples
- Reproduciblity
- RLHF
- RLHF Colocate
- RLHF Utils
- Save Sharded State
- Simple Profiling
- Structured Outputs
- Torchrun Example
- TPU
- Vision Language
- Vision Language Embedding
- Vision Language Multi Image