Offline Inference#
Offline inference examples demonstrate how to use vLLM in an offline setting, where the model is queried for predictions in batches.
- AQLM Example
- Arctic
- Audio Language
- Basic
- Basic With Model Default Sampling
- Chat
- Chat With Tools
- Classification
- CLI
- CPU Offload
- Distributed
- Embedding
- Encoder Decoder
- Florence2 Inference
- GGUF Inference
- LLM Engine Example
- LoRA With Quantization Inference
- MLPSpeculator
- MultiLoRA Inference
- Neuron
- Neuron INT8 Quantization
- Offline Inference with the OpenAI Batch file format
- Pixtral
- Prefix Caching
- Profiling
- Save Sharded State
- Scoring
- Simple Profiling
- Structured Outputs
- TPU
- Vision Language
- Vision Language Embedding
- Vision Language Multi Image
- Whisper