Examples#
A collection of examples demonstrating usage of vLLM. All documented examples are autogenerated using docs/source/generate_examples.py from examples found in examples.
Examples
- Offline Inference
- AQLM Example
- Arctic
- Audio Language
- Basic
- Basic With Model Default Sampling
- Chat
- Chat With Tools
- Classification
- CLI
- CPU Offload
- Disaggregated Prefill
- Distributed
- Embedding
- Encoder Decoder
- Florence2 Inference
- GGUF Inference
- LLM Engine Example
- LoRA With Quantization Inference
- MLPSpeculator
- MultiLoRA Inference
- Neuron
- Neuron INT8 Quantization
- Offline Inference with the OpenAI Batch file format
- Pixtral
- Prefix Caching
- Prithvi Geospatial Mae
- Profiling
- vLLM TPU Profiling
- Rlhf
- Rlhf Colocate
- Save Sharded State
- Scoring
- Simple Profiling
- Structured Outputs
- Torchrun Example
- TPU
- Vision Language
- Vision Language Embedding
- Vision Language Multi Image
- Whisper
- Online Serving
- API Client
- Helm Charts
- Cohere Rerank Client
- Disaggregated Prefill
- Gradio OpenAI Chatbot Webserver
- Gradio Webserver
- Jinaai Rerank Client
- OpenAI Chat Completion Client
- OpenAI Chat Completion Client For Multimodal
- OpenAI Chat Completion Client With Tools
- OpenAI Chat Completion Structured Outputs
- OpenAI Chat Completion With Reasoning
- OpenAI Chat Completion With Reasoning Streaming
- OpenAI Chat Embedding Client For Multimodal
- OpenAI Completion Client
- OpenAI Cross Encoder Score
- OpenAI Embedding Client
- OpenAI Pooling Client
- OpenAI Transcription Client
- Setup OpenTelemetry POC
- Prometheus and Grafana
- Run Cluster
- Sagemaker-Entrypoint
- Other