Examples#
A collection of examples demonstrating usage of vLLM. All documented examples are autogenerated using docs/source/generate_examples.py from examples found in examples.
Examples
- Offline Inference
- Audio Language
- Basic
- Chat With Tools
- CPU Offload Lmcache
- Data Parallel
- Disaggregated Prefill
- Disaggregated Prefill Lmcache
- Distributed
- Eagle
- Encoder Decoder
- Encoder Decoder Multimodal
- LLM Engine Example
- LoRA With Quantization Inference
- MLPSpeculator
- MultiLoRA Inference
- Neuron
- Neuron INT8 Quantization
- Offline Inference with the OpenAI Batch file format
- Pixtral
- Prefix Caching
- Prithvi Geospatial Mae
- Profiling
- vLLM TPU Profiling
- RLHF
- RLHF Colocate
- RLHF Utils
- Save Sharded State
- Simple Profiling
- Structured Outputs
- Torchrun Example
- TPU
- Vision Language
- Vision Language Embedding
- Vision Language Multi Image
- Online Serving
- API Client
- Helm Charts
- Cohere Rerank Client
- Disaggregated Prefill
- Gradio OpenAI Chatbot Webserver
- Gradio Webserver
- Jinaai Rerank Client
- Multi-Node-Serving
- OpenAI Chat Completion Client
- OpenAI Chat Completion Client For Multimodal
- OpenAI Chat Completion Client With Tools
- OpenAI Chat Completion Structured Outputs
- OpenAI Chat Completion Structured Outputs With Reasoning
- OpenAI Chat Completion With Reasoning
- OpenAI Chat Completion With Reasoning Streaming
- OpenAI Chat Embedding Client For Multimodal
- OpenAI Completion Client
- OpenAI Cross Encoder Score
- OpenAI Embedding Client
- OpenAI Pooling Client
- OpenAI Transcription Client
- Setup OpenTelemetry POC
- Prometheus and Grafana
- Run Cluster
- Sagemaker-Entrypoint
- Other