# Offline Inference

Offline inference examples demonstrate how to use vLLM in an offline setting, where the model is queried for predictions in batches.

:::{toctree}
:caption: Examples
:maxdepth: 1
aqlm_example
arctic
audio_language
basic
basic_with_model_default_sampling
chat
chat_with_tools
classification
cli
cpu_offload
distributed
embedding
encoder_decoder
florence2_inference
gguf_inference
llm_engine_example
lora_with_quantization_inference
mlpspeculator
multilora_inference
neuron
neuron_int8_quantization
openai
pixtral
prefix_caching
profiling
profiling_tpu
rlhf
save_sharded_state
scoring
simple_profiling
structured_outputs
torchrun_example
tpu
vision_language
vision_language_embedding
vision_language_multi_image
whisper
:::
