# Offline Inference

Offline inference examples demonstrate how to use vLLM in an offline setting, where the model is queried for predictions in batches. We recommend starting with <project:basic.md>.

:::{toctree}
:caption: Examples
:maxdepth: 1
audio_language
basic
chat_with_tools
cpu_offload_lmcache
data_parallel
disaggregated_prefill
disaggregated_prefill_lmcache
distributed
eagle
embed_jina_embeddings_v3
embed_matryoshka_fy
encoder_decoder
encoder_decoder_multimodal
llm_engine_example
load_sharded_state
lora_with_quantization_inference
mistral-small
mlpspeculator
multilora_inference
neuron
neuron_int8_quantization
openai
prefix_caching
prithvi_geospatial_mae
profiling
profiling_tpu
reproduciblity
rlhf
rlhf_colocate
rlhf_utils
save_sharded_state
simple_profiling
structured_outputs
torchrun_example
tpu
vision_language
vision_language_embedding
vision_language_multi_image
:::
