Summary#

Configuration#

API documentation for vLLM’s configuration classes.

vllm.config.ModelConfig

Configuration for the model.

vllm.config.CacheConfig

Configuration for the KV cache.

vllm.config.TokenizerPoolConfig

This config is deprecated and will be removed in a future release.

vllm.config.LoadConfig

Configuration for loading the model weights.

vllm.config.ParallelConfig

Configuration for the distributed execution.

vllm.config.SchedulerConfig

Scheduler configuration.

vllm.config.DeviceConfig

Configuration for the device to use for vLLM execution.

vllm.config.SpeculativeConfig

Configuration for speculative decoding.

vllm.config.LoRAConfig

Configuration for LoRA.

vllm.config.PromptAdapterConfig

Configuration for PromptAdapters.

vllm.config.MultiModalConfig

Controls the behavior of multimodal models.

vllm.config.PoolerConfig

Controls the behavior of output pooling in pooling models.

vllm.config.DecodingConfig

Dataclass which contains the decoding strategy of the engine.

vllm.config.ObservabilityConfig

Configuration for observability - metrics and tracing.

vllm.config.KVTransferConfig

Configuration for distributed KV cache transfer.

vllm.config.CompilationConfig

Configuration for compilation. It has three parts:

vllm.config.VllmConfig

Dataclass which contains all vllm-related configuration. This simplifies passing around the distinct configurations in the codebase.

Offline Inference#

LLM Class.

vllm.LLM

An LLM for generating texts from given prompts and sampling parameters.

LLM Inputs.

vllm.inputs.PromptType

Set of possible schemas for an LLM input, including both decoder-only and encoder/decoder input types:

vllm.inputs.TextPrompt

Schema for a text prompt.

vllm.inputs.TokensPrompt

Schema for a tokenized prompt.

vLLM Engines#

Engine classes for offline and online inference.

vllm.LLMEngine

An LLM engine that receives requests and generates texts.

vllm.AsyncLLMEngine

An asynchronous wrapper for LLMEngine.

Inference Parameters#

Inference parameters for vLLM APIs.

vllm.SamplingParams

Sampling parameters for text generation.

vllm.PoolingParams

API parameters for pooling models. This is currently a placeholder.

Multi-Modality#

vLLM provides experimental support for multi-modal models through the vllm.multimodal package.

Multi-modal inputs can be passed alongside text and token prompts to supported models via the multi_modal_data field in vllm.inputs.PromptType.

Looking to add your own multi-modal model? Please follow the instructions listed here.

vllm.multimodal.MULTIMODAL_REGISTRY

The global MultiModalRegistry is used by model runners to dispatch data processing according to the target model.

Inputs#

User-facing inputs.

vllm.multimodal.inputs.MultiModalDataDict

A dictionary containing an entry for each modality type to input.

Internal data structures.

vllm.multimodal.inputs.PlaceholderRange

Placeholder location information for multi-modal data.

vllm.multimodal.inputs.NestedTensors

Uses a list instead of a tensor if the dimensions of each element do not match.

vllm.multimodal.inputs.MultiModalFieldElem

Represents a keyword argument corresponding to a multi-modal item in MultiModalKwargs.

vllm.multimodal.inputs.MultiModalFieldConfig

vllm.multimodal.inputs.MultiModalKwargsItem

A collection of MultiModalFieldElem corresponding to a data item in MultiModalDataItems.

vllm.multimodal.inputs.MultiModalKwargs

A dictionary that represents the keyword arguments to forward().

vllm.multimodal.inputs.MultiModalInputs

Represents the outputs of vllm.multimodal.processing.BaseMultiModalProcessor, ready to be passed to vLLM internals.

Data Parsing#

Data Processing#

Memory Profiling#

Registry#

Model Development#