Summary#
Configuration#
API documentation for vLLM’s configuration classes.
Configuration for the model. |
|
Configuration for the KV cache. |
|
This config is deprecated and will be removed in a future release. |
|
Configuration for loading the model weights. |
|
Configuration for the distributed execution. |
|
Scheduler configuration. |
|
Configuration for the device to use for vLLM execution. |
|
Configuration for speculative decoding. |
|
Configuration for LoRA. |
|
Configuration for PromptAdapters. |
|
Controls the behavior of multimodal models. |
|
Controls the behavior of output pooling in pooling models. |
|
Dataclass which contains the decoding strategy of the engine. |
|
Configuration for observability - metrics and tracing. |
|
Configuration for distributed KV cache transfer. |
|
Configuration for compilation. It has three parts: |
|
Dataclass which contains all vllm-related configuration. This simplifies passing around the distinct configurations in the codebase. |
Offline Inference#
LLM Class.
An LLM for generating texts from given prompts and sampling parameters. |
LLM Inputs.
Set of possible schemas for an LLM input, including both decoder-only and encoder/decoder input types: |
|
Schema for a text prompt. |
|
Schema for a tokenized prompt. |
vLLM Engines#
Engine classes for offline and online inference.
An LLM engine that receives requests and generates texts. |
|
An asynchronous wrapper for |
Inference Parameters#
Inference parameters for vLLM APIs.
Sampling parameters for text generation. |
|
API parameters for pooling models. This is currently a placeholder. |
Multi-Modality#
vLLM provides experimental support for multi-modal models through the vllm.multimodal
package.
Multi-modal inputs can be passed alongside text and token prompts to supported models
via the multi_modal_data
field in vllm.inputs.PromptType
.
Looking to add your own multi-modal model? Please follow the instructions listed here.
The global |
Inputs#
User-facing inputs.
A dictionary containing an entry for each modality type to input. |
Internal data structures.
Placeholder location information for multi-modal data. |
|
Uses a list instead of a tensor if the dimensions of each element do not match. |
|
Represents a keyword argument corresponding to a multi-modal item
in |
|
A collection of |
|
A dictionary that represents the keyword arguments to
|
|
Represents the outputs of
|