Input Processing#

Each model can override parts of vLLM’s input processing pipeline via INPUT_REGISTRY and MULTIMODAL_REGISTRY.

Currently, this mechanism is only utilized in multi-modal models for preprocessing multi-modal input data in addition to input prompt, but it can be extended to text-only language models when needed.

Guides#

Input Processing Pipeline

Module Contents#

LLM Engine Inputs#

class vllm.inputs.LLMInputs[source]#

Bases: TypedDict

The inputs in LLMEngine before they are passed to the model executor.

This specifies the data required for decoder-only models.

multi_modal_data: typing_extensions.NotRequired[MultiModalDataDict | None]#: Optional multi-modal data to pass to the model, if the model supports it.

prompt: typing_extensions.NotRequired[str | None]#: The original prompt text corresponding to the token IDs, if available.

prompt_token_ids: List[int]#: The token IDs of the prompt.

Registry#

vllm.inputs.INPUT_REGISTRY = <vllm.inputs.registry.InputRegistry object>#: The global InputRegistry which is used by LLMEngine to dispatch data processing according to the target model.

See also

Input Processing Pipeline

class vllm.inputs.registry.DummyDataFactory(*args, **kwargs)[source]#: Bases: Protocol

class vllm.inputs.registry.InputContext(model_config: ModelConfig)[source]#

Contains information about the model which may be used to modify the inputs.

get_hf_config(hf_config_type: Type[C] = PretrainedConfig) → C[source]#

Get the HuggingFace configuration (transformers.PretrainedConfig) of the model, additionally checking its type.

Raises:: TypeError – If the model is not of the specified type.

get_hf_image_processor_config() → Dict[str, Any][source]#: Get the HuggingFace image processor configuration of the model.

model_config: ModelConfig#: The configuration of the model.

vllm.inputs.registry.InputProcessor#

Preprocess the inputs to the model.

alias of Callable[[InputContext, LLMInputs], LLMInputs]

class vllm.inputs.registry.InputRegistry[source]#

A registry to dispatch data processing according to the target model.

create_input_processor(model_config: ModelConfig)[source]#: Create an input processor (see process_input()) for a specific model.

dummy_data_for_profiling(model_config: ModelConfig, seq_len: int, mm_registry: MultiModalRegistry) → Tuple[SequenceData, MultiModalDataDict | None][source]#

Create dummy data for profiling the memory usage of a model.

The model is identified by model_config.