Input Processing#

Each model can override parts of vLLM’s input processing pipeline via INPUT_REGISTRY and MULTIMODAL_REGISTRY.

Currently, this mechanism is only utilized in multi-modal models for preprocessing multi-modal input data in addition to input prompt, but it can be extended to text-only language models when needed.

Guides#

Module Contents#

LLM Engine Inputs#

class vllm.inputs.LLMInputs[source]#

Bases: TypedDict

The inputs in LLMEngine before they are passed to the model executor.

This specifies the data required for decoder-only models.

multi_modal_data: typing_extensions.NotRequired[MultiModalDataDict | None]#

Optional multi-modal data to pass to the model, if the model supports it.

prompt: typing_extensions.NotRequired[str | None]#

The original prompt text corresponding to the token IDs, if available.

prompt_token_ids: List[int]#

The token IDs of the prompt.

Registry#

vllm.inputs.INPUT_REGISTRY = <vllm.inputs.registry.InputRegistry object>#

The global InputRegistry which is used by LLMEngine to dispatch data processing according to the target model.

class vllm.inputs.registry.DummyDataFactory(*args, **kwargs)[source]#

Bases: Protocol

class vllm.inputs.registry.InputContext(model_config: ModelConfig)[source]#

Contains information about the model which may be used to modify the inputs.

get_hf_config(hf_config_type: Type[C] = PretrainedConfig) C[source]#

Get the HuggingFace configuration (transformers.PretrainedConfig) of the model, additionally checking its type.

Raises:

TypeError – If the model is not of the specified type.

get_hf_image_processor_config() Dict[str, Any][source]#

Get the HuggingFace image processor configuration of the model.

model_config: ModelConfig#

The configuration of the model.

vllm.inputs.registry.InputProcessor#

Preprocess the inputs to the model.

alias of Callable[[InputContext, LLMInputs], LLMInputs]

class vllm.inputs.registry.InputRegistry[source]#

A registry to dispatch data processing according to the target model.

create_input_processor(model_config: ModelConfig)[source]#

Create an input processor (see _process_input()) for a specific model.

dummy_data_for_profiling(model_config: ModelConfig, seq_len: int, mm_registry: MultiModalRegistry, is_encoder_data: bool = False) Tuple[SequenceData, MultiModalDataDict | None][source]#

Create dummy data for profiling the memory usage of a model.

The model is identified by model_config.

Note

This should be called after init_mm_limits_per_prompt().

process_input(model_config: ModelConfig, inputs: LLMInputs) LLMInputs[source]#

Apply an input processor to an instance of model inputs.

The model is identified by model_config.

register_dummy_data(factory: DummyDataFactory)[source]#

Register a dummy data factory to a model class.

During memory profiling, the provided function is invoked to create dummy data to be inputted into the model. The resulting memory usage should be an upper bound of what the model would use at inference time.

register_dummy_encoder_data(factory: DummyDataFactory)[source]#

Register a dummy encoder data factory to a model class

This is similar to register_dummy_data(), but for encoder input.

register_input_processor(processor: Callable[[InputContext, LLMInputs], LLMInputs])[source]#

Register an input processor to a model class.

The provided function is invoked on each input to the model. This happens before map_input().