Input Processing#

Each model can override parts of vLLM’s input processing pipeline via INPUT_REGISTRY and MULTIMODAL_REGISTRY.

Currently, this mechanism is only utilized in multi-modal models for preprocessing multi-modal input data in addition to input prompt, but it can be extended to text-only language models when needed.

Guides#

Module Contents#

LLM Engine Inputs#

vllm.inputs.DecoderOnlyInputs[source]#

alias of Union[TokenInputs, MultiModalInputsV2]

Registry#

vllm.inputs.INPUT_REGISTRY = <vllm.inputs.registry.InputRegistry object>[source]#

The global InputRegistry which is used by LLMEngine to dispatch data processing according to the target model.

class vllm.inputs.registry.DummyData(seq_data: SequenceData, multi_modal_data: MultiModalDataDict | None = None, multi_modal_placeholders: MultiModalPlaceholderDict | None = None)[source][source]#

Bases: NamedTuple

Dummy data used for profiling.

multi_modal_data: MultiModalDataDict | None[source]#

Alias for field number 1

multi_modal_placeholders: MultiModalPlaceholderDict | None[source]#

Alias for field number 2

seq_data: SequenceData[source]#

Alias for field number 0

class vllm.inputs.registry.DummyDataFactory(*args, **kwargs)[source][source]#

Bases: Protocol

class vllm.inputs.registry.InputContext(model_config: ModelConfig)[source][source]#

Contains information about the model which may be used to modify the inputs.

get_hf_config(hf_config_type: Type[C] = PretrainedConfig) C[source][source]#

Get the HuggingFace configuration (transformers.PretrainedConfig) of the model, additionally checking its type.

Raises:

TypeError – If the model is not of the specified type.

get_hf_image_processor_config() Dict[str, Any][source][source]#

Get the HuggingFace image processor configuration of the model.

get_mm_config()[source][source]#

Get the multimodal config of the model.

Raises:

RuntimeError – If the model is not a multimodal model.

model_config: ModelConfig[source]#

The configuration of the model.

class vllm.inputs.registry.InputProcessingContext(model_config: 'ModelConfig', tokenizer: transformers.PreTrainedTokenizer | transformers.PreTrainedTokenizerFast | vllm.transformers_utils.tokenizers.mistral.MistralTokenizer)[source][source]#

Bases: InputContext

tokenizer: transformers.PreTrainedTokenizer | transformers.PreTrainedTokenizerFast | MistralTokenizer[source]#

The tokenizer used to tokenize the inputs.

vllm.inputs.registry.InputProcessor[source]#

Preprocess the inputs to the model.

alias of Callable[[InputContext, Union[TokenInputs, MultiModalInputsV2, EncoderDecoderInputs]], Union[TokenInputs, MultiModalInputsV2, EncoderDecoderInputs]]

class vllm.inputs.registry.InputRegistry[source][source]#

A registry to dispatch data processing according to the target model.

create_input_processor(model_config: ModelConfig)[source][source]#

Create an input processor (see _process_input()) for a specific model.

dummy_data_for_profiling(model_config: ModelConfig, seq_len: int, mm_registry: MultiModalRegistry, is_encoder_data: bool = False) DummyData[source][source]#

Create dummy data for profiling the memory usage of a model.

The model is identified by model_config.

Note

This should be called after init_mm_limits_per_prompt().

process_input(model_config: ModelConfig, inputs: TokenInputs | MultiModalInputsV2 | EncoderDecoderInputs) TokenInputs | MultiModalInputsV2 | EncoderDecoderInputs[source][source]#

Apply an input processor to an instance of model inputs.

The model is identified by model_config.

register_dummy_data(factory: DummyDataFactory)[source][source]#

Register a dummy data factory to a model class.

During memory profiling, the provided function is invoked to create dummy data to be inputted into the model. The resulting memory usage should be an upper bound of what the model would use at inference time.

register_dummy_encoder_data(factory: DummyDataFactory)[source][source]#

Register a dummy encoder data factory to a model class

This is similar to register_dummy_data(), but for encoder input.

register_input_processor(processor: Callable[[InputContext, TokenInputs | MultiModalInputsV2 | EncoderDecoderInputs], TokenInputs | MultiModalInputsV2 | EncoderDecoderInputs])[source][source]#

Register an input processor to a model class.

The provided function is invoked on each input to the model. This happens before map_input().