Input Processing#
Each model can override parts of vLLM’s input processing pipeline via
INPUT_REGISTRY
and MULTIMODAL_REGISTRY
.
Currently, this mechanism is only utilized in multi-modal models for preprocessing multi-modal input data in addition to input prompt, but it can be extended to text-only language models when needed.
Guides#
Module Contents#
LLM Engine Inputs#
Registry#
- vllm.inputs.INPUT_REGISTRY = <vllm.inputs.registry.InputRegistry object>[source]#
The global
InputRegistry
which is used byLLMEngine
to dispatch data processing according to the target model.See also
- class vllm.inputs.registry.DummyData(seq_data: SequenceData, multi_modal_data: MultiModalDataDict | None = None, multi_modal_placeholders: MultiModalPlaceholderDict | None = None)[source][source]#
Bases:
NamedTuple
Dummy data used for profiling.
- class vllm.inputs.registry.InputContext(model_config: ModelConfig)[source][source]#
Contains information about the model which may be used to modify the inputs.
- get_hf_config(hf_config_type: Type[C] = PretrainedConfig) C [source][source]#
Get the HuggingFace configuration (
transformers.PretrainedConfig
) of the model, additionally checking its type.- Raises:
TypeError – If the model is not of the specified type.
- get_hf_image_processor_config() Dict[str, Any] [source][source]#
Get the HuggingFace image processor configuration of the model.
- get_mm_config()[source][source]#
Get the multimodal config of the model.
- Raises:
RuntimeError – If the model is not a multimodal model.
- class vllm.inputs.registry.InputProcessingContext(model_config: 'ModelConfig', tokenizer: transformers.PreTrainedTokenizer | transformers.PreTrainedTokenizerFast | vllm.transformers_utils.tokenizers.mistral.MistralTokenizer)[source][source]#
Bases:
InputContext
- vllm.inputs.registry.InputProcessor[source]#
Preprocess the inputs to the model.
alias of
Callable
[[InputContext
,Union
[TokenInputs
,MultiModalInputsV2
,EncoderDecoderInputs
]],Union
[TokenInputs
,MultiModalInputsV2
,EncoderDecoderInputs
]]
- class vllm.inputs.registry.InputRegistry[source][source]#
A registry to dispatch data processing according to the target model.
- create_input_processor(model_config: ModelConfig)[source][source]#
Create an input processor (see
_process_input()
) for a specific model.
- dummy_data_for_profiling(model_config: ModelConfig, seq_len: int, mm_registry: MultiModalRegistry, is_encoder_data: bool = False) DummyData [source][source]#
Create dummy data for profiling the memory usage of a model.
The model is identified by
model_config
.See also
Note
This should be called after
init_mm_limits_per_prompt()
.
- process_input(model_config: ModelConfig, inputs: TokenInputs | MultiModalInputsV2 | EncoderDecoderInputs) TokenInputs | MultiModalInputsV2 | EncoderDecoderInputs [source][source]#
Apply an input processor to an instance of model inputs.
The model is identified by
model_config
.See also
- register_dummy_data(factory: DummyDataFactory)[source][source]#
Register a dummy data factory to a model class.
During memory profiling, the provided function is invoked to create dummy data to be inputted into the model. The resulting memory usage should be an upper bound of what the model would use at inference time.
- register_dummy_encoder_data(factory: DummyDataFactory)[source][source]#
Register a dummy encoder data factory to a model class
This is similar to
register_dummy_data()
, but for encoder input.
- register_input_processor(processor: Callable[[InputContext, TokenInputs | MultiModalInputsV2 | EncoderDecoderInputs], TokenInputs | MultiModalInputsV2 | EncoderDecoderInputs])[source][source]#
Register an input processor to a model class.
The provided function is invoked on each input to the model. This happens before
map_input()
.See also