Input Processing#
Each model can override parts of vLLM’s input processing pipeline via
INPUT_REGISTRY
and MULTIMODAL_REGISTRY
.
Currently, this mechanism is only utilized in multi-modal models for preprocessing multi-modal input data in addition to input prompt, but it can be extended to text-only language models when needed.
Guides#
Module Contents#
LLM Engine Inputs#
- class vllm.inputs.LLMInputs[source]#
Bases:
TypedDict
The inputs in
LLMEngine
before they are passed to the model executor.This specifies the data required for decoder-only models.
- multi_modal_data: typing_extensions.NotRequired[MultiModalDataDict | None]#
Optional multi-modal data to pass to the model, if the model supports it.
Registry#
- vllm.inputs.INPUT_REGISTRY = <vllm.inputs.registry.InputRegistry object>#
The global
InputRegistry
which is used byLLMEngine
to dispatch data processing according to the target model.See also
- class vllm.inputs.registry.InputContext(model_config: ModelConfig)[source]#
Contains information about the model which may be used to modify the inputs.
- get_hf_config(hf_config_type: Type[C] = PretrainedConfig) C [source]#
Get the HuggingFace configuration (
transformers.PretrainedConfig
) of the model, additionally checking its type.- Raises:
TypeError – If the model is not of the specified type.
- get_hf_image_processor_config() Dict[str, Any] [source]#
Get the HuggingFace image processor configuration of the model.
- model_config: ModelConfig#
The configuration of the model.
- vllm.inputs.registry.InputProcessor#
Preprocess the inputs to the model.
alias of
Callable
[[InputContext
,LLMInputs
],LLMInputs
]
- class vllm.inputs.registry.InputRegistry[source]#
A registry to dispatch data processing according to the target model.
- create_input_processor(model_config: ModelConfig)[source]#
Create an input processor (see
_process_input()
) for a specific model.
- dummy_data_for_profiling(model_config: ModelConfig, seq_len: int, mm_registry: MultiModalRegistry, is_encoder_data: bool = False) Tuple[SequenceData, MultiModalDataDict | None] [source]#
Create dummy data for profiling the memory usage of a model.
The model is identified by
model_config
.See also
Note
This should be called after
init_mm_limits_per_prompt()
.
- process_input(model_config: ModelConfig, inputs: LLMInputs) LLMInputs [source]#
Apply an input processor to an instance of model inputs.
The model is identified by
model_config
.See also
- register_dummy_data(factory: DummyDataFactory)[source]#
Register a dummy data factory to a model class.
During memory profiling, the provided function is invoked to create dummy data to be inputted into the model. The resulting memory usage should be an upper bound of what the model would use at inference time.
- register_dummy_encoder_data(factory: DummyDataFactory)[source]#
Register a dummy encoder data factory to a model class
This is similar to
register_dummy_data()
, but for encoder input.
- register_input_processor(processor: Callable[[InputContext, LLMInputs], LLMInputs])[source]#
Register an input processor to a model class.
The provided function is invoked on each input to the model. This happens before
map_input()
.See also