Input Processing#
Each model can override parts of vLLM’s input processing pipeline via
INPUT_REGISTRY and MULTIMODAL_REGISTRY.
Currently, this mechanism is only utilized in multi-modal models for preprocessing multi-modal input data in addition to input prompt, but it can be extended to text-only language models when needed.
Guides#
Module Contents#
LLM Engine Inputs#
- vllm.inputs.DecoderOnlyInputs#
alias of
TokenInputs
Registry#
- vllm.inputs.INPUT_REGISTRY = <vllm.inputs.registry.InputRegistry object>#
The global
InputRegistrywhich is used byLLMEngineto dispatch data processing according to the target model.See also
- class vllm.inputs.registry.InputContext(model_config: ModelConfig)[source]#
Contains information about the model which may be used to modify the inputs.
- get_hf_config(hf_config_type: Type[C] = PretrainedConfig) C[source]#
Get the HuggingFace configuration (
transformers.PretrainedConfig) of the model, additionally checking its type.- Raises:
TypeError – If the model is not of the specified type.
- get_hf_image_processor_config() Dict[str, Any][source]#
Get the HuggingFace image processor configuration of the model.
- model_config: ModelConfig#
The configuration of the model.
- vllm.inputs.registry.InputProcessor#
Preprocess the inputs to the model.
alias of
Callable[[InputContext,TokenInputs],TokenInputs]
- class vllm.inputs.registry.InputRegistry[source]#
A registry to dispatch data processing according to the target model.
- create_input_processor(model_config: ModelConfig)[source]#
Create an input processor (see
_process_input()) for a specific model.
- dummy_data_for_profiling(model_config: ModelConfig, seq_len: int, mm_registry: MultiModalRegistry, is_encoder_data: bool = False) Tuple[SequenceData, MultiModalDataDict | None][source]#
Create dummy data for profiling the memory usage of a model.
The model is identified by
model_config.See also
Note
This should be called after
init_mm_limits_per_prompt().
- process_input(model_config: ModelConfig, inputs: TokenInputs) TokenInputs[source]#
Apply an input processor to an instance of model inputs.
The model is identified by
model_config.See also
- register_dummy_data(factory: DummyDataFactory)[source]#
Register a dummy data factory to a model class.
During memory profiling, the provided function is invoked to create dummy data to be inputted into the model. The resulting memory usage should be an upper bound of what the model would use at inference time.
- register_dummy_encoder_data(factory: DummyDataFactory)[source]#
Register a dummy encoder data factory to a model class
This is similar to
register_dummy_data(), but for encoder input.
- register_input_processor(processor: Callable[[InputContext, TokenInputs], TokenInputs])[source]#
Register an input processor to a model class.
The provided function is invoked on each input to the model. This happens before
map_input().See also