Registry#

Module Contents#

class vllm.multimodal.registry.ProcessingInfoFactory(*args, **kwargs)[source]#

Constructs a MultiModalProcessor instance from the context.

class vllm.multimodal.registry.DummyInputsBuilderFactory(*args, **kwargs)[source]#

Constructs a BaseDummyInputsBuilder instance from the context.

class vllm.multimodal.registry.MultiModalProcessorFactory(*args, **kwargs)[source]#

Constructs a MultiModalProcessor instance from the context.

class vllm.multimodal.registry.MultiModalRegistry[source]#

A registry that dispatches data processing according to the model.

get_max_tokens_per_item_by_modality(model_config: ModelConfig) Mapping[str, int][source]#

Get the maximum number of tokens per data item from each modality based on underlying model configuration.

get_max_tokens_per_item_by_nonzero_modality(model_config: ModelConfig) Mapping[str, int][source]#

Get the maximum number of tokens per data item from each modality based on underlying model configuration, excluding modalities that user explicitly disabled via limit_mm_per_prompt.

Note

This is currently directly used only in V1 for profiling the memory usage of a model.

get_max_tokens_by_modality(model_config: ModelConfig) Mapping[str, int][source]#

Get the maximum number of tokens from each modality for profiling the memory usage of a model.

See MultiModalPlugin.get_max_multimodal_tokens() for more details.

get_max_multimodal_tokens(model_config: ModelConfig) int[source]#

Get the maximum number of multi-modal tokens for profiling the memory usage of a model.

See MultiModalPlugin.get_max_multimodal_tokens() for more details.

get_mm_limits_per_prompt(model_config: ModelConfig) Mapping[str, int][source]#

Get the maximum number of multi-modal input instances for each modality that are allowed per prompt for a model class.

register_processor(processor: MultiModalProcessorFactory[_I], *, info: ProcessingInfoFactory[_I], dummy_inputs: DummyInputsBuilderFactory[_I])[source]#

Register a multi-modal processor to a model class. The processor is constructed lazily, hence a factory method should be passed.

When the model receives multi-modal data, the provided function is invoked to transform the data into a dictionary of model inputs.

create_processor(model_config: ModelConfig, *, tokenizer: transformers.PreTrainedTokenizer | transformers.PreTrainedTokenizerFast | TokenizerBase | None = None, disable_cache: bool | None = None) BaseMultiModalProcessor[BaseProcessingInfo][source]#

Create a multi-modal processor for a specific model and tokenizer.

get_decoder_dummy_data(model_config: ModelConfig, seq_len: int, mm_counts: Mapping[str, int] | None = None) DummyDecoderData[source]#

Create dummy data for profiling the memory usage of a model.

The model is identified by model_config.

get_encoder_dummy_data(model_config: ModelConfig, seq_len: int, mm_counts: Mapping[str, int] | None = None) DummyEncoderData[source]#

Create dummy data for profiling the memory usage of a model.

The model is identified by model_config.