Data Processing#

Module Contents#

class vllm.multimodal.processing.PromptReplacement(modality: str, target: str | list[int], replacement: Callable[[int], str | list[int]] | str | list[int])[source][source]#

Defines how to replace portions of an input prompt with placeholder tokens.

modality: str[source]#

The modality for which the replacement is made.

target: str | list[int][source]#

The token sequence (or text) to find and replace.

replacement: Callable[[int], str | list[int]] | str | list[int][source]#

Given the index of the processed item within modality, output the replacement token sequence (or text).

For convenience, you can directly pass in the replacement token sequence (or text) instead of a function if it does not depend on the input.

vllm.multimodal.processing.full_groupby_modality(values: Iterable[_M]) ItemsView[str, list[_M]][source][source]#

Convenience function to apply full_groupby() based on modality.

class vllm.multimodal.processing.BoundPromptReplacement(tokenizer: transformers.PreTrainedTokenizer | transformers.PreTrainedTokenizerFast | MistralTokenizer, modality: str, _target: str | list[int], _replacement: Callable[[int], str | list[int]] | str | list[int])[source][source]#

A PromptReplacement bound to a tokenizer to automatically convert target and the result of get_replacement() between token sequence and text representations.

property target: _BoundPromptSequence[source]#

The token sequence (or text) to find and replace.

get_replacement(item_idx: int) _BoundPromptSequence[source][source]#

Given the index of the processed item within modality, output the replacement token sequence (or text).

vllm.multimodal.processing.iter_token_matches(token_ids: list[int], match_ids: list[int]) Iterable[_TokenMatch][source][source]#

Yield each occurrence of match_ids in token_ids.

Note that empty matches are ignored.

class vllm.multimodal.processing.PlaceholderInfo(modality: str, item_idx: int, start_idx: int, replacement: list[int])[source][source]#
vllm.multimodal.processing.find_token_matches(prompt: list[int], prompt_repls: Sequence[BoundPromptReplacement]) list[vllm.multimodal.processing._PromptReplacementTokenMatch][source][source]#

Return each target of prompt_repls found in prompt.

vllm.multimodal.processing.find_text_matches(prompt: str, prompt_repls: Sequence[BoundPromptReplacement]) list[vllm.multimodal.processing._PromptReplacementTextMatch][source][source]#

Return each target of prompt_repls found in prompt.

vllm.multimodal.processing.replace_token_matches(prompt: list[int], mm_matches: Mapping[str, Sequence[_PromptReplacementTokenMatch]], mm_item_counts: Mapping[str, int]) list[int][source][source]#

Apply the replacements in mm_matches to prompt.

vllm.multimodal.processing.replace_text_matches(prompt: str, mm_matches: Mapping[str, Sequence[_PromptReplacementTextMatch]], mm_item_counts: Mapping[str, int]) str[source][source]#

Apply the replacements in mm_matches to prompt.

class vllm.multimodal.processing.BaseProcessingInfo(ctx: InputProcessingContext)[source][source]#

Base class to provide the information necessary for data processing.

get_hf_processor(**kwargs: object) transformers.ProcessorMixin[source][source]#

Subclasses can override this method to handle specific kwargs from model config or user inputs.

abstract get_supported_mm_limits() Mapping[str, int | None][source][source]#

Return the maximum supported number of items for each modality.

A value of None means unlimited number of items.

Omitting a modality from the returned dictionary means that it is not supported at all.

abstract get_mm_max_tokens_per_item(seq_len: int) Mapping[str, int][source][source]#

Get the maximum possible number of tokens per data item for each modality.

The dictionary returned by this method should have the same keys as that returned by get_supported_mm_limits().

class vllm.multimodal.processing.BaseMultiModalProcessor(info: _I, dummy_inputs: BaseDummyInputsBuilder[_I], *, cache: ProcessingCache | None = None, enable_sanity_checks: bool = True)[source][source]#

Abstract base class to process multi-modal inputs to be used in vLLM.

Not to be confused with transformers.ProcessorMixin.

apply(prompt: str | list[int], mm_data: Mapping[str, Any | list[Any]], hf_processor_mm_kwargs: Mapping[str, object]) MultiModalInputsV2[source][source]#

Process multi-modal inputs to be used in vLLM.

The main steps are:

  1. Apply HF Processor on prompt text and multi-modal data together, outputting token IDs and processed tensors.

  2. Find and replace sequences in the token IDs with placeholder tokens. The number of placeholder tokens equals the feature size of the multi-modal data outputted by the multi-modal encoder.

  3. Extract information about the placeholder tokens from the processed token IDs.