Memory Profiling#

Module Contents#

class vllm.multimodal.profiling.ProcessorInputs(prompt_text: str, mm_data: ~collections.abc.Mapping[str, ~typing.Any | list[typing.Any]], hf_processor_mm_kwargs: ~collections.abc.Mapping[str, object] = <factory>)[source]#

Represents the keyword arguments to vllm.multimodal.processing.BaseMultiModalProcessor.apply().

class vllm.multimodal.profiling.DummyEncoderData(prompt_token_ids: list[int])[source]#

Dummy data used for profiling.

prompt_token_ids: list[int][source]#

Alias for field number 0

class vllm.multimodal.profiling.DummyDecoderData(prompt_token_ids: list[int], multi_modal_data: MultiModalKwargs, multi_modal_placeholders: Mapping[str, Sequence[PlaceholderRange]])[source]#

Dummy data used for profiling.

prompt_token_ids: list[int][source]#

Alias for field number 0

multi_modal_data: MultiModalKwargs[source]#

Alias for field number 1

multi_modal_placeholders: Mapping[str, Sequence[PlaceholderRange]][source]#

Alias for field number 2

class vllm.multimodal.profiling.BaseDummyInputsBuilder(info: _I)[source]#

Abstract base class that constructs the dummy data to profile multi-modal models.

get_dummy_text(mm_counts: Mapping[str, int]) str[source]#

Build the text input corresponding to mm_counts.

get_dummy_mm_data(seq_len: int, mm_counts: Mapping[str, int]) Mapping[str, Any | list[Any]][source]#

Build the multimodal input which, after processing, results in the maximum possible number of placeholder tokens.

get_dummy_processor_inputs(seq_len: int, mm_counts: Mapping[str, int]) ProcessorInputs[source]#

Build the input which, after processing, results in the maximum possible number of placeholder tokens.

class vllm.multimodal.profiling.MultiModalProfiler(processor: BaseMultiModalProcessor[_I])[source]#

Contains code for running memory profiling for multi-modal models.