Data Parsing#

Module Contents#

class vllm.multimodal.parse.ModalityDataItems(data: _T, modality: str)[source]#

Represents data items for a modality in MultiModalDataItems.

abstract get_count() int[source]#

Get the number of data items.

abstract get(index: int) _I[source]#

Get a data item by its index.

get_all() list[_I][source]#

Get all data items.

abstract get_processor_data() Mapping[str, object][source]#

Get the data to pass to the HF processor.

abstract get_passthrough_data() Mapping[str, object][source]#

Get the data to pass directly to the model.

class vllm.multimodal.parse.ProcessorBatchItems(data: _T, modality: str)[source]#

Base class for data items that are arranged in a list.

get_count() int[source]#

Get the number of data items.

get(index: int) _T[source]#

Get a data item by its index.

get_processor_data() Mapping[str, object][source]#

Get the data to pass to the HF processor.

get_passthrough_data() Mapping[str, object][source]#

Get the data to pass directly to the model.

class vllm.multimodal.parse.EmbeddingItems(data: _T, modality: str)[source]#

Base class for data items that are expressed as a batched embedding tensor, or a list of embedding tensors (one per item).

get_count() int[source]#

Get the number of data items.

get(index: int) torch.Tensor[source]#

Get a data item by its index.

get_processor_data() Mapping[str, object][source]#

Get the data to pass to the HF processor.

get_passthrough_data() Mapping[str, object][source]#

Get the data to pass directly to the model.

class vllm.multimodal.parse.DictEmbeddingItems(data: Mapping[str, torch.Tensor], modality: str, required_fields: set[str], fields_factory: Callable[[Mapping[str, torch.Tensor]], Mapping[str, MultiModalFieldConfig]])[source]#

Base class for data items that are expressed as a dictionary of tensors.

Usually, the dictionary keys correspond to the outputs of HF processor.

get_count() int[source]#

Get the number of data items.

get(index: int) Mapping[str, torch.Tensor][source]#

Get a data item by its index.

get_processor_data() Mapping[str, object][source]#

Get the data to pass to the HF processor.

get_passthrough_data() Mapping[str, object][source]#

Get the data to pass directly to the model.

class vllm.multimodal.parse.AudioProcessorItems(data: Sequence[list[float] | numpy.ndarray | torch.Tensor])[source]#
class vllm.multimodal.parse.AudioEmbeddingItems(data: torch.Tensor | list[torch.Tensor])[source]#
class vllm.multimodal.parse.ImageSize(width, height)[source]#
width: int[source]#

Alias for field number 0

height: int[source]#

Alias for field number 1

class vllm.multimodal.parse.ImageProcessorItems(data: Sequence[Image | numpy.ndarray | torch.Tensor])[source]#
class vllm.multimodal.parse.ImageEmbeddingItems(data: torch.Tensor | list[torch.Tensor])[source]#
class vllm.multimodal.parse.VideoProcessorItems(data: Sequence[list[PIL.Image.Image] | numpy.ndarray | torch.Tensor | list[numpy.ndarray] | list[torch.Tensor]])[source]#
class vllm.multimodal.parse.VideoEmbeddingItems(data: torch.Tensor | list[torch.Tensor])[source]#
class vllm.multimodal.parse.MultiModalDataItems(dict=None, /, **kwargs)[source]#

As MultiModalDataDict, but normalized such that each entry corresponds to a list.

get_count(modality: str, *, strict: bool = True) int[source]#

Get the number of data items belonging to a modality.

If strict=False, return 0 instead of raising KeyError even if the modality is not found.

get_all_counts() Mapping[str, int][source]#

Get the number of items belonging to each modality.

get_items(modality: str, typ: type[_D] | tuple[type[_D], ...]) _D[source]#

Get the data items belonging to a modality, requiring that they belong to a certain type.

class vllm.multimodal.parse.MultiModalDataParser(*, target_sr: float | None = None)[source]#

Parses MultiModalDataDict into MultiModalDataItems.

Parameters:

target_sr (float, optional) – Enables automatic resampling of audio items to the model’s expected sampling rate.