Data Parsing#
Module Contents#
- class vllm.multimodal.parse.ModalityDataItems(data: _T, modality: str)[source][source]#
Represents data items for a modality in
MultiModalDataItems
.
- class vllm.multimodal.parse.ProcessorBatchItems(data: _T, modality: str)[source][source]#
Base class for data items that are arranged in a list.
- class vllm.multimodal.parse.EmbeddingItems(data: _T, modality: str)[source][source]#
Base class for data items that are expressed as a batched embedding tensor, or a list of embedding tensors (one per item).
- get(index: int) torch.Tensor [source][source]#
Get a data item by its index.
- class vllm.multimodal.parse.AudioProcessorItems(data: Sequence[list[float] | ndarray | torch.Tensor])[source][source]#
- class vllm.multimodal.parse.AudioEmbeddingItems(data: torch.Tensor | list[torch.Tensor])[source][source]#
- class vllm.multimodal.parse.ImageProcessorItems(data: Sequence[Image | ndarray | torch.Tensor])[source][source]#
- class vllm.multimodal.parse.ImageEmbeddingItems(data: torch.Tensor | list[torch.Tensor])[source][source]#
- class vllm.multimodal.parse.VideoProcessorItems(data: Sequence[list[PIL.Image.Image] | ndarray | torch.Tensor | list[numpy.ndarray] | list[torch.Tensor]])[source][source]#
- class vllm.multimodal.parse.VideoEmbeddingItems(data: torch.Tensor | list[torch.Tensor])[source][source]#
- class vllm.multimodal.parse.MultiModalDataItems(dict=None, /, **kwargs)[source][source]#
As
MultiModalDataDict
, but normalized such that each entry corresponds to a list.- get_count(modality: str, *, strict: bool = True) int [source][source]#
Get the number of data items belonging to a modality.
If strict=False, return 0 instead of raising
KeyError
even if the modality is not found.
- class vllm.multimodal.parse.MultiModalDataParser(*, target_sr: float | None = None)[source][source]#
Parses
MultiModalDataDict
intoMultiModalDataItems
.- Parameters:
target_sr (float, optional) – Enables automatic resampling of audio items to the model’s expected sampling rate.