Skip to content

vllm_omni.engine.mm_outputs

Multimodal output data structures for vLLM-Omni.

This module defines structured types for multimodal outputs.

MultimodalCompletionOutput dataclass

Bases: CompletionOutput

CompletionOutput with multimodal support.

Inherits all CompletionOutput fields and adds multimodal_output. As a CompletionOutput subclass, compatible with all existing vLLM consumers.

multimodal_output instance-attribute

multimodal_output = multimodal_output

MultimodalPayload dataclass

Structured multimodal output payload.

Attributes:

Name Type Description
tensors dict[str, Tensor]

Dictionary mapping modality/key names to their tensors.

metadata dict[str, Any]

Optional dictionary for non-tensor metadata (e.g., sample rate for audio, image dimensions).

is_empty property

is_empty: bool

Return True if the payload has no tensors.

metadata class-attribute instance-attribute

metadata: dict[str, Any] = field(default_factory=dict)

primary_tensor property

primary_tensor: Tensor | None

Return the first tensor in the payload, or None if empty.

tensors class-attribute instance-attribute

tensors: dict[str, Tensor] = field(default_factory=dict)

from_dict classmethod

from_dict(
    data: dict[str, Any] | None,
) -> MultimodalPayload | None

Create a MultimodalPayload from a raw dictionary.

Separates torch.Tensor values into tensors and everything else into metadata.

get

get(key: str) -> Tensor | None

Get a tensor by key, returning None if not found.