Skip to content

vllm_omni.engine.mm_outputs

Multimodal output data structures for vLLM-Omni.

This module defines structured types for multimodal outputs.

MultimodalCompletionOutput dataclass

Bases: CompletionOutput

CompletionOutput with multimodal support.

Inherits all CompletionOutput fields and adds multimodal_output. As a CompletionOutput subclass, compatible with all existing vLLM consumers.

multimodal_output instance-attribute

multimodal_output = multimodal_output

MultimodalPayload dataclass

Bases: Mapping

Structured multimodal output payload.

Implements collections.abc.Mapping so that isinstance(payload, dict) style checks in downstream code can be replaced with duck-typing, and payload.get(key), payload[key], key in payload, len(payload) all work seamlessly for both tensors and metadata.

Attributes:

Name Type Description
tensors dict[str, Tensor]

Dictionary mapping modality/key names to their tensors.

metadata dict[str, Any]

Optional dictionary for non-tensor metadata (e.g., sample rate for audio, image dimensions).

is_empty property

is_empty: bool

Return True if the payload has no tensors and no metadata.

metadata class-attribute instance-attribute

metadata: dict[str, Any] = field(default_factory=dict)

primary_tensor property

primary_tensor: Tensor | None

Return the first tensor in the payload, or None if empty.

tensors class-attribute instance-attribute

tensors: dict[str, Tensor] = field(default_factory=dict)

from_dict classmethod

from_dict(
    data: dict[str, Any] | None,
) -> MultimodalPayload | None

Create a MultimodalPayload from a raw dictionary.

Separates torch.Tensor values into tensors and everything else into metadata.

get

get(key: str, default: Any = None) -> Any

Get a value by key, searching tensors first then metadata.

to_dict

to_dict() -> dict[str, Any]

Convert back to a plain dict (tensors + metadata merged).