vllm_omni.utils.mm_outputs ¶

Utilities for handling multimodal outputs / building multimodal output payloads, most of which are shared by the prefix cache / no prefix cache path.

logger `module-attribute` ¶

logger = init_logger(__name__)

build_mm_cpu ¶

build_mm_cpu(multimodal_outputs: dict) -> dict[str, object]

Pre-copies multimodal tensor to CPU once (not per-request) to avoid redundant D2H transfers when gpu_resident_buffer_keys keeps them on GPU.

In the case of prefix caching, the multimodal outputs provided will only contain the passthrough data.

Parameters:

Name	Type	Description	Default
`multimodal_outputs`	`dict`	Multimodal dict mapping strings to objects.	required

partition_flat_payload ¶

partition_flat_payload(
    payload: Mapping[str, object],
) -> tuple[dict[str, object], dict[str, object]]

Split a flattened per-request payload into inter-stage vs client mm dicts.

partition_payload_list ¶

partition_payload_list(
    payloads: list[dict[str, object]],
) -> tuple[
    list[dict[str, object] | None] | None,
    list[dict[str, object] | None] | None,
]

to_payload_element ¶

to_payload_element(
    element: object,
    idx: int,
    start: int,
    end: int,
    pass_lists_through: bool = False,
    seq_len: int | None = None,
    scheduled_seq_len: int | None = None,
)

Build an mm payload element corresponding to one request index from an element containing 0 or more CPU tensors.