Skip to content

vllm_omni.engine.output_processor

logger module-attribute

logger = init_logger(__name__)

MultimodalOutputProcessor

Bases: OutputProcessor

Handles multimodal output processing.

Captures multimodal outputs from OmniEngineCoreOutput and accumulates them as MultimodalPayload in OmniRequestState, before delegating to the base vLLM OutputProcessor for text handling.

The data flow is: 1. For each EngineCoreOutput with multimodal_output: - Capture into OmniRequestState.add_multimodal_tensor() 2. Base vLLM OutputProcessor handles text detokenization 3. On finish, _consolidate_multimodal_tensors() concatenates accumulated tensors using strategy-based dispatch 4. _new_completion_output() returns MultimodalCompletionOutput

engine_core_output_type instance-attribute

engine_core_output_type = engine_core_output_type

output_modality instance-attribute

output_modality = OutputModality.from_string(
    engine_core_output_type
)

abort_requests

abort_requests(request_ids, internal: bool) -> list[str]

add_request

add_request(
    request: EngineCoreRequest,
    prompt: str | None,
    parent_req: ParentRequest | None = None,
    request_index: int = 0,
    queue: RequestOutputCollector | None = None,
) -> None

Add a new request to be processed.

Creates an OmniRequestState for the request and registers it for output processing.

Parameters:

Name Type Description Default
request EngineCoreRequest

Engine core request to add

required
prompt str | None

Optional prompt string for the request

required
parent_req ParentRequest | None

Optional parent request for parallel sampling

None
request_index int

Index of the request in the batch

0
queue RequestOutputCollector | None

Optional queue for collecting outputs

None

Raises:

Type Description
ValueError

If the request ID is already registered

pop_native_text_metrics

pop_native_text_metrics(request_id: str) -> dict[str, Any]

process_outputs

process_outputs(
    engine_core_outputs: list[EngineCoreOutput],
    engine_core_timestamp: float | None = None,
    iteration_stats: IterationStats | None = None,
) -> OutputProcessorOutput

remove_request

remove_request(request_id: str) -> None

Rollback one previously registered request if it was never submitted.

OmniRequestState

Bases: RequestState

Request state for omni models, tracking multimodal outputs.

Extends the base RequestState with support for accumulating multimodal tensor outputs (e.g., images, audio, latents) that are produced incrementally during generation.

mm_accumulated instance-attribute

mm_type instance-attribute

mm_type: str | None = None

native_text_stats instance-attribute

native_text_stats = RequestStateStats(
    arrival_time=float(arrival_time or 0.0)
)

add_multimodal_tensor

add_multimodal_tensor(
    payload: Any | None, mm_type: str | None
) -> None

Accumulate a multimodal tensor payload into the request state.

Normalizes incoming payloads (dict or raw tensor) into a MultimodalPayload and merges with any previously accumulated data. Uses list-based deferred concatenation to avoid O(n²) repeated torch.cat calls.

apply_streaming_update

apply_streaming_update(update) -> None

make_request_output

make_request_output(
    new_token_ids: list[int],
    pooling_output: Tensor | None,
    finish_reason: FinishReason | None,
    stop_reason: int | str | None,
    kv_transfer_params: dict[str, Any] | None = None,
) -> OmniRequestOutput | PoolingRequestOutput | None

Create a request output from generation results.

Creates a RequestOutput or PoolingRequestOutput from the generated tokens and accumulated multimodal outputs. Attaches multimodal tensors to the completion output if available.

Parameters:

Name Type Description Default
new_token_ids list[int]

List of newly generated token IDs

required
pooling_output Tensor | None

Optional pooling output tensor

required
finish_reason FinishReason | None

Optional finish reason indicating why generation stopped

required
stop_reason int | str | None

Optional stop reason (token ID or stop string)

required
kv_transfer_params dict[str, Any] | None

Optional KV cache transfer parameters

None

Returns:

Type Description
OmniRequestOutput | PoolingRequestOutput | None

OmniRequestOutput or PoolingRequestOutput if output should be

OmniRequestOutput | PoolingRequestOutput | None

emitted (based on finish status and output kind), None otherwise