Skip to content

vllm_omni.engine.output_processor

logger module-attribute

logger = init_logger(__name__)

MultimodalOutputProcessor

Bases: OutputProcessor

Handles multimodal output processing by capturing pooling_output from EngineCoreOutput and accumulating it as multimodal tensors, before delegating to the base vLLM OutputProcessor for text handling.

The actual data flow is: 1. For each EngineCoreOutput with pooling_output and a detokenizer: - Capture pooling_output into OmniRequestState.add_multimodal_tensor() - Clear eco.pooling_output to force text path in base processor 2. Base vLLM OutputProcessor handles text detokenization 3. On finish, _consolidate_multimodal_tensors() concatenates accumulated tensors 4. _new_completion_output() attaches mm_accumulated to CompletionOutput

engine_core_output_type instance-attribute

engine_core_output_type = engine_core_output_type

abort_requests

abort_requests(request_ids, internal: bool) -> list[str]

add_request

add_request(
    request: EngineCoreRequest,
    prompt: str | None,
    parent_req: ParentRequest | None = None,
    request_index: int = 0,
    queue: RequestOutputCollector | None = None,
) -> None

Add a new request to be processed.

Creates an OmniRequestState for the request and registers it for output processing.

Parameters:

Name Type Description Default
request EngineCoreRequest

Engine core request to add

required
prompt str | None

Optional prompt string for the request

required
parent_req ParentRequest | None

Optional parent request for parallel sampling

None
request_index int

Index of the request in the batch

0
queue RequestOutputCollector | None

Optional queue for collecting outputs

None

Raises:

Type Description
ValueError

If the request ID is already registered

pop_native_text_metrics

pop_native_text_metrics(request_id: str) -> dict[str, Any]

process_outputs

process_outputs(
    engine_core_outputs: list[EngineCoreOutput],
    engine_core_timestamp: float | None = None,
    iteration_stats: IterationStats | None = None,
) -> OutputProcessorOutput

remove_request

remove_request(request_id: str) -> None

Rollback one previously registered request if it was never submitted.

OmniRequestState

Bases: RequestState

Request state for omni models, tracking multimodal outputs.

Extends the base RequestState with support for accumulating multimodal tensor outputs (e.g., images, audio, latents) that are produced incrementally during generation.

mm_accumulated instance-attribute

mm_accumulated: dict[str, Any] = {}

native_text_stats instance-attribute

native_text_stats = RequestStateStats(
    arrival_time=float(arrival_time or 0.0)
)

add_multimodal_tensor

add_multimodal_tensor(
    payload: Any | None, mm_type: str | None
) -> None

apply_streaming_update

apply_streaming_update(update) -> None

make_request_output

make_request_output(
    new_token_ids: list[int],
    pooling_output: Tensor | None,
    finish_reason: FinishReason | None,
    stop_reason: int | str | None,
    kv_transfer_params: dict[str, Any] | None = None,
) -> OmniRequestOutput | PoolingRequestOutput | None

Create a request output from generation results.

Creates a RequestOutput or PoolingRequestOutput from the generated tokens and accumulated multimodal outputs. Attaches multimodal tensors to the completion output if available.

Parameters:

Name Type Description Default
new_token_ids list[int]

List of newly generated token IDs

required
pooling_output Tensor | None

Optional pooling output tensor

required
finish_reason FinishReason | None

Optional finish reason indicating why generation stopped

required
stop_reason int | str | None

Optional stop reason (token ID or stop string)

required
kv_transfer_params dict[str, Any] | None

Optional KV cache transfer parameters

None

Returns:

Type Description
OmniRequestOutput | PoolingRequestOutput | None

OmniRequestOutput or PoolingRequestOutput if output should be

OmniRequestOutput | PoolingRequestOutput | None

emitted (based on finish status and output kind), None otherwise