vllm_omni.engine.output_processor ¶
MultimodalOutputProcessor ¶
Bases: OutputProcessor
Handles multimodal output processing.
Captures multimodal outputs from OmniEngineCoreOutput and accumulates them as MultimodalPayload in OmniRequestState, before delegating to the base vLLM OutputProcessor for text handling.
The data flow is: 1. For each EngineCoreOutput with multimodal_output: - Capture into OmniRequestState.add_multimodal_tensor() 2. Base vLLM OutputProcessor handles text detokenization 3. On finish, _consolidate_multimodal_tensors() concatenates accumulated tensors using strategy-based dispatch 4. _new_completion_output() returns MultimodalCompletionOutput
output_modality instance-attribute ¶
output_modality = OutputModality.from_string(
engine_core_output_type
)
add_request ¶
add_request(
request: EngineCoreRequest,
prompt: str | None,
parent_req: ParentRequest | None = None,
request_index: int = 0,
queue: RequestOutputCollector | None = None,
) -> None
Add a new request to be processed.
Creates an OmniRequestState for the request and registers it for output processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request | EngineCoreRequest | Engine core request to add | required |
prompt | str | None | Optional prompt string for the request | required |
parent_req | ParentRequest | None | Optional parent request for parallel sampling | None |
request_index | int | Index of the request in the batch | 0 |
queue | RequestOutputCollector | None | Optional queue for collecting outputs | None |
Raises:
| Type | Description |
|---|---|
ValueError | If the request ID is already registered |
OmniRequestState ¶
Bases: RequestState
Request state for omni models, tracking multimodal outputs.
Extends the base RequestState with support for accumulating multimodal tensor outputs (e.g., images, audio, latents) that are produced incrementally during generation.
native_text_stats instance-attribute ¶
native_text_stats = RequestStateStats(
arrival_time=float(arrival_time or 0.0)
)
add_multimodal_tensor ¶
Accumulate a multimodal tensor payload into the request state.
Normalizes incoming payloads (dict or raw tensor) into a MultimodalPayload and merges with any previously accumulated data. Uses list-based deferred concatenation to avoid O(n²) repeated torch.cat calls.
make_request_output ¶
make_request_output(
new_token_ids: list[int],
pooling_output: Tensor | None,
finish_reason: FinishReason | None,
stop_reason: int | str | None,
kv_transfer_params: dict[str, Any] | None = None,
) -> OmniRequestOutput | PoolingRequestOutput | None
Create a request output from generation results.
Creates a RequestOutput or PoolingRequestOutput from the generated tokens and accumulated multimodal outputs. Attaches multimodal tensors to the completion output if available.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
new_token_ids | list[int] | List of newly generated token IDs | required |
pooling_output | Tensor | None | Optional pooling output tensor | required |
finish_reason | FinishReason | None | Optional finish reason indicating why generation stopped | required |
stop_reason | int | str | None | Optional stop reason (token ID or stop string) | required |
kv_transfer_params | dict[str, Any] | None | Optional KV cache transfer parameters | None |
Returns:
| Type | Description |
|---|---|
OmniRequestOutput | PoolingRequestOutput | None | OmniRequestOutput or PoolingRequestOutput if output should be |
OmniRequestOutput | PoolingRequestOutput | None | emitted (based on finish status and output kind), None otherwise |