vllm_omni.outputs ¶
OmniConnectorOutput dataclass ¶
Communication results from Model Runner to Scheduler.
Carries transfer readiness signals so the Scheduler can make scheduling decisions without ever calling connector.put()/get() directly.
Attributes:
| Name | Type | Description |
|---|---|---|
chunk_ready_req_ids | set[str] | Request IDs with newly arrived chunks this cycle. |
chunk_finished_req_ids | set[str] | Request IDs whose final chunk has arrived. |
request_metadata | dict[str, dict[str, Any]] | Lightweight scheduling metadata keyed by request ID (e.g. next_stage_prompt_len, code_predictor_codes, left_context_size). Full payloads are owned by the Model Runner's local cache. |
kv_sent_req_ids | list[str] | Request IDs whose KV cache was successfully sent. |
stage_recv_req_ids | set[str] | Request IDs that received batch stage inputs. |
has_pending_kv_work | bool | True if the mixin has pending, active, or completed KV transfers that the scheduler should account for. |
OmniModelRunnerOutput ¶
Bases: ModelRunnerOutput
Model runner output for omni models.
Extends the base ModelRunnerOutput with support for multimodal outputs that may be produced by non-autoregressive stages.
Attributes:
| Name | Type | Description |
|---|---|---|
multimodal_outputs | dict[str, Tensor] | None | Optional dictionary mapping modality names to output tensors (e.g., {"image": tensor, "audio": tensor}) |
kv_extracted_req_ids class-attribute instance-attribute ¶
multimodal_outputs class-attribute instance-attribute ¶
omni_connector_output class-attribute instance-attribute ¶
omni_connector_output: OmniConnectorOutput | None = None
OmniRequestOutput dataclass ¶
Unified request output for both pipeline stages and diffusion models.
This class handles outputs from: 1. Multi-stage LLM pipelines (with stage_id, final_output_type, request_output) 2. Diffusion models (with images, prompt, metrics)
Attributes:
| Name | Type | Description |
|---|---|---|
request_id | str | Unique identifier for this request |
finished | bool | Whether generation is complete |
stage_id | int | None | Identifier of the stage that produced this output (pipeline mode) |
final_output_type | str | Type of output ("text", "image", "audio", "latents") |
request_output | RequestOutput | None | The underlying RequestOutput from the stage (pipeline mode) |
images | list[Image] | List of generated PIL images (diffusion mode) |
prompt | OmniPromptType | None | The prompt used for generation (diffusion mode) |
latents | Tensor | None | Optional tensor of latent representations (diffusion mode) |
metrics | dict[str, Any] | Optional dictionary of generation metrics |
custom_output property writable ¶
Return custom output data from diffusion pipelines.
For diffusion outputs, returns the local _custom_output field. For pipeline outputs with an inner OmniRequestOutput, forwards the custom_output from the inner request output.
encoder_prompt_token_ids property ¶
Return encoder prompt token IDs from the underlying request output.
kv_transfer_params property ¶
kv_transfer_params: Any
Return KV transfer params from the underlying request output.
multimodal_output property ¶
Return multimodal output from the underlying request output or local field.
For pipeline outputs, this checks completion outputs first, then request_output. For diffusion outputs, this returns the local _multimodal_output field.
num_cached_tokens property ¶
num_cached_tokens: int | None
Return number of cached tokens from the underlying request output.
outputs property ¶
Return outputs from the underlying request output.
This property is required for compatibility with vLLM's streaming and non-streaming chat completion generators.
prompt_logprobs property ¶
prompt_logprobs: Any
Return prompt logprobs from the underlying request output.
prompt_token_ids property ¶
Return prompt token IDs from the underlying request output.
This property is required for compatibility with vLLM's streaming chat completion generator which checks res.prompt_token_ids.
stage_durations class-attribute instance-attribute ¶
trajectory_log_probs class-attribute instance-attribute ¶
trajectory_timesteps class-attribute instance-attribute ¶
from_diffusion classmethod ¶
from_diffusion(
request_id: str,
images: list[Image],
prompt: OmniPromptType | None = None,
metrics: dict[str, Any] | None = None,
latents: Tensor | None = None,
trajectory_latents: Tensor | None = None,
trajectory_timesteps: Tensor | None = None,
trajectory_log_probs: Tensor | None = None,
trajectory_decoded: list | None = None,
multimodal_output: dict[str, Any] | None = None,
custom_output: dict[str, Any] | None = None,
final_output_type: str = "image",
stage_durations: dict[str, float] | None = None,
peak_memory_mb: float = 0.0,
) -> OmniRequestOutput
Create output from diffusion model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request_id | str | Request identifier | required |
images | list[Image] | Generated images | required |
prompt | OmniPromptType | None | The prompt used | None |
metrics | dict[str, Any] | None | Generation metrics | None |
latents | Tensor | None | Optional latent tensors | None |
trajectory_latents | Tensor | None | Optional stacked trajectory latent tensors | None |
trajectory_timesteps | Tensor | None | Optional stacked trajectory timestep tensors | None |
trajectory_log_probs | Tensor | None | Optional stacked trajectory log-probability tensors | None |
trajectory_decoded | list | None | Optional list of decoded trajectory images | None |
multimodal_output | dict[str, Any] | None | Optional multimodal output dict | None |
custom_output | dict[str, Any] | None | Optional custom output dict (e.g. prompt embeds) | None |
stage_durations | dict[str, float] | None | Optional stage durations (execution time of each stage) dict | None |
peak_memory_mb | float | Peak memory usage in MB | 0.0 |
Returns:
| Type | Description |
|---|---|
OmniRequestOutput | OmniRequestOutput configured for diffusion mode |
from_error classmethod ¶
from_error(
request_id: str, error_message: str
) -> OmniRequestOutput
Create a terminal error output.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request_id | str | Request identifier | required |
error_message | str | Human-readable error description | required |
Returns:
| Type | Description |
|---|---|
OmniRequestOutput | OmniRequestOutput with |
from_pipeline classmethod ¶
from_pipeline(
stage_id: int,
final_output_type: str,
request_output: RequestOutput,
) -> OmniRequestOutput
Create output from pipeline stage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stage_id | int | Stage identifier | required |
final_output_type | str | Type of output | required |
request_output | RequestOutput | The stage's output | required |
Returns:
| Type | Description |
|---|---|
OmniRequestOutput | OmniRequestOutput configured for pipeline mode |
unwrap ¶
unwrap() -> OmniRequestOutput
Unwrap nested OmniRequestOutput to get the innermost result.
This helper handles the common pattern where pipeline outputs may wrap other OmniRequestOutput instances. It recursively unwraps until it reaches the final output with actual content (images, text, etc.).
Returns:
| Type | Description |
|---|---|
OmniRequestOutput | The innermost OmniRequestOutput containing the actual generation results. |
unwrap_result staticmethod ¶
unwrap_result(result: Any) -> OmniRequestOutput
Unwrap result from omni.generate() to get the final OmniRequestOutput.
This static helper handles the full unwrapping pattern including: 1. Extracting from list if needed 2. Type validation 3. Recursive unwrapping of nested pipeline outputs
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result | Any | The result from omni.generate() - may be a list or OmniRequestOutput | required |
Returns:
| Type | Description |
|---|---|
OmniRequestOutput | The innermost OmniRequestOutput with actual content |
Raises:
| Type | Description |
|---|---|
ValueError | If result is not an OmniRequestOutput or list containing one |