Skip to content

vllm_omni.outputs

OmniConnectorOutput dataclass

Communication results from Model Runner to Scheduler.

Carries transfer readiness signals so the Scheduler can make scheduling decisions without ever calling connector.put()/get() directly.

Attributes:

Name Type Description
chunk_ready_req_ids set[str]

Request IDs with newly arrived chunks this cycle.

chunk_finished_req_ids set[str]

Request IDs whose final chunk has arrived.

request_metadata dict[str, dict[str, Any]]

Lightweight scheduling metadata keyed by request ID (e.g. next_stage_prompt_len, code_predictor_codes, left_context_size). Full payloads are owned by the Model Runner's local cache.

kv_sent_req_ids list[str]

Request IDs whose KV cache was successfully sent.

stage_recv_req_ids set[str]

Request IDs that received batch stage inputs.

has_pending_kv_work bool

True if the mixin has pending, active, or completed KV transfers that the scheduler should account for.

chunk_finished_req_ids class-attribute instance-attribute

chunk_finished_req_ids: set[str] = field(
    default_factory=set
)

chunk_ready_req_ids class-attribute instance-attribute

chunk_ready_req_ids: set[str] = field(default_factory=set)

has_pending_kv_work class-attribute instance-attribute

has_pending_kv_work: bool = False

kv_sent_req_ids class-attribute instance-attribute

kv_sent_req_ids: list[str] = field(default_factory=list)

request_metadata class-attribute instance-attribute

request_metadata: dict[str, dict[str, Any]] = field(
    default_factory=dict
)

stage_recv_req_ids class-attribute instance-attribute

stage_recv_req_ids: set[str] = field(default_factory=set)

OmniModelRunnerOutput

Bases: ModelRunnerOutput

Model runner output for omni models.

Extends the base ModelRunnerOutput with support for multimodal outputs that may be produced by non-autoregressive stages.

Attributes:

Name Type Description
multimodal_outputs dict[str, Tensor] | None

Optional dictionary mapping modality names to output tensors (e.g., {"image": tensor, "audio": tensor})

kv_extracted_req_ids class-attribute instance-attribute

kv_extracted_req_ids: list[str] | None = None

multimodal_outputs class-attribute instance-attribute

multimodal_outputs: dict[str, Tensor] | None = None

omni_connector_output class-attribute instance-attribute

omni_connector_output: OmniConnectorOutput | None = None

OmniRequestOutput dataclass

Unified request output for both pipeline stages and diffusion models.

This class handles outputs from: 1. Multi-stage LLM pipelines (with stage_id, final_output_type, request_output) 2. Diffusion models (with images, prompt, metrics)

Attributes:

Name Type Description
request_id str

Unique identifier for this request

finished bool

Whether generation is complete

stage_id int | None

Identifier of the stage that produced this output (pipeline mode)

final_output_type str

Type of output ("text", "image", "audio", "latents")

request_output RequestOutput | None

The underlying RequestOutput from the stage (pipeline mode)

images list[Image]

List of generated PIL images (diffusion mode)

prompt OmniPromptType | None

The prompt used for generation (diffusion mode)

latents Tensor | None

Optional tensor of latent representations (diffusion mode)

metrics dict[str, Any]

Optional dictionary of generation metrics

custom_output property writable

custom_output: dict[str, Any]

Return custom output data from diffusion pipelines.

For diffusion outputs, returns the local _custom_output field. For pipeline outputs with an inner OmniRequestOutput, forwards the custom_output from the inner request output.

encoder_prompt_token_ids property

encoder_prompt_token_ids: list[int] | None

Return encoder prompt token IDs from the underlying request output.

error class-attribute instance-attribute

error: str | None = None

final_output_type class-attribute instance-attribute

final_output_type: str = 'text'

finished class-attribute instance-attribute

finished: bool = True

images class-attribute instance-attribute

images: list[Image] = field(default_factory=list)

is_diffusion_output property

is_diffusion_output: bool

Check if this is a diffusion model output.

is_pipeline_output property

is_pipeline_output: bool

Check if this is a pipeline stage output.

kv_transfer_params property

kv_transfer_params: Any

Return KV transfer params from the underlying request output.

latents class-attribute instance-attribute

latents: Tensor | None = None

metrics class-attribute instance-attribute

metrics: dict[str, Any] = field(default_factory=dict)

multimodal_output property

multimodal_output: dict[str, Any]

Return multimodal output from the underlying request output or local field.

For pipeline outputs, this checks completion outputs first, then request_output. For diffusion outputs, this returns the local _multimodal_output field.

num_cached_tokens property

num_cached_tokens: int | None

Return number of cached tokens from the underlying request output.

num_images property

num_images: int

Return the number of generated images.

outputs property

outputs: list[Any]

Return outputs from the underlying request output.

This property is required for compatibility with vLLM's streaming and non-streaming chat completion generators.

peak_memory_mb class-attribute instance-attribute

peak_memory_mb: float = 0.0

prompt class-attribute instance-attribute

prompt: OmniPromptType | None = None

prompt_logprobs property

prompt_logprobs: Any

Return prompt logprobs from the underlying request output.

prompt_token_ids property

prompt_token_ids: list[int] | None

Return prompt token IDs from the underlying request output.

This property is required for compatibility with vLLM's streaming chat completion generator which checks res.prompt_token_ids.

request_id class-attribute instance-attribute

request_id: str = ''

request_output class-attribute instance-attribute

request_output: RequestOutput | None = None

stage_durations class-attribute instance-attribute

stage_durations: dict[str, float] = field(
    default_factory=dict
)

stage_id class-attribute instance-attribute

stage_id: int | None = None

trajectory_decoded class-attribute instance-attribute

trajectory_decoded: list | None = None

trajectory_latents class-attribute instance-attribute

trajectory_latents: Tensor | None = None

trajectory_log_probs class-attribute instance-attribute

trajectory_log_probs: Tensor | None = None

trajectory_timesteps class-attribute instance-attribute

trajectory_timesteps: Tensor | None = None

from_diffusion classmethod

from_diffusion(
    request_id: str,
    images: list[Image],
    prompt: OmniPromptType | None = None,
    metrics: dict[str, Any] | None = None,
    latents: Tensor | None = None,
    trajectory_latents: Tensor | None = None,
    trajectory_timesteps: Tensor | None = None,
    trajectory_log_probs: Tensor | None = None,
    trajectory_decoded: list | None = None,
    multimodal_output: dict[str, Any] | None = None,
    custom_output: dict[str, Any] | None = None,
    final_output_type: str = "image",
    stage_durations: dict[str, float] | None = None,
    peak_memory_mb: float = 0.0,
) -> OmniRequestOutput

Create output from diffusion model.

Parameters:

Name Type Description Default
request_id str

Request identifier

required
images list[Image]

Generated images

required
prompt OmniPromptType | None

The prompt used

None
metrics dict[str, Any] | None

Generation metrics

None
latents Tensor | None

Optional latent tensors

None
trajectory_latents Tensor | None

Optional stacked trajectory latent tensors

None
trajectory_timesteps Tensor | None

Optional stacked trajectory timestep tensors

None
trajectory_log_probs Tensor | None

Optional stacked trajectory log-probability tensors

None
trajectory_decoded list | None

Optional list of decoded trajectory images

None
multimodal_output dict[str, Any] | None

Optional multimodal output dict

None
custom_output dict[str, Any] | None

Optional custom output dict (e.g. prompt embeds)

None
stage_durations dict[str, float] | None

Optional stage durations (execution time of each stage) dict

None
peak_memory_mb float

Peak memory usage in MB

0.0

Returns:

Type Description
OmniRequestOutput

OmniRequestOutput configured for diffusion mode

from_error classmethod

from_error(
    request_id: str, error_message: str
) -> OmniRequestOutput

Create a terminal error output.

Parameters:

Name Type Description Default
request_id str

Request identifier

required
error_message str

Human-readable error description

required

Returns:

Type Description
OmniRequestOutput

OmniRequestOutput with finished=True and the error field set.

from_pipeline classmethod

from_pipeline(
    stage_id: int,
    final_output_type: str,
    request_output: RequestOutput,
) -> OmniRequestOutput

Create output from pipeline stage.

Parameters:

Name Type Description Default
stage_id int

Stage identifier

required
final_output_type str

Type of output

required
request_output RequestOutput

The stage's output

required

Returns:

Type Description
OmniRequestOutput

OmniRequestOutput configured for pipeline mode

to_dict

to_dict() -> dict[str, Any]

Convert to dictionary for JSON serialization.

unwrap

unwrap() -> OmniRequestOutput

Unwrap nested OmniRequestOutput to get the innermost result.

This helper handles the common pattern where pipeline outputs may wrap other OmniRequestOutput instances. It recursively unwraps until it reaches the final output with actual content (images, text, etc.).

Returns:

Type Description
OmniRequestOutput

The innermost OmniRequestOutput containing the actual generation results.

Example
result = omni.generate(...)
output = OmniRequestOutput.unwrap_result(result)
if output.images:
    # Access images directly
    video_frames = output.images

unwrap_result staticmethod

unwrap_result(result: Any) -> OmniRequestOutput

Unwrap result from omni.generate() to get the final OmniRequestOutput.

This static helper handles the full unwrapping pattern including: 1. Extracting from list if needed 2. Type validation 3. Recursive unwrapping of nested pipeline outputs

Parameters:

Name Type Description Default
result Any

The result from omni.generate() - may be a list or OmniRequestOutput

required

Returns:

Type Description
OmniRequestOutput

The innermost OmniRequestOutput with actual content

Raises:

Type Description
ValueError

If result is not an OmniRequestOutput or list containing one

Example
result = omni.generate(...)
output = OmniRequestOutput.unwrap_result(result)
# output is guaranteed to be the final OmniRequestOutput