Skip to content

vllm_omni.engine

Engine components for vLLM-Omni.

Modules:

Name Description
arg_utils
async_omni_engine

Async Omni Engine for vLLM-Omni multi-stage runtime.

cfg_companion_tracker

CFG companion request tracker for the Omni orchestrator.

messages
mm_outputs

Multimodal output data structures for vLLM-Omni.

omni_core_engine_proc_manager

Process manager for omni stage engine subprocesses.

orchestrator

Orchestrator for vLLM-Omni multi-stage runtime.

output_modality

Output modality types for vLLM-Omni.

output_processor
serialization

Shared serialization helpers for omni engine request payloads.

stage_client

Shared stage-client typing for vLLM-Omni runtime surfaces.

stage_engine_core_client

Stage Engine Core Client for vLLM-Omni multi-stage runtime.

stage_engine_core_proc

Stage Core Process for vLLM-Omni V1 architecture.

stage_engine_startup

Helpers for launching and handshaking omni engine cores.

stage_init_utils

Stage initialization helpers for vLLM-Omni multi-stage runtime.

stage_pool

Unified stage-local runtime abstraction for vLLM-Omni.

AdditionalInformationEntry

Bases: Struct

One entry of additional_information.

Three supported forms are encoded
  • tensor: data/shape/dtype
  • list: a Python list (msgspec-serializable)
  • scalar: a Python scalar (msgspec-serializable)

Exactly one of (tensor_data, list_data, scalar_data) should be non-None.

list_data class-attribute instance-attribute

list_data: list[Any] | None = None

scalar_data class-attribute instance-attribute

scalar_data: Any | None = None

tensor_data class-attribute instance-attribute

tensor_data: bytes | None = None

tensor_dtype class-attribute instance-attribute

tensor_dtype: str | None = None

tensor_shape class-attribute instance-attribute

tensor_shape: list[int] | None = None

AdditionalInformationPayload

Bases: Struct

Serialized dictionary payload for additional_information.

Keys are strings; values are encoded as AdditionalInformationEntry.

entries instance-attribute

OmniEngineCoreOutput

Bases: EngineCoreOutput

is_segment_finished class-attribute instance-attribute

is_segment_finished: bool | None = False

new_prompt_len_snapshot class-attribute instance-attribute

new_prompt_len_snapshot: int | None = None

pooling_output class-attribute instance-attribute

pooling_output: dict[str, Tensor] | None = None

OmniEngineCoreOutputs

Bases: EngineCoreOutputs

outputs class-attribute instance-attribute

outputs: list[OmniEngineCoreOutput] = []

OmniEngineCoreRequest

Bases: EngineCoreRequest

Engine core request for omni models with embeddings support.

Extends the base EngineCoreRequest with support for additional information payloads, enabling direct transfer of pre-computed data between pipeline stages.

Note: prompt_embeds is inherited from EngineCoreRequest (torch.Tensor | None). PromptEmbedsPayload should be decoded to torch.Tensor before constructing this request.

Attributes:

Name Type Description
additional_information AdditionalInformationPayload | None

Optional serialized additional information dictionary containing tensors or lists to pass along with the request

additional_information class-attribute instance-attribute

additional_information: (
    AdditionalInformationPayload | None
) = None

from_request classmethod

from_request(
    request: EngineCoreRequest,
    *,
    prompt_embeds: Tensor | None = None,
    additional_information: AdditionalInformationPayload
    | None = None,
) -> OmniEngineCoreRequest

Clone an EngineCoreRequest into an OmniEngineCoreRequest with optional payload overrides.

PromptEmbedsPayload

Bases: Struct

Serialized prompt embeddings payload for direct transfer.

data: raw bytes of the tensor in row-major order shape: [seq_len, hidden_size] dtype: torch dtype name (e.g., "float16", "float32")

data instance-attribute

data: bytes

dtype instance-attribute

dtype: str

shape instance-attribute

shape: list[int]