Skip to content

vllm_omni.request

OmniRequest

Bases: Request

Request class for omni models, extending the base Request.

This class extends the base vLLM Request with support for prompt embeddings and additional information payloads, enabling direct transfer of pre-computed embeddings between stages.

Parameters:

Name Type Description Default
prompt_embeds PromptEmbedsPayload | Tensor | None

Optional serialized prompt embeddings payload. Used for direct transfer of embeddings between stages.

None
additional_information AdditionalInformationPayload | None

Optional additional information payload containing tensors or lists to be passed along with the request.

None

additional_information instance-attribute

additional_information: (
    AdditionalInformationPayload | None
) = additional_information

external_req_id instance-attribute

external_req_id: str | None = external_req_id

prompt_embeds_payload instance-attribute

prompt_embeds_payload: PromptEmbedsPayload | None = (
    prompt_embeds
    if isinstance(prompt_embeds, PromptEmbedsPayload)
    else None
)

from_engine_core_request classmethod

from_engine_core_request(
    request: OmniEngineCoreRequest,
    block_hasher: Callable[[Request], list[BlockHash]]
    | None,
) -> Request

Create an OmniRequest from an OmniEngineCoreRequest.

Parameters:

Name Type Description Default
request OmniEngineCoreRequest

The OmniEngineCoreRequest to convert

required
block_hasher Callable[[Request], list[BlockHash]] | None

Optional function to compute block hashes for prefix caching

required

Returns:

Type Description
Request

OmniRequest instance created from the engine core request

OmniStreamingUpdate dataclass

Override: add additional information Lightweight data for streaming session continuation.

Contains only the fields needed to update an existing streaming session with new input data.

additional_information class-attribute instance-attribute

additional_information: (
    AdditionalInformationPayload | None
) = None

arrival_time instance-attribute

arrival_time: float

max_tokens instance-attribute

max_tokens: int

mm_features instance-attribute

mm_features: list[MultiModalFeatureSpec] | None

prompt_token_ids instance-attribute

prompt_token_ids: list[int] | None

sampling_params instance-attribute

sampling_params: SamplingParams | None

from_request classmethod

from_request(
    request: Request,
) -> OmniStreamingUpdate | None