vllm_omni.distributed.kv_transfer.monkey_patch ¶
Monkey-patch vLLM's MooncakeConnector to fix request-ID mismatch in PD disaggregation.
vLLM's InputProcessor appends a random suffix to each request ID. The prefill engine stores KV under its suffix, but the decode engine generates a different suffix. This patch threads remote_request_id through kv_transfer_params so the decode side references the correct KV entry.
PatchedRecvReqMeta dataclass ¶
Receive-request metadata carrying the prefill engine's request ID.