Skip to content

vllm_omni.distributed.kv_transfer

Patched KV transfer connectors for PD disaggregation.

This package provides monkey-patched versions of vLLM's native KV transfer connectors (e.g. MooncakeConnector) that fix the request-ID mismatch problem in prefill-decode disaggregation.

vLLM's InputProcessor.assign_request_id() appends a random 8-char suffix to each request ID internally. The prefill engine stores KV under its own suffix, but the decode engine generates a different suffix — so it can never find the KV data. The patched connector threads the prefill engine's internal remote_request_id through kv_transfer_params so the decode side can reference the correct KV entry.

Modules:

Name Description
monkey_patch

Monkey-patch vLLM's MooncakeConnector to fix request-ID mismatch in PD disaggregation.