vllm.v1.worker.gpu.spec_decode.autoregressive.cudagraph_utils ¶
DecodeSpeculatorCudaGraphManager ¶
Bases: CudaGraphManager
CudaGraphManager for draft decode, building its own attention metadata.
Source code in vllm/v1/worker/gpu/spec_decode/autoregressive/cudagraph_utils.py
PrefillSpeculatorCudaGraphManager ¶
Bases: CudaGraphManager
CudaGraphManager for draft prefill, using pre-built attention states from the target model's capture.