vllm.v1.worker.gpu.lora_utils ¶
LoRA utilities for the Model Runner V2 and cudagraph.
Functions:
-
create_lora_capture_hook–Create a hook to set up LoRA state before each cudagraph capture.
-
get_lora_capture_cases–Return num_active_loras values for cudagraph capture.
-
get_num_active_loras_for_dispatch–Compute num_active_loras for cudagraph dispatch.
create_lora_capture_hook(lora_config, runner) ¶
Create a hook to set up LoRA state before each cudagraph capture.
Source code in vllm/v1/worker/gpu/lora_utils.py
get_lora_capture_cases(lora_config, compilation_config) ¶
Return num_active_loras values for cudagraph capture.
When cudagraph_specialize_lora=True: powers of 2 up to max_loras, plus max_loras+1. When False: [0, max_loras+1]. When LoRA disabled: [0].
Source code in vllm/v1/worker/gpu/lora_utils.py
get_num_active_loras_for_dispatch(lora_config, lora_state, req_ids, dummy_run) ¶
Compute num_active_loras for cudagraph dispatch.