vllm.v1.worker.gpu.mm.rope ¶
Classes:
-
RopeState–Unified state for multi-dimensional RoPE variants (M-RoPE, XD-RoPE).
Functions:
-
get_rope_state–Create a RopeState if the model uses multi-dimensional RoPE.
RopeState ¶
Unified state for multi-dimensional RoPE variants (M-RoPE, XD-RoPE).
M-RoPE: 3 dims, uses position delta for decode. XD-RoPE: 3 or 4 dims, delta is 0 (decode uses orig_pos for all dims).
NOTE: positions is implemented with one additional dummy position on purpose to make it non-contiguous so that it can work with torch compile. See detailed explanation in https://github.com/vllm-project/vllm/pull/12128#discussion_r1926431923
NOTE: When M-RoPE is enabled, position ids are 3D regardless of the modality of inputs. For text-only inputs, each dimension has identical position IDs, making M-RoPE functionally equivalent to 1D-RoPE. See page 5 of https://arxiv.org/abs/2409.12191
Methods:
-
read_prefill_positions–Return staged per-request prefill positions as [num_dims, length].
-
update_prefill_positions–Overwrite a request's staged prefill positions with recomputed values.
Source code in vllm/v1/worker/gpu/mm/rope.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | |
read_prefill_positions(req_idx, length) ¶
Return staged per-request prefill positions as [num_dims, length].
Source code in vllm/v1/worker/gpu/mm/rope.py
update_prefill_positions(req_idx, positions, delta) ¶
Overwrite a request's staged prefill positions with recomputed values.
Source code in vllm/v1/worker/gpu/mm/rope.py
get_rope_state(model_config, model, max_num_reqs, max_num_tokens, max_model_len, device) ¶
Create a RopeState if the model uses multi-dimensional RoPE.