vllm.v1.kv_offload.tiering.async_lookup ¶
AsyncLookupManager: per-tier async lookup manager for secondary tier existence checks.
Each secondary tier that wants non-blocking lookups composes its own AsyncLookupManager instance internally. The manager maintains lookup state and uses a background thread to execute batch_lookup() calls.
Locking design¶
There is no explicit lock. Thread safety is achieved by ownership:
-
_lookup_state and _lookup_batch are owned exclusively by the scheduler thread. lookup(), flush(), and cleanup() read and write them directly.
-
_lookup_queue is written by the scheduler (flush → put_nowait, one item per step) and read by the background thread (get). queue.Queue is thread-safe.
-
_pending_results is written by the background thread (put) and read by the scheduler (get_nowait inside drain_results). queue.SimpleQueue is thread-safe by design.
lookup() accumulates new keys in _lookup_batch without touching the queue. flush() is called once per step from the tier's on_schedule_end(), posting the entire batch as a single queue item so the background thread sees one batch per step. drain_results() is called before any lookup() calls in the same step, so lookup() is a pure OrderedDict operation.
Classes:
-
AsyncLookupManager–Per-tier async lookup manager for secondary tier existence checks.
AsyncLookupManager ¶
Bases: ABC
Per-tier async lookup manager for secondary tier existence checks.
Each secondary tier that wants non-blocking lookups composes its own AsyncLookupManager instance internally. The manager maintains lookup state (cache, queue) and uses a background thread to execute the actual batch_lookup() calls.
Subclasses implement only batch_lookup() — all queue management, state tracking, and result delivery is provided by this base class.
The owning tier delegates its lookup(), on_schedule_end(), and on_request_finished() to this manager: - lookup() → drain_results() + lookup state check - on_schedule_end() → flush() - on_request_finished() → cleanup()
Methods:
-
batch_lookup–Check whether a batch of blocks exist in this tier.
-
cleanup–Remove entries no longer needed by any active request.
-
drain_results–Apply pending worker results to _lookup_state.
-
flush–Post this step's accumulated keys to the worker thread.
-
lookup–Non-blocking lookup called from the scheduler thread.
-
shutdown–Stop the worker thread.
Source code in vllm/v1/kv_offload/tiering/async_lookup.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 | |
batch_lookup(keys, req_context) abstractmethod ¶
Check whether a batch of blocks exist in this tier.
Called from the worker thread — must be synchronous and must not touch the primary tier or scheduler state.
Returns a list parallel to keys: True if present, False if not.
Source code in vllm/v1/kv_offload/tiering/async_lookup.py
cleanup(req_id) ¶
Remove entries no longer needed by any active request.
Called from the tier's on_request_finished(). Uses the reverse index to visit only keys associated with this request.
Source code in vllm/v1/kv_offload/tiering/async_lookup.py
drain_results() ¶
Apply pending worker results to _lookup_state.
Called from lookup() before checking state.
Source code in vllm/v1/kv_offload/tiering/async_lookup.py
flush() ¶
Post this step's accumulated keys to the worker thread.
Called once per step from on_schedule_end() after all lookup() calls are done. The worker receives the full batch and processes it during the model-execution window, maximising time available before the next step's drain_results(). Safe to call with an empty batch (no-op).
Source code in vllm/v1/kv_offload/tiering/async_lookup.py
lookup(key, req_context) ¶
Non-blocking lookup called from the scheduler thread.
Returns:
-
bool | None–True — block is present in this tier.
-
bool | None–False — block is not present in this tier.
-
bool | None–None — result not yet available; retry next step.