vllm.v1.kv_offload.tiering.example.manager ¶
ExampleSecondaryTierManager: A simple in-memory secondary tier.
This implementation provides a minimal secondary tier that stores blocks in memory (using a dictionary) with immediate completion. It serves as a reference for writing new tiers and is useful for testing the TieringOffloadingManager without requiring actual storage or network backends.
Classes:
-
ExampleSecondaryTierManager–A simple in-memory secondary tier.
ExampleSecondaryTierManager ¶
Bases: SecondaryTierManager
A simple in-memory secondary tier.
This implementation: - Stores blocks in a dictionary (key -> True) - Completes transfers immediately (synchronous)
Methods:
-
__init__–Initialize the example secondary tier.
-
drain_jobs–Synchronous tier — submit_*() returns only after the operation
-
get_finished_jobs–Poll for finished jobs.
-
get_num_blocks–Get the number of blocks currently stored in this tier.
-
lookup–Check whether a block exists in this secondary tier.
-
submit_load–Submit a job to load blocks from this tier to primary tier.
-
submit_store–Submit a job to store blocks from primary tier to this tier.
Source code in vllm/v1/kv_offload/tiering/example/manager.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
__init__(offloading_spec, primary_kv_view, tier_type, custom_param=0) ¶
Initialize the example secondary tier.
Parameters:
Source code in vllm/v1/kv_offload/tiering/example/manager.py
drain_jobs() ¶
Synchronous tier — submit_*() returns only after the operation completes, so there is nothing to wait for.
get_finished_jobs() ¶
get_num_blocks() ¶
lookup(key, req_context) ¶
Check whether a block exists in this secondary tier.
Parameters:
Returns:
-
LookupResult–HIT if the block is present, MISS if not found.
Source code in vllm/v1/kv_offload/tiering/example/manager.py
submit_load(job_metadata) ¶
Submit a job to load blocks from this tier to primary tier.
Parameters:
-
(job_metadata¶JobMetadata) –Job metadata including job_id, keys, and spec for writing blocks into the primary tier.
Source code in vllm/v1/kv_offload/tiering/example/manager.py
submit_store(job_metadata) ¶
Submit a job to store blocks from primary tier to this tier.
Parameters:
-
(job_metadata¶JobMetadata) –Job metadata including job_id, keys, and spec for reading blocks from the primary tier.