vllm.distributed.kv_transfer.kv_connector.v1.hf3fs.utils.gather_scatter_helper ¶
Classes:
-
CopyBufferAllocator–Memory pool for tensor buffers to avoid frequent allocation/deallocation.
Functions:
-
gather_kv_caches–Gather KV cache data from KV cache storage to destination tensor.
-
scatter_kv_caches–Scatter KV cache data from source tensor to KV cache storage.
CopyBufferAllocator ¶
Memory pool for tensor buffers to avoid frequent allocation/deallocation.
Methods:
-
alloc_buffer–Allocate buffers from the pool.
-
free_buffer–Return buffers to the pool.
Source code in vllm/distributed/kv_transfer/kv_connector/v1/hf3fs/utils/gather_scatter_helper.py
gather_kv_caches(kv_caches_ptrs, total_token_in_kvcache, dst_tensor, token_indices, is_mla=False) ¶
Gather KV cache data from KV cache storage to destination tensor.
Parameters:
-
(kv_caches_ptrs¶Tensor) –Tensor of KV cache pointers (one per layer)
-
(total_token_in_kvcache¶int) –Total number of tokens in KV cache
-
(dst_tensor¶Tensor) –Destination tensor to store gathered data - MHA format: [num_layers, 2, num_tokens_in_block, hidden_size] - MLA format: [num_layers, num_tokens_in_block, hidden_size]
-
(token_indices¶list[int]) –List of token positions to gather
-
(is_mla¶bool, default:False) –Whether using MLA model format
Source code in vllm/distributed/kv_transfer/kv_connector/v1/hf3fs/utils/gather_scatter_helper.py
scatter_kv_caches(kv_caches_ptrs, total_token_in_kvcache, src_tensor, token_indices, is_mla=False) ¶
Scatter KV cache data from source tensor to KV cache storage.
Parameters:
-
(kv_caches_ptrs¶Tensor) –Tensor of KV cache pointers (one per layer)
-
(total_token_in_kvcache¶int) –Total number of tokens in KV cache
-
(src_tensor¶Tensor) –Source tensor containing data to scatter - MHA format: [num_layers, 2, num_tokens_in_block, hidden_size] - MLA format: [num_layers, num_tokens_in_block, hidden_size]
-
(token_indices¶list[int]) –List of token positions to update
-
(is_mla¶bool, default:False) –Whether using MLA model format