vllm_gaudi.extension.utils
¶
B2BMatmul
¶
Bases: Matmul
Specialized alias for batch2block and block2batch matmul operations.
This class remains functionally identical to Matmul but is used to
semantically mark B2B-related matmuls. This enables the system to apply the
fix that uses the B2B output measurements as the input measurements during
calibration, avoiding corrupted scales from the KV‑cache.
Source code in vllm_gaudi/extension/utils.py
FP8Matmul
¶
Bases: Module
Source code in vllm_gaudi/extension/utils.py
__init__
¶
forward
¶
Source code in vllm_gaudi/extension/utils.py
matmul_fp8
¶
Source code in vllm_gaudi/extension/utils.py
ModuleFP8FusedSDPA
¶
Bases: Module
Source code in vllm_gaudi/extension/utils.py
__init__
¶
Source code in vllm_gaudi/extension/utils.py
dequant_output
¶
forward
¶
forward(
query,
key,
value,
attn_mask,
dropout_p,
is_causal,
scale,
softmax_mode,
recompute_mode,
valid_sequence_lengths,
padding_side="left",
window_size=None,
)
Source code in vllm_gaudi/extension/utils.py
ModuleFusedSDPA
¶
Bases: Module
Source code in vllm_gaudi/extension/utils.py
__init__
¶
forward
¶
forward(
query,
key,
value,
attn_mask,
dropout_p,
is_causal,
scale,
softmax_mode,
recompute_mode,
valid_sequence_lengths,
padding_side="left",
window_size=None,
sinks=None,
)
Source code in vllm_gaudi/extension/utils.py
VLLMFP8KVCache
¶
Bases: VLLMKVCache
Source code in vllm_gaudi/extension/utils.py
__init__
¶
dequant_output
¶
fetch_from_cache
¶
Source code in vllm_gaudi/extension/utils.py
forward
¶
VLLMKVCache
¶
Bases: Module