vllm_gaudi.extension.utils
¶
FP8Matmul
¶
Bases: Module
Source code in vllm_gaudi/extension/utils.py
ModuleFusedSDPA
¶
Bases: Module
Source code in vllm_gaudi/extension/utils.py
__init__
¶
forward
¶
forward(
query,
key,
value,
attn_mask,
dropout_p,
is_causal,
scale,
softmax_mode,
recompute_mode,
valid_sequence_lengths,
padding_side="left",
window_size=None,
)
Source code in vllm_gaudi/extension/utils.py
VLLMFP8KVCache
¶
Bases: VLLMKVCache
Source code in vllm_gaudi/extension/utils.py
__init__
¶
dequant_output
¶
fetch_from_cache
¶
Source code in vllm_gaudi/extension/utils.py
forward
¶
VLLMKVCache
¶
Bases: Module