vllm.v1.attention.backends.mla.prefill ¶
Modules:
-
aiter_flash_attn–AITER FlashAttention backend for MLA prefill (ROCm).
-
base–Abstract base class for MLA prefill backends.
-
flash_attn–FlashAttention backend for MLA prefill.
-
flashinfer–FlashInfer backend for MLA prefill.
-
registry–Registry for MLA prefill backends.
-
selector–Selector for MLA prefill backends.
-
tokenspeed_mla–TokenSpeed CuTe DSL backend for MLA prefill.
-
trtllm_ragged–TRT-LLM Ragged backend for MLA prefill.
Classes:
-
MLAPrefillBackend–Abstract base class for MLA prefill backends.
-
MLAPrefillBackendEnum–Enumeration of all supported MLA prefill backends.
Functions:
-
get_mla_prefill_backend–Select the MLA prefill backend based on configuration and device.
-
register_mla_prefill_backend–Register or override an MLA prefill backend implementation.
MLAPrefillBackend ¶
Bases: ABC
Abstract base class for MLA prefill backends.
Methods:
-
prepare_metadata–Prepare backend-specific metadata before the forward pass.
-
supports_quant_output–Whether
run_prefill_new_tokenscan write quantized output
Source code in vllm/v1/attention/backends/mla/prefill/base.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | |
prepare_metadata(prefill_metadata) ¶
Prepare backend-specific metadata before the forward pass.
Called by the metadata builder after constructing the prefill metadata.
Source code in vllm/v1/attention/backends/mla/prefill/base.py
supports_quant_output(quant_key) ¶
Whether run_prefill_new_tokens can write quantized output directly (fused) for the given quant key, skipping the post-quant pass. Overridden by backends that support it.
Source code in vllm/v1/attention/backends/mla/prefill/base.py
MLAPrefillBackendEnum ¶
Bases: Enum
Enumeration of all supported MLA prefill backends.
Methods:
-
clear_override–Clear any override for this backend, reverting to the default.
-
get_class–Get the backend class (respects overrides).
-
get_path–Get the class path for this backend (respects overrides).
-
is_overridden–Check if this backend has been overridden.
Source code in vllm/v1/attention/backends/mla/prefill/registry.py
clear_override() ¶
get_class() ¶
Get the backend class (respects overrides).
Returns:
-
type[MLAPrefillBackend]–The backend class
Raises:
-
ImportError–If the backend class cannot be imported
-
ValueError–If CUSTOM is used without being registered
Source code in vllm/v1/attention/backends/mla/prefill/registry.py
get_path() ¶
Get the class path for this backend (respects overrides).
Returns:
-
str–The fully qualified class path string
Raises:
-
ValueError–If Backend.CUSTOM is used without being registered
Source code in vllm/v1/attention/backends/mla/prefill/registry.py
get_mla_prefill_backend(vllm_config) ¶
Select the MLA prefill backend based on configuration and device.
This function first checks for explicit user preferences via mla_prefill_backend in AttentionConfig, then falls back to automatic priority-based selection.
Parameters:
-
(vllm_config¶VllmConfig) –The vLLM configuration.
Returns:
-
type[MLAPrefillBackend]–The selected prefill backend class.
Source code in vllm/v1/attention/backends/mla/prefill/selector.py
register_mla_prefill_backend(backend, class_path=None) ¶
Register or override an MLA prefill backend implementation.
Parameters:
-
(backend¶MLAPrefillBackendEnum) –The MLAPrefillBackendEnum member to register.
-
(class_path¶str | None, default:None) –Optional class path. If not provided and used as decorator, will be auto-generated from the class.
Returns:
Examples:
Override an existing MLA prefill backend¶
@register_mla_prefill_backend(MLAPrefillBackendEnum.FLASH_ATTN) class MyCustomFlashAttn(MLAPrefillBackend): ...
Register a custom third-party MLA prefill backend¶
@register_mla_prefill_backend(MLAPrefillBackendEnum.CUSTOM) class MyCustomPrefillBackend(MLAPrefillBackend): ...
Direct registration¶
register_mla_prefill_backend( MLAPrefillBackendEnum.CUSTOM, "my.module.MyCustomPrefillBackend" )