vllm.model_executor.layers.quantization.compressed_tensors.compressed_tensors_moe.rocm_moe_rdna ¶
ROCm MoE kernel dispatcher.
Selects architecture-specific native HIP MoE kernels in priority order. Falls back to the Triton WNA16 path when no native kernel is available.
Functions:
-
is_supported–Check if a native ROCm MoE kernel is available for this config.
-
make_method–Create the native ROCm MoE method. Call only after is_supported().
is_supported(weight_quant) ¶
Check if a native ROCm MoE kernel is available for this config.
Source code in vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe/rocm_moe_rdna.py
make_method(weight_quant, input_quant, moe_config) ¶
Create the native ROCm MoE method. Call only after is_supported().