vllm.model_executor.layers.quantization.compressed_tensors.compressed_tensors_moe ¶
Modules:
-
compressed_tensors_moe_w4a4_nvfp4– -
compressed_tensors_moe_w4a8_int8– -
compressed_tensors_moe_w8a8_fp8– -
compressed_tensors_moe_w8a8_int8– -
compressed_tensors_moe_w8a8_mxfp8– -
compressed_tensors_moe_wna16_marlin– -
compressed_tensors_moe_wna16_rdna3–CompressedTensors MoE W4A16 using the fused RDNA3 (gfx1100) HIP kernel.
-
rocm_moe_rdna–ROCm MoE kernel dispatcher.