vllm.model_executor.kernels.linear.mxfp8 ¶
Modules:
-
Mxfp8LinearKernel– -
emulation– -
flashinfer– -
marlin– -
rocm_native–Native MXFP8 linear GEMM for AMD CDNA4 (gfx950) via Triton
tl.dot_scaled. -
xpu–
Classes:
-
Mxfp8LinearLayerConfig–Configuration for an MXFP8 linear layer.
Mxfp8LinearLayerConfig dataclass ¶
Configuration for an MXFP8 linear layer.
All MXFP8 layers share the same structure: FP8-E4M3 weights with uint8 (E8M0) per-block scales at block size 32.