vllm.model_executor.layers.fused_moe.oracle.int8 ¶
Functions:
-
convert_to_int8_moe_kernel_format–Convert INT8 MoE weights to backend-specific kernel format.
-
map_int8_backend–Map user's MoEBackend to Int8MoeBackend.
-
select_int8_moe_backend–Select the primary Int8 MoE backend.
_get_priority_backends(moe_config) ¶
Get available backends in priority order based on platform and config.
Source code in vllm/model_executor/layers/fused_moe/oracle/int8.py
convert_to_int8_moe_kernel_format(int8_backend, w13, w2) ¶
Convert INT8 MoE weights to backend-specific kernel format.
Source code in vllm/model_executor/layers/fused_moe/oracle/int8.py
map_int8_backend(runner_backend) ¶
Map user's MoEBackend to Int8MoeBackend.
Source code in vllm/model_executor/layers/fused_moe/oracle/int8.py
select_int8_moe_backend(config, weight_key=kInt8StaticChannelSym, activation_key=kInt8DynamicTokenSym) ¶
Select the primary Int8 MoE backend. Note: Shape-specific fallbacks may still occur at runtime.