vllm.model_executor.layers.fused_moe.prepare_finalize
MoEPrepareAndFinalizeNoEP
¶
Bases: FusedMoEPrepareAndFinalize
Source code in vllm/model_executor/layers/fused_moe/prepare_finalize.py
__init__
¶
__init__(
quant_dtype: Optional[dtype] = None,
per_channel_quant: bool = False,
block_shape: Optional[list[int]] = None,
)
Source code in vllm/model_executor/layers/fused_moe/prepare_finalize.py
finalize
¶
finalize(
output: Tensor,
fused_expert_output: Tensor,
topk_weights: Tensor,
topk_ids: Tensor,
apply_router_weight_on_input: bool,
) -> None
Source code in vllm/model_executor/layers/fused_moe/prepare_finalize.py
prepare
¶
prepare(
a1: Tensor,
a1_scale: Optional[Tensor],
a2_scale: Optional[Tensor],
topk_weights: Tensor,
topk_ids: Tensor,
num_experts: int,
expert_map: Optional[Tensor],
apply_router_weight_on_input: bool = False,
) -> tuple[Tensor, Optional[Tensor], Optional[Tensor]]