Skip to content

llmcompressor.modifiers.transform.smoothquant.dynamic_mappings

Dynamic SmoothQuant mapping builders for architectures that need model-aware logic.

Functions:

build_qwen3_5_dense_smoothquant_mappings

build_qwen3_5_dense_smoothquant_mappings(
    model: Module,
) -> list[LayerMap]

Build SmoothQuant mappings for dense Qwen3.5 hybrid-attention models.

Dense Qwen3.5 variants expose a regular mlp.gate_proj/mlp.up_proj pair instead of the MoE shared_expert submodule.

Source code in src/llmcompressor/modifiers/transform/smoothquant/dynamic_mappings.py
def build_qwen3_5_dense_smoothquant_mappings(model: Module) -> list[LayerMap]:
    """
    Build SmoothQuant mappings for dense Qwen3.5 hybrid-attention models.

    Dense Qwen3.5 variants expose a regular ``mlp.gate_proj``/``mlp.up_proj``
    pair instead of the MoE ``shared_expert`` submodule.
    """
    return _build_qwen3_5_smoothquant_mappings(
        model,
        mlp_balance_layers=[
            "re:.*mlp\\.gate_proj$",
            "re:.*mlp\\.up_proj$",
        ],
    )

build_qwen3_5_moe_smoothquant_mappings

build_qwen3_5_moe_smoothquant_mappings(
    model: Module,
) -> list[LayerMap]

Build SmoothQuant mappings for Qwen3.5 MoE hybrid-attention models.

Only full-attention layers expose self_attn q/k/v projections, so the input layernorm regex must be restricted to those layer indices. The shared expert MLP remains safe to smooth with the standard post-attention layernorm mapping.

Source code in src/llmcompressor/modifiers/transform/smoothquant/dynamic_mappings.py
def build_qwen3_5_moe_smoothquant_mappings(model: Module) -> list[LayerMap]:
    """
    Build SmoothQuant mappings for Qwen3.5 MoE hybrid-attention models.

    Only full-attention layers expose self_attn q/k/v projections, so the input
    layernorm regex must be restricted to those layer indices. The shared expert MLP
    remains safe to smooth with the standard post-attention layernorm mapping.
    """
    return _build_qwen3_5_smoothquant_mappings(
        model,
        mlp_balance_layers=[
            "re:.*mlp\\.shared_expert\\.gate_proj$",
            "re:.*mlp\\.shared_expert\\.up_proj$",
        ],
    )

get_layer_mappings_from_model

get_layer_mappings_from_model(
    model: Module,
) -> list[LayerMap]

Infer SmoothQuant mappings from a model.

Checks the dynamic mapping registry first for model-aware builders, then falls back to the static architecture registry, then to the default mappings.

Parameters:

  • model (Module) –

    model instance used to infer mappings

Returns:

  • list[LayerMap]

    list of SmoothQuant LayerMap entries for the model

Source code in src/llmcompressor/modifiers/transform/smoothquant/dynamic_mappings.py
def get_layer_mappings_from_model(model: Module) -> list[LayerMap]:
    """
    Infer SmoothQuant mappings from a model.

    Checks the dynamic mapping registry first for model-aware builders, then falls back
    to the static architecture registry, then to the default mappings.

    :param model: model instance used to infer mappings
    :return: list of SmoothQuant LayerMap entries for the model
    """
    architecture = model.__class__.__name__

    if architecture in SMOOTHQUANT_DYNAMIC_MAPPING_REGISTRY:
        return SMOOTHQUANT_DYNAMIC_MAPPING_REGISTRY[architecture](model)

    if architecture in MAPPINGS_REGISTRY:
        return MAPPINGS_REGISTRY[architecture]

    logger.info(
        f"Architecture {architecture} not found in mappings. "
        f"Using default mappings: {DEFAULT_SMOOTHQUANT_MAPPINGS}"
    )
    return DEFAULT_SMOOTHQUANT_MAPPINGS