Skip to content

Qwen3.5

Quantization examples for the Qwen3.5 family of models, including dense vision-language and sparse MoE variants.

Note: These examples require transformers >= v5, which can be installed with:

uv pip install --upgrade transformers
With this, the examples can run end-to-end.

Pre-quantized Checkpoints

Model Format Hugging Face Link
Qwen3.5-4B FP8-dynamic RedHatAI/Qwen3.5-4B-FP8-dynamic
Qwen3.5-4B W4A16 RedHatAI/Qwen3.5-4B-quantized.w4a16
Qwen3.5-4B W8A8 RedHatAI/Qwen3.5-4B-quantized.w8a8
Qwen3.5-9B FP8-dynamic RedHatAI/Qwen3.5-9B-FP8-dynamic
Qwen3.5-9B W4A16 RedHatAI/Qwen3.5-9B-quantized.w4a16
Qwen3.5-9B W8A8 RedHatAI/Qwen3.5-9B-quantized.w8a8
Qwen3.5-35B-A3B FP8-dynamic RedHatAI/Qwen3.5-35B-A3B-FP8-dynamic
Qwen3.5-122B-A10B FP8-dynamic RedHatAI/Qwen3.5-122B-A10B-FP8-dynamic
Qwen3.5-122B-A10B NVFP4 RedHatAI/Qwen3.5-122B-A10B-NVFP4
Qwen3.5-397B-A17B FP8-dynamic RedHatAI/Qwen3.5-397B-A17B-FP8-dynamic