Fp8 nvfp4 example
DeepSeekV4
Quantizing DeepSeekV4
DeepSeekV4 is currently supported experimentally in LLM Compressor. You can quantize the model using the following steps:
-
Install LLM Compressor from scratch and check out the DeepSeekV4 experimental branch
-
Run the example. You can replace the MODEL_ID with any of
inference-optimization/DSV4-tiny-emptyorRedHatAI/DeepSeek-V4-Flash-BF16 -
Convert the model. This is to account for issues with the DSV4 transformers model definition which may be fixed at a future date
-
Install vllm with the experimental branch. You may want to rebase on main
-
Serve with vLLM