vllm.models.deepseek_v32 ¶
DeepSeek V3.2 (deepseek_v32) model — hardware-isolated entry point.
DeepSeek V3.2 introduced the DeepSeek Sparse Attention (DSA) architecture: MLA + a "lightning indexer" that selects the top-k tokens for a sparse MLA attend. The same model code serves any DSA checkpoint, including GLM-5.2 (glm_moe_dsa), which reuses this architecture.
Modules:
-
nvidia–
Classes:
-
DeepseekV32ForCausalLM–DSA causal LM — DeepSeek V2/V3 orchestration with the DSA backbone.
DeepseekV32ForCausalLM ¶
Bases: DeepseekV2ForCausalLM
DSA causal LM — DeepSeek V2/V3 orchestration with the DSA backbone.
Serves DeepSeek V3.2 and any architecture reusing DSA (e.g. GLM-5.2).