vllm_omni.model_executor.models.voxcpm2 ¶
Modules:
| Name | Description |
|---|---|
minicpm4_hf_compat | fp32 RoPE + MLP matching native VoxCPM2 numerics. |
minicpm4_paged | MiniCPM4 with PagedAttention + fp32 RoPE/RMSNorm for VoxCPM2. |
pipeline | VoxCPM2 pipeline topology (frozen). |
voxcpm2_import_utils | Dynamic import utilities for the native VoxCPM2 package. |
voxcpm2_talker | VoxCPM2 AR talker — PagedAttention pipeline with per-request state. |
VoxCPM2TalkerForConditionalGeneration ¶
Bases: Module
hf_to_vllm_mapper class-attribute instance-attribute ¶
make_empty_intermediate_tensors instance-attribute ¶
model instance-attribute ¶
model = MiniCPM4PagedForVoxCPM2(
vllm_config=vllm_config,
prefix=maybe_prefix(prefix, "model"),
)
residual_model instance-attribute ¶
residual_model = MiniCPM4PagedResidualLM(
vllm_config=vllm_config,
prefix=maybe_prefix(prefix, "residual_model"),
)
compute_logits ¶
compute_logits(
hidden_states: Tensor | OmniOutput,
sampling_metadata: Any = None,
) -> Tensor | None
forward ¶
forward(
input_ids: Tensor,
positions: Tensor,
intermediate_tensors: IntermediateTensors | None = None,
inputs_embeds: Tensor | None = None,
**kwargs: Any,
) -> Tensor | IntermediateTensors
make_omni_output ¶
make_omni_output(
model_outputs: Tensor | OmniOutput, **kwargs: Any
) -> OmniOutput