vllm_omni.model_executor.models.voxcpm2.voxcpm2_talker ¶
VoxCPM2 AR talker — PagedAttention pipeline with per-request state.
Architecture
MiniCPM4PagedForVoxCPM2 (base_lm, 28 layers, PagedAttention + fp32 RoPE) → FSQ → MiniCPM4PagedResidualLM (8 layers, PagedAttention, no RoPE) → LocDiT (CFM solver) → AudioVAE → 48kHz waveform
VoxCPM2TalkerForConditionalGeneration ¶
Bases: Module
hf_to_vllm_mapper class-attribute instance-attribute ¶
make_empty_intermediate_tensors instance-attribute ¶
model instance-attribute ¶
model = MiniCPM4PagedForVoxCPM2(
vllm_config=vllm_config,
prefix=maybe_prefix(prefix, "model"),
)
residual_model instance-attribute ¶
residual_model = MiniCPM4PagedResidualLM(
vllm_config=vllm_config,
prefix=maybe_prefix(prefix, "residual_model"),
)
compute_logits ¶
compute_logits(
hidden_states: Tensor | OmniOutput,
sampling_metadata: Any = None,
) -> Tensor | None
forward ¶
forward(
input_ids: Tensor,
positions: Tensor,
intermediate_tensors: IntermediateTensors | None = None,
inputs_embeds: Tensor | None = None,
**kwargs: Any,
) -> Tensor | IntermediateTensors
make_omni_output ¶
make_omni_output(
model_outputs: Tensor | OmniOutput, **kwargs: Any
) -> OmniOutput
build_cjk_split_map ¶
Build {multichar_cjk_token_id: [single_char_ids]} from tokenizer vocab.
build_voxcpm2_prompt ¶
build_voxcpm2_prompt(
hf_config: Any,
tokenizer: Any,
split_map: dict[int, list[int]],
text: str,
ref_audio: Any | None = None,
ref_sr: int | None = None,
ref_text: str | None = None,
voice_profile: dict[str, Any] | None = None,
) -> dict[str, Any]
Build a VoxCPM2 prefill prompt whose prompt_token_ids length matches the talker-side prefill length.
Used by both online serving (serving_speech._build_voxcpm2_prompt) and the offline example, so the talker-side length assertion never fires.