vllm_omni.model_executor.models.higgs_audio_v3.higgs_audio_v3_talker ¶
Stage-0 talker for higgs-audio v3 (Qwen3 backbone, fused multi-codebook).
Architecture: - Backbone: Qwen3 (~4B, 36 layers, 2560 hidden, GQA 32/8). No DualFFN. - Fused multi-codebook embedding: [N*V, D] weight, offset lookup, sum across N - Fused multi-codebook head: same weight (tied), reshape to [L, N, V] - MusicGen-style delay pattern [0,1,...,7] with BOC/EOC - Audio feedback: replace continuation-token embedding with fused codebook embed
Weight loading maps from the HF checkpoint's prefixes: tied.embedding.text_embedding. -> model.embed_tokens. body.layers. -> model.layers. body.norm. -> model.norm. tied.head.text_head. -> lm_head. tied.embedding.modality_embeddings.0.embedding. -> multimodal_embedding. tied.embedding.modality_embeddings.0.model. -> skipped (codec for code2wav) tied.head.modality_heads.0. -> skipped when tied
HiggsAudioV3TalkerForConditionalGeneration ¶
Bases: Module
Stage-0 talker for higgs-audio v3.
Wraps Qwen3Model backbone + fused multi-codebook modules for TTS generation with MusicGen-style delay pattern sampling and audio feedback embedding.
modality_head instance-attribute ¶
modality_head = HiggsFusedMultiTextHead(
num_codebooks, codebook_size, hidden_size
)
model instance-attribute ¶
model = Qwen3Model(
vllm_config=backbone_vllm_config,
prefix=f"{prefix}.model" if prefix else "model",
)
multimodal_embedding instance-attribute ¶
multimodal_embedding = HiggsFusedMultiTextEmbedding(
num_codebooks, codebook_size, hidden_size
)
postprocess_uses_hidden_states class-attribute instance-attribute ¶
postprocess_uses_hidden_states: bool = False
postprocess_uses_multimodal_outputs class-attribute instance-attribute ¶
postprocess_uses_multimodal_outputs: bool = False
postprocess_uses_req_infos class-attribute instance-attribute ¶
postprocess_uses_req_infos: bool = False
requires_full_prefix_cached_hidden_states instance-attribute ¶
skips_model_sampler_output_token_history class-attribute instance-attribute ¶
skips_model_sampler_output_token_history: bool = True
supports_omni_query_start_loc class-attribute instance-attribute ¶
supports_omni_query_start_loc: bool = True
forward ¶
forward(
input_ids: Tensor,
positions: Tensor,
intermediate_tensors: Any | None = None,
inputs_embeds: Tensor | None = None,
**kwargs: Any,
) -> Tensor
make_omni_output ¶
make_omni_output(
model_outputs: Tensor | OmniOutput, **kwargs: Any
) -> OmniOutput
postprocess ¶
postprocess(
hidden_states_slice: Tensor,
multimodal_outputs: Any = None,
**req_infos: Any,
) -> dict[str, Any]
Publish per-request audio codes into model_intermediate_buffer.
Called once per request in batch order. Indexes _last_audio_codes by a running cursor (one row per request per step).
sample ¶
Model-owned sampler with delay-pattern audio dispatch.
Mirrors v2's pattern: bias LM logits to force audio continuation, sample multi-codebook codes via the fused head, apply delay pattern, and accumulate per-request state.
HiggsFusedMultiTextEmbedding ¶
Bases: Module
Fused multi-codebook embedding: [N*V, D] weight + offset lookup.
HiggsFusedMultiTextHead ¶
Bases: Module
Fused multi-codebook head: [L, D] -> [L, N, V] via one linear.