vllm_omni.model_executor.models.omnivoice.omnivoice_decoder ¶
OmniVoice Decoder (Stage 1) - Audio token to waveform conversion.
Implements the HiggsAudioV2 decode path using transformers' DacModel decoder and a custom RVQ quantizer, compatible with transformers 4.x.
Decode path
audio_codes [B, 8, T] → RVQ codebook lookup + project_out → sum → [B, 1024, T] → fc2 Linear(1024, 256) → [B, 256, T] → DAC acoustic decoder (conv transpose upsampling) → [B, 1, T*960] → 24kHz waveform (25fps × 960 samples/frame)
HiggsAudioRVQ ¶
Bases: Module
Residual Vector Quantizer with 8 codebook layers.
quantizers instance-attribute ¶
quantizers = ModuleList(
[
(
HiggsAudioVQLayer(
codebook_size, codebook_dim, hidden_size
)
)
for _ in (range(num_quantizers))
]
)
HiggsAudioVQLayer ¶
Bases: Module
Single VQ layer: codebook lookup + project_out.
OmniVoiceDecoder ¶
Bases: Module
OmniVoice Stage 1: Token-to-audio decoder.
Uses DAC acoustic decoder from transformers + custom HiggsAudio RVQ quantizer to convert 8-codebook tokens into 24kHz waveform.
forward ¶
Decode audio tokens to waveform.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio_codes | Tensor | [B, 8, T] - 8-codebook audio token IDs | required |
Returns:
| Name | Type | Description |
|---|---|---|
waveform | Tensor | [B, 1, audio_samples] at 24kHz |