vllm_omni.model_executor.models.higgs_audio_v3.higgs_audio_v3_tokenizer ¶
Prompt builder for higgs-audio v3 TTS.
Prompt formats
Zero-shot: <|tts|> <|text|> {text tokens} <|audio|> Voice clone (no ref text): <|tts|> <|ref_audio|> [-100]×N <|text|> {text tokens} <|audio|> Voice clone (with ref text): <|tts|> <|ref_text|> {ref text tokens} <|ref_audio|> [-100]×N <|text|> {text tokens} <|audio|>
-100 placeholders are replaced at prefill time with fused multi-codebook embeddings of the delay-pattern-encoded reference audio codes.
HiggsAudioV3TokenizerAdapter ¶
apply_delay_pattern ¶
Apply MusicGen-style delay pattern to raw codes.
Input: [T, N] raw codes (T frames, N codebooks). Output: [T + N - 1, N] delayed codes with BOC/EOC padding.
Codebook c is delayed by c positions: rows 0..c-1 get BOC, rows c..c+T-1 get real codes, rows c+T..T+N-2 get EOC.