Skip to content

vllm_omni.model_executor.models.higgs_audio_v3

vllm-omni integration for boson-ai's higgs-audio v3 (two-stage TTS).

Modules:

Name Description
configuration_higgs_audio_v3

Configuration class for higgs-audio v3 (HiggsMultimodalQwen3) in vllm-omni.

higgs_audio_v3_code2wav

Stage 1 (codec decoder) for higgs-audio v3.

higgs_audio_v3_talker

Stage-0 talker for higgs-audio v3 (Qwen3 backbone, fused multi-codebook).

higgs_audio_v3_tokenizer

Prompt builder for higgs-audio v3 TTS.

pipeline

higgs-audio v3 pipeline: Talker (text -> 8-codebook codec) -> Code2Wav (codec -> 24 kHz PCM).

ref_audio_cache

Bounded LRU cache for Higgs Audio v3 reference-audio codec encoding.

HiggsAudioV3Config

Bases: PretrainedConfig

Typed config for higgs-audio v3 (HiggsMultimodalQwen3).

from_pretrained() automatically resolves <|tts|>, <|text|>, <|audio|> and eos_token_id from the checkpoint tokenizer.

audio_continuation_id instance-attribute

audio_continuation_id = audio_continuation_id

audio_encoder_config instance-attribute

audio_encoder_config = audio_encoder_config

audio_hidden_size instance-attribute

audio_hidden_size = int(get('out_dim', hidden_size))

audio_stream_bos_id instance-attribute

audio_stream_bos_id = audio_stream_bos_id

audio_stream_eos_id instance-attribute

audio_stream_eos_id = audio_stream_eos_id

audio_token_id instance-attribute

audio_token_id = audio_token_id

codebook_size instance-attribute

codebook_size = int(get('vocab_size', codebook_size))

frame_rate instance-attribute

frame_rate = frame_rate

hidden_size property

hidden_size: int

is_composition class-attribute instance-attribute

is_composition = True

mel_per_sample instance-attribute

mel_per_sample = mel_per_sample

model_type class-attribute instance-attribute

model_type: str = 'higgs_multimodal_qwen3'

num_codebooks instance-attribute

num_codebooks = int(get('num_codebooks', num_codebooks))

num_real_codes property

num_real_codes: int

sample_rate instance-attribute

sample_rate = sample_rate

text_config instance-attribute

text_config = _build_text_config(text_config)

text_token_id instance-attribute

text_token_id = text_token_id

tie_modality_embeddings instance-attribute

tie_modality_embeddings = bool(
    get("tie_word_embeddings", True)
)

tts_token_id instance-attribute

tts_token_id = tts_token_id

from_pretrained classmethod

from_pretrained(
    pretrained_model_name_or_path: str, **kwargs: Any
) -> HiggsAudioV3Config

Load config and resolve special token IDs from the checkpoint tokenizer.

Passes the original pretrained_model_name_or_path (local dir or HF repo ID) directly to AutoTokenizer.from_pretrained() so it can handle cache hits, downloads, and local paths uniformly. Raises if the tokenizer is missing required specials.

get_text_config

get_text_config(decoder: bool = False) -> PretrainedConfig

resolve_special_tokens

resolve_special_tokens(model_path: str) -> None

Resolve <|tts|>, <|text|>, <|audio|> and eos from the HF tokenizer.

Raises ValueError if any of the 3 required specials is missing from the tokenizer's added vocabulary.