Skip to content

vllm_omni.model_executor.models.higgs_audio_v2

vllm-omni integration for boson-ai's higgs-audio v2 (two-stage TTS).

Modules:

Name Description
configuration_higgs_audio_v2

Configuration class for higgs-audio v2 in vllm-omni.

higgs_audio_decoder

HiggsAudio codec decoder kernel for higgs-audio v2.

higgs_audio_v2_code2wav

Stage 1 (codec decoder) for higgs-audio v2.

higgs_audio_v2_talker

Stage-0 talker for higgs-audio v2 (vLLM-native, DualFFN-aware).

higgs_audio_v2_tokenizer

TTS prompt builder + scope validators for higgs-audio v2.

pipeline

higgs-audio v2 pipeline: Talker (text -> 8-codebook codec) -> Code2Wav (codec -> 24 kHz PCM).

HiggsAudioV2Config

Bases: PretrainedConfig

Typed wrapper around the upstream higgs_audio_v2 config.

The field set here is a strict subset of the upstream transformers.models.higgs_audio_v2.HiggsAudioV2Config: only the knobs that vllm-omni reads directly are surfaced. Anything extra coming from the HF config dict is preserved via PretrainedConfig so AutoConfig.from_pretrained(...) round-trips correctly.

DEFAULT_ARCHITECTURES class-attribute instance-attribute

DEFAULT_ARCHITECTURES = (
    "HiggsAudioV2ForConditionalGeneration",
)

attention_bias instance-attribute

attention_bias = attention_bias

attention_dropout instance-attribute

attention_dropout = attention_dropout

audio_bos_token_id instance-attribute

audio_bos_token_id = audio_bos_token_id

audio_delay_token_id instance-attribute

audio_delay_token_id = audio_delay_token_id

audio_eos_token_id instance-attribute

audio_eos_token_id = audio_eos_token_id

audio_stream_bos_id instance-attribute

audio_stream_bos_id = audio_stream_bos_id

audio_stream_eos_id instance-attribute

audio_stream_eos_id = audio_stream_eos_id

audio_token_id instance-attribute

audio_token_id = audio_token_id

audio_tokenizer_id instance-attribute

audio_tokenizer_id = audio_tokenizer_id

audio_tokenizer_subdir instance-attribute

audio_tokenizer_subdir = audio_tokenizer_subdir

codebook_output_size property

codebook_output_size: int

codebook_size instance-attribute

codebook_size = codebook_size

frame_rate instance-attribute

frame_rate = frame_rate

head_dim instance-attribute

head_dim = head_dim

hidden_act instance-attribute

hidden_act = hidden_act

hidden_size instance-attribute

hidden_size = hidden_size

initializer_range instance-attribute

initializer_range = initializer_range

intermediate_size instance-attribute

intermediate_size = intermediate_size

keys_to_ignore_at_inference class-attribute instance-attribute

keys_to_ignore_at_inference = ('past_key_values',)

max_position_embeddings instance-attribute

max_position_embeddings = max_position_embeddings

mlp_bias instance-attribute

mlp_bias = mlp_bias

model_type class-attribute instance-attribute

model_type: str = 'higgs_audio_v2'

num_attention_heads instance-attribute

num_attention_heads = num_attention_heads

num_codebooks instance-attribute

num_codebooks = num_codebooks

num_hidden_layers instance-attribute

num_hidden_layers = num_hidden_layers

num_key_value_heads instance-attribute

num_key_value_heads = num_key_value_heads

num_real_codes property

num_real_codes: int

pretraining_tp instance-attribute

pretraining_tp = pretraining_tp

rms_norm_eps instance-attribute

rms_norm_eps = rms_norm_eps

rope_parameters instance-attribute

rope_parameters = dict(rope_parameters)

sample_rate instance-attribute

sample_rate = sample_rate

use_audio_dual_ffn instance-attribute

use_audio_dual_ffn = use_audio_dual_ffn

use_cache instance-attribute

use_cache = use_cache

vocab_size instance-attribute

vocab_size = vocab_size

default_delay_pattern

default_delay_pattern() -> list[int]