Skip to content

vllm_omni.transformers_utils.configs.omnivoice

OmniVoice configuration for vLLM-Omni two-stage pipeline.

OmniVoiceConfig

Bases: PretrainedConfig

Configuration for OmniVoice model in vLLM-Omni.

This mirrors the HuggingFace OmniVoiceConfig but adds fields needed for the two-stage serving pipeline.

audio_codebook_weights instance-attribute

audio_codebook_weights = getattr(
    self, "audio_codebook_weights", [8, 8, 6, 6, 4, 4, 2, 2]
)

audio_mask_id instance-attribute

audio_mask_id = getattr(self, 'audio_mask_id', 1024)

audio_vocab_size instance-attribute

audio_vocab_size = getattr(self, 'audio_vocab_size', 1025)

class_temperature instance-attribute

class_temperature = getattr(self, 'class_temperature', 0.0)

cuda_graph_capture_sizes instance-attribute

cuda_graph_capture_sizes = getattr(
    self,
    "cuda_graph_capture_sizes",
    [128, 192, 256, 320, 384, 448, 512, 640, 768, 1024],
)

enable_cuda_graph instance-attribute

enable_cuda_graph = getattr(
    self,
    "enable_cuda_graph",
    get("OMNIVOICE_CUDA_GRAPH", "1") != "0",
)

frame_rate instance-attribute

frame_rate = getattr(self, 'frame_rate', 25)

guidance_scale instance-attribute

guidance_scale = getattr(self, 'guidance_scale', 2.0)

head_dim instance-attribute

head_dim = llm_head_dim

hidden_size instance-attribute

hidden_size = llm_hidden_size

layer_penalty_factor instance-attribute

layer_penalty_factor = getattr(
    self, "layer_penalty_factor", 5.0
)

llm_head_dim instance-attribute

llm_head_dim = get(
    "head_dim", llm_hidden_size // llm_num_attention_heads
)

llm_hidden_size instance-attribute

llm_hidden_size = get('hidden_size', 1024)

llm_intermediate_size instance-attribute

llm_intermediate_size = get('intermediate_size', 3072)

llm_max_position_embeddings instance-attribute

llm_max_position_embeddings = get(
    "max_position_embeddings", 40960
)

llm_num_attention_heads instance-attribute

llm_num_attention_heads = get('num_attention_heads', 16)

llm_num_hidden_layers instance-attribute

llm_num_hidden_layers = get('num_hidden_layers', 28)

llm_num_key_value_heads instance-attribute

llm_num_key_value_heads = get('num_key_value_heads', 8)

llm_rms_norm_eps instance-attribute

llm_rms_norm_eps = get('rms_norm_eps', 1e-06)

llm_rope_theta instance-attribute

llm_rope_theta = get('rope_theta', 1000000.0)

llm_vocab_size instance-attribute

llm_vocab_size = get('vocab_size', 151676)

model_type class-attribute instance-attribute

model_type = 'omnivoice'

num_attention_heads instance-attribute

num_attention_heads = llm_num_attention_heads

num_audio_codebook instance-attribute

num_audio_codebook = getattr(self, 'num_audio_codebook', 8)

num_hidden_layers instance-attribute

num_hidden_layers = llm_num_hidden_layers

num_key_value_heads instance-attribute

num_key_value_heads = llm_num_key_value_heads

num_step instance-attribute

num_step = getattr(self, 'num_step', 32)

position_temperature instance-attribute

position_temperature = getattr(
    self, "position_temperature", 5.0
)

sample_rate instance-attribute

sample_rate = getattr(self, 'sample_rate', 24000)

speculative_config instance-attribute

speculative_config = None

t_shift instance-attribute

t_shift = getattr(self, 't_shift', 0.1)

vocab_size instance-attribute

vocab_size = llm_vocab_size

get_text_config

get_text_config(**kwargs)

Return self so vLLM uses our top-level config (which has num_attention_heads etc.) instead of trying to extract a sub-config.