vllm_omni.transformers_utils.configs.omnivoice ¶
OmniVoice configuration for vLLM-Omni two-stage pipeline.
OmniVoiceConfig ¶
Bases: PretrainedConfig
Configuration for OmniVoice model in vLLM-Omni.
This mirrors the HuggingFace OmniVoiceConfig but adds fields needed for the two-stage serving pipeline.
audio_codebook_weights instance-attribute ¶
audio_codebook_weights = getattr(
self, "audio_codebook_weights", [8, 8, 6, 6, 4, 4, 2, 2]
)
cuda_graph_capture_sizes instance-attribute ¶
cuda_graph_capture_sizes = getattr(
self,
"cuda_graph_capture_sizes",
[128, 192, 256, 320, 384, 448, 512, 640, 768, 1024],
)
enable_cuda_graph instance-attribute ¶
enable_cuda_graph = getattr(
self,
"enable_cuda_graph",
get("OMNIVOICE_CUDA_GRAPH", "1") != "0",
)
layer_penalty_factor instance-attribute ¶
layer_penalty_factor = getattr(
self, "layer_penalty_factor", 5.0
)
llm_head_dim instance-attribute ¶
llm_max_position_embeddings instance-attribute ¶
llm_num_attention_heads instance-attribute ¶
llm_num_key_value_heads instance-attribute ¶
position_temperature instance-attribute ¶
position_temperature = getattr(
self, "position_temperature", 5.0
)
get_text_config ¶
Return self so vLLM uses our top-level config (which has num_attention_heads etc.) instead of trying to extract a sub-config.