Skip to content

vllm_omni.transformers_utils.configs.voxcpm2

VoxCPM2Config

Bases: PretrainedConfig

Configuration for VoxCPM2 native AR integration.

The HuggingFace checkpoint stores LM parameters inside a nested lm_config dict. This class hoists them to top-level attributes so that vllm's MiniCPMModel can consume them directly.

vllm's MiniCPM always applies muP scaling (scale_emb, scale_depth, dim_model_base). VoxCPM2 was trained with use_mup=false, so we neutralise the scalings: * scale_emb = 1.0 * scale_depth = sqrt(num_hidden_layers) (cancels the division) * dim_model_base = hidden_size (makes scale_width = 1.0)

architecture instance-attribute

architecture = architecture

audio_vae_config instance-attribute

audio_vae_config = audio_vae_config or {}

device instance-attribute

device = device

dim_model_base instance-attribute

dim_model_base = get('dim_model_base', hidden_size)

dit_config instance-attribute

dit_config = dit_config or {}

dtype instance-attribute

dtype = dtype

encoder_config instance-attribute

encoder_config = encoder_config or {}

feat_dim instance-attribute

feat_dim = feat_dim

head_dim instance-attribute

head_dim = kv_channels

hidden_act instance-attribute

hidden_act = 'silu'

hidden_act_param instance-attribute

hidden_act_param = 0.0

hidden_size instance-attribute

hidden_size = get('hidden_size', hidden_size)

intermediate_size instance-attribute

intermediate_size = get(
    "intermediate_size", intermediate_size
)

keys_to_ignore_at_inference class-attribute instance-attribute

keys_to_ignore_at_inference = ['past_key_values']

lm_config instance-attribute

lm_config = lm_config or {}

max_length instance-attribute

max_length = max_length

max_position_embeddings instance-attribute

max_position_embeddings = get(
    "max_position_embeddings", max_position_embeddings
)

model_type class-attribute instance-attribute

model_type = 'voxcpm2'

num_attention_heads instance-attribute

num_attention_heads = get(
    "num_attention_heads", num_attention_heads
)

num_experts instance-attribute

num_experts = 0

num_hidden_layers instance-attribute

num_hidden_layers = get(
    "num_hidden_layers", num_hidden_layers
)

num_key_value_heads instance-attribute

num_key_value_heads = get(
    "num_key_value_heads", num_key_value_heads
)

patch_size instance-attribute

patch_size = patch_size

residual_lm_no_rope instance-attribute

residual_lm_no_rope = residual_lm_no_rope

residual_lm_num_layers instance-attribute

residual_lm_num_layers = residual_lm_num_layers

rms_norm_eps instance-attribute

rms_norm_eps = get('rms_norm_eps', rms_norm_eps)

rope_parameters instance-attribute

rope_parameters = rp

rope_scaling instance-attribute

rope_scaling = dict(raw_rope)

rope_theta instance-attribute

rope_theta = get('rope_theta', rope_theta)

scalar_quantization_latent_dim instance-attribute

scalar_quantization_latent_dim = (
    scalar_quantization_latent_dim
)

scalar_quantization_scale instance-attribute

scalar_quantization_scale = scalar_quantization_scale

scale_depth instance-attribute

scale_depth = get('scale_depth', 1.0)

scale_emb instance-attribute

scale_emb = get('scale_emb', 1.0)

tie_word_embeddings instance-attribute

tie_word_embeddings = False

vocab_size instance-attribute

vocab_size = get('vocab_size', vocab_size)

get_text_config

get_text_config(**kwargs)

Return self as the text config — LM attributes are top-level.