vllm_omni.diffusion.models.cosmos3.audio_tokenizer.avae ¶
Diffusers-format AVAE audio tokenizer used by Cosmos3 sound generation.
Cosmos3AVAEAudioTokenizer ¶
Bases: Module
Decoder-only AVAE tokenizer for Cosmos3 audio latents.
audio_channels instance-attribute ¶
audio_channels = int(
_config_get(
config,
"dec_out_channels",
"audio_channels",
default=2
if bool(get("stereo", audio_channels == 2))
else 1,
)
)
decoder instance-attribute ¶
decoder = OobleckDecoder(
channels=int(
_config_get(config, "dec_dim", default=320)
),
input_channels=latent_ch,
audio_channels=audio_channels,
upsampling_ratios=list(reversed(dec_strides)),
channel_multiples=list(
_config_get(
config, "dec_c_mults", default=[1, 2, 4, 8, 16]
)
),
)
hop_size instance-attribute ¶
hop_size = int(
_config_get(
config,
"hop_size",
default=prod(dec_strides)
if dec_strides
else hop_size,
)
)
latent_ch instance-attribute ¶
latent_ch = int(
_config_get(
config,
"vocoder_input_dim",
"io_channels",
"latent_ch",
default=io_channels,
)
)
sample_rate instance-attribute ¶
tanh_clamp instance-attribute ¶
tanh_input_scale instance-attribute ¶
tanh_output_scale instance-attribute ¶
OobleckDecoder ¶
OobleckDecoderBlock ¶
OobleckResidualUnit ¶
Bases: Module
Residual unit used by the diffusers Oobleck decoder.
conv1 instance-attribute ¶
conv1 = weight_norm(
Conv1d(
dimension,
dimension,
kernel_size=7,
dilation=dilation,
padding=pad,
)
)
Snake1d ¶
Bases: Module
One-dimensional Snake activation matching diffusers' Oobleck layout.