vllm_omni.diffusion.models.cosmos3.sound_tokenizer ¶
Cosmos3 sound tokenizer integration.
DEFAULT_SOUND_LATENT_FPS module-attribute ¶
DEFAULT_SOUND_LATENT_FPS = (
DEFAULT_SOUND_SAMPLE_RATE / DEFAULT_SOUND_HOP_SIZE
)
SOUND_TOKENIZER_CHECKPOINT_NAME module-attribute ¶
SOUND_TOKENIZER_COMPONENT_NAME module-attribute ¶
Cosmos3SoundTokenizer ¶
Thin adapter around the local AVAE tokenizer implementation.
audio_channels instance-attribute ¶
audio_channels = int(
getattr(
tokenizer, "audio_channels", DEFAULT_SOUND_CHANNELS
)
)
hop_size instance-attribute ¶
hop_size = int(
getattr(
tokenizer,
"temporal_compression_factor",
DEFAULT_SOUND_HOP_SIZE,
)
)
latent_ch instance-attribute ¶
latent_ch = int(
getattr(tokenizer, "latent_ch", DEFAULT_SOUND_DIM)
)
sample_rate instance-attribute ¶
sample_rate = int(
getattr(
tokenizer, "sample_rate", DEFAULT_SOUND_SAMPLE_RATE
)
)
decode ¶
Decode sound latents.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
latents | Tensor |
| required |
Returns:
| Type | Description |
|---|---|
Tensor |
|
Tensor |
|