vllm_omni.model_executor.models.cosyvoice3.utils ¶
concat_text_with_prompt_ids ¶
concat_text_with_prompt_ids(
text: Tensor,
text_len: Tensor,
prompt_text: Tensor,
prompt_text_len: Tensor,
) -> tuple[Tensor, Tensor]
log_mel_spectrogram ¶
log_mel_spectrogram(
audio: str | ndarray | Tensor,
n_mels: int = 80,
padding: int = 0,
device: str | device | None = None,
)
Compute the log-Mel spectrogram of
Parameters¶
audio: Union[str, np.ndarray, torch.Tensor], shape = (*) The path to audio or either a NumPy array or Tensor containing the audio waveform in 16 kHz
int
The number of Mel-frequency filters, only 80 and 128 are supported
int
Number of zero samples to pad to the right
Optional[Union[str, torch.device]]
If given, the audio tensor is moved to this device before STFT
Returns¶
torch.Tensor, shape = (n_mels, n_frames) A Tensor that contains the Mel spectrogram
make_pad_mask ¶
make_pad_mask(lengths: Tensor, max_len: int = 0) -> Tensor
Make mask tensor containing indices of padded part.
See description of make_non_pad_mask.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lengths | Tensor | Batch of lengths (B,). | required |
Returns: torch.Tensor: Mask tensor containing indices of padded part.
Examples:
mel_filters cached ¶
mel_filters(device, n_mels: int) -> Tensor
Compute mel filterbank matrix for projecting STFT into a Mel spectrogram.