vllm_omni.model_executor.models.cosyvoice3.utils ¶
concat_text_with_prompt_ids ¶
concat_text_with_prompt_ids(
text: Tensor,
text_len: Tensor,
prompt_text: Tensor,
prompt_text_len: Tensor,
) -> tuple[Tensor, Tensor]
extract_spk_embedding_trt ¶
TensorRT counterpart of extract_spk_embedding.
Identical fbank front-end; the campplus forward runs on a prebuilt TensorRT engine (GPU) instead of the CPU ONNX-Runtime session. Returns the same [1, 192] embedding tensor on device.
log_mel_spectrogram ¶
log_mel_spectrogram(
audio: str | ndarray | Tensor,
n_mels: int = 80,
padding: int = 0,
device: str | device | None = None,
)
Compute the log-Mel spectrogram of
Parameters¶
audio: Union[str, np.ndarray, torch.Tensor], shape = (*) The path to audio or either a NumPy array or Tensor containing the audio waveform in 16 kHz
int
The number of Mel-frequency filters, only 80 and 128 are supported
int
Number of zero samples to pad to the right
Optional[Union[str, torch.device]]
If given, the audio tensor is moved to this device before STFT
Returns¶
torch.Tensor, shape = (n_mels, n_frames) A Tensor that contains the Mel spectrogram
make_pad_mask ¶
make_pad_mask(lengths: Tensor, max_len: int = 0) -> Tensor
Make mask tensor containing indices of padded part.
See description of make_non_pad_mask.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lengths | Tensor | Batch of lengths (B,). | required |
Returns: torch.Tensor: Mask tensor containing indices of padded part.
Examples:
mel_filters cached ¶
mel_filters(device, n_mels: int) -> Tensor
Compute mel filterbank matrix for projecting STFT into a Mel spectrogram.
mel_spectrogram ¶
unpad_prompt_conditioning ¶
Drop right-padding from per-request prompt conditioning.
The talker emits speech_token [1, T] possibly right-padded to the batch max, plus the true speech_token_len; speech_feat is at the 2:1 mel:token ratio ([1, 2T, F]). This trims both to the real length. Called in the (eager) code2wav stage where a host read of the length is allowed. Returns (speech_token, speech_feat) unchanged when length is unavailable.