vllm_omni.model_executor.models.indextts2.utils.common ¶
de_tokenized_by_CJK_char ¶
Example
input = "你 好 世 界 是 HELLO WORLD 的 中 文" output = "你好世界是 hello world 的中文"
do_lower_case
input = "SEE YOU!" output = "see you!"
make_pad_mask ¶
make_pad_mask(lengths: Tensor, max_len: int = 0) -> Tensor
Make mask tensor containing indices of padded part.
See description of make_non_pad_mask.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lengths | Tensor | Batch of lengths (B,). | required |
Returns: torch.Tensor: Mask tensor containing indices of padded part.
Examples:
tokenize_by_CJK_char ¶
Tokenize a line of text with CJK char.
Note: All return characters will be upper case.
Example
input = "你好世界是 hello world 的中文" output = "你 好 世 界 是 HELLO WORLD 的 中 文"
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
line | str | The input text. | required |
Return
A new string tokenize by CJK char.