Skip to content

vllm_omni.model_executor.models.indextts2.utils.common

de_tokenized_by_CJK_char

de_tokenized_by_CJK_char(
    line: str, do_lower_case=False
) -> str
Example

input = "你 好 世 界 是 HELLO WORLD 的 中 文" output = "你好世界是 hello world 的中文"

do_lower_case

input = "SEE YOU!" output = "see you!"

make_pad_mask

make_pad_mask(lengths: Tensor, max_len: int = 0) -> Tensor

Make mask tensor containing indices of padded part.

See description of make_non_pad_mask.

Parameters:

Name Type Description Default
lengths Tensor

Batch of lengths (B,).

required

Returns: torch.Tensor: Mask tensor containing indices of padded part.

Examples:

>>> lengths = [5, 3, 2]
>>> make_pad_mask(lengths)
masks = [[0, 0, 0, 0 ,0],
         [0, 0, 0, 1, 1],
         [0, 0, 1, 1, 1]]

tokenize_by_CJK_char

tokenize_by_CJK_char(line: str, do_upper_case=True) -> str

Tokenize a line of text with CJK char.

Note: All return characters will be upper case.

Example

input = "你好世界是 hello world 的中文" output = "你 好 世 界 是 HELLO WORLD 的 中 文"

Parameters:

Name Type Description Default
line str

The input text.

required
Return

A new string tokenize by CJK char.