vllm_omni.model_executor.models.mimo_audio.quantization ¶
EuclideanCodebook ¶
Bases: Module
Codebook with Euclidean distance. Args: dim (int): Dimension. codebook_size (int): Codebook size. kmeans_init (bool): Whether to use k-means to initialize the codebooks. If set to true, run the k-means algorithm on the first training batch and use the learned centroids as initialization. kmeans_iters (int): Number of iterations used for k-means algorithm at initialization. decay (float): Decay for exponential moving average over the codebooks. epsilon (float): Epsilon value for numerical stability. threshold_ema_dead_code (int): Threshold for dead code expiration. Replace any codes that have an exponential moving average cluster size less than the specified threshold with randomly selected vector from the current batch.
ResidualVectorQuantization ¶
Bases: Module
Residual vector quantization implementation. Follows Algorithm 1. in https://arxiv.org/pdf/2107.03312.pdf
layers instance-attribute ¶
layers = ModuleList(
[
(
VectorQuantization(
codebook_size=codebook_size[i], **kwargs
)
)
for i in (range(num_quantizers))
]
)
ResidualVectorQuantizer ¶
Bases: Module
Residual Vector Quantizer. Args: dimension (int): Dimension of the codebooks. n_q (int): Number of residual vector quantizers used. bins (int): Codebook size. decay (float): Decay for exponential moving average over the codebooks. kmeans_init (bool): Whether to use kmeans to initialize the codebooks. kmeans_iters (int): Number of iterations used for kmeans initialization. threshold_ema_dead_code (int): Threshold for dead code expiration. Replace any codes that have an exponential moving average cluster size less than the specified threshold with randomly selected vector from the current batch.
vq instance-attribute ¶
vq = ResidualVectorQuantization(
dim=dimension,
codebook_size=bins,
num_quantizers=n_q,
decay=decay,
kmeans_init=kmeans_init,
kmeans_iters=kmeans_iters,
threshold_ema_dead_code=threshold_ema_dead_code,
)
decode ¶
decode(codes: Tensor, st: int = 0) -> Tensor
Decode the given codes to the quantized representation. Args: codes (torch.Tensor): Input indices for each quantizer. st (int): Start to decode input codes from which layers. Default: 0.
encode ¶
Encode a given input tensor with the specified sample rate at the given bandwidth. The RVQ encode method sets the appropriate number of quantizer to use and returns indices for each quantizer. Args: x (torch.Tensor): Input tensor. n_q (int): Number of quantizer used to quantize. Default: All quantizers. st (int): Start to encode input from which layers. Default: 0.
forward ¶
Residual vector quantization on the given input tensor. Args: x (torch.Tensor): Input tensor. n_q (int): Number of quantizer used to quantize. Default: All quantizers. layers (list): Layer that need to return quantized. Default: None. Returns: QuantizedResult: The quantized (or approximately quantized) representation with the associated number quantizers and layer quantized required to return.
VectorQuantization ¶
Bases: Module
Vector quantization implementation. Currently supports only euclidean distance. Args: dim (int): Dimension codebook_size (int): Codebook size codebook_dim (int): Codebook dimension. If not defined, uses the specified dimension in dim. decay (float): Decay for exponential moving average over the codebooks. epsilon (float): Epsilon value for numerical stability. kmeans_init (bool): Whether to use kmeans to initialize the codebooks. kmeans_iters (int): Number of iterations used for kmeans initialization. threshold_ema_dead_code (int): Threshold for dead code expiration. Replace any codes that have an exponential moving average cluster size less than the specified threshold with randomly selected vector from the current batch. commitment_weight (float): Weight for commitment loss.