vllm_omni.model_executor.models.cosyvoice3.cosyvoice3_talker ¶
CosyVoice3LM ¶
Qwen2LM ¶
Bases: TransformerLM
TransformerLM ¶
VLLMQwen2Encoder ¶
Bases: Module
Qwen2 encoder using vLLM's Qwen2Model with external KV cache management.
This replaces the HuggingFace Qwen2ForCausalLM with vLLM's optimized implementation that uses PagedAttention and external KV cache via ForwardContext.
forward ¶
Forward pass using vLLM's attention with external KV cache.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inputs_embeds | Tensor | Input embeddings [total_tokens, hidden_size] or [batch, seq, hidden] | required |
positions | Tensor | Position tensor for RoPE [total_tokens] | required |
Returns:
| Name | Type | Description |
|---|---|---|
hidden_states | Tensor | Output hidden states [total_tokens, hidden_size] |