vllm_omni.diffusion.models.mistral_encoder ¶
Modules:
| Name | Description |
|---|---|
mistral_encoder | TP-aware Mistral model for use as a text encoder in diffusion pipelines. |
MistralEncoderModel ¶
Bases: Module
TP-aware Mistral encoder for use as a text encoder in diffusion pipelines.
Accepts a HuggingFace Mistral3Config (or its text_config). Uses vLLM parallel layers for TP but simple SDPA for attention (no PagedAttention).
head_dim instance-attribute ¶
head_dim = (
getattr(text_config, "head_dim", None)
or hidden_size // num_heads
)
max_position_embeddings instance-attribute ¶
max_position_embeddings = getattr(
text_config, "max_position_embeddings", 131072
)
num_kv_heads instance-attribute ¶
num_kv_heads = getattr(
text_config, "num_key_value_heads", num_attention_heads
)
forward ¶
forward(
input_ids: Tensor,
attention_mask: Tensor | None = None,
output_hidden_states: bool = False,
use_cache: bool = False,
past_key_values: list[tuple[Tensor, Tensor]]
| None = None,
**kwargs,
) -> MistralEncoderOutput
generate ¶
generate(
input_ids: Tensor,
attention_mask: Tensor | None = None,
max_new_tokens: int = 512,
do_sample: bool = True,
temperature: float = 1.0,
eos_token_id: int | list[int] | None = None,
**kwargs,
) -> Tensor
Autoregressive text generation with KV caching.
Accepts the same keyword arguments as the HuggingFace GenerationMixin.generate interface used by the pipeline (pixel_values etc. are accepted and ignored).
Returns the full token sequence including the input prompt.