Optional Interfaces#
Module Contents#
- vllm.model_executor.models.interfaces.MultiModalEmbeddings[source]#
The output embeddings must be one of the following formats:
- A list or tuple of 2D tensors, where each tensor corresponds to
each input multimodal data item (e.g, image).
A single 3D tensor, with the batch dimension grouping the 2D tensors.
- class vllm.model_executor.models.interfaces.SupportsMultiModal(*args, **kwargs)[source]#
The interface required for all multi-modal models.
- supports_multimodal: ClassVar[Literal[True]] = True[source]#
A flag that indicates this model supports multi-modal inputs.
Note
There is no need to redefine this flag if this class is in the MRO of your model class.
- get_multimodal_embeddings(**kwargs: object) list[torch.Tensor] | torch.Tensor | tuple[torch.Tensor, ...] | None [source]#
Returns multimodal embeddings generated from multimodal kwargs to be merged with text embeddings.
Note
The returned multimodal embeddings must be in the same order as the appearances of their corresponding multimodal data item in the input prompt.
- get_language_model() torch.nn.Module [source]#
Returns the underlying language model used for text generation.
This is typically the torch.nn.Module instance responsible for processing the merged multimodal embeddings and producing hidden states
- Returns:
The core language model component.
- Return type:
- get_input_embeddings(input_ids: Tensor, multimodal_embeddings: MultiModalEmbeddings | None = None, attn_metadata: 'AttentionMetadata' | None = None) Tensor [source]#
- get_input_embeddings(input_ids: Tensor, multimodal_embeddings: MultiModalEmbeddings | None = None) Tensor
Helper for @overload to raise when called.
- class vllm.model_executor.models.interfaces.SupportsLoRA(*args, **kwargs)[source]#
The interface required for all models that support LoRA.
- class vllm.model_executor.models.interfaces.SupportsPP(*args, **kwargs)[source]#
The interface required for all models that support pipeline parallel.
- supports_pp: ClassVar[Literal[True]] = True[source]#
A flag that indicates this model supports pipeline parallel.
Note
There is no need to redefine this flag if this class is in the MRO of your model class.
- make_empty_intermediate_tensors(batch_size: int, dtype: torch.dtype, device: torch.device) IntermediateTensors [source]#
Called when PP rank > 0 for profiling purposes.
- forward(*, intermediate_tensors: IntermediateTensors | None) torch.Tensor | IntermediateTensors [source]#
Accept
IntermediateTensors
when PP rank > 0.Return
IntermediateTensors
only for the last PP rank.
- class vllm.model_executor.models.interfaces.HasInnerState(*args, **kwargs)[source]#
The interface required for all models that has inner state.
- class vllm.model_executor.models.interfaces.IsAttentionFree(*args, **kwargs)[source]#
The interface required for all models like Mamba that lack attention, but do have state whose size is constant wrt the number of tokens.
- class vllm.model_executor.models.interfaces.IsHybrid(*args, **kwargs)[source]#
The interface required for all models like Jamba that have both attention and mamba blocks, indicates that hf_config has ‘layers_block_type’
- class vllm.model_executor.models.interfaces.SupportsCrossEncoding(*args, **kwargs)[source]#
The interface required for all models that support cross encoding.
- class vllm.model_executor.models.interfaces.SupportsQuant(*args, **kwargs)[source]#
The interface required for all models that support quantization.