Optional Interfaces#
Module Contents#
- class vllm.model_executor.models.interfaces.SupportsMultiModal(*args, **kwargs)[source][source]#
The interface required for all multi-modal models.
- supports_multimodal: ClassVar[Literal[True]] = True[source]#
A flag that indicates this model supports multi-modal inputs.
Note
There is no need to redefine this flag if this class is in the MRO of your model class.
- get_multimodal_embeddings(**kwargs) T | None [source][source]#
Returns multimodal embeddings generated from multimodal kwargs to be merged with text embeddings.
The output embeddings must be one of the following formats:
A list or tuple of 2D tensors, where each tensor corresponds to each input multimodal data item (e.g, image).
A single 3D tensor, with the batch dimension grouping the 2D tensors.
Note
The returned multimodal embeddings must be in the same order as the appearances of their corresponding multimodal data item in the input prompt.
- get_input_embeddings(input_ids: torch.Tensor, multimodal_embeddings: T | None = None, attn_metadata: 'AttentionMetadata' | None = None) torch.Tensor [source]#
- get_input_embeddings(input_ids: torch.Tensor, multimodal_embeddings: T | None = None) torch.Tensor
Helper for @overload to raise when called.
- class vllm.model_executor.models.interfaces.SupportsLoRA(*args, **kwargs)[source][source]#
The interface required for all models that support LoRA.
- class vllm.model_executor.models.interfaces.SupportsPP(*args, **kwargs)[source][source]#
The interface required for all models that support pipeline parallel.
- supports_pp: ClassVar[Literal[True]] = True[source]#
A flag that indicates this model supports pipeline parallel.
Note
There is no need to redefine this flag if this class is in the MRO of your model class.
- make_empty_intermediate_tensors(batch_size: int, dtype: torch.dtype, device: torch.device) IntermediateTensors [source][source]#
Called when PP rank > 0 for profiling purposes.
- forward(*, intermediate_tensors: IntermediateTensors | None) torch.Tensor | IntermediateTensors [source][source]#
Accept
IntermediateTensors
when PP rank > 0.Return
IntermediateTensors
only for the last PP rank.
- class vllm.model_executor.models.interfaces.HasInnerState(*args, **kwargs)[source][source]#
The interface required for all models that has inner state.
- class vllm.model_executor.models.interfaces.IsAttentionFree(*args, **kwargs)[source][source]#
The interface required for all models like Mamba that lack attention, but do have state whose size is constant wrt the number of tokens.