vllm.model_executor.models.interfaces
MultiModalEmbeddings
module-attribute
¶
The output embeddings must be one of the following formats:
- A list or tuple of 2D tensors, where each tensor corresponds to each input multimodal data item (e.g, image).
- A single 3D tensor, with the batch dimension grouping the 2D tensors.
HasInnerState
¶
Bases: Protocol
The interface required for all models that has inner state.
Source code in vllm/model_executor/models/interfaces.py
HasNoOps
¶
IsAttentionFree
¶
Bases: Protocol
The interface required for all models like Mamba that lack attention, but do have state whose size is constant wrt the number of tokens.
Source code in vllm/model_executor/models/interfaces.py
IsHybrid
¶
Bases: Protocol
The interface required for all models like Jamba that have both attention and mamba blocks, indicates that hf_config has 'layers_block_type'
Source code in vllm/model_executor/models/interfaces.py
SupportsCrossEncoding
¶
Bases: Protocol
The interface required for all models that support cross encoding.
Source code in vllm/model_executor/models/interfaces.py
SupportsLoRA
¶
Bases: Protocol
The interface required for all models that support LoRA.
Source code in vllm/model_executor/models/interfaces.py
SupportsMultiModal
¶
Bases: Protocol
The interface required for all multi-modal models.
Source code in vllm/model_executor/models/interfaces.py
supports_multimodal
class-attribute
¶
supports_multimodal: Literal[True] = True
A flag that indicates this model supports multi-modal inputs.
Note
There is no need to redefine this flag if this class is in the MRO of your model class.
get_language_model
¶
get_language_model() -> Module
Returns the underlying language model used for text generation.
This is typically the torch.nn.Module
instance responsible for
processing the merged multimodal embeddings and producing hidden states
Returns:
Type | Description |
---|---|
Module
|
torch.nn.Module: The core language model component. |
Source code in vllm/model_executor/models/interfaces.py
get_multimodal_embeddings
¶
get_multimodal_embeddings(
**kwargs: object,
) -> MultiModalEmbeddings
Returns multimodal embeddings generated from multimodal kwargs to be merged with text embeddings.
Note
The returned multimodal embeddings must be in the same order as the appearances of their corresponding multimodal data item in the input prompt.
Source code in vllm/model_executor/models/interfaces.py
SupportsPP
¶
Bases: Protocol
The interface required for all models that support pipeline parallel.
Source code in vllm/model_executor/models/interfaces.py
supports_pp
class-attribute
¶
supports_pp: Literal[True] = True
A flag that indicates this model supports pipeline parallel.
Note
There is no need to redefine this flag if this class is in the MRO of your model class.
forward
¶
forward(
*, intermediate_tensors: Optional[IntermediateTensors]
) -> Union[Tensor, IntermediateTensors]
Accept IntermediateTensors
when
PP rank > 0.
Return IntermediateTensors
only
for the last PP rank.
Source code in vllm/model_executor/models/interfaces.py
make_empty_intermediate_tensors
¶
make_empty_intermediate_tensors(
batch_size: int, dtype: dtype, device: device
) -> IntermediateTensors
Called when PP rank > 0 for profiling purposes.
SupportsQuant
¶
The interface required for all models that support quantization.
Source code in vllm/model_executor/models/interfaces.py
_find_quant_config
staticmethod
¶
_find_quant_config(
*args, **kwargs
) -> Optional[QuantizationConfig]
Source code in vllm/model_executor/models/interfaces.py
SupportsTranscription
¶
Bases: Protocol
The interface required for all models that support transcription.
Source code in vllm/model_executor/models/interfaces.py
SupportsV0Only
¶
Bases: Protocol
Models with this interface are not compatible with V1 vLLM.
Source code in vllm/model_executor/models/interfaces.py
_HasInnerStateType
¶
_HasNoOpsType
¶
_IsAttentionFreeType
¶
_IsHybridType
¶
_SupportsMultiModalType
¶
_SupportsPPType
¶
Bases: Protocol
Source code in vllm/model_executor/models/interfaces.py
forward
¶
forward(
*, intermediate_tensors: Optional[IntermediateTensors]
) -> Union[Tensor, IntermediateTensors]
make_empty_intermediate_tensors
¶
make_empty_intermediate_tensors(
batch_size: int, dtype: dtype, device: device
) -> IntermediateTensors
_supports_cross_encoding
¶
_supports_cross_encoding(
model: Union[type[object], object],
) -> Union[
TypeIs[type[SupportsCrossEncoding]],
TypeIs[SupportsCrossEncoding],
]
Source code in vllm/model_executor/models/interfaces.py
_supports_lora
¶
_supports_pp_attributes
¶
_supports_pp_inspect
¶
has_inner_state
¶
has_inner_state(model: object) -> TypeIs[HasInnerState]
has_inner_state(
model: type[object],
) -> TypeIs[type[HasInnerState]]
has_inner_state(
model: Union[type[object], object],
) -> Union[
TypeIs[type[HasInnerState]], TypeIs[HasInnerState]
]
Source code in vllm/model_executor/models/interfaces.py
has_noops
¶
is_attention_free
¶
is_attention_free(model: object) -> TypeIs[IsAttentionFree]
is_attention_free(
model: type[object],
) -> TypeIs[type[IsAttentionFree]]
is_attention_free(
model: Union[type[object], object],
) -> Union[
TypeIs[type[IsAttentionFree]], TypeIs[IsAttentionFree]
]
Source code in vllm/model_executor/models/interfaces.py
is_hybrid
¶
supports_cross_encoding
¶
supports_cross_encoding(
model: type[object],
) -> TypeIs[type[SupportsCrossEncoding]]
supports_cross_encoding(
model: object,
) -> TypeIs[SupportsCrossEncoding]
supports_cross_encoding(
model: Union[type[object], object],
) -> Union[
TypeIs[type[SupportsCrossEncoding]],
TypeIs[SupportsCrossEncoding],
]
supports_lora
¶
supports_lora(
model: type[object],
) -> TypeIs[type[SupportsLoRA]]
supports_lora(model: object) -> TypeIs[SupportsLoRA]
supports_lora(
model: Union[type[object], object],
) -> Union[
TypeIs[type[SupportsLoRA]], TypeIs[SupportsLoRA]
]
Source code in vllm/model_executor/models/interfaces.py
supports_multimodal
¶
supports_multimodal(
model: type[object],
) -> TypeIs[type[SupportsMultiModal]]
supports_multimodal(
model: object,
) -> TypeIs[SupportsMultiModal]
supports_multimodal(
model: Union[type[object], object],
) -> Union[
TypeIs[type[SupportsMultiModal]],
TypeIs[SupportsMultiModal],
]
Source code in vllm/model_executor/models/interfaces.py
supports_pp
¶
supports_pp(
model: type[object],
) -> TypeIs[type[SupportsPP]]
supports_pp(model: object) -> TypeIs[SupportsPP]
supports_pp(
model: Union[type[object], object],
) -> Union[
bool, TypeIs[type[SupportsPP]], TypeIs[SupportsPP]
]
Source code in vllm/model_executor/models/interfaces.py
supports_transcription
¶
supports_transcription(
model: type[object],
) -> TypeIs[type[SupportsTranscription]]
supports_transcription(
model: object,
) -> TypeIs[SupportsTranscription]
supports_transcription(
model: Union[type[object], object],
) -> Union[
TypeIs[type[SupportsTranscription]],
TypeIs[SupportsTranscription],
]
Source code in vllm/model_executor/models/interfaces.py
supports_v0_only
¶
supports_v0_only(
model: type[object],
) -> TypeIs[type[SupportsV0Only]]
supports_v0_only(model: object) -> TypeIs[SupportsV0Only]
supports_v0_only(
model: Union[type[object], object],
) -> Union[
TypeIs[type[SupportsV0Only]], TypeIs[SupportsV0Only]
]