Input Definitions#
User-facing inputs#
Internal data structures#
- class vllm.multimodal.inputs.PlaceholderRange[source][source]#
Bases:
TypedDictPlaceholder location information for multi-modal data.
Example
Prompt:
AAAA BBBB What is in these images?Images A and B will have:
A: { "offset": 0, "length": 4 } B: { "offset": 5, "length": 4 }
- vllm.multimodal.inputs.NestedTensors[source]#
Uses a list instead of a tensor if the dimensions of each element do not match.
alias of
Union[list[NestedTensors],list[Tensor],Tensor,tuple[Tensor, …]]
- class vllm.multimodal.inputs.MultiModalFieldElem(field: BaseMultiModalField, data: list[typing.Union[list[ForwardRef('NestedTensors')], list[torch.Tensor], torch.Tensor, tuple[torch.Tensor, ...]]] | list[torch.Tensor] | torch.Tensor | tuple[torch.Tensor, ...])[source][source]#
Contains metadata and data of an item in
MultiModalKwargs.
- class vllm.multimodal.inputs.MultiModalFieldConfig(field_cls: type[vllm.multimodal.inputs.BaseMultiModalField], modality: str, **field_config: Any)[source][source]#
- class vllm.multimodal.inputs.MultiModalKwargsItem(dict=None, /, **kwargs)[source][source]#
Bases:
UserDict[str,MultiModalFieldElem]A collection of
MultiModalFieldElemcorresponding to a data item inMultiModalDataItems.
- class vllm.multimodal.inputs.MultiModalKwargs(data: ]], *, items: Sequence[MultiModalKwargsItem] | None = None)[source][source]#
Bases:
UserDict[str,Union[list[NestedTensors],list[Tensor],Tensor,tuple[Tensor, …]]]A dictionary that represents the keyword arguments to
forward().The metadata
itemsenables us to obtain the keyword arguments corresponding to each data item inMultiModalDataItems, viaget_item()andget_items().- static batch(inputs_list: list[vllm.multimodal.inputs.MultiModalKwargs]) ]][source][source]#
Batch multiple inputs together into a dictionary.
The resulting dictionary has the same keys as the inputs. If the corresponding value from each input is a tensor and they all share the same shape, the output value is a single batched tensor; otherwise, the output value is a list containing the original value from each input.
- static from_items(items: Sequence[MultiModalKwargsItem])[source][source]#
Construct a new
MultiModalKwargsfrom multiple items.
- class vllm.multimodal.inputs.MultiModalInputs[source][source]#
Bases:
TypedDictRepresents the outputs of
vllm.multimodal.processing.BaseMultiModalProcessor, ready to be passed to vLLM internals.- mm_hashes: NotRequired[MultiModalHashDict | None][source]#
The hashes of the multi-modal data.
- mm_kwargs: MultiModalKwargs[source]#
Keyword arguments to be directly passed to the model after batching.
- mm_placeholders: Mapping[str, Sequence[PlaceholderRange]][source]#
For each modality, information about the placeholder tokens in
prompt_token_ids.
- token_type_ids: NotRequired[list[int]][source]#
The token type IDs of the prompt.