Input Definitions#
User-facing inputs#
Internal data structures#
- class vllm.multimodal.inputs.PlaceholderRange[source][source]#
Bases:
TypedDict
Placeholder location information for multi-modal data.
Example
Prompt:
AAAA BBBB What is in these images?
Images A and B will have:
A: { "offset": 0, "length": 4 } B: { "offset": 5, "length": 4 }
- vllm.multimodal.inputs.NestedTensors[source]#
Uses a list instead of a tensor if the dimensions of each element do not match.
alias of
Union
[list
[NestedTensors],list
[Tensor
],Tensor
,tuple
[Tensor
, …]]
- class vllm.multimodal.inputs.MultiModalFieldElem(field: BaseMultiModalField, data: list[typing.Union[list[ForwardRef('NestedTensors')], list[torch.Tensor], torch.Tensor, tuple[torch.Tensor, ...]]] | list[torch.Tensor] | torch.Tensor | tuple[torch.Tensor, ...])[source][source]#
Contains metadata and data of an item in
MultiModalKwargs
.
- class vllm.multimodal.inputs.MultiModalFieldConfig(field_cls: type[vllm.multimodal.inputs.BaseMultiModalField], modality: str, **field_config: Any)[source][source]#
- class vllm.multimodal.inputs.MultiModalKwargsItem(dict=None, /, **kwargs)[source][source]#
Bases:
UserDict
[str
,MultiModalFieldElem
]A collection of
MultiModalFieldElem
corresponding to a data item inMultiModalDataItems
.
- class vllm.multimodal.inputs.MultiModalKwargs(data: ]], *, items: Sequence[MultiModalKwargsItem] | None = None)[source][source]#
Bases:
UserDict
[str
,Union
[list
[NestedTensors],list
[Tensor
],Tensor
,tuple
[Tensor
, …]]]A dictionary that represents the keyword arguments to
forward()
.The metadata
items
enables us to obtain the keyword arguments corresponding to each data item inMultiModalDataItems
, viaget_item()
andget_items()
.- static batch(inputs_list: list[vllm.multimodal.inputs.MultiModalKwargs]) ]] [source][source]#
Batch multiple inputs together into a dictionary.
The resulting dictionary has the same keys as the inputs. If the corresponding value from each input is a tensor and they all share the same shape, the output value is a single batched tensor; otherwise, the output value is a list containing the original value from each input.
- static from_items(items: Sequence[MultiModalKwargsItem])[source][source]#
Construct a new
MultiModalKwargs
from multiple items.
- get_item(modality: str, item_index: int) MultiModalKwargsItem [source][source]#
Get the keyword arguments corresponding to an item identified by its modality and index.
- class vllm.multimodal.inputs.MultiModalInputsV2[source][source]#
Bases:
TypedDict
Represents the outputs of
vllm.multimodal.processing.BaseMultiModalProcessor
, ready to be passed to vLLM internals.- mm_hashes: NotRequired[MultiModalHashDict | None][source]#
The hashes of the multi-modal data.
- mm_kwargs: MultiModalKwargs[source]#
Keyword arguments to be directly passed to the model after batching.
- mm_placeholders: Mapping[str, Sequence[PlaceholderRange]][source]#
For each modality, information about the placeholder tokens in
prompt_token_ids
.
- token_type_ids: NotRequired[list[int]][source]#
The token type IDs of the prompt.