Input Definitions#
User-facing inputs#
Internal data structures#
- class vllm.multimodal.inputs.PlaceholderRange[source]#
Bases:
TypedDict
Placeholder location information for multi-modal data.
Example
Prompt:
AAAA BBBB What is in these images?
Images A and B will have:
A: { "offset": 0, "length": 4 } B: { "offset": 5, "length": 4 }
- vllm.multimodal.inputs.NestedTensors[source]#
Uses a list instead of a tensor if the dimensions of each element do not match.
alias of
Union
[list
[NestedTensors],list
[Tensor
],Tensor
,tuple
[Tensor
, …]]
- class vllm.multimodal.inputs.MultiModalFieldElem(modality: str, key: str, data: list[typing.Union[list[ForwardRef('NestedTensors')], list[torch.Tensor], torch.Tensor, tuple[torch.Tensor, ...]]] | list[torch.Tensor] | torch.Tensor | tuple[torch.Tensor, ...], field: BaseMultiModalField)[source]#
Represents a keyword argument corresponding to a multi-modal item in
MultiModalKwargs
.- data: list[typing.Union[list[ForwardRef('NestedTensors')], list[torch.Tensor], torch.Tensor, tuple[torch.Tensor, ...]]] | list[torch.Tensor] | torch.Tensor | tuple[torch.Tensor, ...][source]#
The tensor data of this field in
MultiModalKwargs
, i.e. the value of the keyword argument to be passed to the model.
- field: BaseMultiModalField[source]#
Defines how to combine the tensor data of this field with others in order to batch multi-modal items together for model inference.
- key: str[source]#
The key of this field in
MultiModalKwargs
, i.e. the name of the keyword argument to be passed to the model.
- class vllm.multimodal.inputs.MultiModalFieldConfig(field: BaseMultiModalField, modality: str)[source]#
- static batched(modality: str)[source]#
Defines a field where an element in the batch is obtained by indexing into the first dimension of the underlying data.
- Parameters:
modality – The modality of the multi-modal item that uses this keyword argument.
Example
Input: Data: [[AAAA] [BBBB] [CCCC]] Output: Element 1: [AAAA] Element 2: [BBBB] Element 3: [CCCC]
- static flat(modality: str, slices: Sequence[slice])[source]#
Defines a field where an element in the batch is obtained by slicing along the first dimension of the underlying data.
- Parameters:
modality – The modality of the multi-modal item that uses this keyword argument.
slices – For each multi-modal item, a slice that is used to extract the data corresponding to it.
Example
Given: slices: [slice(0, 3), slice(3, 7), slice(7, 9)] Input: Data: [AAABBBBCC] Output: Element 1: [AAA] Element 2: [BBBB] Element 3: [CC]
- static flat_from_sizes(modality: str, size_per_item: torch.Tensor)[source]#
Defines a field where an element in the batch is obtained by slicing along the first dimension of the underlying data.
- Parameters:
modality – The modality of the multi-modal item that uses this keyword argument.
slices – For each multi-modal item, the size of the slice that is used to extract the data corresponding to it.
Example
Given: size_per_item: [3, 4, 2] Input: Data: [AAABBBBCC] Output: Element 1: [AAA] Element 2: [BBBB] Element 3: [CC]
See also
Defines a field where an element in the batch is obtained by taking the entirety of the underlying data.
This means that the data is the same for each element in the batch.
- Parameters:
modality – The modality of the multi-modal item that uses this keyword argument.
batch_size – The number of multi-modal items which share this data.
Example
Given: batch_size: 4 Input: Data: [XYZ] Output: Element 1: [XYZ] Element 2: [XYZ] Element 3: [XYZ] Element 4: [XYZ]
- class vllm.multimodal.inputs.MultiModalKwargsItem(dict=None, /, **kwargs)[source]#
Bases:
UserDict
[str
,MultiModalFieldElem
]A collection of
MultiModalFieldElem
corresponding to a data item inMultiModalDataItems
.
- class vllm.multimodal.inputs.MultiModalKwargs(data: ]], *, items: Sequence[MultiModalKwargsItem] | None = None)[source]#
Bases:
UserDict
[str
,Union
[list
[NestedTensors],list
[Tensor
],Tensor
,tuple
[Tensor
, …]]]A dictionary that represents the keyword arguments to
forward()
.The metadata
items
enables us to obtain the keyword arguments corresponding to each data item inMultiModalDataItems
, viaget_item()
andget_items()
.- static batch(inputs_list: list[vllm.multimodal.inputs.MultiModalKwargs]) ]] [source]#
Batch multiple inputs together into a dictionary.
The resulting dictionary has the same keys as the inputs. If the corresponding value from each input is a tensor and they all share the same shape, the output value is a single batched tensor; otherwise, the output value is a list containing the original value from each input.
- static from_items(items: Sequence[MultiModalKwargsItem])[source]#
Construct a new
MultiModalKwargs
from multiple items.
- get_item(modality: str, item_index: int) MultiModalKwargsItem [source]#
Get the keyword arguments corresponding to an item identified by its modality and index.
- get_items(modality: str) Sequence[MultiModalKwargsItem] [source]#
Get the keyword arguments corresponding to each item belonging to a modality.
- class vllm.multimodal.inputs.MultiModalInputs[source]#
Bases:
TypedDict
Represents the outputs of
vllm.multimodal.processing.BaseMultiModalProcessor
, ready to be passed to vLLM internals.- mm_kwargs: MultiModalKwargs[source]#
Keyword arguments to be directly passed to the model after batching.
- mm_placeholders: Mapping[str, Sequence[PlaceholderRange]][source]#
For each modality, information about the placeholder tokens in
prompt_token_ids
.
- token_type_ids: NotRequired[list[int]][source]#
The token type IDs of the prompt.