`vllm.inputs` ¶

Modules:

engine –

Schema and utilities for inputs to the engine client (LLMEngine/AsyncLLM).
llm –

Schema and utilities for input prompts to the LLM API.
preprocess –

Classes:

DataPrompt –

Represents generic inputs that are converted to
EmbedsInput –

Represents embeddings-based input to the engine.
EmbedsPrompt –

Schema for a prompt provided via token embeddings.
EncoderDecoderInput –

A rendered EncoderDecoderPrompt
ExplicitEncoderDecoderPrompt –

Schema for a pair of encoder and decoder singleton prompts.
MultiModalDataBuiltins –

Type annotations for modality types predefined by vLLM.
MultiModalEncDecInput –

Represents multi-modal input to the engine for encoder-decoder models.
MultiModalInput –

Represents multi-modal input to the engine.
TextPrompt –

Schema for a text prompt.
TokensInput –

Represents token-based input to the engine.
TokensPrompt –

Schema for a tokenized prompt.

Functions:

embeds_input –

Construct EmbedsInput
tokens_input –

Construct TokensInput

Attributes:

DecoderOnlyEngineInput (TypeAlias) –

A rendered DecoderOnlyPrompt
EngineInput (TypeAlias) –

A rendered PromptType
ModalityData (TypeAlias) –

Either a single data item, or a list of data items. Can only be None if UUID
MultiModalDataDict (TypeAlias) –

A dictionary containing an entry for each modality type to input.
MultiModalHashes (TypeAlias) –

A dictionary containing per-item hashes for each modality.
MultiModalPlaceholders (TypeAlias) –

A dictionary containing per-item placeholder ranges for each modality.
MultiModalUUIDDict (TypeAlias) –

A dictionary containing user-provided UUIDs for items in each modality.
PromptType (TypeAlias) –

Schema for any prompt, regardless of model type.
SingletonInput (TypeAlias) –

A rendered SingletonPrompt
SingletonPrompt (TypeAlias) –

Schema for a single prompt. This is as opposed to a data structure

`DecoderOnlyEngineInput = TokensInput | EmbedsInput | MultiModalInput` `module-attribute` ¶

A rendered DecoderOnlyPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.

`EngineInput = DecoderOnlyEngineInput | EncoderDecoderInput` `module-attribute` ¶

A rendered PromptType which can be passed to LLMEngine.add_request or AsyncLLM.add_request.

`ModalityData = _T | list[_T | None] | None` `module-attribute` ¶

Either a single data item, or a list of data items. Can only be None if UUID is provided.

The number of data items allowed per modality is restricted by --limit-mm-per-prompt.

`MultiModalDataDict = Mapping[str, ModalityData[Any]]` `module-attribute` ¶

A dictionary containing an entry for each modality type to input.

The built-in modalities are defined by MultiModalDataBuiltins.

`MultiModalHashes = Mapping[str, list[str]]` `module-attribute` ¶

A dictionary containing per-item hashes for each modality.

`MultiModalPlaceholders = Mapping[str, Sequence['PlaceholderRange']]` `module-attribute` ¶

A dictionary containing per-item placeholder ranges for each modality.

`MultiModalUUIDDict = Mapping[str, Sequence[str | None] | str]` `module-attribute` ¶

A dictionary containing user-provided UUIDs for items in each modality. If a UUID for an item is not provided, its entry will be None and MultiModalHasher will compute a hash for the item.

The UUID will be used to identify the item for all caching purposes (input processing caching, embedding caching, prefix caching, etc).

`PromptType = DecoderOnlyPrompt | EncoderDecoderPrompt` `module-attribute` ¶

Schema for any prompt, regardless of model type.

This is the input format accepted by most LLM APIs.

`SingletonInput = DecoderOnlyEngineInput | MultiModalEncDecInput` `module-attribute` ¶

A rendered SingletonPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.

`SingletonPrompt = DecoderOnlyPrompt | EncoderPrompt | DecoderPrompt` `module-attribute` ¶

Schema for a single prompt. This is as opposed to a data structure which encapsulates multiple prompts, such as ExplicitEncoderDecoderPrompt.

`DataPrompt` ¶

Bases: _PromptOptions

Represents generic inputs that are converted to PromptType by IO processor plugins.

Attributes:

data (Any) –

The input data.
data_format (str) –

The input data format.

Source code in vllm/inputs/llm.py

class DataPrompt(_PromptOptions):
    """
    Represents generic inputs that are converted to
    [`PromptType`][vllm.inputs.llm.PromptType] by IO processor plugins.
    """

    data: Any
    """The input data."""

    data_format: str
    """The input data format."""

`data` `instance-attribute` ¶

The input data.

`data_format` `instance-attribute` ¶

The input data format.

`EmbedsInput` ¶

Bases: _InputOptions

Represents embeddings-based input to the engine.

Attributes:

is_token_ids (NotRequired[list[bool]]) –

Per-position mask for mixed-mode inputs. True means the position
prompt (NotRequired[str]) –

The prompt text corresponding to the token IDs, if available.
prompt_embeds (Tensor) –

The embeddings of the prompt.
prompt_token_ids (NotRequired[list[int]]) –

Token IDs of the rendered prompt. Only set for mixed-mode inputs
type (Literal['embeds']) –

The type of input.

Source code in vllm/inputs/engine.py

class EmbedsInput(_InputOptions):
    """Represents embeddings-based input to the engine."""

    type: Literal["embeds"]
    """The type of input."""

    prompt_embeds: "torch.Tensor"
    """The embeddings of the prompt."""

    prompt: NotRequired[str]
    """The prompt text corresponding to the token IDs, if available."""

    prompt_token_ids: NotRequired[list[int]]
    """Token IDs of the rendered prompt. Only set for mixed-mode inputs
    (chat completion with `prompt_embeds` content parts). When present,
    `is_token_ids` MUST also be present and have the same length. 
    For pure-embeds inputs this field is absent."""

    is_token_ids: NotRequired[list[bool]]
    """Per-position mask for mixed-mode inputs. `True` means the position
    is a real token ID (use the model's embedding layer); `False` means
    the position uses a pre-computed embedding row from `prompt_embeds`.
    Length MUST equal `len(prompt_token_ids)`.
    For pure-embeds inputs this field is absent."""

`is_token_ids` `instance-attribute` ¶

Per-position mask for mixed-mode inputs. True means the position is a real token ID (use the model's embedding layer); False means the position uses a pre-computed embedding row from prompt_embeds. Length MUST equal len(prompt_token_ids). For pure-embeds inputs this field is absent.

`prompt` `instance-attribute` ¶

The prompt text corresponding to the token IDs, if available.

`prompt_embeds` `instance-attribute` ¶

The embeddings of the prompt.

`prompt_token_ids` `instance-attribute` ¶

Token IDs of the rendered prompt. Only set for mixed-mode inputs (chat completion with prompt_embeds content parts). When present, is_token_ids MUST also be present and have the same length. For pure-embeds inputs this field is absent.

`type` `instance-attribute` ¶

The type of input.

`EmbedsPrompt` ¶

Bases: _PromptOptions

Schema for a prompt provided via token embeddings.

Attributes:

prompt (NotRequired[str]) –

The prompt text corresponding to the token embeddings, if available.
prompt_embeds (Tensor) –

The embeddings of the prompt.
prompt_is_token_ids (NotRequired[list[bool]]) –

Per-position mask, True uses the real token ID, False uses
prompt_token_ids (NotRequired[list[int]]) –

Token IDs for mixed-mode inputs (chat completion with

Source code in vllm/inputs/llm.py

class EmbedsPrompt(_PromptOptions):
    """Schema for a prompt provided via token embeddings."""

    prompt_embeds: "torch.Tensor"
    """The embeddings of the prompt."""

    prompt: NotRequired[str]
    """The prompt text corresponding to the token embeddings, if available."""

    prompt_token_ids: NotRequired[list[int]]
    """Token IDs for mixed-mode inputs (chat completion with
    `prompt_embeds` content parts). The tokens at positions where 
    `prompt_is_token_ids` is `False` are placeholder tokens that 
    get replaced by entries from `prompt_embeds` in the forward pass."""

    prompt_is_token_ids: NotRequired[list[bool]]
    """Per-position mask, `True` uses the real token ID, `False` uses
    the corresponding entry from `prompt_embeds`. 
    Must be the same length as `prompt_token_ids` when both are set."""

`prompt` `instance-attribute` ¶

The prompt text corresponding to the token embeddings, if available.

`prompt_embeds` `instance-attribute` ¶

The embeddings of the prompt.

`prompt_is_token_ids` `instance-attribute` ¶

Per-position mask, True uses the real token ID, False uses the corresponding entry from prompt_embeds. Must be the same length as prompt_token_ids when both are set.

`prompt_token_ids` `instance-attribute` ¶

Token IDs for mixed-mode inputs (chat completion with prompt_embeds content parts). The tokens at positions where prompt_is_token_ids is False are placeholder tokens that get replaced by entries from prompt_embeds in the forward pass.

`EncoderDecoderInput` ¶

Bases: TypedDict

A rendered EncoderDecoderPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.

Attributes:

arrival_time (NotRequired[float]) –

The time when the input was received (before rendering).
decoder_prompt (DecoderEngineInput) –

The inputs for the decoder portion.
encoder_prompt (EncoderInput) –

The inputs for the encoder portion.

Source code in vllm/inputs/engine.py

class EncoderDecoderInput(TypedDict):
    """
    A rendered [`EncoderDecoderPrompt`][vllm.inputs.llm.EncoderDecoderPrompt]
    which can be passed to `LLMEngine.add_request` or `AsyncLLM.add_request`.
    """

    type: Literal["enc_dec"]

    encoder_prompt: EncoderInput
    """The inputs for the encoder portion."""

    decoder_prompt: DecoderEngineInput
    """The inputs for the decoder portion."""

    arrival_time: NotRequired[float]
    """The time when the input was received (before rendering)."""

`arrival_time` `instance-attribute` ¶

The time when the input was received (before rendering).

`decoder_prompt` `instance-attribute` ¶

The inputs for the decoder portion.

`encoder_prompt` `instance-attribute` ¶

The inputs for the encoder portion.

`ExplicitEncoderDecoderPrompt` ¶

Bases: TypedDict

Schema for a pair of encoder and decoder singleton prompts.

Note

This schema is not valid for decoder-only models.

Attributes:

decoder_prompt (DecoderPrompt | None) –

The prompt for the decoder part of the model.
encoder_prompt (EncoderPrompt) –

The prompt for the encoder part of the model.

Source code in vllm/inputs/llm.py

class ExplicitEncoderDecoderPrompt(TypedDict):
    """
    Schema for a pair of encoder and decoder singleton prompts.

    Note:
        This schema is not valid for decoder-only models.
    """

    encoder_prompt: EncoderPrompt
    """The prompt for the encoder part of the model."""

    decoder_prompt: DecoderPrompt | None
    """
    The prompt for the decoder part of the model.

    Passing `None` will cause the prompt to be inferred automatically.
    """

`decoder_prompt` `instance-attribute` ¶

The prompt for the decoder part of the model.

Passing None will cause the prompt to be inferred automatically.

`encoder_prompt` `instance-attribute` ¶

The prompt for the encoder part of the model.

`MultiModalDataBuiltins` ¶

Bases: TypedDict

Type annotations for modality types predefined by vLLM.

Attributes:

audio (ModalityData[AudioItem]) –

The input audio(s).
image (ModalityData[ImageItem]) –

The input image(s).
video (ModalityData[VideoItem]) –

The input video(s).
vision_chunk (ModalityData[VisionChunk]) –

The input visual atom(s) - unified modality for images and video chunks.

Source code in vllm/inputs/llm.py

@final
class MultiModalDataBuiltins(TypedDict, total=False):
    """Type annotations for modality types predefined by vLLM."""

    image: ModalityData["ImageItem"]
    """The input image(s)."""

    video: ModalityData["VideoItem"]
    """The input video(s)."""

    audio: ModalityData["AudioItem"]
    """The input audio(s)."""

    vision_chunk: ModalityData["VisionChunk"]
    """The input visual atom(s) - unified modality for images and video chunks."""

`audio` `instance-attribute` ¶

The input audio(s).

`image` `instance-attribute` ¶

The input image(s).

`video` `instance-attribute` ¶

The input video(s).

`vision_chunk` `instance-attribute` ¶

The input visual atom(s) - unified modality for images and video chunks.

`MultiModalEncDecInput` ¶

Bases: MultiModalInput

Represents multi-modal input to the engine for encoder-decoder models.

Note

Even text-only encoder-decoder models are currently implemented as multi-modal models for convenience. (Example: https://github.com/vllm-project/bart-plugin)

Attributes:

encoder_prompt (NotRequired[str]) –

The prompt text corresponding to the encoder token IDs, if available.
encoder_prompt_token_ids (list[int]) –

The processed token IDs of the encoder prompt.

Source code in vllm/inputs/engine.py

class MultiModalEncDecInput(MultiModalInput):
    """
    Represents multi-modal input to the engine for encoder-decoder models.

    Note:
        Even text-only encoder-decoder models are currently implemented
        as multi-modal models for convenience.
        (Example: https://github.com/vllm-project/bart-plugin)
    """

    encoder_prompt_token_ids: list[int]
    """The processed token IDs of the encoder prompt."""

    encoder_prompt: NotRequired[str]
    """The prompt text corresponding to the encoder token IDs, if available."""

`encoder_prompt` `instance-attribute` ¶

The prompt text corresponding to the encoder token IDs, if available.

`encoder_prompt_token_ids` `instance-attribute` ¶

The processed token IDs of the encoder prompt.

`MultiModalInput` ¶

Bases: _InputOptions

Represents multi-modal input to the engine.

Attributes:

assistant_tokens_mask (NotRequired[list[int] | None]) –

Per-token 0/1 mask marking assistant-generated tokens.
mm_hashes (MultiModalHashes) –

The hashes of the multi-modal data.
mm_kwargs (MultiModalKwargsOptionalItems) –

Keyword arguments to be directly passed to the model after batching.
mm_placeholders (MultiModalPlaceholders) –

For each modality, information about the placeholder tokens in
prompt (NotRequired[str]) –

The prompt text corresponding to the token IDs, if available.
prompt_token_ids (list[int]) –

The processed token IDs which includes placeholder tokens.
type (Literal['multimodal']) –

The type of input.

Source code in vllm/inputs/engine.py

class MultiModalInput(_InputOptions):
    """Represents multi-modal input to the engine."""

    type: Literal["multimodal"]
    """The type of input."""

    prompt_token_ids: list[int]
    """The processed token IDs which includes placeholder tokens."""

    prompt: NotRequired[str]
    """The prompt text corresponding to the token IDs, if available."""

    mm_kwargs: "MultiModalKwargsOptionalItems"
    """Keyword arguments to be directly passed to the model after batching."""

    mm_hashes: MultiModalHashes
    """The hashes of the multi-modal data."""

    mm_placeholders: MultiModalPlaceholders
    """
    For each modality, information about the placeholder tokens in
    `prompt_token_ids`.
    """

    assistant_tokens_mask: NotRequired[list[int] | None]
    """Per-token 0/1 mask marking assistant-generated tokens.
    Populated when ``return_assistant_tokens_mask=True`` is set on the
    render request and the chat template supports ``{% generation %}``."""

`assistant_tokens_mask` `instance-attribute` ¶

Per-token 0/1 mask marking assistant-generated tokens. Populated when return_assistant_tokens_mask=True is set on the render request and the chat template supports {% generation %}.

`mm_hashes` `instance-attribute` ¶

The hashes of the multi-modal data.

`mm_kwargs` `instance-attribute` ¶

Keyword arguments to be directly passed to the model after batching.

`mm_placeholders` `instance-attribute` ¶

For each modality, information about the placeholder tokens in prompt_token_ids.

`prompt` `instance-attribute` ¶

The prompt text corresponding to the token IDs, if available.

`prompt_token_ids` `instance-attribute` ¶

The processed token IDs which includes placeholder tokens.

`type` `instance-attribute` ¶

The type of input.

`TextPrompt` ¶

Bases: _PromptOptions

Schema for a text prompt.

Attributes:

prompt (str) –

The input text to be tokenized before passing to the model.

Source code in vllm/inputs/llm.py

class TextPrompt(_PromptOptions):
    """Schema for a text prompt."""

    prompt: str
    """The input text to be tokenized before passing to the model."""

`prompt` `instance-attribute` ¶

The input text to be tokenized before passing to the model.

`TokensInput` ¶

Bases: _InputOptions

Represents token-based input to the engine.

Attributes:

assistant_tokens_mask (NotRequired[list[int] | None]) –

Per-token 0/1 mask marking assistant-generated tokens.
prompt (NotRequired[str]) –

The prompt text corresponding to the token IDs, if available.
prompt_token_ids (list[int]) –

The token IDs of the prompt.
prompt_token_offsets (NotRequired[list[tuple[int, int]] | None]) –

Char-level (start, end) offsets per token, propagated from the
type (Literal['token']) –

The type of input.

Source code in vllm/inputs/engine.py

class TokensInput(_InputOptions):
    """Represents token-based input to the engine."""

    type: Literal["token"]
    """The type of input."""

    prompt_token_ids: list[int]
    """The token IDs of the prompt."""

    prompt: NotRequired[str]
    """The prompt text corresponding to the token IDs, if available."""

    prompt_token_offsets: NotRequired[list[tuple[int, int]] | None]
    """Char-level (start, end) offsets per token, propagated from the
    renderer's TokensPrompt when offsets were computed."""

    assistant_tokens_mask: NotRequired[list[int] | None]
    """Per-token 0/1 mask marking assistant-generated tokens.
    Populated when ``return_assistant_tokens_mask=True`` is set on the
    render request and the chat template supports ``{% generation %}``."""

`assistant_tokens_mask` `instance-attribute` ¶

Per-token 0/1 mask marking assistant-generated tokens. Populated when return_assistant_tokens_mask=True is set on the render request and the chat template supports {% generation %}.

`prompt` `instance-attribute` ¶

The prompt text corresponding to the token IDs, if available.

`prompt_token_ids` `instance-attribute` ¶

The token IDs of the prompt.

`prompt_token_offsets` `instance-attribute` ¶

Char-level (start, end) offsets per token, propagated from the renderer's TokensPrompt when offsets were computed.

`type` `instance-attribute` ¶

The type of input.

`TokensPrompt` ¶

Bases: _PromptOptions

Schema for a tokenized prompt.

Attributes:

prompt (NotRequired[str]) –

The prompt text corresponding to the token IDs, if available.
prompt_token_ids (list[int]) –

A list of token IDs to pass to the model.
prompt_token_offsets (NotRequired[list[tuple[int, int]] | None]) –

Char-level (start, end) offsets per token, relative to the
token_type_ids (NotRequired[list[int]]) –

A list of token type IDs to pass to the cross encoder model.

Source code in vllm/inputs/llm.py

class TokensPrompt(_PromptOptions):
    """Schema for a tokenized prompt."""

    prompt_token_ids: list[int]
    """A list of token IDs to pass to the model."""

    prompt: NotRequired[str]
    """The prompt text corresponding to the token IDs, if available."""

    token_type_ids: NotRequired[list[int]]
    """A list of token type IDs to pass to the cross encoder model."""

    prompt_token_offsets: NotRequired[list[tuple[int, int]] | None]
    """Char-level (start, end) offsets per token, relative to the
    tokenized source string. Present only when offsets were requested
    AND a Fast (Rust-backed) tokenizer was used AND no multimodal data
    was present. The list length equals the length of `prompt_token_ids`."""

`prompt` `instance-attribute` ¶

The prompt text corresponding to the token IDs, if available.

`prompt_token_ids` `instance-attribute` ¶

A list of token IDs to pass to the model.

`prompt_token_offsets` `instance-attribute` ¶

Char-level (start, end) offsets per token, relative to the tokenized source string. Present only when offsets were requested AND a Fast (Rust-backed) tokenizer was used AND no multimodal data was present. The list length equals the length of prompt_token_ids.

`token_type_ids` `instance-attribute` ¶

A list of token type IDs to pass to the cross encoder model.

`embeds_input(prompt_embeds, *, prompt=None, cache_salt=None, prompt_token_ids=None, is_token_ids=None)` ¶

Construct EmbedsInput from optional values.

Source code in vllm/inputs/engine.py

def embeds_input(
    prompt_embeds: "torch.Tensor",
    *,
    prompt: str | None = None,
    cache_salt: str | None = None,
    prompt_token_ids: list[int] | None = None,
    is_token_ids: list[bool] | None = None,
) -> EmbedsInput:
    """
    Construct [`EmbedsInput`][vllm.inputs.engine.EmbedsInput]
    from optional values.
    """
    inputs = EmbedsInput(type="embeds", prompt_embeds=prompt_embeds)

    if prompt is not None:
        inputs["prompt"] = prompt
    if cache_salt is not None:
        inputs["cache_salt"] = cache_salt
    if prompt_token_ids is not None:
        inputs["prompt_token_ids"] = prompt_token_ids
    if is_token_ids is not None:
        inputs["is_token_ids"] = is_token_ids

    return inputs

`tokens_input(prompt_token_ids, *, prompt=None, cache_salt=None)` ¶

Construct TokensInput from optional values.

Source code in vllm/inputs/engine.py

def tokens_input(
    prompt_token_ids: list[int],
    *,
    prompt: str | None = None,
    cache_salt: str | None = None,
) -> TokensInput:
    """
    Construct [`TokensInput`][vllm.inputs.engine.TokensInput]
    from optional values.
    """
    inputs = TokensInput(type="token", prompt_token_ids=prompt_token_ids)

    if prompt is not None:
        inputs["prompt"] = prompt
    if cache_salt is not None:
        inputs["cache_salt"] = cache_salt

    return inputs

vllm.inputs ¶

DecoderOnlyEngineInput = TokensInput | EmbedsInput | MultiModalInput module-attribute ¶

EngineInput = DecoderOnlyEngineInput | EncoderDecoderInput module-attribute ¶

ModalityData = _T | list[_T | None] | None module-attribute ¶

MultiModalDataDict = Mapping[str, ModalityData[Any]] module-attribute ¶

MultiModalHashes = Mapping[str, list[str]] module-attribute ¶

MultiModalPlaceholders = Mapping[str, Sequence['PlaceholderRange']] module-attribute ¶

MultiModalUUIDDict = Mapping[str, Sequence[str | None] | str] module-attribute ¶

PromptType = DecoderOnlyPrompt | EncoderDecoderPrompt module-attribute ¶

SingletonInput = DecoderOnlyEngineInput | MultiModalEncDecInput module-attribute ¶

SingletonPrompt = DecoderOnlyPrompt | EncoderPrompt | DecoderPrompt module-attribute ¶

DataPrompt ¶

data instance-attribute ¶

data_format instance-attribute ¶

EmbedsInput ¶

is_token_ids instance-attribute ¶

prompt instance-attribute ¶

prompt_embeds instance-attribute ¶

prompt_token_ids instance-attribute ¶

type instance-attribute ¶

EmbedsPrompt ¶

prompt instance-attribute ¶

prompt_embeds instance-attribute ¶

prompt_is_token_ids instance-attribute ¶

prompt_token_ids instance-attribute ¶

EncoderDecoderInput ¶

arrival_time instance-attribute ¶

decoder_prompt instance-attribute ¶

encoder_prompt instance-attribute ¶

ExplicitEncoderDecoderPrompt ¶

decoder_prompt instance-attribute ¶

encoder_prompt instance-attribute ¶

MultiModalDataBuiltins ¶

audio instance-attribute ¶

image instance-attribute ¶

video instance-attribute ¶

vision_chunk instance-attribute ¶

MultiModalEncDecInput ¶

encoder_prompt instance-attribute ¶

encoder_prompt_token_ids instance-attribute ¶

MultiModalInput ¶

assistant_tokens_mask instance-attribute ¶

mm_hashes instance-attribute ¶

mm_kwargs instance-attribute ¶

mm_placeholders instance-attribute ¶

prompt instance-attribute ¶

prompt_token_ids instance-attribute ¶

type instance-attribute ¶

TextPrompt ¶

prompt instance-attribute ¶

TokensInput ¶

assistant_tokens_mask instance-attribute ¶

prompt instance-attribute ¶

prompt_token_ids instance-attribute ¶

prompt_token_offsets instance-attribute ¶

type instance-attribute ¶

TokensPrompt ¶

prompt instance-attribute ¶

prompt_token_ids instance-attribute ¶

prompt_token_offsets instance-attribute ¶

token_type_ids instance-attribute ¶

embeds_input(prompt_embeds, *, prompt=None, cache_salt=None, prompt_token_ids=None, is_token_ids=None) ¶

tokens_input(prompt_token_ids, *, prompt=None, cache_salt=None) ¶

`vllm.inputs` ¶

`DecoderOnlyEngineInput = TokensInput | EmbedsInput | MultiModalInput` `module-attribute` ¶

`EngineInput = DecoderOnlyEngineInput | EncoderDecoderInput` `module-attribute` ¶

`ModalityData = _T | list[_T | None] | None` `module-attribute` ¶

`MultiModalDataDict = Mapping[str, ModalityData[Any]]` `module-attribute` ¶

`MultiModalHashes = Mapping[str, list[str]]` `module-attribute` ¶

`MultiModalPlaceholders = Mapping[str, Sequence['PlaceholderRange']]` `module-attribute` ¶

`MultiModalUUIDDict = Mapping[str, Sequence[str | None] | str]` `module-attribute` ¶

`PromptType = DecoderOnlyPrompt | EncoderDecoderPrompt` `module-attribute` ¶

`SingletonInput = DecoderOnlyEngineInput | MultiModalEncDecInput` `module-attribute` ¶

`SingletonPrompt = DecoderOnlyPrompt | EncoderPrompt | DecoderPrompt` `module-attribute` ¶

`DataPrompt` ¶

`data` `instance-attribute` ¶

`data_format` `instance-attribute` ¶

`EmbedsInput` ¶

`is_token_ids` `instance-attribute` ¶

`prompt` `instance-attribute` ¶

`prompt_embeds` `instance-attribute` ¶

`prompt_token_ids` `instance-attribute` ¶

`type` `instance-attribute` ¶

`EmbedsPrompt` ¶

`prompt` `instance-attribute` ¶

`prompt_embeds` `instance-attribute` ¶

`prompt_is_token_ids` `instance-attribute` ¶

`prompt_token_ids` `instance-attribute` ¶

`EncoderDecoderInput` ¶

`arrival_time` `instance-attribute` ¶

`decoder_prompt` `instance-attribute` ¶

`encoder_prompt` `instance-attribute` ¶

`ExplicitEncoderDecoderPrompt` ¶

`decoder_prompt` `instance-attribute` ¶

`encoder_prompt` `instance-attribute` ¶

`MultiModalDataBuiltins` ¶

`audio` `instance-attribute` ¶

`image` `instance-attribute` ¶

`video` `instance-attribute` ¶

`vision_chunk` `instance-attribute` ¶

`MultiModalEncDecInput` ¶

`encoder_prompt` `instance-attribute` ¶

`encoder_prompt_token_ids` `instance-attribute` ¶

`MultiModalInput` ¶

`assistant_tokens_mask` `instance-attribute` ¶

`mm_hashes` `instance-attribute` ¶

`mm_kwargs` `instance-attribute` ¶

`mm_placeholders` `instance-attribute` ¶

`prompt` `instance-attribute` ¶

`prompt_token_ids` `instance-attribute` ¶

`type` `instance-attribute` ¶

`TextPrompt` ¶

`prompt` `instance-attribute` ¶

`TokensInput` ¶

`assistant_tokens_mask` `instance-attribute` ¶

`prompt` `instance-attribute` ¶

`prompt_token_ids` `instance-attribute` ¶

`prompt_token_offsets` `instance-attribute` ¶

`type` `instance-attribute` ¶

`TokensPrompt` ¶

`prompt` `instance-attribute` ¶

`prompt_token_ids` `instance-attribute` ¶

`prompt_token_offsets` `instance-attribute` ¶

`token_type_ids` `instance-attribute` ¶

`embeds_input(prompt_embeds, *, prompt=None, cache_salt=None, prompt_token_ids=None, is_token_ids=None)` ¶

`tokens_input(prompt_token_ids, *, prompt=None, cache_salt=None)` ¶