Input Processing Pipeline#
Input data is passed to
LLMEngine
(orAsyncLLMEngine
).Tokenize the data if necessary.
Process the inputs using
INPUT_REGISTRY.process_input
.For example, add placeholder tokens to reserve KV cache for multi-modal embeddings.
Send the processed inputs to
ExecutorBase
.Distribute the inputs via
WorkerBase
toModelRunnerBase
.If the data contains multi-modal data, convert it into keyword arguments using
MULTIMODAL_REGISTRY.map_input
.For example, convert a
PIL.Image.Image
input to its pixel values for a vision model.