Input Processing Pipeline#
Input data is passed to
LLMEngine(orAsyncLLMEngine).Tokenize the data if necessary.
Process the inputs using
INPUT_REGISTRY.process_input.For example, add placeholder tokens to reserve KV cache for multi-modal embeddings.
Send the processed inputs to
ExecutorBase.Distribute the inputs via
WorkerBasetoModelRunnerBase.If the data contains multi-modal data, convert it into keyword arguments using
MULTIMODAL_REGISTRY.map_input.For example, convert a
PIL.Image.Imageinput to its pixel values for a vision model.