Input Processing Pipeline#

Input data is passed to LLMEngine (or AsyncLLMEngine).
Tokenize the data if necessary.
Process the inputs using INPUT_REGISTRY.process_input.
- For example, add placeholder tokens to reserve KV cache for multi-modal embeddings.
Send the processed inputs to ExecutorBase.
Distribute the inputs via WorkerBase to ModelRunnerBase.
If the data contains multi-modal data, convert it into keyword arguments using MULTIMODAL_REGISTRY.map_input.
- For example, convert a PIL.Image.Image input to its pixel values for a vision model.