Input Processing Pipeline#

  1. Input data is passed to LLMEngine (or AsyncLLMEngine).

  2. Tokenize the data if necessary.

  3. Process the inputs using INPUT_REGISTRY.process_input.

    • For example, add placeholder tokens to reserve KV cache for multi-modal embeddings.

  4. Send the processed inputs to ExecutorBase.

  5. Distribute the inputs via WorkerBase to ModelRunnerBase.

  6. If the data contains multi-modal data, convert it into keyword arguments using MULTIMODAL_REGISTRY.map_input.

    • For example, convert a PIL.Image.Image input to its pixel values for a vision model.