vllm.worker.multi_step_neuronx_distributed_model_runner
MultiStepNeuronxDistributedModelRunner
¶
Bases: NeuronxDistributedModelRunner
A model runner for multi-step decoding using the neuronx-distributed-inference framework
Source code in vllm/worker/multi_step_neuronx_distributed_model_runner.py
__init__
¶
__init__(vllm_config: VllmConfig)
execute_model
¶
execute_model(
model_input,
kv_caches: Optional[List[Tensor]] = None,
intermediate_tensors: Optional[
IntermediateTensors
] = None,
num_steps: int = 1,
) -> Optional[List[SamplerOutput]]