Engine Arguments
Engine arguments control the behavior of the vLLM engine.
- For offline inference, they are part of the arguments to LLM class.
- For online serving, they are part of the arguments to
vllm serve
.
You can look at EngineArgs and AsyncEngineArgs to see the available engine arguments.
However, these classes are a combination of the configuration classes defined in vllm.config. Therefore, we would recommend you read about them there where they are best documented.
For offline inference you will have access to these configuration classes and for online serving you can cross-reference the configs with vllm serve --help
, which has its arguments grouped by config.
Note
Additional arguments are available to the AsyncLLMEngine which is used for online serving. These can be found by running vllm serve --help