Distributed utils¶
This directory (vllm_omni/distributed/ray_utils) contains utilities for distributed execution in vllm-omni, supporting both Ray and Multiprocessing backends.
1. Installation¶
2. Ray Utils¶
The ray_utils module provides helper functions for managing Ray clusters and actors, which is essential for: * Multi-node deployment: Running pipeline stages across different physical machines. * Resource management: Efficient GPU/CPU allocation.
2.1 Basic Usage¶
To use the Ray backend, specify worker_backend="ray" when initializing the engine.
Command Line Example:
2.2 Cluster Setup¶
Step 1: Start Head Node Run this on your primary machine:
Step 2: Connect Worker Nodes Run this on each worker machine:
Tip: For a complete cluster setup script, refer to the vLLM example: run_cluster.sh
2.3 Distributed Connector Support¶
When running on Ray, the system automatically adapts its communication strategy:
- Cross-Node: Recommended to use
MooncakeTransferEngineConnector(RDMA, fastest) orMooncakeStoreConnector(TCP fallback). - Same-Node: Can still use
SharedMemoryConnectorfor efficiency, or Ray's native object store (plasma). - SHM threshold default differs: when
worker_backend="ray", the SharedMemoryConnector default threshold is set tosys.maxsize, which forces payloads to go inline (no SHM). Overrideshm_threshold_bytesin the connector config if you want SHM for Ray runs.
2.4 Internal Helpers¶
initialize_ray_cluster: Connects to an existing Ray cluster or starts a local one.
3. Troubleshooting¶
- Connection Issues: Ensure the Ray head node is accessible and ports (default 6399 in this example) are open.
- Version Mismatch: Ensure all nodes run the same version of Ray and Python.