Google Cloud Platform#
Introduction#
This script automatically configures a GKE LLM inference cluster.
Make sure your GCP CLI is set up, logged in, and the region is properly configured.
You must have the following dependencies installed:
gcloud(Google Cloud CLI)kubectl(Kubernetes command-line tool)helm(Kubernetes package manager)
Ensure that all the required tools are installed before proceeding.
Additionally, ensure that the following GCP APIs are enabled:
Kubernetes Engine APICloud Resource Manager APIIAM APICompute Engine API
To enable these APIs, run:
gcloud services enable container.googleapis.com cloudresourcemanager.googleapis.com iam.googleapis.com compute.googleapis.com
Steps to Follow#
1. Deploy GKE vLLM Stack#
1.1 Modify the Configuration#
Before running the deployment, ensure that the configuration file production_stack_specification.yaml is properly set up.
You need to configure:
servingEngineSpec: Define the model repository, resource requests, and storage settings.routerSpec: Set up routing resource limits and requests.
Modify these fields as needed to match your cluster requirements.
1.2 Execute the Deployment Script#
Run the deployment script by replacing YAML_FILE_PATH with the actual configuration file path:
bash entry_point_basic.sh YAML_FILE_PATH
After executing the script, Kubernetes will start deploying the vLLM inference stack.
You can monitor the status of the deployment.
2. Validate Installation#
2.1 Monitor Deployment Status#
To check whether the pods for vLLM deployment are up and running, use:
kubectl get pods
Expected output:
NAME READY STATUS RESTARTS AGE
vllm-deployment-router-6786bdcc5b-flj2x 1/1 Running 0 54s
vllm-llama3-deployment-vllm-7dd564bc8f-7mf5x 1/1 Running 0 54s
Note
It may take some time for the pods to reach the Running state, depending on cluster setup and image download speed.
2.2 Inspect Pod Logs#
If a pod is not transitioning to Running, use the following command to inspect logs:
kubectl logs -f <POD_NAME>
To get more detailed information about the pod, run:
kubectl describe pod <POD_NAME>
3. Uninstall#
To remove the deployed vLLM stack and clean up resources, run:
bash clean_up_basic.sh production-stack
This command will remove all Kubernetes resources associated with the vLLM deployment.
4. Troubleshooting#
If you encounter issues, refer to the following solutions:
Pods stuck in
Pendingstate: Check available resources and ensure that the cluster has enough nodes:kubectl describe nodes
Pods in
CrashLoopBackOffstate: Inspect logs to find the issue:kubectl logs <POD_NAME>
Cannot connect to
GKEcluster: Ensure that yourgcloudCLI is properly configured:gcloud container clusters get-credentials vllm-gke-cluster --region <REGION>
Following these steps should help ensure a successful deployment.