Google Cloud Platform

Google Cloud Platform#

Introduction#

This script automatically configures a GKE LLM inference cluster. Make sure your GCP CLI is set up, logged in, and the region is properly configured. You must have the following dependencies installed:

gcloud (Google Cloud CLI)
kubectl (Kubernetes command-line tool)
helm (Kubernetes package manager)

Ensure that all the required tools are installed before proceeding. Additionally, ensure that the following GCP APIs are enabled:

Kubernetes Engine API
Cloud Resource Manager API
IAM API
Compute Engine API

To enable these APIs, run:

gcloud services enable container.googleapis.com cloudresourcemanager.googleapis.com iam.googleapis.com compute.googleapis.com

Steps to Follow#

1. Deploy GKE vLLM Stack#

1.1 Modify the Configuration#

Before running the deployment, ensure that the configuration file production_stack_specification.yaml is properly set up. You need to configure:

servingEngineSpec: Define the model repository, resource requests, and storage settings.
routerSpec: Set up routing resource limits and requests.

Modify these fields as needed to match your cluster requirements.

1.2 Execute the Deployment Script#

Run the deployment script by replacing YAML_FILE_PATH with the actual configuration file path:

bash entry_point_basic.sh YAML_FILE_PATH

After executing the script, Kubernetes will start deploying the vLLM inference stack. You can monitor the status of the deployment.

2. Validate Installation#

2.1 Monitor Deployment Status#

To check whether the pods for vLLM deployment are up and running, use:

kubectl get pods

Expected output:

NAME                                           READY   STATUS    RESTARTS   AGE
vllm-deployment-router-6786bdcc5b-flj2x        1/1     Running   0          54s
vllm-llama3-deployment-vllm-7dd564bc8f-7mf5x   1/1     Running   0          54s

Note

It may take some time for the pods to reach the Running state, depending on cluster setup and image download speed.

2.2 Inspect Pod Logs#

If a pod is not transitioning to Running, use the following command to inspect logs:

kubectl logs -f <POD_NAME>

To get more detailed information about the pod, run:

kubectl describe pod <POD_NAME>

3. Uninstall#

To remove the deployed vLLM stack and clean up resources, run:

bash clean_up_basic.sh production-stack

This command will remove all Kubernetes resources associated with the vLLM deployment.

4. Troubleshooting#

If you encounter issues, refer to the following solutions:

Pods stuck in Pending state: Check available resources and ensure that the cluster has enough nodes:
```
kubectl describe nodes
```
Pods in CrashLoopBackOff state: Inspect logs to find the issue:
```
kubectl logs <POD_NAME>
```

Cannot connect to GKE cluster: Ensure that your gcloud CLI is properly configured:

gcloud container clusters get-credentials vllm-gke-cluster --region <REGION>

Following these steps should help ensure a successful deployment.