Distributed Tracing#
This tutorial guides you through the basic configurations required to collect traces, metrics, and logs from a vLLM serving engine in a Kubernetes environment with GPU support. You will learn how to use OpenTelemetry tooling along with the Jaeger distributed tracing observability platform to monitor running vLLM instances. You will learn how to specify the tracing configuration with necessary environment variables (like OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT, and OTEL_RESOURCE_ATTRIBUTES).
Table of Contents#
Prerequisites#
A Kubernetes environment with GPU support, as set up in the Prerequisite tutorial.
Helm installed on your system.
Access to a HuggingFace token (
HF_TOKEN).A self-defined api key or an existing secret (
VLLM_API_KEY).
Step 1: Preparing the Jaeger Configuration File#
Locate the example configuration files tutorials/assets/otel-example/{jaeger, jaeger-query, jaeger-collector}.yaml.
Open the files and examine the following fields:
Specify your desired ports for
jaeger-collectorandjaeger-queryservices for trace storage and retrieval, respectively.Verify that
COLLECTOR_OTLP_ENABLEDis set totrue; this enables Jaeger’s native support for OpenTelemetry
Explanation of Key Items in Jaeger Files#
ports: The ports through which Jaeger performs its operations (collection and querying in this case).
name: The unique identifier for your Jaeger deployment.
image: The single Docker image that runs all Jaeger’s backend parts as well as its UI in a container.
type: The ClusterIP type ensures the query service is only reachable from within the cluster.
Step 2: Preparing the OpenTelemetry Collector File#
Locate the example collector files tutorials/assets/otel-example/{otel-collector, otel-collector-config}.yaml.
Open the file and examine the following fields:
Specify the
receivers,processors, andexportersof the collectorThe OpenTelemetry collection is an intermediary step of collecting traces from the vLLM service and then providing these to Jaeger for its UI.
The
resourcesspecify the compute resources available to the container running the OpenTelemetry collector.
Step 3: Configuring Model and Monitoring#
Feel free to inspect the
values-12-otel-vllm.yamlfile. Note that theOTEL_EXPORTER_OTLP_ENDPOINTspecification enables metrics, traces, and logs to be collected. Further configurations can be explored in the OpenTelemetry OTLP Exporter documentation.Run the following from the
tutorials/assetsdirectory:sudo kubectl apply -f otel-example/jaeger.yaml sudo kubectl apply -f otel-example/jaeger-collector.yaml sudo kubectl apply -f otel-example/jaeger-query.yaml sudo kubectl apply -f otel-example/otel-collector-config.yaml sudo kubectl apply -f otel-example/otel-collector.yaml sudo helm install vllm ../../helm/ -f values-12-otel-vllm.yaml
sudo kubectl get pods
Expected output:
You should see pods such as the following:
NAME READY STATUS RESTARTS AGE jaeger-744484b5bc-rdrgn 1/1 Running 0 13m otel-collector-859db69dd4-kj8x6 1/1 Running 0 12m vllm-deployment-router-6888598c6-m6gl9 1/1 Running 0 12m vllm-opt125m-deployment-vllm-b489dfd8b-95gb5 1/1 Running 0 12m
The
vllm-deployment-routerpod acts as the router, managing requests and routing them to the appropriate model-serving pod.The
vllm-opt125m-deployment-vllmpod serves the actual model for inference.
Check service usage:
sudo kubectl get services
Expected output:
Ensure there are services for the serving engine, router, jaeger-collector, and jaeger-query. Note that the OpenTelemetry deployment does not require its own service:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE jaeger-collector ClusterIP 10.99.125.245 <none> 4317/TCP,4318/TCP 86m jaeger-query ClusterIP 10.97.59.30 <none> 16686/TCP 86m vllm-engine-service ClusterIP 10.102.6.58 <none> 80/TCP 86m vllm-router-service ClusterIP 10.103.127.48 <none> 80/TCP 86m
The
vllm-engine-serviceexposes the serving engine.The
vllm-router-servicehandles routing and load balancing across model-serving pods.The
jaeger-collectorservice handles collection of trace data from OpenTelemetry.The
jaeger-queryservice pulls data from the jaeger collector to use in the UI.
Expose the model and the Jaeger UI:
sudo kubectl port-forward svc/vllm-router-service 30080:80 sudo kubectl port-forward svc/jaeger-query 16686:16686
Note that 30080:80 can be replaced with any TCP/UDP port and that port 16686 is not used if a different
jaeger-queryport is chosen instead.
Please refer to Step 3 in the Quick Start tutorial for querying the deployed vLLM service. You can monitor all queries by navigating to localhost:16686 or wherever your jaeger-query port is specified, select jaeger-all-in-one from the Service dropdown menu on the Jaeger UI and click “Find Traces” to yield the traces.
Conclusion#
In this tutorial, you configured and deployed a vLLM serving engine in a Kubernetes environment, processed and exported resulting traces to Jaeger using an OpenTelemetry collector, and viewed the traces in the Jaeger UI. For further customization, please look at the various data sources available for monitoring here.