Skip to content

vLLM-Omni Helm Chart

Source https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/chart-helm.

Helm chart for deploying vLLM-Omni on Kubernetes. vLLM-Omni extends vLLM with omni-modality model serving, supporting text-to-image, multimodal chat, text-to-speech, and more.

Prerequisites

Quick Start

helm install my-release ./chart-helm \
  --set model=Tongyi-MAI/Z-Image-Turbo

Configuration

Model Selection

Set the model value to any supported HuggingFace model ID:

Model Type GPUs Notes
Tongyi-MAI/Z-Image-Turbo text-to-image 1 Small, fast (default)
stabilityai/stable-diffusion-3.5-medium text-to-image 1 ~6GB VRAM
Qwen/Qwen-Image text-to-image 1 Large, ~40GB+ VRAM
Qwen/Qwen2.5-Omni-7B multimodal 2 Text + audio + image + video
Qwen/Qwen3-Omni-7B-Chat multimodal 2 Latest omni model
Qwen/Qwen3-TTS text-to-speech 1 TTS

HuggingFace Token

For gated models that require authentication:

helm install my-release ./chart-helm \
  --set model=Qwen/Qwen2.5-Omni-7B \
  --set hfToken=hf_xxxxx \
  --set resources.requests."nvidia\.com/gpu"=2 \
  --set resources.limits."nvidia\.com/gpu"=2

Omni-Specific Flags

Enable VAE memory optimizations for diffusion models:

helm install my-release ./chart-helm \
  --set model=Qwen/Qwen-Image \
  --set omniArgs.vaeUseSlicing=true \
  --set omniArgs.vaeUseTiling=true

Enable CPU offloading:

helm install my-release ./chart-helm \
  --set model=Qwen/Qwen-Image \
  --set omniArgs.enableCpuOffload=true

Pass additional raw CLI flags:

helm install my-release ./chart-helm \
  --set model=Qwen/Qwen-Image \
  --set omniArgs.extraArgs[0]="--enable-layerwise-offload"

Model Cache

By default, a PersistentVolumeClaim is created for the HuggingFace model cache to avoid re-downloading models on pod restarts:

modelCache:
  enabled: true
  storageSize: "50Gi"
  storageClassName: ""

To use an ephemeral volume instead:

helm install my-release ./chart-helm \
  --set modelCache.enabled=false

Custom Command Override

To fully override the container command:

helm install my-release ./chart-helm \
  --set image.command[0]=vllm \
  --set image.command[1]=serve \
  --set image.command[2]=my-model \
  --set image.command[3]=--omni \
  --set image.command[4]=--host \
  --set image.command[5]=0.0.0.0

API Endpoints

Once deployed, vLLM-Omni exposes the following OpenAI-compatible endpoints:

Endpoint Method Description
/health GET Health check
/v1/models GET List available models
/v1/chat/completions POST Chat completions (text/multimodal)
/v1/images/generations POST Image generation
/v1/images/edits POST Image editing
/v1/audio/speech POST Text-to-speech

Files

File Description
Chart.yaml Chart metadata (name, version, maintainers)
values.yaml Default configuration values
values.schema.json JSON schema for validating values
templates/_helpers.tpl Helper templates for common configurations
templates/deployment.yaml Kubernetes Deployment
templates/service.yaml Kubernetes Service (ClusterIP)
templates/secrets.yaml Secrets (generic + HuggingFace token)
templates/pvc.yaml PersistentVolumeClaim for model cache
templates/configmap.yaml Optional ConfigMap
templates/hpa.yaml HorizontalPodAutoscaler
templates/poddisruptionbudget.yaml PodDisruptionBudget
templates/custom-objects.yaml Custom Kubernetes objects

Running Tests

This chart includes unit tests using helm-unittest. Install the plugin and run tests:

# Install plugin
helm plugin install https://github.com/helm-unittest/helm-unittest

# Run tests
helm unittest .

Example materials

.helmignore
*.png
.git/
ct.yaml
lintconf.yaml
values.schema.json
/workflows
Chart.yaml
apiVersion: v2
name: chart-vllm-omni
description: A Helm chart for deploying vLLM-Omni on Kubernetes for omni-modality model serving

# Application chart that can be packaged and deployed
type: application

# Chart version - increment on each change to the chart
version: 0.1.0

# vllm-omni application version — keep in sync with image.tag in values.yaml
appVersion: "0.16.0"

maintainers:
  - name: vllm-omni-team
ct.yaml
chart-dirs:
  - charts
validate-maintainers: false
lintconf.yaml
---
rules:
  braces:
    min-spaces-inside: 0
    max-spaces-inside: 0
    min-spaces-inside-empty: -1
    max-spaces-inside-empty: -1
  brackets:
    min-spaces-inside: 0
    max-spaces-inside: 0
    min-spaces-inside-empty: -1
    max-spaces-inside-empty: -1
  colons:
    max-spaces-before: 0
    max-spaces-after: 1
  commas:
    max-spaces-before: 0
    min-spaces-after: 1
    max-spaces-after: 1
  comments:
    require-starting-space: true
    min-spaces-from-content: 2
  document-end: disable
  document-start: disable
  empty-lines:
    max: 2
    max-start: 0
    max-end: 0
  hyphens:
    max-spaces-after: 1
  indentation:
    spaces: consistent
    indent-sequences: whatever
    check-multi-line-strings: false
  key-duplicates: enable
  line-length: disable
  new-line-at-end-of-file: disable
  new-lines:
    type: unix
  trailing-spaces: enable
  truthy:
    level: warning
templates/_helpers.tpl
{{/*
Define the vllm-omni serve command from model + omniArgs.
If image.command is set, uses that as a full override.
*/}}
{{- define "chart.omni-command" -}}
{{- if .Values.image.command }}
{{-   toYaml .Values.image.command }}
{{- else }}
- "vllm"
- "serve"
- {{ .Values.model | quote }}
- "--omni"
- "--host"
- "0.0.0.0"
- "--port"
- {{ include "chart.container-port" . | quote }}
{{- if .Values.omniArgs.vaeUseSlicing }}
- "--vae-use-slicing"
{{- end }}
{{- if .Values.omniArgs.vaeUseTiling }}
- "--vae-use-tiling"
{{- end }}
{{- if .Values.omniArgs.enableCpuOffload }}
- "--enable-cpu-offload"
{{- end }}
{{- if .Values.omniArgs.numGpus }}
- "--num-gpus"
- {{ .Values.omniArgs.numGpus | quote }}
{{- end }}
{{- if .Values.omniArgs.stageConfigsPath }}
- "--stage-configs-path"
- {{ .Values.omniArgs.stageConfigsPath | quote }}
{{- end }}
{{- if and .Values.omniArgs.cacheBackend (ne .Values.omniArgs.cacheBackend "none") }}
- "--cache-backend"
- {{ .Values.omniArgs.cacheBackend | quote }}
{{- end }}
{{- if .Values.omniArgs.defaultSamplingParams }}
- "--default-sampling-params"
- {{ .Values.omniArgs.defaultSamplingParams | quote }}
{{- end }}
{{- if and .Values.omniArgs.workerBackend (ne .Values.omniArgs.workerBackend "multi_process") }}
- "--worker-backend"
- {{ .Values.omniArgs.workerBackend | quote }}
{{- end }}
{{- range .Values.omniArgs.extraArgs }}
- {{ . | quote }}
{{- end }}
{{- end }}
{{- end }}

{{/*
Define HuggingFace environment variables
*/}}
{{- define "chart.hf-env" -}}
- name: HF_HOME
  value: "/cache/huggingface"
- name: HOME
  value: "/cache"
{{- if .Values.hfToken }}
- name: HF_TOKEN
  valueFrom:
    secretKeyRef:
      name: {{ .Release.Name }}-hf-token
      key: token
{{- end }}
{{- end }}

{{/*
Define ports for the pods
*/}}
{{- define "chart.container-port" -}}
{{-  default "8000" .Values.containerPort }}
{{- end }}

{{/*
Define service name
*/}}
{{- define "chart.service-name" -}}
{{-  if .Values.serviceName -}}
{{ .Values.serviceName | lower | trim }}
{{-  else -}}
{{ printf "%s-service" .Release.Name }}
{{-  end -}}
{{- end }}

{{/*
Define service port
*/}}
{{- define "chart.service-port" -}}
{{-  if .Values.servicePort }}
{{-    .Values.servicePort }}
{{-  else }}
{{-    include "chart.container-port" . }}
{{-  end }}
{{- end }}

{{/*
Define service port name
*/}}
{{- define "chart.service-port-name" -}}
"service-port"
{{- end }}

{{/*
Define container port name
*/}}
{{- define "chart.container-port-name" -}}
"container-port"
{{- end }}

{{/*
Define deployment strategy
*/}}
{{- define "chart.strategy" -}}
strategy:
{{-   if not .Values.deploymentStrategy }}
  type: Recreate
{{-   else }}
{{      toYaml .Values.deploymentStrategy | indent 2 }}
{{-   end }}
{{- end }}

{{/*
Define additional ports
*/}}
{{- define "chart.extraPorts" }}
{{-   with .Values.extraPorts }}
{{      toYaml . }}
{{-   end }}
{{- end }}

{{/*
Define chart external ConfigMaps and Secrets
*/}}
{{- define "chart.externalConfigs" -}}
{{-   with .Values.externalConfigs -}}
{{      toYaml . }}
{{-   end }}
{{- end }}

{{/*
Define startup, liveness and readiness probes
*/}}
{{- define "chart.probes" -}}
{{-   if .Values.startupProbe  }}
startupProbe:
{{-     with .Values.startupProbe }}
{{-       toYaml . | nindent 2 }}
{{-     end }}
{{-   end }}
{{-   if .Values.readinessProbe  }}
readinessProbe:
{{-     with .Values.readinessProbe }}
{{-       toYaml . | nindent 2 }}
{{-     end }}
{{-   end }}
{{-   if .Values.livenessProbe  }}
livenessProbe:
{{-     with .Values.livenessProbe }}
{{-       toYaml . | nindent 2 }}
{{-     end }}
{{-   end }}
{{- end }}

{{/*
Define resources
*/}}
{{- define "chart.resources" -}}
requests:
  memory: {{ required "Value 'resources.requests.memory' must be defined !" .Values.resources.requests.memory | quote }}
  cpu: {{ required "Value 'resources.requests.cpu' must be defined !" .Values.resources.requests.cpu | quote }}
  {{- if and (gt (int (index .Values.resources.requests "nvidia.com/gpu")) 0) (gt (int (index .Values.resources.limits "nvidia.com/gpu")) 0) }}
  nvidia.com/gpu: {{ required "Value 'resources.requests.nvidia.com/gpu' must be defined !" (index .Values.resources.requests "nvidia.com/gpu") }}
  {{- end }}
limits:
  memory: {{ required "Value 'resources.limits.memory' must be defined !" .Values.resources.limits.memory | quote }}
  cpu: {{ required "Value 'resources.limits.cpu' must be defined !" .Values.resources.limits.cpu | quote }}
  {{- if and (gt (int (index .Values.resources.requests "nvidia.com/gpu")) 0) (gt (int (index .Values.resources.limits "nvidia.com/gpu")) 0) }}
  nvidia.com/gpu: {{ required "Value 'resources.limits.nvidia.com/gpu' must be defined !" (index .Values.resources.limits "nvidia.com/gpu") }}
  {{- end }}
{{- end }}

{{/*
Define user for the main container
*/}}
{{- define "chart.user" }}
{{-   if .Values.image.runAsUser  }}
runAsUser:
{{-     with .Values.image.runAsUser }}
{{-       toYaml . | nindent 2 }}
{{-     end }}
{{-   end }}
{{- end }}

{{/*
Define chart labels
*/}}
{{- define "chart.labels" -}}
{{-   with .Values.labels -}}
{{      toYaml . }}
{{-   end }}
{{- end }}
templates/configmap.yaml
{{- if .Values.configs -}}
apiVersion: v1
kind: ConfigMap
metadata:
  name: "{{ .Release.Name }}-configs"
  namespace: {{ .Release.Namespace }}
data:
  {{- with .Values.configs }}
  {{- toYaml . | nindent 2 }}
  {{- end }}
{{- end -}}
templates/custom-objects.yaml
{{- if .Values.customObjects }}
{{- range .Values.customObjects }}
{{- tpl (. | toYaml) $ }}
---
{{- end }}
{{- end }}
templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: "{{ .Release.Name }}-deployment-vllm-omni"
  namespace: {{ .Release.Namespace }}
  labels:
  {{- include "chart.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  {{- include "chart.strategy" . | nindent 2 }}
  selector:
    matchLabels:
    {{- include "chart.labels" . | nindent 6 }}
  progressDeadlineSeconds: 1800
  template:
    metadata:
      labels:
      {{- include "chart.labels" . | nindent 8 }}
    spec:
      containers:
        - name: "vllm-omni"
          image: "{{ required "Required value 'image.repository' must be defined !" .Values.image.repository }}:{{ required "Required value 'image.tag' must be defined !" .Values.image.tag }}"
          command:
            {{- include "chart.omni-command" . | nindent 12 }}
          securityContext:
            {{- if .Values.image.securityContext }}
              {{- with .Values.image.securityContext }}
              {{- toYaml . | nindent 12 }}
              {{- end }}
            {{- else }}
            runAsNonRoot: false
              {{- include "chart.user" . | indent 12 }}
            {{- end }}
          imagePullPolicy: IfNotPresent
          env:
            {{- include "chart.hf-env" . | nindent 12 }}
            {{- if .Values.image.env }}
            {{- toYaml .Values.image.env | nindent 12 }}
            {{- end }}
          {{- if or .Values.externalConfigs .Values.configs .Values.secrets }}
          envFrom:
            {{- if .Values.configs }}
            - configMapRef:
                name: "{{ .Release.Name }}-configs"
            {{- end }}
            {{- if .Values.secrets }}
            - secretRef:
                name: "{{ .Release.Name }}-secrets"
            {{- end }}
            {{- include "chart.externalConfigs" . | nindent 12 }}
          {{- end }}
          ports:
            - name: {{ include "chart.container-port-name" . }}
              containerPort: {{ include "chart.container-port" . }}
            {{- include "chart.extraPorts" . | nindent 12 }}
          {{- include "chart.probes" . | indent 10 }}
          resources: {{- include "chart.resources" . | nindent 12 }}
          volumeMounts:
          - name: shm
            mountPath: /dev/shm
          - name: model-cache
            mountPath: /cache

        {{- with .Values.extraContainers }}
        {{ toYaml . | nindent 8 }}
        {{- end }}

      {{- if .Values.extraInit.initContainers }}
      initContainers:
        {{- toYaml .Values.extraInit.initContainers | nindent 8 }}
      {{- end }}

      volumes:
        - name: shm
          emptyDir:
            medium: Memory
            sizeLimit: {{ .Values.shmSize }}
        - name: model-cache
          {{- if .Values.modelCache.enabled }}
          persistentVolumeClaim:
            claimName: {{ .Release.Name }}-model-cache
          {{- else }}
          emptyDir: {}
          {{- end }}

      {{- with .Values.nodeSelector }}
      nodeSelector:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.tolerations }}
      tolerations:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- if .Values.gpuModels }}
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: nvidia.com/gpu.product
                  operator: In
                  values:
                    {{- toYaml .Values.gpuModels | nindent 20 }}
      {{- end }}
templates/hpa.yaml
{{- if .Values.autoscaling.enabled }}
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: "{{ .Release.Name }}-hpa"
  namespace: {{ .Release.Namespace }}
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: "{{ .Release.Name }}-deployment-vllm-omni"
  minReplicas: {{ .Values.autoscaling.minReplicas }}
  maxReplicas: {{ .Values.autoscaling.maxReplicas }}
  metrics:
    {{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
    {{- end }}
    {{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}
    {{- end }}
{{- end }}
templates/ingress.yaml
{{- if .Values.ingress.enabled }}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: "{{ .Release.Name }}-ingress"
  namespace: {{ .Release.Namespace }}
  labels:
  {{- include "chart.labels" . | nindent 4 }}
  {{- with .Values.ingress.annotations }}
  annotations:
    {{- toYaml . | nindent 4 }}
  {{- end }}
spec:
  {{- if .Values.ingress.ingressClassName }}
  ingressClassName: {{ .Values.ingress.ingressClassName }}
  {{- end }}
  {{- if .Values.ingress.tls }}
  tls:
    {{- toYaml .Values.ingress.tls | nindent 4 }}
  {{- end }}
  rules:
    - host: {{ .Values.ingress.host }}
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: "{{ include "chart.service-name" . }}"
                port:
                  number: {{ include "chart.service-port" . }}
{{- end }}
templates/poddisruptionbudget.yaml
{{- if gt (int .Values.replicaCount) 1 }}
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: "{{ .Release.Name }}-pdb"
  namespace: {{ .Release.Namespace }}
spec:
  maxUnavailable: {{ default 1 .Values.maxUnavailablePodDisruptionBudget }}
  selector:
    matchLabels:
    {{- include "chart.labels" . | nindent 6 }}
{{- end }}
templates/pvc.yaml
{{- if .Values.modelCache.enabled }}
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: "{{ .Release.Name }}-model-cache"
  namespace: {{ .Release.Namespace }}
spec:
  accessModes:
    {{- toYaml .Values.modelCache.accessModes | nindent 4 }}
  {{- if .Values.modelCache.storageClassName }}
  storageClassName: {{ .Values.modelCache.storageClassName }}
  {{- end }}
  resources:
    requests:
      storage: {{ .Values.modelCache.storageSize }}
{{- end }}
templates/secrets.yaml
{{- if .Values.secrets }}
apiVersion: v1
kind: Secret
metadata:
  name: "{{ .Release.Name }}-secrets"
  namespace: {{ .Release.Namespace }}
type: Opaque
data:
  {{- range $key, $val := .Values.secrets }}
  {{ $key }}: {{ $val | b64enc | quote }}
  {{- end }}
---
{{- end }}
{{- if .Values.hfToken }}
apiVersion: v1
kind: Secret
metadata:
  name: "{{ .Release.Name }}-hf-token"
  namespace: {{ .Release.Namespace }}
type: Opaque
data:
  token: {{ .Values.hfToken | b64enc | quote }}
{{- end }}
templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: "{{ include "chart.service-name" . }}"
  namespace: {{ .Release.Namespace }}
spec:
  type: ClusterIP
  ports:
    - name: {{ include "chart.service-port-name" . }}
      port: {{ include "chart.service-port" . }}
      targetPort: {{ include "chart.container-port-name" . }}
      protocol: TCP
  selector:
  {{- include "chart.labels" . | nindent 4 }}
tests/deployment_test.yaml
suite: test deployment
templates:
  - deployment.yaml
tests:
  - it: should create deployment with default omni command
    asserts:
      - hasDocuments:
          count: 1
      - isKind:
          of: Deployment
      - equal:
          path: spec.template.spec.containers[0].name
          value: vllm-omni
      - equal:
          path: spec.template.spec.containers[0].command[0]
          value: vllm
      - equal:
          path: spec.template.spec.containers[0].command[1]
          value: serve
      - equal:
          path: spec.template.spec.containers[0].command[2]
          value: Tongyi-MAI/Z-Image-Turbo
      - equal:
          path: spec.template.spec.containers[0].command[3]
          value: "--omni"
      - equal:
          path: spec.template.spec.containers[0].command[4]
          value: "--host"
      - equal:
          path: spec.template.spec.containers[0].command[5]
          value: "0.0.0.0"
      - equal:
          path: spec.template.spec.containers[0].command[6]
          value: "--port"
      - equal:
          path: spec.template.spec.containers[0].command[7]
          value: "8000"

  - it: should use custom model when set
    set:
      model: "Qwen/Qwen2.5-Omni-7B"
    asserts:
      - equal:
          path: spec.template.spec.containers[0].command[2]
          value: Qwen/Qwen2.5-Omni-7B

  - it: should include omniArgs flags in command
    set:
      omniArgs:
        vaeUseSlicing: true
        vaeUseTiling: true
        enableCpuOffload: true
        numGpus: 2
        cacheBackend: "tea_cache"
        workerBackend: "ray"
        extraArgs:
          - "--enable-layerwise-offload"
    asserts:
      - contains:
          path: spec.template.spec.containers[0].command
          content: "--vae-use-slicing"
      - contains:
          path: spec.template.spec.containers[0].command
          content: "--vae-use-tiling"
      - contains:
          path: spec.template.spec.containers[0].command
          content: "--enable-cpu-offload"
      - contains:
          path: spec.template.spec.containers[0].command
          content: "--num-gpus"
      - contains:
          path: spec.template.spec.containers[0].command
          content: "--cache-backend"
      - contains:
          path: spec.template.spec.containers[0].command
          content: "--worker-backend"
      - contains:
          path: spec.template.spec.containers[0].command
          content: "--enable-layerwise-offload"

  - it: should use full command override when image.command is set
    set:
      image:
        command:
          - "custom-binary"
          - "--flag"
    asserts:
      - equal:
          path: spec.template.spec.containers[0].command[0]
          value: custom-binary
      - equal:
          path: spec.template.spec.containers[0].command[1]
          value: "--flag"

  - it: should mount shm and model-cache volumes
    asserts:
      - contains:
          path: spec.template.spec.containers[0].volumeMounts
          content:
            name: shm
            mountPath: /dev/shm
      - contains:
          path: spec.template.spec.containers[0].volumeMounts
          content:
            name: model-cache
            mountPath: /cache
      - contains:
          path: spec.template.spec.volumes
          content:
            name: shm
            emptyDir:
              medium: Memory
              sizeLimit: 8Gi

  - it: should use PVC for model-cache when modelCache is enabled
    set:
      modelCache:
        enabled: true
        storageSize: "50Gi"
        accessModes:
          - ReadWriteOnce
    asserts:
      - contains:
          path: spec.template.spec.volumes
          content:
            name: model-cache
            persistentVolumeClaim:
              claimName: RELEASE-NAME-model-cache

  - it: should use emptyDir for model-cache when modelCache is disabled
    set:
      modelCache:
        enabled: false
        storageSize: "50Gi"
        accessModes:
          - ReadWriteOnce
    asserts:
      - contains:
          path: spec.template.spec.volumes
          content:
            name: model-cache
            emptyDir: {}

  - it: should set HF_HOME environment variable
    asserts:
      - contains:
          path: spec.template.spec.containers[0].env
          content:
            name: HF_HOME
            value: /cache/huggingface

  - it: should set HF_TOKEN env when hfToken is provided
    set:
      hfToken: "hf_test_token_123"
    asserts:
      - contains:
          path: spec.template.spec.containers[0].env
          content:
            name: HF_TOKEN
            valueFrom:
              secretKeyRef:
                name: RELEASE-NAME-hf-token
                key: token

  - it: should create custom init containers when specified
    set:
      extraInit:
        initContainers:
          - name: my-init
            image: busybox:latest
            command: ["echo", "init"]
    asserts:
      - lengthEqual:
          path: spec.template.spec.initContainers
          count: 1
      - equal:
          path: spec.template.spec.initContainers[0].name
          value: my-init
      - equal:
          path: spec.template.spec.initContainers[0].image
          value: busybox:latest
tests/ingress_test.yaml
suite: test ingress
templates:
  - ingress.yaml
tests:
  - it: should not create Ingress when disabled
    set:
      ingress:
        enabled: false
    asserts:
      - hasDocuments:
          count: 0

  - it: should create Ingress when enabled
    set:
      ingress:
        enabled: true
        host: "vllm-omni.example.com"
    asserts:
      - hasDocuments:
          count: 1
      - isKind:
          of: Ingress
      - equal:
          path: spec.rules[0].host
          value: vllm-omni.example.com
      - equal:
          path: spec.rules[0].http.paths[0].path
          value: /
      - equal:
          path: spec.rules[0].http.paths[0].pathType
          value: Prefix
      - equal:
          path: spec.rules[0].http.paths[0].backend.service.name
          value: RELEASE-NAME-service
      - equal:
          path: spec.rules[0].http.paths[0].backend.service.port.number
          value: 80

  - it: should set ingressClassName when specified
    set:
      ingress:
        enabled: true
        host: "vllm-omni.example.com"
        ingressClassName: "nginx"
    asserts:
      - equal:
          path: spec.ingressClassName
          value: nginx

  - it: should not set ingressClassName when empty
    set:
      ingress:
        enabled: true
        host: "vllm-omni.example.com"
        ingressClassName: ""
    asserts:
      - notExists:
          path: spec.ingressClassName

  - it: should include annotations when specified
    set:
      ingress:
        enabled: true
        host: "vllm-omni.example.com"
        annotations:
          nginx.ingress.kubernetes.io/proxy-body-size: "100m"
    asserts:
      - isNotEmpty:
          path: metadata.annotations

  - it: should include TLS configuration when specified
    set:
      ingress:
        enabled: true
        host: "vllm-omni.example.com"
        tls:
          - secretName: vllm-omni-tls
            hosts:
              - vllm-omni.example.com
    asserts:
      - equal:
          path: spec.tls[0].secretName
          value: vllm-omni-tls
      - equal:
          path: spec.tls[0].hosts[0]
          value: vllm-omni.example.com
tests/pvc_test.yaml
suite: test pvc
templates:
  - pvc.yaml
tests:
  - it: should create PVC when modelCache is enabled
    set:
      modelCache:
        enabled: true
        storageSize: "50Gi"
        storageClassName: ""
        accessModes:
          - ReadWriteOnce
    asserts:
      - hasDocuments:
          count: 1
      - isKind:
          of: PersistentVolumeClaim
      - equal:
          path: spec.accessModes[0]
          value: ReadWriteOnce
      - equal:
          path: spec.resources.requests.storage
          value: 50Gi

  - it: should not create PVC when modelCache is disabled
    set:
      modelCache:
        enabled: false
        storageSize: "50Gi"
        accessModes:
          - ReadWriteOnce
    asserts:
      - hasDocuments:
          count: 0

  - it: should use custom storage size
    set:
      modelCache:
        enabled: true
        storageSize: "100Gi"
        storageClassName: ""
        accessModes:
          - ReadWriteOnce
    asserts:
      - equal:
          path: spec.resources.requests.storage
          value: 100Gi

  - it: should set storageClassName when specified
    set:
      modelCache:
        enabled: true
        storageSize: "50Gi"
        storageClassName: "fast-ssd"
        accessModes:
          - ReadWriteOnce
    asserts:
      - equal:
          path: spec.storageClassName
          value: fast-ssd

  - it: should support ReadWriteMany access mode
    set:
      modelCache:
        enabled: true
        storageSize: "50Gi"
        storageClassName: ""
        accessModes:
          - ReadWriteMany
    asserts:
      - equal:
          path: spec.accessModes[0]
          value: ReadWriteMany
tests/secrets_test.yaml
suite: test secrets
templates:
  - secrets.yaml
tests:
  - it: should create HF token secret when hfToken is provided
    set:
      hfToken: "hf_test_token_123"
    asserts:
      - hasDocuments:
          count: 1
      - isKind:
          of: Secret
      - equal:
          path: metadata.name
          value: RELEASE-NAME-hf-token
      - exists:
          path: data.token

  - it: should not create any secrets when hfToken is empty and no secrets defined
    set:
      hfToken: ""
      secrets: {}
    asserts:
      - hasDocuments:
          count: 0

  - it: should create both generic and HF token secrets
    set:
      hfToken: "hf_test_token_123"
      secrets:
        mykey: myvalue
    asserts:
      - hasDocuments:
          count: 2

  - it: should create only generic secrets when no hfToken
    set:
      hfToken: ""
      secrets:
        mykey: myvalue
    asserts:
      - hasDocuments:
          count: 1
      - isKind:
          of: Secret
      - equal:
          path: metadata.name
          value: RELEASE-NAME-secrets
values.yaml
# -- Default values for chart vllm-omni
# -- Declare variables to be passed into your templates.

# -- HuggingFace model ID to serve
model: "Tongyi-MAI/Z-Image-Turbo"

# -- Image configuration
image:
  # -- Image repository
  repository: "vllm/vllm-omni"
  # -- Image tag
  tag: "v0.16.0"
  # -- Override the container command entirely. If empty, the chart constructs
  # -- it automatically from model + omniArgs.
  command: []
  # -- Optional environment variables for the container
  env: []
  # -- Security context override
  securityContext: {}

# -- Container port
containerPort: 8000
# -- Service name (auto-generated from release name if empty)
serviceName:
# -- Service port
servicePort: 80
# -- Additional ports configuration
extraPorts: []

# -- Number of replicas
replicaCount: 1

# -- Deployment strategy configuration
deploymentStrategy: {}

# -- Resource configuration
resources:
  requests:
    # -- Number of CPUs
    cpu: 4
    # -- CPU memory
    memory: 24Gi
    # -- Number of GPUs
    nvidia.com/gpu: 1
  limits:
    # -- Number of CPUs
    cpu: 4
    # -- CPU memory
    memory: 24Gi
    # -- Number of GPUs
    nvidia.com/gpu: 1

# -- GPU model types for node affinity scheduling (optional)
# -- Leave empty to schedule on any GPU node. Set to restrict to specific GPU types.
# -- Example: ["NVIDIA-A100-SXM4-40GB"]
gpuModels: []

# -- vllm-omni specific CLI arguments
omniArgs:
  # -- Additional raw CLI flags appended to the serve command (list of strings)
  extraArgs: []
  # -- Enable VAE slicing for memory optimization
  vaeUseSlicing: false
  # -- Enable VAE tiling for memory optimization
  vaeUseTiling: false
  # -- Enable CPU offloading for diffusion models
  enableCpuOffload: false
  # -- Number of GPUs for diffusion inference (empty = auto)
  numGpus:
  # -- Path to stage configs file (empty = auto-detected from model)
  stageConfigsPath:
  # -- Cache backend: none, tea_cache, or cache_dit
  cacheBackend: "none"
  # -- Default sampling params as a JSON string
  defaultSamplingParams:
  # -- Worker backend: multi_process or ray
  workerBackend: "multi_process"

# -- HuggingFace token for gated models (optional)
hfToken: ""

# -- Shared memory size for PyTorch multiprocessing
shmSize: "8Gi"

# -- Model cache configuration (HuggingFace downloads)
modelCache:
  # -- Use a PersistentVolumeClaim for model cache (recommended for production)
  enabled: true
  # -- Storage size for the PVC
  storageSize: "50Gi"
  # -- Storage class name (empty = cluster default)
  storageClassName: ""
  # -- Access modes
  accessModes:
    - ReadWriteOnce

# -- Autoscaling configuration
autoscaling:
  # -- Enable autoscaling
  enabled: false
  # -- Minimum replicas
  minReplicas: 1
  # -- Maximum replicas
  maxReplicas: 10
  # -- Target CPU utilization for autoscaling
  targetCPUUtilizationPercentage: 80
  # targetMemoryUtilizationPercentage: 80

# -- ConfigMap data (key-value pairs injected as environment variables)
configs: {}

# -- Secrets data (key-value pairs, base64-encoded automatically)
secrets: {}

# -- External ConfigMaps/Secrets references
externalConfigs: []

# -- Custom Kubernetes objects
customObjects: []

# -- PodDisruptionBudget max unavailable
maxUnavailablePodDisruptionBudget: ""

# -- Additional init containers
extraInit:
  initContainers: []

# -- Additional sidecar containers
extraContainers: []

# -- Startup probe configuration
# -- Protects slow-starting containers. Liveness and readiness probes
# -- do not start until the startup probe succeeds.
# -- failureThreshold * periodSeconds = max startup time (40 * 30 = 1200s = 20 min)
startupProbe:
  # -- HTTP check configuration
  httpGet:
    # -- Path to check
    path: /health
    # -- Port to check (uses named port to stay in sync with containerPort)
    port: "container-port"
  # -- Failures before killing the container
  failureThreshold: 40
  # -- How often to perform the probe
  periodSeconds: 30

# -- Readiness probe configuration
readinessProbe:
  # -- How often to perform the probe
  periodSeconds: 10
  # -- Failures before marking unready
  failureThreshold: 3
  # -- HTTP check configuration
  httpGet:
    # -- Path to check
    path: /health
    # -- Port to check (uses named port to stay in sync with containerPort)
    port: "container-port"

# -- Liveness probe configuration
livenessProbe:
  # -- Failures before restarting
  failureThreshold: 3
  # -- How often to perform the probe
  periodSeconds: 15
  # -- HTTP check configuration
  httpGet:
    # -- Path to check
    path: /health
    # -- Port to check (uses named port to stay in sync with containerPort)
    port: "container-port"

# -- Ingress configuration
ingress:
  # -- Enable Ingress
  enabled: false
  # -- Ingress class name (e.g., nginx, traefik)
  ingressClassName: ""
  # -- Hostname for the Ingress rule
  host: "vllm-omni.example.com"
  # -- Additional annotations
  annotations: {}
  # -- TLS configuration
  tls: []
  #  - secretName: vllm-omni-tls
  #    hosts:
  #      - vllm-omni.example.com

# -- Node selector for pod scheduling
nodeSelector: {}

# -- Tolerations for pod scheduling
tolerations: []

# -- Labels applied to all resources
labels:
  app: "vllm-omni"