Skip to content

CI Failures

What should I do when a CI job fails on my PR, but I don't think my PR caused the failure?

  • Check the dashboard of current CI test failures:
    ๐Ÿ‘‰ CI Failures Dashboard

  • If your failure is already listed, it's likely unrelated to your PR.
    Help fixing it is always welcome!

    • Leave comments with links to additional instances of the failure.
    • React with a ๐Ÿ‘ to signal how many are affected.
  • If your failure is not listed, you should file an issue.

Filing a CI Test Failure Issue

  • File a bug report:
    ๐Ÿ‘‰ New CI Failure Report

  • Use this title format:

    [CI Failure]: failing-test-job - regex/matching/failing:test
    
  • For the environment field:

    Still failing on main as of commit abcdef123

  • In the description, include failing tests:

    FAILED failing/test.py:failing_test1 - Failure description  
     FAILED failing/test.py:failing_test2 - Failure description  
    https://github.com/orgs/vllm-project/projects/20  
    https://github.com/vllm-project/vllm/issues/new?template=400-bug-report.yml  
    FAILED failing/test.py:failing_test3 - Failure description  
    
  • Attach logs (collapsible section example):

    Logs:

    ERROR 05-20 03:26:38 [dump_input.py:68] Dumping input data  
    --- Logging error ---  
    Traceback (most recent call last):  
      File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 203, in execute_model  
        return self.model_executor.execute_model(scheduler_output)  
    ...
    FAILED failing/test.py:failing_test1 - Failure description  
    FAILED failing/test.py:failing_test2 - Failure description  
    FAILED failing/test.py:failing_test3 - Failure description  
    

Logs Wrangling

Download the full log file from Buildkite locally.

Strip timestamps and colorization:

.buildkite/scripts/ci-clean-log.sh

./ci-clean-log.sh ci.log

Use a tool wl-clipboard for quick copy-pasting:

tail -525 ci_build.log | wl-copy

Investigating a CI Test Failure

  1. Go to ๐Ÿ‘‰ Buildkite main branch
  2. Bisect to find the first build that shows the issue.
  3. Add your findings to the GitHub issue.
  4. If you find a strong candidate PR, mention it in the issue and ping contributors.

Reproducing a Failure

CI test failures may be flaky. Use a bash loop to run repeatedly:

.buildkite/scripts/rerun-test.sh

./rerun-test.sh tests/v1/engine/test_engine_core_client.py::test_kv_cache_events[True-tcp]

Submitting a PR

If you submit a PR to fix a CI failure:

  • Link the PR to the issue:
    Add Closes #12345 to the PR description.
  • Add the ci-failure label:
    This helps track it in the CI Failures GitHub Project.

Other Resources

Daily Triage

Use Buildkite analytics (2-day view) to:

  • Identify recent test failures on main.
  • Exclude legitimate test failures on PRs.
  • (Optional) Ignore tests with 0% reliability.

Compare to the CI Failures Dashboard.