vllm.entrypoints.openai.api_server ¶
Functions:
-
build_and_serve–Build FastAPI app, initialize state, and start serving.
-
build_and_serve_renderer–Build FastAPI app for a CPU-only render server, initialize state, and
-
build_async_engine_client_from_engine_args–Create EngineClient, either:
-
init_render_app_state–Initialise FastAPI app state for a CPU-only render server.
-
run_server–Run a single-worker API server.
-
run_server_worker–Run a single API server worker.
-
setup_server–Validate API server args and create the server socket.
build_and_serve(engine_client, listen_address, sock, args, **uvicorn_kwargs) async ¶
Build FastAPI app, initialize state, and start serving.
Returns the shutdown task for the caller to await.
Source code in vllm/entrypoints/openai/api_server.py
build_and_serve_renderer(vllm_config, listen_address, sock, args, **uvicorn_kwargs) async ¶
Build FastAPI app for a CPU-only render server, initialize state, and start serving.
Returns the shutdown task for the caller to await.
Source code in vllm/entrypoints/openai/api_server.py
build_async_engine_client_from_engine_args(engine_args, *, usage_context=UsageContext.OPENAI_API_SERVER, client_config=None) async ¶
Create EngineClient, either: - in-process using the AsyncLLMEngine Directly - multiprocess using AsyncLLMEngine RPC
Returns the Client or None if the creation failed.
Source code in vllm/entrypoints/openai/api_server.py
init_render_app_state(vllm_config, state, args) async ¶
Initialise FastAPI app state for a CPU-only render server.
Unlike :func:init_app_state this function does not require an :class:~vllm.engine.protocol.EngineClient; it bootstraps the preprocessing pipeline (renderer, input_processor) directly from the :class:~vllm.config.VllmConfig.
Source code in vllm/entrypoints/openai/api_server.py
434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 | |
run_server(args, **uvicorn_kwargs) async ¶
Run a single-worker API server.
Source code in vllm/entrypoints/openai/api_server.py
run_server_worker(listen_address, sock, args, client_config=None, **uvicorn_kwargs) async ¶
Run a single API server worker.
Source code in vllm/entrypoints/openai/api_server.py
setup_server(args) ¶
Validate API server args and create the server socket.