vllm.entrypoints.pooling.embed.protocol ¶
Embedding API protocol models for OpenAI and Cohere formats.
OpenAI: https://platform.openai.com/docs/api-reference/embeddings Cohere: https://docs.cohere.com/reference/embed
Classes:
-
EmbeddingBatchChatInputRequest–OpenAI embeddings request with batched chat conversations in
input. -
EmbeddingBatchChatRequest–OpenAI embeddings request with batched top-level chat conversations.
-
EmbeddingChatInputRequest–OpenAI embeddings request with one chat conversation in
input. -
EmbeddingChatRequest–OpenAI embeddings request with one top-level chat conversation.
Functions:
-
build_typed_embeddings–Convert float embeddings to all requested Cohere embedding types.
EmbeddingBatchChatInputRequest ¶
Bases: EmbeddingBatchChatRequest
OpenAI embeddings request with batched chat conversations in input.
Source code in vllm/entrypoints/pooling/embed/protocol.py
EmbeddingBatchChatRequest ¶
Bases: PoolingBasicRequestMixin, ChatRequestOptionsMixin, EmbedRequestMixin, EmbeddingTokenizeParamsMixin
OpenAI embeddings request with batched top-level chat conversations.
Mirrors BatchChatCompletionRequest by keeping batched conversations in messages instead of introducing a separate batch-specific field.
Source code in vllm/entrypoints/pooling/embed/protocol.py
EmbeddingChatInputRequest ¶
Bases: EmbeddingChatRequest
OpenAI embeddings request with one chat conversation in input.
Source code in vllm/entrypoints/pooling/embed/protocol.py
EmbeddingChatRequest ¶
Bases: PoolingBasicRequestMixin, ChatRequestMixin, EmbedRequestMixin, EmbeddingTokenizeParamsMixin
OpenAI embeddings request with one top-level chat conversation.
Source code in vllm/entrypoints/pooling/embed/protocol.py
_encode_base64_embeddings(float_embeddings) ¶
Encode float embeddings as base64 (little-endian float32).
Source code in vllm/entrypoints/pooling/embed/protocol.py
_pack_binary_embeddings(float_embeddings, signed) ¶
Bit-pack float embeddings: positive -> 1, negative -> 0.
Each bit is shifted left by 7 - idx%8, and every 8 bits are packed into one byte.
Source code in vllm/entrypoints/pooling/embed/protocol.py
build_typed_embeddings(float_embeddings, embedding_types) ¶
Convert float embeddings to all requested Cohere embedding types.