Retrieval-Augmented Generation
Retrieval-augmented generation (RAG) is a technique that enables generative artificial intelligence (Gen AI) models to retrieve and incorporate new information. It modifies interactions with a large language model (LLM) so that the model responds to user queries with reference to a specified set of documents, using this information to supplement information from its pre-existing training data. This allows LLMs to use domain-specific and/or updated information. Use cases include providing chatbot access to internal company data or generating responses based on authoritative sources.
Here are the integrations: - vLLM + langchain + milvus - vLLM + llamaindex + milvus
vLLM + langchain¶
Prerequisites¶
- Setup vLLM and langchain environment
pip install -U vllm \
langchain_milvus langchain_openai \
langchain_community beautifulsoup4 \
langchain-text-splitters
Deploy¶
- Start the vLLM server with the supported embedding model, e.g.
- Start the vLLM server with the supported chat completion model, e.g.
-
Use the script: examples/online_serving/retrieval_augmented_generation_with_langchain.py
-
Run the script
vLLM + llamaindex¶
Prerequisites¶
- Setup vLLM and llamaindex environment
pip install vllm \
llama-index llama-index-readers-web \
llama-index-llms-openai-like \
llama-index-embeddings-openai-like \
llama-index-vector-stores-milvus \
Deploy¶
- Start the vLLM server with the supported embedding model, e.g.
- Start the vLLM server with the supported chat completion model, e.g.
-
Use the script: examples/online_serving/retrieval_augmented_generation_with_llamaindex.py
-
Run the script