icon: LiWrench
Title: Improving Retrieval Processes
✦ The “Retrieval” step is key since it directly improves the context that the LLM has when generating a response.
✦ The methods we will cover below are:
✦ Consider that we've developed a RAG system designed to identify potential diseases based on the symptoms entered during a consultation. If we're working with a Naive RAG, it's possible that it might only identify diseases sharing one or two symptoms, which could somewhat show that our application is not useful or even unusable.
✦ This scenario is perfectly suited for employing the Parent-Child Index Retrieval method.
✦ However, there's a minor issue with this approach:
The above points are illustrated in the subsequent image:
✦ The dilemma seems inescapable:
✦ This is where the Parent-child index retrieval method comes into play, promising to improve our approach.
✦ To bring this concept into practical application, a step-by-step explanation is most effective:
The process described is visually represented in the following image:
✦ To better understand this method, consider the following image that illustrates how it operates:
✦ Implementing this might sound daunting due to the need to establish a new database for the smaller chunks, maintain the parent chunks in memory, and track the relationship between parent and child chunks. Fortunately, LangChain
simplifies this process significantly, making it straightforward to set up.
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
# Some code for loading the documents are obmitted
# ...
parent_docs = documents
# Embedding Model
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Splitters
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=800)
# Stores
store = InMemoryStore()
vectorstore = Chroma(embedding_function=embeddings, collection_name="fullDoc", persist_directory="./JohnWick_db_parentsRD")
parent_document_retriever = ParentDocumentRetriever(
vectorstore=vectorstore,
docstore=store,
child_splitter=child_splitter,
parent_splitter =parent_splitter
)
print(f"Number of parent chunks is: {len(list(store.yield_keys()))}")
print(f"Number of child chunks is: {len(parent_document_retriever.vectorstore.get()['ids'])}")
'''
Number of parent chunks is: 75
Number of child chunks is: 3701
'''
Once we have our Parent Document Retriever, we just need to create our RAG based on this retriever and that would be it.
setup_and_retrieval = RunnableParallel({"question": RunnablePassthrough(), "context": parent_document_retriever })
output_parser = StrOutputParser()
parent_retrieval_chain = setup_and_retrieval | rag_prompt | chat_model | output_parser
LangChain
Documentation: Parent Document Retriever | 🦜️🔗 LangChain
✦ This approach can be understood as the reversal of Parent-Child Index Retrieval that we just discussed above. It is also a more intelligent method as it takes into consideration the "semantic meaning of the child chunks" and groups semantically-similar child chunks together.
✦ RAPTOR is one of the hierarchical approach introduced by Stanford researchers.
✦ Based on user query, the summary document is retrieved and then relevant chunks are retrieved from that document.
# Intallation
!git clone https://github.com/parthsarthi03/raptor.git
!cd raptor
!pip install -r requirements.txt
# Setting Up
import os
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
from raptor import RetrievalAugmentation
RA = RetrievalAugmentation()
# Adding Documents
with open('sample.txt', 'r') as file:
text = file.read()
RA.add_documents(text)
✦ Its main feature is that it is capable of performing searches in the vector store, applying filters based on the metadata. This approach is allegedly one of the most optimal methods to improve the efficiency of the retriever.
✦ We know that when we apply a “Naive retrieval”, we are calculating the similarity of all the chunks of the vector database with the query.
✦ Let’s look at a use case to fully understand when to apply this type of retreival.
✦ This case is ideal for applying Self Query Retriever.
This technique can be summarized in two very specific steps:
✦ The objective of the step called “Query Constructor” is to create the appropriate query and filters according to the user input.
✦ Who is in charge of applying the corresponding filters and how do you know what they are? For this we are going to use an LLM.
✦ The output generated by the LLM cannot be directly entered into the database.
LangChain
has specific database translators for almost all of the databases.✦ From the previous image, we see that everything begins with the user’s query.
✦ It is very important to provide the LLM with a detailed description of the metadata available in the vector store. This is shown through the following piece of code:
from langchain_chroma import Chroma
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
docs = [
Document(
page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"},
),
Document(
page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},
),
Document(
page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},
),
Document(
page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},
),
Document(
page_content="Toys come alive and have a blast doing so",
metadata={"year": 1995, "genre": "animated"},
),
Document(
page_content="Three men walk into the Zone, three men walk out of the Zone",
metadata={
"year": 1979,
"director": "Andrei Tarkovsky",
"genre": "thriller",
"rating": 9.9,
},
),
]
vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())
from langchain.chains.query_constructor.base import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.retrievers.self_query.chroma import ChromaTranslator
from langchain_openai import ChatOpenAI
metadata_field_info = [
AttributeInfo(
name="genre",
description="The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'action', 'animated']",
type="string",
),
AttributeInfo(
name="year",
description="The year the movie was released",
type="integer",
),
AttributeInfo(
name="director",
description="The name of the movie director",
type="string",
),
AttributeInfo(
name="rating", description="A 1-10 rating for the movie", type="float"
),
]
document_content_description = "Brief summary of a movie"
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
chat_model = ChatOpenAI()
self_query_retriever = SelfQueryRetriever.from_llm(
llm=ChatOpenAI(temperature=0),
vectorstore =vectordb,
document_contents = document_content_desription,
metadata_field_info =metadata_field_info,
verbose = True,
structured_query_translator = ChromaTranslator()
)
LangChain Documentation: Self-querying | 🦜️🔗 LangChain
This note is not intended to exhaustively cover all techniques or methods available for improving Retrieval-Augmented Generation (RAG) processes.