icon: LiNotebook
✦ Embeddings are a type of representation that bridges the human understanding of language to that of a machine.
✦ They are a distributed representation for text that is perhaps one of the key breakthroughs for the impressive performance of deep learning methods on challenging natural language processing problems.
Large language models like GPT-4, Gemini, or BERT use word embeddings as the first layer of the model. We know, BERT is not that "large" compared to the other two, but it's still considered a significant advancement in natural language processing.
These models convert each word into a dense vector and feed it into the model. The models then use these vectors to predict the next word in a sentence (in the case of GPT-4) or to understand the context of a word (in the case of BERT).
These models are trained on a large corpus of text, so they learn the semantic meaning of words. For example, the word “king” is closer in this space to “queen” than it is to “apple”.
They are representations of text in a N-dimensional space where words that have the same meaning have a similar representation.
The number of values in a text embedding — known as its “dimension” — depends on the embedding technique (the process of producing the vector), as well as how much information you want it to convey.
The embeddings below shows a vector with 8 dimensions.
Table below show the common models with the dimensions of their embeddings
Model | Embedding Dimension | Max Input Tokens |
---|---|---|
BERT-Base | 768 | 512 |
BERT-Large | 1024 | 512 |
GPT-2 | 768 | 1024 |
GPT-3 | 768 | 2048 |
RoBERTa-Base | 768 | 512 |
RoBERTa-Large | 1024 | 512 |
DistilBERT | 768 | 512 |
OpenAI text-embedding-3-small | 1536 | 8191 |
OpenAI text-embedding-3-large | 3072 | 8191 |
in_1 = "Flamingo spotted at the bird park"
in_2 = "Sea otter seen playing at the marine park"
in_3 = "Baby panda born at the city zoo"
in_4 = "Python developers prefer snake_case for variable naming"
in_5 = "New JavaScript framework aims to simplify coding"
in_6 = "C++ developers appreciate the power of OOP"
in_7 = "Java is a popular choice for enterprise applications"
list_of_input_texts = [in_1, in_2, in_3, in_4, in_5, in_6, in_7]
text-embedding-3-small
. ✦ The straightforward reason is that they can reduce data dimensionality and address the primary issue: the necessity for speed.
✦ The initial phase of any Large Language Model (LLM) training is the most crucial: the neural network is constructed from a vast amount of data with an extensive number of features (let’s refer to them as details).
Embedding models have been used for a long time, primarily for training other LLMs or ML models.
The introduction of Retrieval Augmented Generation (RAG) and subsequently of Vector Store Databases has shed new light on these models.
They have a few common issues:
As research progressed, new state-of-the-art (text) embedding models began producing embeddings with increasingly higher output dimensions, meaning each input text is represented using more values. While this improves performance, it comes at the cost of efficiency and speed. Researchers were therefore motivated to create embedding models whose embeddings could be reasonably reduced in size without significantly sacrificing performance.
icon: LiNotebook
This is our new helper function to get embeddings by passing in a list of text to the function.
def get_embedding(input, model='text-embedding-3-small', dimensions=None):
response = client.embeddings.create(
input=input,
model=model,
dimensions=dimensions
)
return [x.embedding for x in response.data]
text-embedding-3-small
that produces embeddings with 1536 dimensiontext-embedding-3-large
that produces embeddings with 3072 dimensionsUsage is priced per input token. Below is an example of how many pages of text that can be processed per US dollar (assuming ~800 tokens per page):
MODEL | ~ PAGES PER USD DOLLAR | PERFORMANCE ON MTEB EVAL | MAX INPUT |
---|---|---|---|
text-embedding-3-small | 62,500 | 62.3% | 8191 |
text-embedding-3-large | 9,615 | 64.6% | 8191 |
text-embedding-ada-002 | 12,500 | 61.0% | 8191 |
Using larger embeddings, for example storing them in a vector store for retrieval, generally costs more and consumes more compute, memory and storage than using smaller embeddings.
With OpenAI's new embedding models, both text-embedding-3-large
and text-embedding-3-small
allows builders to trade-off performance and cost of using embeddings.
✦ Specifically, builders can shorten embeddings (i.e. remove some numbers from the end of the sequence) without the embedding losing its concept-representing properties by passing in the dimensions
API parameter.
✦ For example, on the MTEB benchmark, a text-embedding-3-large
embedding can be shortened to a size of 256 while still outperforming an unshortened text-embedding-ada-002
(One of OpenAI's older embedding models) embedding with a size of 1,536.
✦ In general, using the dimensions
parameter when creating the embedding is the suggested approach. Code below shows how the helper function is called with the dimensions specified as 512.
# Helper Function for Getting Embeddings
def get_embedding(input, model='text-embedding-3-small', dimensions=None):
response = client.embeddings.create(
input=input,
model=model,
dimensions=dimensions
)
return [x.embedding for x in response.data]
# Calling the function
text = "Python developers prefer snake_case for variable naming"
embeddings = get_embedding(text, dimensions=512)
Uniform Manifold Approximation and Projection (UMAP) is a powerful dimensionality reduction technique that can be used to compress and visualize high-dimensional data in a lower-dimensional space.
UMAP operates in two main steps:
UMAP has several advantages over other dimensionality reduction techniques:
Preservation of Structure: UMAP preserves both the local and global structure of the data. This means that both clusters of similar data points and the broader relationships between these clusters are maintained in the lower-dimensional space.
Scalability: UMAP is highly scalable and can handle large datasets efficiently.
Flexibility: UMAP is not limited to just visualization. It can also be used for general non-linear dimension reduction tasks, making it a versatile tool for many data analysis tasks.
The UMAP algorithm is implemented in the umap-learn
package in Python. Here's a simple example of how to use it:
import umap
import numpy as np
# Assume embeddings is your high-dimensional data
embeddings = np.random.rand(100, 50)
reducer = umap.UMAP()
umap_embeddings = reducer.fit_transform(embeddings)
In this example, umap.UMAP()
creates a UMAP object, and fit_transform()
fits the model to the data and then transforms the data to a lower-dimensional representation. The result, umap_embeddings
, is a 2D array of the lower-dimensional embeddings of your data.
In conclusion, UMAP is a powerful tool for data analysts dealing with high-dimensional data. It offers a way to visualize and understand the structure of the data, making it an invaluable tool in the data analyst's toolkit.
You may have learnt about Principal Component Analysis (PCA) in Data Champions Bootcamp or other machine learning or statistical analysis courses. Here we try to understand why the UMAP is a superior technique compared to PCA, especially when it comes to complex data.
Linearity vs Non-linearity: PCA is a linear dimension reduction technique. It works well when the data lies along a linear subspace, but it may not capture complex structures in the data. On the other hand, UMAP is a non-linear dimension reduction technique. It can capture more complex structures in the data, making it more suitable for high-dimensional data where the structure is not linear.
Preservation of Structure: PCA aims to preserve the variance in the data. It projects the data onto the directions (principal components) where the variance is maximized. However, it does not preserve the distances between data points. UMAP, on the other hand, aims to preserve both the local and global structure of the data. It tries to maintain the distances between nearby points in the high-dimensional space in the lower-dimensional projection.
Scalability: PCA scales well with the number of features, but not with the number of samples. UMAP, however, scales well with both the number of features and the number of samples, making it more suitable for large datasets.
Interpretability: The principal components in PCA are combinations of the original features, which can be interpreted in terms of the original features. This is not the case with UMAP, as it uses a more complex algorithm to reduce dimensionality, which might not be as easily interpretable.
In summary, while PCA is a good choice for linear data and when interpretability is important, UMAP is more suitable for complex, high-dimensional data where preserving the structure of the data is crucial.
import numpy as np
import pandas as pd
import umap # For compressing high-dimensional data (many columns) into lower-dimensional data (e.g. 2 columns)
import matplotlib.pyplot as plt
import seaborn as sns # For data visualization
# New Helper Function
def get_projected_embeddings(embeddings, random_state=0):
reducer = umap.UMAP(random_state=random_state).fit(embeddings)
embeddings_2d_array = reducer.transform(embeddings)
return pd.DataFrame(embeddings_2d_array, columns=['x', 'y'])
def get_projected_embeddings(embeddings, random_state=0):
reducer = umap.UMAP(random_state=random_state).fit(embeddings)
embeddings_2d_array = reducer.transform(embeddings)
return pd.DataFrame(embeddings_2d_array, columns=['x', 'y'])
Below is the example of using the new help function and then visualize its output using a scatterplot:
Since embeddings capture semantic information, they allow us to compare a pair of texts based on their vector representations.
✦ One very common way to compare the distance between a pair of embeddings.
✦ With the distance between a pair of embeddings, we can then apply the distance in many other use cases such as:
Cosine similarity is one of the most common and often the default method used in calculating the distance between a pair of embeddings.
import numpy as np# Define two vectors A and B
A = np.array([1, 2, 3]) # Example vector A
B = np.array([4, 5, 6]) # Example vector B
# Define a function to calculate cosine similarity
def cosine_similarity(vector_a, vector_b):
# Calculate the dot product of A and B
dot_product = np.dot(vector_a, vector_b)
# Calculate the L2 norm (magnitude) of A and B
# **L2 norm** (also known as the **Euclidean norm**) of a vector is the square root of the sum of the squares of its components.
# - The Euclidean norm provides a straightforward measure of the magnitude of a vector.
# - It captures how “big” or “long” a vector is, regardless of its direction.
norm_a = np.linalg.norm(vector_a)
norm_b = np.linalg.norm(vector_b)
# Calculate cosine similarity
cosine_sim = dot_product / (norm_a * norm_b)
return cosine_sim
# Calculate and print the cosine similarity between A and B
cos_sim = cosine_similarity(A, B)
print(f"The cosine similarity between A and B is: {cos_sim}")
✦ In Python, you can use the cosine_similarity
function from the sklearn.metrics.pairwise
module to calculate cosine similarity.
Langchain
that handles the low-level operations such as calculating the distance behind the scene, while we can focus on the logics of our applications. consine similarity
on our own.✦ Cosine similarity is particularly useful for LLM embeddings because it effectively captures the semantic similarity between text documents.
✦ For production-level retriever that requires searching over many vectors quickly, it is generally suggested to use a vector database.
While embeddings offer significant advantages in various applications, they also pose substantial risks to privacy and data security.
Embeddings are essentially numerical representations of text data, and despite their seemingly abstract nature, they can encode sensitive information about individuals or organizations.
✦ Embeddings Contain Sensitive Information:
✦ Inversion Attacks:
✦ Privacy Implications:
✦ Balancing Utility and Privacy:
icon: LiWrench
Here is the sample data used in the use cases below:
To retrieve the most relevant documents we use the cosine similarity between the embedding vectors of the query and each document, and return the highest scored documents.
from openai.embeddings_utils import get_embedding, cosine_similarity
def search_reviews(df, product_description, n=3, pprint=True):
embedding = get_embedding(product_description, model='text-embedding-3-small')
df['similarities'] = df.ada_embedding.apply(lambda x: cosine_similarity(x, embedding))
res = df.sort_values('similarities', ascending=False).head(n)
return res
res = search_reviews(df, 'delicious beans', n=3)
The size of the embeddings varies with the complexity of the underlying model. In order to visualize this high dimensional data we use the t-SNE algorithm to transform the data into two dimensions.
The individual reviews are coloured based on the star rating which the reviewer has given:
The visualization seems to have produced roughly 3 clusters, one of which has mostly negative reviews.
This code is a way to visualize the relationship between different Amazon reviews based on their embeddings and scores. The t-SNE algorithm
is particularly good at preserving local structure in high-dimensional data, making it a popular choice for tasks like this.
import pandas as pd
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
import matplotlib
df = pd.read_csv('output/embedded_1k_reviews.csv')
matrix = df.ada_embedding.apply(eval).to_list()
# Create a t-SNE model and transform the data
tsne = TSNE(n_components=2, perplexity=15, random_state=42, init='random', learning_rate=200)
vis_dims = tsne.fit_transform(matrix)
colors = ["red", "darkorange", "gold", "turquiose", "darkgreen"]
x = [x for x,y in vis_dims]
y = [y for x,y in vis_dims]
color_indices = df.Score.values - 1
colormap = matplotlib.colors.ListedColormap(colors)
plt.scatter(x, y, c=color_indices, cmap=colormap, alpha=0.3)
plt.title("Amazon ratings visualized in language using t-SNE")
✦ An embedding serves as a versatile free-text feature encoder within a machine learning model.
✦ Advantages over Traditional Methods:
✦ The provided code segment splits the data into a training set and a testing set, which will be utilized for regression and classification use cases
✦ This time, instead of having the algorithm predict a value anywhere between 1 and 5, we will attempt to classify the exact number of stars for a review into 5 buckets, ranging from 1 to 5 stars.
✦ After the training, the model learns to predict 1 and 5-star reviews much better than the more nuanced reviews (2-4 stars), likely due to more extreme sentiment expression.
We can use embeddings for zero shot classification without any labeled training data.
from openai.embeddings_utils import cosine_similarity, get_embedding
df= df[df.Score!=3]
df['sentiment'] = df.Score.replace({1:'negative', 2:'negative', 4:'positive', 5:'positive'})
labels = ['negative', 'positive']
label_embeddings = [get_embedding(label, model=model) for label in labels]
def label_score(review_embedding, label_embeddings):
return cosine_similarity(review_embedding, label_embeddings[1]) - cosine_similarity(review_embedding, label_embeddings[0])
prediction = 'positive' if label_score('Sample Review', label_embeddings) > 0 else 'negative'
Clustering is one way of making sense of a large volume of textual data. Embeddings are useful for this task, as they provide semantically meaningful vector representations of each text. Thus, in an unsupervised way, clustering will uncover hidden groupings in our dataset.
In this example, we discover four distinct clusters: one focusing on dog food, one on negative reviews, and two on positive reviews.
import numpy as np
from sklearn.cluster import KMeans
matrix = np.vstack(df.ada_embedding.values)
n_clusters = 4
kmeans = KMeans(n_clusters = n_clusters, init='k-means++', random_state=42)
kmeans.fit(matrix)
df['Cluster'] = kmeans.labels_
We can obtain a user embedding by averaging over all of their reviews. Similarly, we can obtain a product embedding by averaging over all the reviews about that product. In order to showcase the usefulness of this approach we use a subset of 50k reviews to cover more reviews per user and per product.
We evaluate the usefulness of these embeddings on a separate test set, where we plot similarity of the user and product embedding as a function of the rating. Interestingly, based on this approach, even before the user receives the product we can predict better than random whether they would like the product.
user_embeddings = df.groupby('UserId').ada_embedding.apply(np.mean)
prod_embeddings = df.groupby('ProductId').ada_embedding.apply(np.mean)
✦ After seeing some of these example use cases, you might think, “why should I care about these text embedding things? Can’t I just make use GPT-4 to analyze the text for me?
✦ Techniques like Retrieval Augmented Generated (RAG) or Fine-tuning allow tailoring the LLMs to specific problem domains.
✦ However, it’s important to recognize that these systems are still in their early stages. - Building a robust LLM system presents challenges such as high computational costs, security risks associated with large language models, unpredictable responses, and even hallucinations.
✦ On the other hand, text embeddings have a long history, are lightweight, and deterministic.
Leveraging embeddings simplifies and reduces the cost of building LLM systems while retaining substantial value. By pre-computing text embeddings, you can significantly accelerate the training and inference process of LLMs. This leads to lower computational costs and faster development cycles. Additionally, embeddings capture semantic and syntactic information about text, providing a strong foundation for LLM performance.
It should be another tool in the NLP toolkit, allowing for efficient similarity search, clustering, and other tasks. Embeddings excel at capturing semantic and syntactic relationships between texts. This makes them invaluable for tasks like finding similar documents, grouping related content, and understanding the overall structure of a text corpus. By combining embeddings with LLMs, you can create more powerful and versatile applications.
icon: LiWrench
Now that we understand how embeddings can be used to retrieve semantically related texts, it's time to explore probably the most popular and pragmatic application of embeddings: Retrieval Augmented Generation (RAG).
A Retrieval-Augmented Generation (RAG) system is a framework that enhances the accuracy and reliability of generative AI models by incorporating information from external sources.
✦ LLMs offer a natural language interface between humans and data. Widely available models come pre-trained on vast amounts of publicly available data, such as Wikipedia, mailing lists, textbooks, source code, and more.
✦ However, while LLMs are trained on a vast amount of data, they are not trained on your data, which may be private or specific to the problem you’re trying to solve. This data could be behind APIs, in SQL databases, or trapped in PDFs and slide decks.
✦ You might choose to fine-tune an LLM with your data, but:
✦ Instead of fine-tuning, you can use a context augmentation pattern called Retrieval-Augmented Generation (RAG) to obtain more accurate text generation relevant to your specific data.
✦ By doing so, RAG overcomes all three weaknesses of the fine-tuning approach:
LangChain provides a robust framework for building LLM applications. The framework includes many components to support common LLM operations such as prompt chaining, chat memory management, and, of course, RAG.
We recommend using LangChain
or equivalent frameworks for implementing RAG, instead of writing your code from scratch. These frameworks often offer the following benefits:
Ready-to-use Components
Components
are various modules/functions that we can use to handle many of the common operations in RAG, without having to write the code from scratch.
Contextual Compression
, Self Query
, and Parent Document
– techniques that otherwise someone would have to understand from research papers or code repositories and then translate into Python code.Community Support:
However, packages like LangChain are not without their shortcomings:
✦ Expect a learning curve to get familiar with the framework
✦ They are still in active development and may break your code
✦ Less flexibility compared to writing your own code
There are 5 main steps in a typical RAG pipeline:
✦ Use document loaders
to load data from a source as Document's.
✦ See official documentation on LangChain's Document Loaders for different kinds of loaders for different sources.
✦ In this particular example, we are using one of the PDF loader
from LangChain
to load the Prompt Engineering Playbook.
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("https://www.developer.tech.gov.sg/products/collections/data-science-and-artificial-intelligence/playbooks/prompt-engineering-playbook-beta-v3.pdf")
pages = loader.load()
✦ The loader load each page of the PDF file as a separate Document
object. The code below shows the first page of the PDF, by using index 0.
Once we loaded documents, we'll often want to transform them to better suit our application.
✦ The simplest example is you may want to split a long document into smaller chunks that can fit into your model's context window.
✦ LangChain has a number of built-in document transformers that make it easy to split, combine, filter, and otherwise manipulate documents.
✦ At a high level, text splitters work as following:
✦ In the example, we are using the RecursiveCharacterTextSplitter from Langchain to split the given some_text
into chunks. The resulting segments will have a maximum size of 26 characters, with an overlap of 4 characters between adjacent chunks.
The key parameters that we often see in splitter
are the following:
chunk_size
:
chunk_size
parameter determines the maximum length (in characters) of each chunk or segment into which the document is split.chunk_size
results in more fine-grained segments, while a larger value creates larger chunks. chunk_overlap
:
chunk_overlap
parameter specifies the number of characters that overlap between adjacent chunks. chunk_overlap
value increases the overlap, allowing for smoother transitions between chunks. method
.from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
embeddings_model = OpenAIEmbeddings(model='text-embedding-3-small')
db = Chroma.from_documents(splitted_documents, embeddings_model, persist_directory="./chroma_db")
“./chroma_db”
.Documents objects
(LangChain's object).Vector Store:
Vector Database:
In short, while a Vector Store is minimalistic and focused on storage, a Vector Database provides additional features and optimizations for efficient vector handling, making it suitable for applications like semantic search, recommendation systems, and retrieval-augmented generation (RAG).
For the Retrieval stage, LangChain provides a variety of retrievers
, each of which is an interface that returns documents given an unstructured query.
Document
objects as output.This is a low-level implementation
that is useful if you want to have more flexibility in customizable or developing your own retriever.
For example, if you want to only retrieve the documents of which the relevant_score
is above a specific threshold value, this method allow you to access such values, therefore you can write your own code to do the filtering or other computations before getting the final list of documents to retrieve.
retriever
objectThis is a much more common approach, where we rely on the retriever
component from Langchain to retrieve the relevant documents.
# This is a very basic retriever that return a maximum of 10 most relevant documents
retriever_basic = vectorstore.as_retriever(search_kwargs={"k": 10})
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
AzureChatOpenAI(model='gpt-3.5-turbo'),
retriever=retriever_basic
)
qa_chain.invoke("Why LLM hallucinate?")
or we can also easily write our custom Q&A prompt for generating the answer
from langchain.prompts import PromptTemplate
# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)
# Run chain
qa_chain = RetrievalQA.from_chain_type(
AzureChatOpenAI(model='gpt-3.5-turbo'),
retriever=retriever_basic,
return_source_documents=True, # Make inspection of document possible
chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)
icon: RiCodeBoxLine
Part 1
of the Notebook and Follow Along✦ This notebook contains pre-populated code for you to execute cell by cell.
✦ Here's how to use this notebook:
Part 2
of the Notebook with Your Own Code✦ While there is no submission required, we encourage you to share your solutions with your peers by pasting your link into the Sharing Board.
Feedback: By sharing your solutions, you can get insights, suggestions, and constructive criticism from your peers. This feedback can help you improve your approach and learn from others’ perspectives.
Learning from Peers: Since everyone may have different ways of solving problems, participating in these sessions allows you to see various approaches. You can learn alternative methods, explore different techniques, and gain a deeper understanding of the challenges.
✦ URL: https://miro.com/app/board/uXjVKojBjec=/?share_link_id=989058465513
✦ Passcode: abc-2024
icon: RiMeteorLine
As we delve into the exciting world of Retrieval-Augmented Generation (RAG) and LangChain, building custom pipelines to extract knowledge from our documents, it's crucial to understand the broader landscape of RAG solutions. One such solution, particularly relevant for WOG officers, is AIBots.
While building your own RAG pipeline with LangChain offers immense flexibility and control, it's not always the most efficient or necessary approach. Here's why exploring AIBots alongside your LangChain learning journey can be incredibly beneficial:
1. Not All Use Cases Need a Custom Pipeline:
Before diving headfirst into building a complex RAG pipeline, ask yourself: does your use case truly warrant it? AIBots provides a user-friendly, no-code platform for creating RAG chatbots, perfect for simpler applications. It allows you to quickly test your documents and understand their limitations within a RAG context, saving you valuable time and effort.
2. AIBots as a Learning Tool:
Even if your use case ultimately requires a custom pipeline, AIBots serves as an excellent learning tool. By experimenting with different bot configurations, prompts, and knowledge bases, you gain valuable insights into the nuances of RAG. This hands-on experience will inform your custom pipeline development, leading to more effective and robust solutions.
3. WOG Central Platform within GCC:
AIBots is a Whole-of-Government (WOG) platform hosted within the Government Commercial Cloud (GCC). This means you don't have to worry about setting up infrastructure, managing security, or navigating bureaucratic hurdles. It's a ready-to-use platform, allowing you to focus on exploring RAG and building your chatbot.
4. Understanding Limitations:
By using AIBots, you can quickly identify the limitations of your documents within a RAG context. This includes understanding how well the LLM can extract information, potential biases in the data, and the impact of document structure on response accuracy. These insights are invaluable when designing your custom pipeline, allowing you to address potential challenges upfront.
5. Community and Support:
AIBots platform has a vibrant community of users within WOG. This provides a valuable resource for sharing knowledge, best practices, and troubleshooting tips. Additionally, the AIBots team offers support and guidance, ensuring a smooth learning experience.
In conclusion, while mastering LangChain and building custom RAG pipelines is a valuable skill, understanding the role of AIBots within the RAG ecosystem is equally important.
It offers a quick and easy way to test your use cases, learn the fundamentals of RAG, and leverage a secure, readily available platform within the GCC.
icon: LiWrench
✦ As there are so many ways to tune our RAG pipelines, how would we know which of the changes actually lead to better performance?
✦ Ragas is one of the frameworks designed to assess RAG-based applications.
✦ What’s interesting about Ragas is that it started out as a framework for “reference-free” evaluation. That means, instead of having to rely on human-annotated ground truth labels in the evaluation dataset, Ragas leverages LLMs under the hood to conduct the evaluations.
✦ To evaluate the RAG pipeline, Ragas expects the following information:
question
: The user query that is the input of the RAG pipeline.answer
: The generated answer from the RAG pipeline. contexts
: The contexts retrieved from the external knowledge source used to answer the question
.ground_truths
: The ground truth answer to the question
. This is the only human-annotated information. This information is only required for some of the matrices.✦ Leveraging LLMs for reference-free evaluation is an active research topic.
✦ Note that the framework has expanded to provide metrics and paradigms that require ground truth labels (e.g., context_recall
and answer_correctness
)
✦ Additionally, the framework provides you with tooling for automatic test data generation.
Ragas provides you with a few metrics to evaluate a RAG pipeline component-wise as well as from end-to-end.
On a component level, Ragas provides you with metrics to evaluate the retrieval component (context_relevancy
and context_recall
) and the generative component (faithfulness
and answer_relevancy
) separately.
Most (if not all of) metrics are scaled to the range between 0 and 1, with higher values indicating a better performance.
Ragas also provides you with metrics to evaluate the RAG pipeline end-to-end, such as answer semantic similarity and answer correctness.
pip install ragas
from datasets import Dataset
import os
from ragas import evaluate
from ragas.metrics import faithfulness, answer_correctness
os.environ["OPENAI_API_KEY"] = "your-openai-key"
data_samples = {
'question': ['When was the first super bowl?', 'Who won the most super bowls?'],
'answer': ['The first superbowl was held on Jan 15, 1967', 'The most super bowls have been won by The New England Patriots'],
'contexts' : [['The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles,'],
['The Green Bay Packers...Green Bay, Wisconsin.','The Packers compete...Football Conference']],
'ground_truth': ['The first superbowl was held on January 15, 1967', 'The New England Patriots have won the Super Bowl a record six times']
}
dataset = Dataset.from_dict(data_samples)
score = evaluate(dataset,metrics=[faithfulness,answer_correctness])
score.to_pandas()
Visit the documentation here: Introduction | Ragas
icon: LiLampDesk
The Retrieval-Augmented Generation (RAG) Playbook is a comprehensive guide designed to help developers, particularly in Whole of Government (WOG), navigate the complexities of building and deploying RAG systems.
This playbook offers practical advice on constructing RAG applications, from no-code/low-code solutions to custom pipelines using open-source frameworks. It also provides metrics for evaluating RAG systems and includes experiments on realistic government use cases to demonstrate how to iteratively improve RAG performance.
As RAG technology evolves, this playbook serves as a foundational resource for understanding and leveraging its capabilities effectively.
RAG is only as good as the retrieved documents’ relevance and quality. Fortunately, an emerging set of techniques can be employed to design and improve RAG systems. The guide has focused on grouping and summarizing many of these techniques (see below above) and will share some high-level strategic guidance in the guide. Developers or builders can and should experiment with using different pieces together.
icon: LiWrench
✦ The “Retrieval” step is key since it directly improves the context that the LLM has when generating a response.
✦ The methods we will cover below are:
✦ Consider that we've developed a RAG system designed to identify potential diseases based on the symptoms entered during a consultation. If we're working with a Naive RAG, it's possible that it might only identify diseases sharing one or two symptoms, which could somewhat show that our application is not useful or even unusable.
✦ This scenario is perfectly suited for employing the Parent-Child Index Retrieval method.
✦ However, there's a minor issue with this approach:
The above points are illustrated in the subsequent image:
✦ The dilemma seems inescapable:
✦ This is where the Parent-child index retrieval method comes into play, promising to improve our approach.
✦ To bring this concept into practical application, a step-by-step explanation is most effective:
The process described is visually represented in the following image:
✦ To better understand this method, consider the following image that illustrates how it operates:
✦ Implementing this might sound daunting due to the need to establish a new database for the smaller chunks, maintain the parent chunks in memory, and track the relationship between parent and child chunks. Fortunately, LangChain
simplifies this process significantly, making it straightforward to set up.
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
# Some code for loading the documents are obmitted
# ...
parent_docs = documents
# Embedding Model
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Splitters
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=800)
# Stores
store = InMemoryStore()
vectorstore = Chroma(embedding_function=embeddings, collection_name="fullDoc", persist_directory="./JohnWick_db_parentsRD")
parent_document_retriever = ParentDocumentRetriever(
vectorstore=vectorstore,
docstore=store,
child_splitter=child_splitter,
parent_splitter =parent_splitter
)
print(f"Number of parent chunks is: {len(list(store.yield_keys()))}")
print(f"Number of child chunks is: {len(parent_document_retriever.vectorstore.get()['ids'])}")
'''
Number of parent chunks is: 75
Number of child chunks is: 3701
'''
Once we have our Parent Document Retriever, we just need to create our RAG based on this retriever and that would be it.
setup_and_retrieval = RunnableParallel({"question": RunnablePassthrough(), "context": parent_document_retriever })
output_parser = StrOutputParser()
parent_retrieval_chain = setup_and_retrieval | rag_prompt | chat_model | output_parser
LangChain
Documentation: Parent Document Retriever | 🦜️🔗 LangChain
✦ This approach can be understood as the reversal of Parent-Child Index Retrieval that we just discussed above. It is also a more intelligent method as it takes into consideration the "semantic meaning of the child chunks" and groups semantically-similar child chunks together.
✦ RAPTOR is one of the hierarchical approach introduced by Stanford researchers.
✦ Based on user query, the summary document is retrieved and then relevant chunks are retrieved from that document.
# Intallation
!git clone https://github.com/parthsarthi03/raptor.git
!cd raptor
!pip install -r requirements.txt
# Setting Up
import os
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
from raptor import RetrievalAugmentation
RA = RetrievalAugmentation()
# Adding Documents
with open('sample.txt', 'r') as file:
text = file.read()
RA.add_documents(text)
✦ Its main feature is that it is capable of performing searches in the vector store, applying filters based on the metadata. This approach is allegedly one of the most optimal methods to improve the efficiency of the retriever.
✦ We know that when we apply a “Naive retrieval”, we are calculating the similarity of all the chunks of the vector database with the query.
✦ Let’s look at a use case to fully understand when to apply this type of retreival.
✦ This case is ideal for applying Self Query Retriever.
This technique can be summarized in two very specific steps:
✦ The objective of the step called “Query Constructor” is to create the appropriate query and filters according to the user input.
✦ Who is in charge of applying the corresponding filters and how do you know what they are? For this we are going to use an LLM.
✦ The output generated by the LLM cannot be directly entered into the database.
LangChain
has specific database translators for almost all of the databases.✦ From the previous image, we see that everything begins with the user’s query.
✦ It is very important to provide the LLM with a detailed description of the metadata available in the vector store. This is shown through the following piece of code:
from langchain_chroma import Chroma
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
docs = [
Document(
page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"},
),
Document(
page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},
),
Document(
page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},
),
Document(
page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},
),
Document(
page_content="Toys come alive and have a blast doing so",
metadata={"year": 1995, "genre": "animated"},
),
Document(
page_content="Three men walk into the Zone, three men walk out of the Zone",
metadata={
"year": 1979,
"director": "Andrei Tarkovsky",
"genre": "thriller",
"rating": 9.9,
},
),
]
vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())
from langchain.chains.query_constructor.base import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.retrievers.self_query.chroma import ChromaTranslator
from langchain_openai import ChatOpenAI
metadata_field_info = [
AttributeInfo(
name="genre",
description="The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'action', 'animated']",
type="string",
),
AttributeInfo(
name="year",
description="The year the movie was released",
type="integer",
),
AttributeInfo(
name="director",
description="The name of the movie director",
type="string",
),
AttributeInfo(
name="rating", description="A 1-10 rating for the movie", type="float"
),
]
document_content_description = "Brief summary of a movie"
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
chat_model = ChatOpenAI()
self_query_retriever = SelfQueryRetriever.from_llm(
llm=ChatOpenAI(temperature=0),
vectorstore =vectordb,
document_contents = document_content_desription,
metadata_field_info =metadata_field_info,
verbose = True,
structured_query_translator = ChromaTranslator()
)
LangChain Documentation: Self-querying | 🦜️🔗 LangChain
This note is not intended to exhaustively cover all techniques or methods available for improving Retrieval-Augmented Generation (RAG) processes.
icon: LiWrench
✦ Re-ranking is a process of ordering the retrieved context chunks in the final prompt based on its score and relevancy.
✦ This is important as researchers found better performance when the most relevant context is positioned at the start of the prompt.
✦ **The technique consists of two very different steps:
We can notice that each new query, the similarity of the query with each of the documents needs to be calculated.
from langchain_cohere import CohereRerank
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
os.environ["COHERE_API_KEY"] = "YOUR API KEY FROM COHERE"
compressor = CohereRerank(top_n=3)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor,
base_retriever=naive_retriever
)
Let’s see a comparison between a Naive Retriever (e.g., distance between embeddings) and a Reranking Retriever
ContextualCompressionRetriever
from LangChain
library to improve the quality of retrieved documents by compressing them.from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_openai import OpenAI
llm = OpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
compressed_docs = compression_retriever.invoke(
"Why do LLMs hallucinate?"
)
pretty_print_docs(compressed_docs)
LangChain Documentation: Contextual compression | 🦜️🔗 LangChain
# Install the package
!pip install llmlingua
from llmlingua import PromptCompressor
llm_lingua = PromptCompressor()
compressed_prompt = llm_lingua.compress_prompt(prompt, instruction="", question="", target_token=200)
This note is not intended to exhaustively cover all techniques or methods available for improving Retrieval-Augmented Generation (RAG) processes.
number headings: first-level 1, max 6, 1.1
icon: LiNotebook
The basic or "Vanilla" RAG, also known as Naive RAG, exhibits several limitations, particularly when applied to complex use cases or in the development of production-ready applications.
As we saw in the previous topic, building an RAG prototype is relatively easy – investing around 20% of the effort yields an application with 80% performance. However, achieving a further 20% performance improvement requires the remaining 80% of the effort.
Below are some key reasons why Naive RAG may not always deliver the most effective and optimized outcomes.
To overcome these limitations of naive RAG, there are two aspects that are essential:
RAG is only as good as the retrieved documents’ relevance and quality. Fortunately, an emerging set of techniques can be employed to design and improve RAG systems.
The improvement of RAG is not just a matter of incremental updates, by installing newer Python package or calling any functions out-of-the-box, but many of them involves a comprehensive rethinking of its architecture and processes.
We can group the various improvements under 3 major categories:
You might also be interested in the GovTech playbook included in 6. Further Readings - WOG RAG Playbook, where the results of different techniques have been experimented on two specific use cases. This playbook can serve as a general reference point for starting your own experiments, particularly for techniques that have shown the greatest improvement in accuracy and the ability of the RAG pipeline.
Evaluation of RAG systems is essential to benchmark the overall performance of RAG output.
To evaluate RAG we can use metrics like:
These metrics provide a structured way to assess the quality of the generated answers and the relevance of the information retrieved by the system.
Enter RAGAS, a framework specifically designed for this purpose.
We will go into the details of RAG evaluation in 5. RAG Evaluation
icon: LiWrench
Pre-Retrieval Processes
because the "construction" or "enhancement" of the query are somethings that happen before the retrieval process.✦ As we have already seen in Naive RAG that chunks are nothing but the small parts of whole document and indexing is vector representation of these chunks which we store in Vector DB.
✦ We quote a paragraph from GovTech RAG Playbook that perfectly sums up the challenges of finding the right balance between the chunk size and the accuracy of the RAG pipeline. We included the RAG Playbook under the "Further Readings" for Topic 5.
While it is possible to obtain an embedding for a document as long as it fits into the embedding model’s context length, embedding an entire document is not always an optimal strategy. It is common to segment documents into chunks and to specify an overlap size between chunks.
Both of these parameters can help to facilitate the flow of context from one chunk to another, and the optimal chunk and overlap size to use is corpus specific. Embedding a single sentence focuses on its specific meaning but forgoes the broader context in the surrounding text. Embedding an entire body of text focuses on the overall meaning but may dilute the significance of individual sentences or phrases.
Generally, longer and more complex queries benefit from smaller chunk sizes while shorter and simpler queries may not require chunking.
Source: GovTech RAG Playbook
✦ While fixed-size chunking offers a straightforward approach, it often leads to context fragmentation, hindering the retrieval of accurate information.
✦ Also known as recursive structure aware chunking, content based chunking which can keep the context and format of text or the specific file types, such as HTML, PDF, Markdown, JSON.
✦ Simply put, using the right or suitable document splitter method for the use case will help us to derive chunks that are tailored to the specific file formats that we are dealing with.
<h1>
), paragraphs (<p>
), and tables (<table>
), enabling custom processing based on element types.CharacterTextSplitter
. ✦ Langchain
supports many of the commonly used file types. Refer to the table below:
text splitters
offered by Langchain.
Name | Splits On | Description |
---|---|---|
Recursive | A list of user defined characters | Recursively splits text. Splitting text recursively serves the purpose of trying to keep related pieces of text next to each other. This is the recommended way to start splitting text. |
HTML | HTML specific characters | Splits text based on HTML-specific characters. Notably, this adds in relevant information about where that chunk came from (based on the HTML) |
Markdown | Markdown specific characters | Splits text based on Markdown-specific characters. Notably, this adds in relevant information about where that chunk came from (based on the Markdown) |
Code | Code (Python, JS) specific characters | Splits text based on characters specific to coding languages. 15 different languages are available to choose from. |
Chunkviz
utility. ✦ Semantic chunking is one the more sophisticated chunking methods.
✦ The easiest way to take advantage of this cutting-edge chunking approach is to use Langchain's experimental module:
!pip install --quiet langchain_experimental langchain_openai
# Load Example Data
# This is a long document we can split up.
with open("../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
# Create Text Splitter
from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai.embeddings import OpenAIEmbeddings
# That's it. It is this simple.
text_splitter = SemanticChunker(OpenAIEmbeddings())
# Spit Text
docs = text_splitter.create_documents([state_of_the_union])
print(docs[0].page_content)
✦ Query transformation is a method of improving quality of user query by restructuring it to improve retrieval quality.
✦ It includes techniques like:
# The main part is a rewriter to rewrite the query
prompt = """Provide a better search query for \
web search engine to answer the given question.
Question: {user_query}
"""
✦ If the query is complex and has multiple context then, retrieval with a single query may not be the good approach as it may fail to get the proper output you want.
✦ In LangChain, we can use MultiQueryRetriever for implementation of this technique. The MultiQueryRetriever
automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query.
MultiQueryRetriever
might be able to overcome some of the limitations of the distance-based retrieval and get a richer set of results.MultiQueryRetriever
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Load blog post
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()
# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(data)
# VectorDB
embedding = OpenAIEmbeddings()
vectordb = Chroma.from_documents(documents=splits, embedding=embedding)
# This is the Core Part of the Code
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI
question = "What are the approaches to Task Decomposition?"
llm = ChatOpenAI(temperature=0)
retriever_from_llm = MultiQueryRetriever.from_llm(
retriever=vectordb.as_retriever(), llm=llm
)
template="""You are an AI language model assistant. Your task is to generate five
different versions of the given user question to retrieve relevant documents from a vector
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search.
Provide these alternative questions separated by newlines.
Original question: {question}"""
✦ When we are having multiple vector stores / databases or various actions to perform on user query based on its context, then routing the user query in the right direction is very important for relevant retrieval and further generation.
✦ Using specific prompt and output parsers, we can use an LLM call to decide which action to perform or where to route the user query.
✦ If you're keen to use any frameworks, you can use prompt chaining or custom Agents to implement query routing in LangChain or LlamaIndex.
This note is not intended to exhaustively cover all techniques or methods available for improving Retrieval-Augmented Generation (RAG) processes.
icon: LiWrench
CrewAI is an open-source framework designed to orchestrate and coordinate teams of autonomous AI agents, similar to Autogen. Think of it as a way to assemble and manage a group of AI assistants that collaborate to achieve a shared objective, much like a crew on a ship or a project team.
Here are some essential aspects of CrewAI:
1. Building a Smart Assistant Platform: CrewAI can be leveraged to develop a team of agents capable of managing various tasks, such as scheduling appointments, arranging travel, and responding to user inquiries. This creates a comprehensive smart assistant that streamlines everyday activities.
2. Creating an Automated Customer Service System: With CrewAI, you can assemble a team of agents dedicated to handling customer inquiries, resolving issues, and providing support. This automated system enhances customer experience by ensuring timely and efficient responses.
3. Developing a Multi-Agent Research Team: CrewAI can facilitate the formation of a collaborative research team composed of agents that work together on projects. They can analyze data, generate hypotheses, and test ideas, making the research process more efficient and effective.
The CrewAI workflow process typically involves the following steps:
1. Agents: In this initial phase, you define the capabilities of your CrewAI workflow by specifying the agents involved. This includes outlining their roles and the skills they should possess, effectively determining who does what within the team.
2. Tasks: Next, you establish the specific objectives you want your agents to achieve. This step is crucial for guiding the agents toward accomplishing the desired outcomes.
3. Process: Here, you outline how CrewAI will utilize the defined agents and tasks to meet the overarching goals of your project. This involves mapping out the interactions and workflows that will drive the collaboration.
4. Run: Finally, you initiate the execution of your agents and tasks. Once the run is underway, assuming everything goes smoothly, CrewAI will generate results aimed at solving the stated objectives. This step marks the transition from planning to action, bringing your workflow to life.
✦ Focus
✦ Tools
✦ Memory
A tool in CrewAI is a skill or function that agents can utilize to perform various actions.
Tools are pivotal in extending the capabilities of CrewAI agents, enabling them to undertake a broad spectrum of tasks and collaborate effectively. When building solutions with CrewAI, leverage both custom and existing tools to empower your agents and enhance the AI ecosystem
Here are the primary distinctions:
Purpose of Tools for Agents vs. Tasks
Scope and Context of Tool Usage
Control Over Execution
Tool Management and Overlap
CrewAI's ability to support not only its native tools but also third-party tools from LangChain
and LlamaIndex
offers significant advantages.
✦ This flexibility allows users to leverage a broader range of functionalities and integrations, enhancing the overall versatility and capability of the platform.
✦ Developers are not confined to the tools provided by CrewAI alone; they can seamlessly integrate and utilize the best tools available in the market, tailored to their specific needs.
walkthrough
notebook, we have tried a more advanced example that uses toolkits
(a suit of tools) from LangChain to create a tool that can manipulate and analyze tabular data by actually running Python code.
✦ This tool uses the pandas
library to manipulate the data and the ChatOpenAI
agent to run the code.
✦ While the example is a bit more complex, but we think it's worth to include it because the simplier examples (using a single tool
from LangChain) are already well documented in CrewAI's documentation.
✦ The toolkits
are usually much more powerful and can be used to achieve more complex tasks, but we have yet to come across a comprehensive documentation on how to incorporate them into CrewAI's agents.
from langchain.agents import Tool
from langchain.agents.agent_types import AgentType
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain_openai import ChatOpenAI
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv")
pandas_tool_agent = create_pandas_dataframe_agent(
llm=ChatOpenAI(temperature=0, model='gpt-4o-mini'),
df=df,
agent_type=AgentType.OPENAI_FUNCTIONS,
allow_dangerous_code=True # <-- This is an "acknowledgement" that this can run potentially dangerous code
)
# Create the tool
pandas_tool = Tool(
name="Manipulate and Analyze tabular data with Code",
func=pandas_tool_agent.invoke, # <-- This is the function that will be called when the tool is run. Note that there is no `()` at the end
description="Useful for search-based queries",
)
LangChain
tools in CrewAI, see https://docs.crewai.com/core-concepts/Using-LangChain-Tools/LlamaIndex
tools in CrewAI, see https://docs.crewai.com/core-concepts/Using-LlamaIndex-Toolsfor more info about
Tools
, such as the list of tools or how to create your own tool, see https://docs.crewai.com/core-concepts/Tools/#introduction
icon: LiNotebook
Many users of ChatGPT quickly realize that the default workflow for large language models (LLMs) has its limitations, especially as task complexity increases. Even when employing optimal prompt engineering strategies, prompts can become excessively lengthy, leading to a higher likelihood that the LLM will misinterpret or overlook critical instructions.
✦ A common workaround is to iteratively refine the chatbot's responses through additional prompting; however, this method can be labor-intensive and may cause the LLM to become trapped by previous inaccuracies within the chat context.
✦ Moreover, real-world applications often necessitate the integration of various tools, such as internet searches, access to relevant internal documents through Retrieval Augmented Generation (RAG), mathematical computations, coding capabilities, and safety measures to protect sensitive data.
The shift towards agents is about creating AI systems that can truly understand, learn, and solve problems in the real world.
While LLMs and RAG models have pushed the boundaries of what’s possible with language generation, the development of AI agents represents a step towards more intelligent, autonomous, and multi-capable systems that can work alongside humans in a wider variety of scenarios.
Multi-agent system also often known as agentic system.
Figure below gives a good illustration into the differences between a typical LLM workflow and agentic workflow.
Many believe that the AI Agents is going to be the future of AI.
What I'm seeing with AI agents which I think is the exciting Trend that I think everyone building an AI should pay attention.
Andrew Ng (Creator of Google Brain)
AI field is headed towards self contained autonomous agents & it won't be single agent, it will be many agents working together
Andrej Karpathy (co-founder of Open AI)
Developer becomes the user and so we're evolving toward any user being able to create its own autonomous agent. I'm pretty sure that in 5 years from now this will be like something that you learn to do at school
Arthur Mensch (CEO Mistral AI)
“So this is just GPT-4 with RAG?” or “Isn’t this the same as chaining together a couple of prompts?”
There are several key reasons why AI Agents perform better than one LLM:
✦ Goal-oriented behavior:
✦ Interaction with the environment:
✦ Memory and state tracking:
✦ Multi-task capability:
✦ Improved Accuracy
Imagine you need to book a complex trip:
✦ LLM: Could explain different places to visit or give general travel tips.
✦ RAG: Could find relevant blogs and articles about destinations
✦ AI Agent: Could do all that, PLUS:
Now let's see what are the key differences based on this simple example:
1. Task Orientation vs. General Knowledge
2. Multi-Step Reasoning
3. Proactivity
4. Integration with Existing Systems
We have discussed
Prompt Chaining
in 4. Prompts Chaining - Chaining Together Multiple Prompts
A single AI agent’s architecture encompasses the essential components that empower it to think, plan, and act within its environment. This sophisticated design typically includes:
Tools
Memory
Planning
Together, these elements create an intelligent system that can autonomously solve problems. An AI agent can analyze an issue, devise a step-by-step plan, and confidently execute it, making it a transformative force in the world of artificial intelligence. Below is one example of a more detailed architecture of an AI Agent system.
However, the development and implementation of multi-agent systems come with their own set of challenges and risks.
Notably, the increased complexity of Agentic systems often results in longer response times and higher API costs, which could be a significant drawback for various applications.
Fortunately, there are promising advancements on the horizon aimed at mitigating these issues. These include the emergence of smaller, specialized, and faster models, reduced API costs per token, and innovative hardware solutions like language processing units (LPUs) from companies such as Groq, which offer remarkable improvements in inference speed. As the field continues to evolve, it will be interesting to see what additional hardware advancements emerge to address these challenges.
A more significant problem with AI agents is that LLMs are non-deterministic.
To address this challenge, we can create a process to iteratively reflect and refine the execution plan based on past actions and observations. The goal is to correct and improve on past mistakes which helps to improve the quality of final results.
Here are three criteria to determine whether you might need an agent:
✦ Does your application follow an iterative flow based on incoming data?
✦ Does your application need to adapt and follow different flows based on previously taken actions or feedback along the way?
✦ Is there a state space of actions that can be taken?
AutoGen is an open-source framework developed by Microsoft, designed to facilitate multi-agent collaboration through conversational agents. It excels in enabling agents to work together on complex tasks by leveraging large language models (LLMs).
It supports diverse conversation patterns with conversable agents that integrate large language models (LLMs), tools, and human inputs. It provides a collection of working systems with different complexities. These systems span a wide range of applications from various domains and complexities. This demonstrates how AutoGen can easily support diverse conversation patterns.
AutoGen’s flexibility allows for the creation of complex workflows and problem-solving scenarios, making it particularly attractive for developers and researchers looking to push the boundaries of AI agent capabilities.
CrewAI is another open-source framework that emphasizes structured workflows and role-based task automation within a collaborative environment.
CrewAI adopts a different strategy by providing a structured platform for the creation and management of AI agents. This framework enables users to define agents with specific roles, objectives, and narratives, promoting a role-playing approach to task automation.
Built on LangChain, CrewAI takes advantage of a comprehensive ecosystem of tools and integrations, making it accessible to a wider audience, including business users who may lack extensive technical knowledge.
CrewAI takes a more accessible approach, offering a user-friendly interface that reduces the need for extensive coding.
LangGraph is a framework that focuses on creating graph-based multi-agent systems. It is designed to handle complex interactions and dependencies between agents.
LangGraph utilizes a graph structure to manage agent interactions and dependencies. The framework focuses on scalability, allowing it to efficiently handles large-scale multi-agent systems.
✦ Core Focus: AutoGen emphasizes multi-agent conversations and LLM inference, CrewAI focuses on structured workflows and role-based task automation, while LangGraph leverages a graph-based architecture for managing complex interactions.
✦ Customization: AutoGen offers extensive customization options for developers, CrewAI provides a user-friendly approach accessible to those with limited technical expertise, and LangGraph allows for highly specialized agent creation.
✦ Scalability: LangGraph excels in handling large-scale systems, while AutoGen and CrewAI are more suited for smaller to medium-sized applications.
LangChain
and Llama Index
tools. This flexibility means that we are not limited to the tools that CrewAI comes with, but can also leverage a diverse array of tools from other packages.McKinsey’s most recent “State of AI” survey found that more than 72 percent of companies surveyed are deploying AI solutions, with a growing interest in GenAI. Given that activity, it would not be surprising to see companies begin to incorporate frontier technologies such as agents into their planning processes and future AI road maps. Agent-driven automation remains an exciting proposition, with the potential to revolutionize whole industries, bringing a new speed of action to work. That said, the technology is still in its early stages, and there is much development required before its full capabilities can be realized.
icon: LiWrench
After getting familiar with Jupyter Notebook, especially Google Colab that is hosted remotely on a server, you would realize that it is very dangerous to specify our API key in the notebook or script.
What we have been using until this point is to rely on the getpass()
function to allow users (in fact us), to input the API Key
and store the value into a variable, shown below.
from openai import OpenAI
from getpass import getpass
openai_key = getpass("Enter your API Key:")
client = OpenAI(api_key=openai_key)
This method helps keep the key secure by not hardcoding it into the script, which could be accidentally shared or exposed.
While the getpass()
method helps keep the key secure by not hardcoding it into the script which which could be accidentally shared or exposed, this method is not suitable for scenarios where the Python script or application needs to run autonomously, without human interaction, such as:
getpass()
impractical.When building an application, the app may require access to a variety of APIs and other services, such as Google Sheet, AWS account, or Telegram messages. All these access would require some forms of credentials (i.e., username and password pair, API key).
Think of an environment variable as a special, secure place on your computer or server where you can store these credentials Your Python scripts or applications can access the credentials, such as the OpenAI API key, when they need to access the services, but the credentials aren't visible to anyone just looking through the code.
One way to set the environment variable is through a configuration file (.env).
.env
File?.env
file is a simple text file used to store configuration settings, environment variables, and other key-value pairs related to a Python project. .env
file, which is loaded into the project's environment during runtime. .env
files in Python ensures secure management of sensitive information and allows for flexible configuration across different environments. .env
File?.env
file. .env
File:.env
File:
.env
at the root level. .env
file, for exampe: KEY="<my_OpenAI_Key>"
python-dotenv
library using the following command: pip install python-dotenv
.env
File in Your Python Code:
dotenv
module in your Python code. load_dotenv()
to load variables from the .env
file. os.getenv("KEY")
for each key-value pair defined in the .env
file. Example: ```Python
import os
from dotenv import load_dotenv
load_dotenv()
print(os.getenv("KEY"))
```
icon: LiWrench
A Python script is a file containing Python code that is intended to be directly executed.
Jupyter Notebooks
, which allow for an interactive coding experience with immediate feedback for each cell, Python scripts are run from start to finish by the Python interpreter. ✦ This is a quick way to check if your Python has been installed correctly
✦ Open your terminal (Command Prompt on Windows, Terminal on macOS and Linux) and type:
python --version
✦ This command should return the version of Python installed.
✦ Open up your Visual Studio Code
✦ Let's create a simple script that prints "Hello, World!".
hello_world
. Note that Python Scripts must have the .py
extension. print("Hello, World!")
✦ If you can see the output "Hello World" being printed out in the "Terminal", that's good news. That means that your Visual Studio and Python are configured properly and you're good to start with writing your python code.
A well-structured Python script not only makes your code more readable and maintainable but also adheres to the conventions that Python developers expect. This section will guide you through the essential components and good practices for structuring your Python scripts.
All import statements should be at the top of the file.
import os
import sys
import requests
Define global variables after the import statements. These are variables that are meant to be used throughout the script.
MEASUREMENT_UNIT = "cm"
Next, define your functions and classes. Each should have a descriptive docstring
(see the first line of string enclosed in triple quotes) explaining what it does. Keep related functions and classes close to each other in the code.
def calculate_area(length, width):
"""Calculate and return the area of a rectangle."""
return length * width
It's a good practice to encapsulate the script's main functionality in a function, often named main()
. This function will be called when the script is executed directly.
def main():
"""Main function of the script."""
length = float(input("Enter the length: "))
width = float(input("Enter the width: "))
area = calculate_area(length, width)
print(f"The area of the rectangle is: {area} {MEASUREMENT_UNIT}")
if __name__ == "__main__":
Statementif __name__ == "__main__":
statement to check whether the script is being run directly or imported as a module. True
, and you can call the main()
function or any other code you want to execute.if __name__ == "__main__":
# if the script is run directly
# e.g. python myscript.py
# Then the main() function will be called
main()
This is how the complete script looks like:
import os
import sys
import requests
MEASUREMENT_UNIT = "cm"
def calculate_area(length, width):
"""Calculate and return the area of a rectangle."""
return length * width
def main():
"""Main function of the script."""
length = float(input("Enter the length in cm: "))
width = float(input("Enter the width in cm: "))
area = calculate_area(length, width)
print(f"The area of the rectangle is: {area} {MEASUREMENT_UNIT}")
if __name__ == "__main__":
main()
#
) are crucial for making your code understandable to others and your future self.snake_case
for variable and function names.try
and except
blocks to handle potential errors in your scripts.icon: RiCodeBoxLine
This week, we will take a slightly different approach.
Instead of overwhelming you with the concepts and intricate details of CrewAI, we will explore the walkthrough notebook titled "Getting Started with Multi-Agent Systems using CrewAI."
We believe that by engaging with this hands-on experience first, you will be better equipped to appreciate and relate the theoretical concepts and details when you revisit them later.
However, please ensure you have completed 2. A More Secure way to Store Credentials and 3. Writing & Running Python Scripts first. This is especially true, if you are not familiar with these topics.
✦ This notebook contains pre-populated code for you to execute cell by cell.
✦ Here's how to use this notebook:
Part 2
of the Notebook with Your Own Code✦ While there is no submission required, we encourage you to share your solutions with your peers by pasting your link into the Sharing Board.
Feedback: By sharing your solutions, you can get insights, suggestions, and constructive criticism from your peers. This feedback can help you improve your approach and learn from others’ perspectives.
Learning from Peers: Since everyone may have different ways of solving problems, participating in these sessions allows you to see various approaches. You can learn alternative methods, explore different techniques, and gain a deeper understanding of the challenges.
✦ URL: https://miro.com/app/board/uXjVKojBjec=/?share_link_id=989058465513
✦ Passcode: abc-2024