Towards AI Agents

1 Overview



1.1 Limitations of LLMs

Many users of ChatGPT quickly realize that the default workflow for large language models (LLMs) has its limitations, especially as task complexity increases. Even when employing optimal prompt engineering strategies, prompts can become excessively lengthy, leading to a higher likelihood that the LLM will misinterpret or overlook critical instructions.

  • ✦ A common workaround is to iteratively refine the chatbot's responses through additional prompting; however, this method can be labor-intensive and may cause the LLM to become trapped by previous inaccuracies within the chat context.

  • ✦ Moreover, real-world applications often necessitate the integration of various tools, such as internet searches, access to relevant internal documents through Retrieval Augmented Generation (RAG), mathematical computations, coding capabilities, and safety measures to protect sensitive data.



1.2 The Rise of Multi-Agent Systems

The shift towards agents is about creating AI systems that can truly understand, learn, and solve problems in the real world.

While LLMs and RAG models have pushed the boundaries of what’s possible with language generation, the development of AI agents represents a step towards more intelligent, autonomous, and multi-capable systems that can work alongside humans in a wider variety of scenarios.

Multi-agent system also often known as agentic system.


Figure below gives a good illustration into the differences between a typical LLM workflow and agentic workflow.

Many believe that the AI Agents is going to be the future of AI.

What I'm seeing with AI agents which I think is the exciting Trend that I think everyone building an AI should pay attention.
Andrew Ng (Creator of Google Brain)

AI field is headed towards self contained autonomous agents & it won't be single agent, it will be many agents working together
Andrej Karpathy (co-founder of Open AI)

Developer becomes the user and so we're evolving toward any user being able to create its own autonomous agent. I'm pretty sure that in 5 years from now this will be like something that you learn to do at school
Arthur Mensch (CEO Mistral AI)



1.3 Why do we need AI Agent, when we have LLM & RAG?

“So this is just GPT-4 with RAG?” or “Isn’t this the same as chaining together a couple of prompts?”

There are several key reasons why AI Agents perform better than one LLM:

  • Goal-oriented behavior:

    • LLMs and RAG models are primarily focused on generating human-like text based on patterns in their training data.
    • However, they lack the ability to set and pursue specific goals in a flexible, intelligent manner. 
    • AI agents, on the other hand, can be designed to have explicit goals and to plan and take actions to achieve those goals.
  • Interaction with the environment:

    • LLMs operate solely in the text domain, without any direct interaction with the physical world.
    • AI agents can perceive and act upon their environment, whether that is the digital world, robotic systems, or even the physical world through sensors and actuators.
  • Memory and state tracking:

    • Most current language models have no persistent memory or state tracking capabilities. Each input is processed independently.
    • AI agents can maintain an internal state, accumulating knowledge over time and using that state to inform future decisions and actions.
  • Multi-task capability:

    • LLMs are typically specialized for particular language tasks.
    • AI agents can be designed as general, multi-task systems capable of fluidly combining various skills like language, reasoning, perception, and control to tackle complex, multi-faceted problems.
  • Improved Accuracy

    • Last but a strong reason is that using multiple agents can greatly improve the performance of the LLMs.
    • In one of his lecture, Andrew Ng highlighted that an agentic workflow utilizing "simpler" models, such as GPT-3.5, can significantly outperform zero-shot prompting with more advanced models like GPT-4.
    • "GPT-3.5 (zero shot) was 48.1% correct. GPT-4 (zero shot) does better at 67.0%. However, the improvement from GPT-3.5 to GPT-4 is dwarfed by incorporating an iterative agent workflow. Indeed, wrapped in an agent loop, GPT-3.5 achieves up to 95.1%. "
    • Improved accuracy arises from iterations that give agents an opportunity to “fact-check” and “review” their answers which leads to less hallucinations.


1.4 Understand the differences between LLM, RAG & AI Agent

Imagine you need to book a complex trip:

  • LLM: Could explain different places to visit or give general travel tips.

  • RAG: Could find relevant blogs and articles about destinations

  • AI Agent: Could do all that, PLUS:

    • Search for flights and hotels based on your budget
    • Actually make the bookings
    • Add everything to your calendar
    • Send pre-departure reminders with relevant information

Now let's see what are the key differences based on this simple example:

1. Task Orientation vs. General Knowledge

  • LLMs: 
    • Excel at broad language understanding and generation.
    • They’re like massive libraries of information.
  • RAG: 
    • Improves LLMs by finding relevant information. Still, the focus is on knowledge and text generation.
  • AI Agents: 
    • Are built with specific goals in mind. They bridge the gap between understanding language and taking action in the real world or within digital systems.

2. Multi-Step Reasoning

  • LLMs & RAG: 
    • Primarily work on single inputs and provide responses based on that.
  • AI Agents: 
    • Can chain together multiple steps:
      • Retrieve information (like RAG)
      • Process the information to make decisions
      • Take actions like:
        • Sending an email
        • Booking an appointment
        • Controlling smart home devices

3. Proactivity

  • LLMs & RAG: Usually respond to direct prompts.
  • AI Agents: 
    • Can be proactive. They can:
      • Monitor data streams and alert you to critical changes
      • Initiate actions based on your preferences
      • Adapt their behavior over time as they learn about you

4. Integration with Existing Systems

  • LLMs & RAG: 
    • Tend to operate within their own environment.
  • AI Agents: 
    • Are designed to interface with various systems and APIs:
    • Access your email or calendar
    • Interact with databases
    • Control other software or devices
Difference between Agents and Prompt Chaining
  • ✦ The core idea of agents is to use a language model to choose a sequence of actions to take.
  • ✦ In chains or pipeline, a sequence of actions (or prompts) is hardcoded (in code).
  • ✦ In agents, a language model is used as a reasoning engine to determine which actions to take and in which order.

We have discussed Prompt Chaining in 4. Prompts Chaining - Chaining Together Multiple Prompts



2 Overview of the Key Components of an AI Agent

A single AI agent’s architecture encompasses the essential components that empower it to think, plan, and act within its environment. This sophisticated design typically includes:

Tools

  • ✦ The agent learns to call external APIs or tools for extra information/context or capability that might be missing in the model weights (often hard to change after pre-training).
  • ✦ This includes things like current information, mathematical engines, code execution capability, access to proprietary information sources, and many more.

Memory

  • Short-term memory: 
    • In-context learning (See Prompt Engineering) can be thought of as utilizing short-term memory of the model to operate on a given problem. The context length window can be thought of as Short-term memory.
  • Long-term memory: 
    • Providing the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval. The Retrieval part in RAG can be thought of as Long-term memory.

Planning

  • Subgoal & task decomposition: 
    • The agent breaks down larger tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.
  • Reflection and refinement: 
    • The agent can do self-criticism (though doubtful in certain ways) and self-reflection over past actions, learn from mistakes, and refine them for future steps, thus improving the final results.

Together, these elements create an intelligent system that can autonomously solve problems. An AI agent can analyze an issue, devise a step-by-step plan, and confidently execute it, making it a transformative force in the world of artificial intelligence. Below is one example of a more detailed architecture of an AI Agent system.




3 Pitfalls & Challenges of Multi-Agent Systems

However, the development and implementation of multi-agent systems come with their own set of challenges and risks.


3.1 High Token Usage

Notably, the increased complexity of Agentic systems often results in longer response times and higher API costs, which could be a significant drawback for various applications.

  • ✦ An agentic system often makes a large number of calls to answer a single/simple question, accumulating tokens for each query made to LLM.
  • ✦ Not only is this costly, it introduces latency.
  • ✦ Token generation is still a relatively slow process, most (not all) of the latency in LLM-based applications comes from generating output tokens.
  • ✦ Calling an LLM repeatedly and asking it to provide thoughts/observations, we end up generating a lot of output tokens (cost) resulting in high latency (degraded user experience).

Fortunately, there are promising advancements on the horizon aimed at mitigating these issues. These include the emergence of smaller, specialized, and faster models, reduced API costs per token, and innovative hardware solutions like language processing units (LPUs) from companies such as Groq, which offer remarkable improvements in inference speed. As the field continues to evolve, it will be interesting to see what additional hardware advancements emerge to address these challenges.


3.2 Non-Deterministic

A more significant problem with AI agents is that LLMs are non-deterministic.

  • ✦ While beneficial for idea generation, this poses a serious challenge in scenarios requiring predictability.
  • ✦ For instance, if we’re writing an LLM-backed chat application to make SQL queries (Text2SQL), we want high predictability

To address this challenge, we can create a process to iteratively reflect and refine the execution plan based on past actions and observations. The goal is to correct and improve on past mistakes which helps to improve the quality of final results.




4 Do You Actually Need An Agent?

Here are three criteria to determine whether you might need an agent:

  • Does your application follow an iterative flow based on incoming data?

    • If your application processes data in a cyclical manner, where each iteration builds upon the previous one, it may be a strong candidate for an agent-based approach.
    • Agents can effectively manage and respond to new information as it arrives, allowing for continuous improvement and refinement of outputs.
    • This is particularly useful in scenarios like data analysis, where insights evolve as more data is processed.
  • Does your application need to adapt and follow different flows based on previously taken actions or feedback along the way?

    • Applications that require dynamic decision-making based on past interactions or user feedback can greatly benefit from agents.
    • An agent can track the history of actions and outcomes, enabling it to adjust its strategy in real-time.
    • This adaptability is crucial in environments where user preferences or external conditions change frequently
  • Is there a state space of actions that can be taken?

    • If your application involves a complex set of possible actions that can be executed in various sequences, rather than a simple linear pathway, it may require an agent to navigate this state space effectively.
    • Agents can explore multiple pathways and make decisions based on the current state, optimizing for the best outcomes.
    • This is particularly relevant in scenarios like game development, robotics, or any system where multiple strategies can lead to different results.



5 Common Frameworks or Tools for Building Multi-Agent System

5.1 Autogen

AutoGen is an open-source framework developed by Microsoft, designed to facilitate multi-agent collaboration through conversational agents. It excels in enabling agents to work together on complex tasks by leveraging large language models (LLMs).

It supports diverse conversation patterns with conversable agents that integrate large language models (LLMs), tools, and human inputs. It provides a collection of working systems with different complexities. These systems span a wide range of applications from various domains and complexities. This demonstrates how AutoGen can easily support diverse conversation patterns.

AutoGen’s flexibility allows for the creation of complex workflows and problem-solving scenarios, making it particularly attractive for developers and researchers looking to push the boundaries of AI agent capabilities.


5.2 CrewAI

CrewAI is another open-source framework that emphasizes structured workflows and role-based task automation within a collaborative environment.

CrewAI adopts a different strategy by providing a structured platform for the creation and management of AI agents. This framework enables users to define agents with specific roles, objectives, and narratives, promoting a role-playing approach to task automation.

Built on LangChain, CrewAI takes advantage of a comprehensive ecosystem of tools and integrations, making it accessible to a wider audience, including business users who may lack extensive technical knowledge.

CrewAI takes a more accessible approach, offering a user-friendly interface that reduces the need for extensive coding.


5.3 LangGraph

LangGraph is a framework that focuses on creating graph-based multi-agent systems. It is designed to handle complex interactions and dependencies between agents.

LangGraph utilizes a graph structure to manage agent interactions and dependencies. The framework focuses on scalability, allowing it to efficiently handles large-scale multi-agent systems.


5.4 Comparing the Three Frameworks

  • Core Focus: AutoGen emphasizes multi-agent conversations and LLM inference, CrewAI focuses on structured workflows and role-based task automation, while LangGraph leverages a graph-based architecture for managing complex interactions.

  • Customization: AutoGen offers extensive customization options for developers, CrewAI provides a user-friendly approach accessible to those with limited technical expertise, and LangGraph allows for highly specialized agent creation.

  • Scalability: LangGraph excels in handling large-scale systems, while AutoGen and CrewAI are more suited for smaller to medium-sized applications.

This Bootcamp will use CrewAI as the framework for developing Multi-agent Systems
  1. User-friendly and Quick Experimentation: CrewAI offers an intuitive interface that allows users to easily experiment with multi-agent systems without requiring extensive technical knowledge.
  2. Support for a Variety of Tools: CrewAI is compatible with a wide range of tools, including both LangChain and Llama Index tools. This flexibility means that we are not limited to the tools that CrewAI comes with, but can also leverage a diverse array of tools from other packages.
  3. Structured Workflows and Role-based Task Automation: CrewAI facilitates the creation of structured workflows and enables role-based task automation, which seemingly more relevant to wide variety of use cases.



6 Agentic System and the Future

McKinsey’s most recent “State of AI” survey found that more than 72 percent of companies surveyed are deploying AI solutions, with a growing interest in GenAI. Given that activity, it would not be surprising to see companies begin to incorporate frontier technologies such as agents into their planning processes and future AI road maps. Agent-driven automation remains an exciting proposition, with the potential to revolutionize whole industries, bringing a new speed of action to work. That said, the technology is still in its early stages, and there is much development required before its full capabilities can be realized.