icon: LiNotebook
Towards AI Agents
Many users of ChatGPT quickly realize that the default workflow for large language models (LLMs) has its limitations, especially as task complexity increases. Even when employing optimal prompt engineering strategies, prompts can become excessively lengthy, leading to a higher likelihood that the LLM will misinterpret or overlook critical instructions.
✦ A common workaround is to iteratively refine the chatbot's responses through additional prompting; however, this method can be labor-intensive and may cause the LLM to become trapped by previous inaccuracies within the chat context.
✦ Moreover, real-world applications often necessitate the integration of various tools, such as internet searches, access to relevant internal documents through Retrieval Augmented Generation (RAG), mathematical computations, coding capabilities, and safety measures to protect sensitive data.
The shift towards agents is about creating AI systems that can truly understand, learn, and solve problems in the real world.
While LLMs and RAG models have pushed the boundaries of what’s possible with language generation, the development of AI agents represents a step towards more intelligent, autonomous, and multi-capable systems that can work alongside humans in a wider variety of scenarios.
Multi-agent system also often known as agentic system.
Figure below gives a good illustration into the differences between a typical LLM workflow and agentic workflow.
Many believe that the AI Agents is going to be the future of AI.
What I'm seeing with AI agents which I think is the exciting Trend that I think everyone building an AI should pay attention.
Andrew Ng (Creator of Google Brain)
AI field is headed towards self contained autonomous agents & it won't be single agent, it will be many agents working together
Andrej Karpathy (co-founder of Open AI)
Developer becomes the user and so we're evolving toward any user being able to create its own autonomous agent. I'm pretty sure that in 5 years from now this will be like something that you learn to do at school
Arthur Mensch (CEO Mistral AI)
“So this is just GPT-4 with RAG?” or “Isn’t this the same as chaining together a couple of prompts?”
There are several key reasons why AI Agents perform better than one LLM:
✦ Goal-oriented behavior:
✦ Interaction with the environment:
✦ Memory and state tracking:
✦ Multi-task capability:
✦ Improved Accuracy
Imagine you need to book a complex trip:
✦ LLM: Could explain different places to visit or give general travel tips.
✦ RAG: Could find relevant blogs and articles about destinations
✦ AI Agent: Could do all that, PLUS:
Now let's see what are the key differences based on this simple example:
1. Task Orientation vs. General Knowledge
2. Multi-Step Reasoning
3. Proactivity
4. Integration with Existing Systems
We have discussed
Prompt Chaining
in 4. Prompts Chaining - Chaining Together Multiple Prompts
A single AI agent’s architecture encompasses the essential components that empower it to think, plan, and act within its environment. This sophisticated design typically includes:
Tools
Memory
Planning
Together, these elements create an intelligent system that can autonomously solve problems. An AI agent can analyze an issue, devise a step-by-step plan, and confidently execute it, making it a transformative force in the world of artificial intelligence. Below is one example of a more detailed architecture of an AI Agent system.
However, the development and implementation of multi-agent systems come with their own set of challenges and risks.
Notably, the increased complexity of Agentic systems often results in longer response times and higher API costs, which could be a significant drawback for various applications.
Fortunately, there are promising advancements on the horizon aimed at mitigating these issues. These include the emergence of smaller, specialized, and faster models, reduced API costs per token, and innovative hardware solutions like language processing units (LPUs) from companies such as Groq, which offer remarkable improvements in inference speed. As the field continues to evolve, it will be interesting to see what additional hardware advancements emerge to address these challenges.
A more significant problem with AI agents is that LLMs are non-deterministic.
To address this challenge, we can create a process to iteratively reflect and refine the execution plan based on past actions and observations. The goal is to correct and improve on past mistakes which helps to improve the quality of final results.
Here are three criteria to determine whether you might need an agent:
✦ Does your application follow an iterative flow based on incoming data?
✦ Does your application need to adapt and follow different flows based on previously taken actions or feedback along the way?
✦ Is there a state space of actions that can be taken?
AutoGen is an open-source framework developed by Microsoft, designed to facilitate multi-agent collaboration through conversational agents. It excels in enabling agents to work together on complex tasks by leveraging large language models (LLMs).
It supports diverse conversation patterns with conversable agents that integrate large language models (LLMs), tools, and human inputs. It provides a collection of working systems with different complexities. These systems span a wide range of applications from various domains and complexities. This demonstrates how AutoGen can easily support diverse conversation patterns.
AutoGen’s flexibility allows for the creation of complex workflows and problem-solving scenarios, making it particularly attractive for developers and researchers looking to push the boundaries of AI agent capabilities.
CrewAI is another open-source framework that emphasizes structured workflows and role-based task automation within a collaborative environment.
CrewAI adopts a different strategy by providing a structured platform for the creation and management of AI agents. This framework enables users to define agents with specific roles, objectives, and narratives, promoting a role-playing approach to task automation.
Built on LangChain, CrewAI takes advantage of a comprehensive ecosystem of tools and integrations, making it accessible to a wider audience, including business users who may lack extensive technical knowledge.
CrewAI takes a more accessible approach, offering a user-friendly interface that reduces the need for extensive coding.
LangGraph is a framework that focuses on creating graph-based multi-agent systems. It is designed to handle complex interactions and dependencies between agents.
LangGraph utilizes a graph structure to manage agent interactions and dependencies. The framework focuses on scalability, allowing it to efficiently handles large-scale multi-agent systems.
✦ Core Focus: AutoGen emphasizes multi-agent conversations and LLM inference, CrewAI focuses on structured workflows and role-based task automation, while LangGraph leverages a graph-based architecture for managing complex interactions.
✦ Customization: AutoGen offers extensive customization options for developers, CrewAI provides a user-friendly approach accessible to those with limited technical expertise, and LangGraph allows for highly specialized agent creation.
✦ Scalability: LangGraph excels in handling large-scale systems, while AutoGen and CrewAI are more suited for smaller to medium-sized applications.
LangChain
and Llama Index
tools. This flexibility means that we are not limited to the tools that CrewAI comes with, but can also leverage a diverse array of tools from other packages.McKinsey’s most recent “State of AI” survey found that more than 72 percent of companies surveyed are deploying AI solutions, with a growing interest in GenAI. Given that activity, it would not be surprising to see companies begin to incorporate frontier technologies such as agents into their planning processes and future AI road maps. Agent-driven automation remains an exciting proposition, with the potential to revolutionize whole industries, bringing a new speed of action to work. That said, the technology is still in its early stages, and there is much development required before its full capabilities can be realized.