Four Design Patterns of AI Agents: A Pathway to Artificial General Intelligence

Four Design Patterns of AI Agents: A Pathway to Artificial General Intelligence

If using AI to complete tasks is likened to writing an essay, then the non-agent approach is to have AI write it from start to finish without any modifications, while the agent approach allows AI to make multiple modifications and use tools and collaborate with the outside world during the process. In 2024, agents are seen as one of the pathways to achieving Artificial General Intelligence (AGI).

Foundational models have driven the development of generative AI, enabling AI Agents to automatically assist users in completing tasks. Over the past year, various agents have emerged. Inspired by Andrew Ng’s speech at the Red Shirt AI Summit, this article combines papers published in the past year and engineering blogs like Langchain to organize existing agents and summarize design paradigms, hoping to aid in designing agents based on foundational models.

The following table preliminarily organizes 16 patterns:

Pattern Description
Passive goal creator Analyzes explicit cues from users through a conversational interface to maintain interactivity, goal tracking, and intuitiveness.
Proactive goal creator Anticipates user goals by understanding human interactions and capturing context to enhance interactivity, goal tracking, and accessibility.
Prompt/response optimiser Optimizes prompts/responses based on expected input or output content and format to provide standardization, response accuracy, interoperability, and adaptability.
Retrieval augmented generation Enhances the knowledge update capability of agents while maintaining data privacy in local foundational model agent systems.
One-shot model querying Accesses the foundational model in a single instance to generate all steps needed for planning, improving cost efficiency and simplifying processes.
Incremental model querying Accesses the foundational model at each step of the plan generation process to provide supplementary context, improve response accuracy, and interpretability.
Single-path plan generator Coordinates the generation of intermediate steps to achieve user goals, improving reasoning certainty, coherence, and efficiency.
Multi-path plan generator Allows multiple options to be created at each step of achieving user goals, enhancing reasoning certainty, coherence, alignment with human preferences, and inclusivity.
Self-reflection Enables agents to generate feedback on the planning and reasoning process and provide self-improvement guidance to enhance reasoning certainty, interpretability, continuous improvement, and efficiency.
Cross-reflection Uses different agents or foundational models to provide feedback and improve the generated planning and reasoning process, enhancing reasoning certainty, interpretability, interoperability, inclusivity, scalability, and continuous improvement.
Human reflection Collects human feedback to improve planning and reasoning processes, effectively aligning with human preferences, enhancing contestability, effectiveness, fairness, and continuous improvement.
Voting-based cooperation Allows agents to freely express opinions and reach consensus through voting, improving diversity, effective division of labor, and fault tolerance.
Role-based cooperation Assigns different roles and finalizes decisions based on the roles of agents, improving decision certainty, division of labor, fault tolerance, scalability, and accountability.
Debate-based cooperation Agents provide and receive feedback through debate, adjusting their ideas and actions until consensus is reached, improving decision certainty, adaptability, interpretability, response accuracy, and critical thinking.
Multimodal guardrails Controls the input and output of foundational models to meet specific requirements, such as user demands, ethical standards, and legal regulations, enhancing robustness, safety, standard alignment, and adaptability.
Tool/agent registry Maintains a unified and convenient source for selecting different agents and tools, improving discoverability, efficiency, and tool applicability.

These 16 patterns can all be attributed to the four paradigms proposed by Andrew Ng in his speech at the Red Shirt AI Summit, namely:

  • Reflection
  • Tool Use
  • Planning
  • Multiagent Collaboration

1 Reflection

1.1 Basic Reflection

In the context of LLM Agent construction, reflection refers to the process of prompting LLMs to observe their past steps (and potential observations from tools/environments) to assess the quality of selected actions. This feedback is then used for downstream tasks such as replanning, searching, or evaluation. The figure below shows a basic reflection pattern.

Basic Reflection

Basic Reflection

1.2 Reflexion Actor

Proposed by Shinn et al., Reflexion is an architecture that learns through language feedback and self-reflection. This agent reviews its task results to generate higher quality final outcomes, but at the cost of longer execution time. It mainly consists of three components:

  1. Actor (agent) with self-reflection
  2. External evaluator (task-specific, e.g., code compilation steps)
  3. Episodic memory that stores the reflections from (1).

Reflexion Actor

Reflexion Actor

1.3 LATS

Language Agent Tree Search (LATS), proposed by Zhou et al., is a general LLM Agent search algorithm that combines reflection/evaluation and search (specifically Monte Carlo Tree Search), achieving better overall task performance compared to similar techniques like ReACT, Reflexion, or Tree of Thoughts.

It has four main steps:

  1. Select: pick the best next actions based on the aggregate rewards from step (2). Either respond (if a solution is found or the max search depth is reached) or continue searching.

  2. Expand and simulate: select the “best” 5 potential actions to take and execute them in parallel.

  3. Reflect + Evaluate: observe the outcomes of these actions and score the decisions based on reflection (and possibly external feedback).

  4. Backpropagate: update the scores of the root trajectories based on the outcomes.

LATS

LATS

2 Tool Use

Invoke tools, using them in the form of functions.

3 Planning

3.1 ReAct

ReAct enhances the capabilities of agents by combining reasoning and action. The ReAct method allows agents to react immediately after receiving information, rather than waiting for all information to be processed. At the same time, this method emphasizes the close integration of reasoning and action, where agents not only need to analyze and understand input information but also take corresponding actions based on the analysis results. The advantage of this approach lies in its flexibility and adaptability to the environment.

ReAct Framework

ReAct Framework

3.2 Plan and Execute

The core idea of Plan and Execute is to first develop a multi-step plan and then execute the plan item by item. After completing specific tasks, the plan can be revisited and appropriately modified.

Compared to typical ReAct-style agents, which think one step at a time, the advantage of this “plan and execute” style is:

  1. Clear long-term planning (even very powerful LLMs find this challenging)
  2. The ability to use smaller/weaker models during execution steps, using larger/better models only during planning steps

Plan-and-Execute

Plan-and-Execute

3.3 ReWOO

In ReWOO, Xu et al. propose an agent that combines multi-step planning and variable substitution to achieve efficient tool use. It improves ReACT-style agent architecture in the following ways:

  1. Reduces token consumption and execution time by generating the complete toolchain used at once. (ReACT-style agent architecture requires many LLM calls and has redundant prefixes because system prompts and previous steps are provided to the LLM at each reasoning step)
  2. Simplifies fine-tuning. Since planning data does not depend on tool outputs, the model can theoretically be fine-tuned without actually calling the tools.

Reasoning without Observation

Reasoning without Observation

3.4 LLMCompiler

LLMCompiler is an agent architecture that accelerates agent task execution by executing tasks in a DAG in a blitz manner. It also saves the cost of redundant token usage by reducing the number of LLM calls. It mainly consists of three parts:

  1. Planner: stream a DAG of tasks.

  2. Task Fetching Unit: schedules and executes the tasks as soon as they are executable.

  3. Joiner: Responds to the user or triggers a second plan.

LLMCompiler

LLMCompiler

4 Multiagent collaboration

4.1 Supervison

Manages and schedules multiple agents for collaboration through a supervisor.

Supervision Method

Supervision Method

4.2 Hierarchical Teams

Completes complex and large-scale tasks by organizing agents in a hierarchical, tiered manner. AutoGen is a typical representative of this approach.

Hierarchical Team Method

Hierarchical Team Method

4.3 Collaboration

A single agent’s ability to use multiple (domain) tools is limited, requiring multiple agents to collaborate using more types of tools. The “divide and conquer” approach can be used, allowing each agent to become an “expert” focused on handling a specific type of problem, and then having them collaborate.

A Basic Multi-Agent Collaboration

A Basic Multi-Agent Collaboration

5 Evaluation

One of the most straightforward ideas is to use an agent as a “virtual user” for evaluation, as many task results cannot be quantitatively evaluated. However, for tasks with clear metrics (classification, regression), a tool can be directly utilized for evaluation.

Agent-based Evaluation

Agent-based Evaluation

6 Other Ways to Achieve AGI

Agents are just one promising way to achieve AGI, but not the only method. The agent approach can be organically combined with methods like RAG and user involvement. For example, Shi et al. combined agents and retrieval to solve Olympiad programming problems with large models.

Untitled

https://github.com/AGI-Edgerunners/LLM-Agents-Papers

https://github.com/zjunlp/LLMAgentPapers

AI agent task decomposition and scheduling classic articles - bonelee - Blog Park (cnblogs.com)

Four Agent Paradigms | CRITIC: Andrew Ng’s Recommended Agent Design Paradigms - Zhihu (zhihu.com)

8 References

  1. Kim, Sehoon, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, and Amir Gholami. “An LLM Compiler for Parallel Function Calling.” arXiv, February 6, 2024. https://doi.org/10.48550/arXiv.2312.04511.
  2. Liu, Yue, Sin Kit Lo, Qinghua Lu, Liming Zhu, Dehai Zhao, Xiwei Xu, Stefan Harrer, and Jon Whittle. “Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model Based Agents.” arXiv, May 16, 2024. https://doi.org/10.48550/arXiv.2405.10467.
  3. Shi, Quan, Michael Tang, Karthik Narasimhan, and Shunyu Yao. “Can Language Models Solve Olympiad Programming?” arXiv, April 16, 2024. https://doi.org/10.48550/arXiv.2404.10952.
  4. Shinn, Noah, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. “Reflexion: Language Agents with Verbal Reinforcement Learning.” arXiv, October 10, 2023. https://doi.org/10.48550/arXiv.2303.11366.
  5. Wang, Lei, Wanyu Xu, Yihuai Lan, Zhiqiang Hu, Yunshi Lan, Roy Ka-Wei Lee, and Ee-Peng Lim. “Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models.” arXiv, May 26, 2023. https://doi.org/10.48550/arXiv.2305.04091.
  6. Xu, Binfeng, Zhiyuan Peng, Bowen Lei, Subhabrata Mukherjee, Yuchen Liu, and Dongkuan Xu. “ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models.” arXiv, May 22, 2023. https://doi.org/10.48550/arXiv.2305.18323.
  7. Yao, Shunyu, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. “ReAct: Synergizing Reasoning and Acting in Language Models.” arXiv, March 9, 2023. https://doi.org/10.48550/arXiv.2210.03629.
  8. “Yoheinakajima/Babyagi.” Accessed May 21, 2024. https://github.com/yoheinakajima/babyagi/tree/main.
  9. “LangGraph tutorials.” Accessed May 21, 2024. https://langchain-ai.github.io/langgraph/tutorials/
  10. Zhou, Andy, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, and Yu-Xiong Wang. “Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models.” arXiv, December 5, 2023. https://doi.org/10.48550/arXiv.2310.04406.
  11. Zhou, Pei, Jay Pujara, Xiang Ren, Xinyun Chen, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou, Swaroop Mishra, and Huaixiu Steven Zheng. “Self-Discover: Large Language Models Self-Compose Reasoning Structures.” arXiv, February 5, 2024. https://doi.org/10.48550/arXiv.2402.03620.
Buy me a coffee~
Tim AlipayAlipay
Tim PayPalPayPal
Tim WeChat PayWeChat Pay
0%