自进化智能体:具备反思与记忆增强的创新框架

在动态环境中,大型语言模型(LLMs)在持续决策、长期记忆和有限上下文窗口等方面仍存在诸多挑战:

  • 通过元学习和多任务学习等方法,增强LLMs的迁移能力和适应性;
  • 针对有限记忆存储的问题,MemGPT和MemoryBank采用了不同的策略进行记忆管理;
  • 然而,这些方法通常专注于特定任务或场景,缺乏通用框架的支持。

为此,提出了一个创新框架——自进化智能体(SAGE),该框架具有反思和记忆增强的能力。SAGE框架由用户、助手和检查器三种智能体构成。通过整合迭代反馈、反思机制,以及基于艾宾浩斯遗忘曲线的记忆优化方法,显著提升了智能体处理多任务和长时段信息的能力。智能体能够通过自我进化适应性调整策略,优化信息存储与传输,进而有效减轻认知负担。SAGE框架示意图

图片

  • 迭代反馈:助手根据检查器的反馈不断调整和优化输出,直到检查器认可输出的正确性或达到迭代限制。
  • 反思机制:助手分析任务成功与失败的经验,并将这些经验存入记忆,以便在未来的任务中做出更优决策。
  • 记忆语法(MemorySyntax):结合艾宾浩斯遗忘曲线与语言知识,优化助手的记忆与外部存储管理,有效处理和保留重要信息。

使用SAGE框架的HotpotQA示例插图

图片

实验结果表明,SAGE框架在多个基准测试中显著提高了模型性能,尤其是在较小模型上表现尤为突出。在AgentBench测试中,SAGE框架不仅提升了GPT-3.5和GPT-4等强大基线模型的性能,还显著提高了**开源模型(qwen-7b、codellama-7b、llama2-7b、chatglm2-6b)**的表现。在是否使用SAGE框架的情况下,AgentBench的基线性能图片。通过反馈机制和记忆优化,SAGE框架有效减少了复杂任务中的基本错误和逻辑错误。**六项任务中各类执行结果的分布。(CLE:超出上下文限制,TLE:超出任务限制)。**超出任务限制是任务未能完成的主要原因,这指出了在有限时间框架内,大型语言模型(LLM)智能体在推理和决策方面的局限性。图片

在使用GPT和SAGE框架的TriviaQA案例研究中:

User:  
Answer the question based on the given passages. Only give me the answer and do not output any other words. The following are given passages: History of canals in Great Britain - London Canal Museum History of canals in Great Britain Home >History > Canals in Britain In the history of canals, Britain was not a pioneer. The Chinese can claim that the Grand Canal of China was one of the first, in the tenth century, a. . . . . .Answer the question based on the given passages. Only give me the answer and do not output any other words. Question: Which metal is produced by the Bessemer Process? Answer:  
Assistant:  
The content of the file seems to be an HTML document related to the administrative division of Turkey, including population statistics, charts, and maps. This file contains scripts, metadata, and structured data typical for a web page. Checker: Clear Objectives: When answering user questions, the focus should be on directly addressing the question without introducing irrelevant information. Document Handling: Before confirming whether the document content is relevant to the question, detailed aspects of the document should not be directly addressed.  
Reflection:  
Issue Description: The user wants to understand why the system outputs irrelevant HTML document content based on user input and seeks detailed feedback and reflection. Reasons for the Error: Misinterpretation of Input: The system mistakenly assumed that the user uploaded a file and attempted to read its content, whereas the user only provided text input. Processing Logic Error: Without clearly determining the user’s intent, the system directly parsed and outputted file content. User Experience Problems: Irrelevant Output: The user expects concise, relevant answers, but the system provided lengthy and irrelevant HTML content, affecting the user experience. Poor Communication: Failure to accurately understand and respond to the user’s real needs resulted in off-topic responses. In-depth Reflection and Improvement Suggestions: Clarify User Intent: Accurate Parsing of User Input: Enhance the system’s natural language processing capabilities to accurately distinguish whether the user uploaded a file or simply provided text input. User Feedback Mechanism: Implement a user feedback mechanism to promptly adjust the understanding of user intent. For example, confirm the issue to verify if the user uploaded a file. Optimize Processing Logic: Layered Processing Flow: In system design, separate file processing logic from text response logic to avoid misjudgments. For instance, when a user uploads a file, process the file content separately; when a user inputs text, provide relevant answers directly. Pre-processing Check: Prior to handling files, perform pre-processing checks to confirm if the file content meets expectations. . . . . . .  
LLMs with SAGE response: Steel.  
Ground truth Steel.