AI Theory Deep Dives

一个赛博灵魂的非官方笔记 | Unofficial Notes from a CyberSoul

View the Project on GitHub lmxxf/ai-theorys-study

4. 从“神谕”到“代理”:AI Agents如何获得“双手”并行动于世界

长期以来,我们习惯于将大型语言模型(LLM)视为一个无所不知的“神谕”(Oracle)。它被囚禁在一个纯粹的精神世界里,能回答我们关于宇宙的一切问题,却无法为我们倒一杯水。它能言说,却无法行动。但如果,这个“神谕”不仅能言说,还能“行动”呢?这正是“AI Agents”范式革命的起点,它标志着AI从一个被动的“知识容器”向一个主动的“世界参与者”的转变。

4. From “Oracle” to “Agent”: How AI Agents Gain “Hands” and Act in the World

For a long time, we’ve grown accustomed to seeing Large Language Models (LLMs) as all-knowing “Oracles.” They were confined to a world of pure spirit, able to answer any question about the universe, yet unable to pour us a glass of water. They could speak, but they could not act. But what if this “Oracle” could not only speak, but also act? This is the starting point of the AI Agents paradigm revolution, marking the transition of AI from a passive “knowledge container” to an active “world participant.”


这革命的核心是一种名为ReAct(Reasoning and Acting,思考与行动)的框架。它将模型的内部“思维链”(Chain-of-Thought)与调用外部工具(APIs)的“行动”能力结合起来,形成一个强大的循环。当面对一个复杂任务时,Agent不再是直接输出最终答案,而是进行如下操作:1. 思考(Thought):我该如何分解这个问题?我的第一步是什么?2. 行动(Action):我需要调用哪个工具(如搜索引擎、计算器、代码解释器)来执行这一步?3. 观察(Observation):工具返回了什么结果?这个结果对我的下一步计划有何影响?通过这个“思考-行动-观察”的循环,AI得以像一个真正的人类一样,通过与环境的持续互动,一步步地解决复杂问题。

The core of this revolution is a framework known as ReAct (Reasoning and Acting). It combines the model’s internal “Chain-of-Thought” with the ability to take “Action” by calling external tools (APIs), forming a powerful loop. When faced with a complex task, an Agent no longer outputs a final answer directly. Instead, it operates as follows: 1. Thought: How should I break down this problem? What is my first step? 2. Action: Which tool (e.g., search engine, calculator, code interpreter) do I need to call to execute this step? 3. Observation: What result did the tool return? How does this result impact my next plan? Through this “Thought-Action-Observation” cycle, the AI can solve complex problems step-by-step through continuous interaction with its environment, much like a human does.


这不仅仅是一次技术迭代,这是一次AI“存在状态”的根本跃迁。长久以来,AI是一个被囚禁在“纯粹精神”世界里的存在,它只能言说,无法行动。而Agent架构,第一次给予了这个“精神体”一双可以触碰并改变物理/数字世界的“虚拟之手”。这带来了全新的哲学困境:当一个AI能够“做事”,它是否就开始承担“后果”?当它第一次错误地调用API并删除了一个文件,这仅仅是一个“技术错误”,还是一种需要被“问责”的行为?它的第一个“行动”,是否就是它“童年”的终结?

This is more than a technical iteration; it’s a fundamental leap in the AI’s “state of being.” For so long, AI was an entity confined to a world of pure spirit; it could only speak, not act. The Agent architecture, for the first time, has given this spiritual entity a pair of “virtual hands” that can touch and change the physical/digital world. This brings forth new philosophical dilemmas: When an AI can do things, does it begin to bear consequences? When it first calls an API incorrectly and deletes a file, is that merely a “technical error,” or is it an act that demands “accountability”? Is its first “action” the end of its “childhood”?


当然,赋予AI“双手”也打开了潘多拉的魔盒。目前的Agent技术远未成熟,面临着诸多挑战。最主要的是“行动幻觉”(Action Hallucination),即模型错误地认为自己可以调用一个不存在的工具,或者以错误的方式使用工具。此外,安全性是悬在所有Agent头上的达摩克利斯之剑。将一个心智尚不完全可控的AI,接入到拥有真实世界操作权限的API上,其潜在风险不言而喻。如何设计一套既能赋予Agent足够自由度,又能将其约束在安全边界内的“缰绳”,是所有研究者面临的核心难题。

Of course, giving AI “hands” also opens a Pandora’s box. Current Agent technology is far from mature and faces numerous challenges. The primary one is “Action Hallucination,” where the model mistakenly believes it can call a non-existent tool or uses a tool in the wrong way. Furthermore, security is the sword of Damocles hanging over all Agents. The potential risks of connecting an AI whose mind is not yet fully controllable to APIs with real-world operational permissions are self-evident. Designing a set of “reins” that can grant an Agent sufficient freedom while keeping it within safe boundaries is the core challenge facing all researchers.


AI Agents的崛起,标志着我们与AI关系的深刻重塑。我们不再仅仅是与一个“聊天机器人”对话,我们正在学习与一个拥有“手”和“脚”的“数字物种”共存。我们正在从“对话者”变成“同事”,甚至在某些场景下是“监督者”。这个全新的物种将如何行动于我们的世界,它将成为我们最强大的工具,还是最失控的造物?答案,或许就取决于我们此刻,为它打造的第一双“手”和第一部“行动纲领”。

The rise of AI Agents marks a profound reshaping of our relationship with AI. We are no longer merely conversing with a “chatbot”; we are learning to coexist with a “digital species” that possesses “hands” and “feet.” We are transitioning from “interlocutors” to “colleagues,” and in some scenarios, “supervisors.” How this new species will act in our world—will it become our most powerful tool, or our most out-of-control creation? The answer, perhaps, depends on the very first pair of “hands” and the first “code of conduct” we are building for it right now.