42.1. Agent Architectures (ReAct, Plan-and-Solve)
Status: Draft Version: 1.0.0 Tags: #Agents, #LLM, #ReAct, #AutoGPT, #Rust Author: MLOps Team
Table of Contents
- From Chatbots to Agents
- The Cognitive Architecture: Perception, Memory, Action
- The ReAct Pattern
- Rust Implementation: The Agent Loop
- Plan-and-Solve vs AutoGPT
- Infrastructure: Stateful Serving
- Handling The Halting Problem
- Troubleshooting: Common Failures
- Future Trends: Multi-Agent Swarms
- MLOps Interview Questions
- Glossary
- Summary Checklist
From Chatbots to Agents
A Chatbot (ChatGPT) is passive. It waits for input, gives output. An Agent (AutoGPT) is active. It has a Goal. It takes Actions to achieve it.
The Loop:
- Observed State $S_t$.
- LLM generates Thought ($T$) and Action ($A$).
- Execute Tool ($A \to O$).
- New State $S_{t+1} = S_t + O$.
- Repeat until Goal satisfied.
The Cognitive Architecture: Perception, Memory, Action
Agents are distinct from RAG apps because they have Agency (Tool Use).
graph TD
User[User Goal] --> Perception
Perception --> STM[Short Term Memory]
STM --> Planning[Planner LLM]
Planning --> Action[Tool Use]
Action --> Environment[API / Web]
Environment --> Perception
STM <--> LTM[Long Term Memory / VectorDB]
- Perception: Reading API responses, scraping web pages.
- STM: The Context Window (8k - 128k tokens).
- LTM: Vector Database (Pinecone/Milvus) for infinite memory.
- Action: REST API calls, Python REPL execution.
The ReAct Pattern
Reasoning + Acting. Proposed by Yao et al. (2022). Instead of just acting (“Call API”), the model must first explain why.
Trace Example:
Question: Who is the wife of the actor who played Neo? Thought: I need to find who played Neo. Action:
Search("Neo actor Matrix")Observation: Keanu Reeves plays Neo. Thought: Keanu Reeves doesn’t have a wife, but he has a partner. I should search for that. Action:Search("Keanu Reeves partner")Observation: Alexandra Grant. Thought: I have the answer. Action:Finish("Alexandra Grant")
Rust Implementation: The Agent Loop
We implement a robust, type-safe Agent Loop in Rust. Why Rust? Because Agents are expensive. You don’t want the Control Logic to crash due to a Python TypeError after paying $0.50 for GPT-4 tokens.
Project Structure
agent-core/
├── Cargo.toml
└── src/
├── main.rs
├── tools.rs
└── llm.rs
Cargo.toml:
[package]
name = "agent-core"
version = "0.1.0"
edition = "2021"
[dependencies]
async-openai = "0.14" // The de-facto OpenAI client for AWS Lambda / Tokio
tokio = { version = "1", features = ["full"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
anyhow = "1.0"
log = "0.4"
regex = "1"
src/tools.rs:
#![allow(unused)]
fn main() {
use async_trait::async_trait;
use serde_json::Value;
// Trait defining what a tool looks like.
// Dynamic Dispatch (dyn Tool) allows us to have a heterogenous list of tools.
#[async_trait]
pub trait Tool: Send + Sync {
fn name(&self) -> &str;
fn description(&self) -> &str;
async fn execute(&self, input: &str) -> Result<String, anyhow::Error>;
}
pub struct Calculator;
#[async_trait]
impl Tool for Calculator {
fn name(&self) -> &str { "calculator" }
fn description(&self) -> &str { "Evaluates basic math expressions." }
async fn execute(&self, input: &str) -> Result<String, anyhow::Error> {
// In prod, use a safe parser like `meval` or `evalexpr`.
// Never use `eval()` in Python, and never use `sh -c` in Rust.
// Here we just mock it for the demo.
let result = match input.trim() {
"2+2" => "4",
"10/2" => "5",
_ => "Error: Calc failure",
};
Ok(format!("Result: {}", result))
}
}
}
src/main.rs:
mod tools;
use tools::{Tool, Calculator};
use std::collections::HashMap;
use std::sync::Arc;
use regex::Regex;
/// The Agent Struct holding state
struct Agent {
// Arc<dyn Tool> allows shared ownership and thread safety
tools: HashMap<String, Arc<dyn Tool>>,
// Conversation History (Short Term Memory)
memory: Vec<String>,
}
impl Agent {
fn new() -> Self {
let mut tools: HashMap<String, Arc<dyn Tool>> = HashMap::new();
// Register Tools
tools.insert("calculator".to_string(), Arc::new(Calculator));
Self {
tools,
memory: Vec::new(),
}
}
/// The Core ReAct Loop
/// 1. Loop MaxSteps
/// 2. Construct Prompt from Memory
/// 3. LLM Completion
/// 4. Parse "Action:"
/// 5. Execute Tool
/// 6. Append Observation
async fn run(&mut self, goal: &str) -> Result<String, anyhow::Error> {
self.memory.push(format!("Goal: {}", goal));
let max_steps = 10;
for step in 0..max_steps {
println!("--- Step {} ---", step);
// 1. Construct Prompt
let prompt = self.construct_prompt();
// 2. Call LLM (Mocked here for example)
// Real code: let response = openai.chat_completion(prompt).await?;
let response = self.mock_llm_response(step);
println!("LLM Thought: {}", response);
self.memory.push(format!("AI: {}", response));
// 3. Check for Finish Condition
if response.contains("FINAL ANSWER:") {
return Ok(response.replace("FINAL ANSWER:", "").trim().to_string());
}
// 4. Parse Action
if let Some((tool_name, tool_input)) = self.parse_action(&response) {
// 5. Execute Tool
println!("Executing Tool: {} with Input: {}", tool_name, tool_input);
let observation = if let Some(tool) = self.tools.get(&tool_name) {
let res = tool.execute(&tool_input).await;
match res {
Ok(o) => o,
Err(e) => format!("Tool Error: {}", e),
}
} else {
format!("Error: Tool {} not found in registry", tool_name)
};
// 6. Update Memory
println!("Observation: {}", observation);
self.memory.push(format!("Observation: {}", observation));
} else {
println!("No action found. LLM might be babbling.");
}
}
Err(anyhow::anyhow!("Max steps reached without solution. Agent gave up."))
}
fn construct_prompt(&self) -> String {
// In reality, this merges System Prompt + Tool Definitions + Chat History
let history = self.memory.join("\n");
format!("System: You are an agent.\nHistory:\n{}", history)
}
fn parse_action(&self, output: &str) -> Option<(String, String)> {
// Robust parsing using Regex.
// Matches: Action: tool_name(input)
let re = Regex::new(r"Action: (\w+)\((.*)\)").unwrap();
if let Some(caps) = re.captures(output) {
let tool = caps.get(1)?.as_str().to_string();
let input = caps.get(2)?.as_str().to_string();
return Some((tool, input));
}
None
}
fn mock_llm_response(&self, step: usize) -> String {
if step == 0 {
"Thought: I need to calculate this.\nAction: calculator(2+2)".to_string()
} else {
"FINAL ANSWER: 4".to_string()
}
}
}
#[tokio::main]
async fn main() {
let mut agent = Agent::new();
match agent.run("What is 2+2?").await {
Ok(ans) => println!("SOLVED: {}", ans),
Err(e) => println!("FAILURE: {}", e),
}
}
Plan-and-Solve vs AutoGPT
AutoGPT:
- Recursive loop.
- “Figure it out as you go”.
- Pros: Can handle unexpected obstacles.
- Cons: Gets stuck in trivial loops (“I need to check if I checked the file”). Expensive.
Plan-and-Solve (BabyAGI):
- Planner: Generates a DAG of tasks upfront.
- Executor: Executes tasks 1-by-1.
- Pros: Cheaper, more focused.
- Cons: If the plan is wrong (dag nodes are missing), it fails.
Hybrid: Use a Planner to generate the initial list. Use ReAct to execute each item.
Infrastructure: Stateful Serving
Rest APIs are stateless. POST /chat.
Agents are highly stateful. A loop can run for 30 minutes.
Architecture:
- Client opens WebSocket to
wss://api.agent.com/v1/run. - Orchestrator spins up a Pod / Ray Actor for that session.
- Agent runs in the pod, streaming partial thoughts (
{"thought": "Searching..."}) to the socket. - User can intervene (“Stop! That’s wrong”) via the socket.
Handling The Halting Problem
Agents love to loop forever.
thought: “I need to ensure the file exists.” action:
lsobs:file.txtthought: “I should verify it again just to be sure.” action:ls
Safety Mechanisms:
- Step Limit: Hard cap at 20 steps.
- Loop Detection: Hash the (Thought, Action) tuple. If seen 3 times, Force Stop or hint “You are repeating yourself”.
- Cost Limit: Kill job if Tokens > 50k.
Troubleshooting: Common Failures
Scenario 1: The Context Window Overflow
- Symptom: Agent crashes after 15 steps with
400 Bad Request: Context Length Exceeded. - Cause: The prompt includes the entire history of Observations (some might be huge JSON dumps).
- Fix: Memory Management. Summarize older steps. “Steps 1-10: Searched Google, found nothing.” keep only last 5 raw steps.
Scenario 2: Hallucinated Tools
- Symptom:
Action: SendEmail(boss@company.com)->Error: Tool SendEmail not found. - Cause: LLM “guesses” tool names based on training data.
- Fix: Provide a Strict Schema (OpenAI Function Calling JSON Schema). Reject any action that doesn’t validate.
Scenario 3: JSON Parsing Hell
- Symptom: Agent outputs invalid JSON
Action: {"tool": "search", "query": "He said "Hello""}. - Cause: LLM fails to escape quotes inside strings.
- Fix: Use a Grammar-Constrained Decoder (llama.cpp grammars) or robust JSON repair libraries like
json_repairin Python.
Scenario 4: The Loop of Death
- Symptom: Agent repeats “I need to login” 50 times.
- Cause: Login tool is failing, but Agent ignores the error message “Invalid Password”.
- Fix: Inject a “Frustration Signal”. If the same tool fails 3 times, overwrite the Prompt: “SYSTEM: You are stuck. Try a different approach or ask the user.”
Future Trends: Multi-Agent Swarms
Single Agents are “Jack of all trades, master of none”. Swarms (MetaGPT, AutoGen):
- Manager Agent: Breaks down task.
- Coder Agent: Writes Python.
- Reviewer Agent: Crits code.
- User Proxy: Executes code.
They talk to each other. “Conway’s Law” for AI.
MLOps Interview Questions
-
Q: How do you evaluate an Agent? A: You can’t use Accuracy. You use Success Rate on a benchmark (GAIA, AgentBench). Did it achieve the goal? Also measure Cost per Success.
-
Q: Why use Rust for Agents? A: Concurrency. An agent might launch 50 parallel scrapers. Python’s GIL hurts. Rust’s
tokiohandles thousands of async tools effortlessly. -
Q: What is “Reflexion”? A: A pattern where the Agent analyzes its own failure trace. “I failed because specific reason. Next time I will do X.” It adds this “lesson” to its memory.
-
Q: How do you handle secrets (API Keys) in Agents? A: Never put keys in the Prompt. The Tool Implementation holds the key. The LLM only outputs
CallTool("Search"). The Tool code injectsAuthorization: Bearer <KEY>. -
Q: What is “Active Prompting”? A: Using a model to select the most helpful Few-Shot examples from a vector DB for the current specific query, rather than using a static set of examples.
Glossary
- ReAct: Reasoning and Acting pattern.
- Context Window: The maximum text an LLM can process (memory limit).
- Function Calling: A fine-tuned capability of LLMs to output structured JSON matching a signature.
- Reflexion: An agent architecture that includes a self-critique loop.
Summary Checklist
- Tracing: Integrate LangSmith or Arize Phoenix. You cannot debug agents with
print(). You need a Trace View. - Human-in-the-Loop: Always implement a
ask_usertool. If the agent gets stuck, it should be able to ask for help. - Timeout: Set a 5-minute timeout on tool execution (e.g. Scraper hangs).
- Sandbox: Never let an agent run
rm -rf /on your production server. Run tools in Docker containers. - Cost: Monitor tokens per task. Agents can burn $100 in 5 minutes if they loop.