42.1. Agent Architectures (ReAct, Plan-and-Solve)

Status: Draft Version: 1.0.0 Tags: #Agents, #LLM, #ReAct, #AutoGPT, #Rust Author: MLOps Team

From Chatbots to Agents
The Cognitive Architecture: Perception, Memory, Action
The ReAct Pattern
Rust Implementation: The Agent Loop
Plan-and-Solve vs AutoGPT
Infrastructure: Stateful Serving
Handling The Halting Problem
Troubleshooting: Common Failures
Future Trends: Multi-Agent Swarms
MLOps Interview Questions
Glossary
Summary Checklist

From Chatbots to Agents

A Chatbot (ChatGPT) is passive. It waits for input, gives output. An Agent (AutoGPT) is active. It has a Goal. It takes Actions to achieve it.

The Loop:

Observed State $S_t$.
LLM generates Thought ($T$) and Action ($A$).
Execute Tool ($A \to O$).
New State $S_{t+1} = S_t + O$.
Repeat until Goal satisfied.

The Cognitive Architecture: Perception, Memory, Action

Agents are distinct from RAG apps because they have Agency (Tool Use).

graph TD
    User[User Goal] --> Perception
    Perception --> STM[Short Term Memory]
    STM --> Planning[Planner LLM]
    Planning --> Action[Tool Use]
    Action --> Environment[API / Web]
    Environment --> Perception
    STM <--> LTM[Long Term Memory / VectorDB]

Perception: Reading API responses, scraping web pages.
STM: The Context Window (8k - 128k tokens).
LTM: Vector Database (Pinecone/Milvus) for infinite memory.
Action: REST API calls, Python REPL execution.

The ReAct Pattern

Reasoning + Acting. Proposed by Yao et al. (2022). Instead of just acting (“Call API”), the model must first explain why.

Trace Example:

Question: Who is the wife of the actor who played Neo? Thought: I need to find who played Neo. Action: Search("Neo actor Matrix") Observation: Keanu Reeves plays Neo. Thought: Keanu Reeves doesn’t have a wife, but he has a partner. I should search for that. Action: Search("Keanu Reeves partner") Observation: Alexandra Grant. Thought: I have the answer. Action: Finish("Alexandra Grant")

Rust Implementation: The Agent Loop

We implement a robust, type-safe Agent Loop in Rust. Why Rust? Because Agents are expensive. You don’t want the Control Logic to crash due to a Python TypeError after paying $0.50 for GPT-4 tokens.

Project Structure

agent-core/
├── Cargo.toml
└── src/
    ├── main.rs
    ├── tools.rs
    └── llm.rs

Cargo.toml:

[package]
name = "agent-core"
version = "0.1.0"
edition = "2021"

[dependencies]
async-openai = "0.14" // The de-facto OpenAI client for AWS Lambda / Tokio
tokio = { version = "1", features = ["full"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
anyhow = "1.0"
log = "0.4"
regex = "1"

src/tools.rs:

#![allow(unused)]
fn main() {
use async_trait::async_trait;
use serde_json::Value;

// Trait defining what a tool looks like.
// Dynamic Dispatch (dyn Tool) allows us to have a heterogenous list of tools.
#[async_trait]
pub trait Tool: Send + Sync {
    fn name(&self) -> &str;
    fn description(&self) -> &str;
    async fn execute(&self, input: &str) -> Result<String, anyhow::Error>;
}

pub struct Calculator;

#[async_trait]
impl Tool for Calculator {
    fn name(&self) -> &str { "calculator" }
    fn description(&self) -> &str { "Evaluates basic math expressions." }
    
    async fn execute(&self, input: &str) -> Result<String, anyhow::Error> {
        // In prod, use a safe parser like `meval` or `evalexpr`.
        // Never use `eval()` in Python, and never use `sh -c` in Rust.
        // Here we just mock it for the demo.
        let result = match input.trim() {
            "2+2" => "4",
            "10/2" => "5",
            _ => "Error: Calc failure",
        };
        Ok(format!("Result: {}", result)) 
    }
}
}

src/main.rs:

mod tools;
use tools::{Tool, Calculator};
use std::collections::HashMap;
use std::sync::Arc;
use regex::Regex;

/// The Agent Struct holding state
struct Agent {
    // Arc<dyn Tool> allows shared ownership and thread safety
    tools: HashMap<String, Arc<dyn Tool>>,
    // Conversation History (Short Term Memory)
    memory: Vec<String>, 
}

impl Agent {
    fn new() -> Self {
        let mut tools: HashMap<String, Arc<dyn Tool>> = HashMap::new();
        // Register Tools
        tools.insert("calculator".to_string(), Arc::new(Calculator));
        
        Self {
            tools,
            memory: Vec::new(),
        }
    }

    /// The Core ReAct Loop
    /// 1. Loop MaxSteps
    /// 2. Construct Prompt from Memory
    /// 3. LLM Completion
    /// 4. Parse "Action:"
    /// 5. Execute Tool
    /// 6. Append Observation
    async fn run(&mut self, goal: &str) -> Result<String, anyhow::Error> {
        self.memory.push(format!("Goal: {}", goal));
        
        let max_steps = 10;
        
        for step in 0..max_steps {
            println!("--- Step {} ---", step);
            
            // 1. Construct Prompt
            let prompt = self.construct_prompt();
            
            // 2. Call LLM (Mocked here for example)
            // Real code: let response = openai.chat_completion(prompt).await?;
            let response = self.mock_llm_response(step);
            println!("LLM Thought: {}", response);
            self.memory.push(format!("AI: {}", response));

            // 3. Check for Finish Condition
            if response.contains("FINAL ANSWER:") {
                return Ok(response.replace("FINAL ANSWER:", "").trim().to_string());
            }
            
            // 4. Parse Action
            if let Some((tool_name, tool_input)) = self.parse_action(&response) {
                // 5. Execute Tool
                println!("Executing Tool: {} with Input: {}", tool_name, tool_input);
                
                let observation = if let Some(tool) = self.tools.get(&tool_name) {
                    let res = tool.execute(&tool_input).await;
                    match res {
                        Ok(o) => o,
                        Err(e) => format!("Tool Error: {}", e),
                    }
                } else {
                    format!("Error: Tool {} not found in registry", tool_name)
                };
                
                // 6. Update Memory
                println!("Observation: {}", observation);
                self.memory.push(format!("Observation: {}", observation));
            } else {
                println!("No action found. LLM might be babbling.");
            }
        }
        
        Err(anyhow::anyhow!("Max steps reached without solution. Agent gave up."))
    }
    
    fn construct_prompt(&self) -> String {
        // In reality, this merges System Prompt + Tool Definitions + Chat History
        let history = self.memory.join("\n");
        format!("System: You are an agent.\nHistory:\n{}", history)
    }
    
    fn parse_action(&self, output: &str) -> Option<(String, String)> {
        // Robust parsing using Regex. 
        // Matches: Action: tool_name(input)
        let re = Regex::new(r"Action: (\w+)\((.*)\)").unwrap();
        if let Some(caps) = re.captures(output) {
            let tool = caps.get(1)?.as_str().to_string();
            let input = caps.get(2)?.as_str().to_string();
            return Some((tool, input));
        }
        None
    }

    fn mock_llm_response(&self, step: usize) -> String {
        if step == 0 {
            "Thought: I need to calculate this.\nAction: calculator(2+2)".to_string()
        } else {
            "FINAL ANSWER: 4".to_string()
        }
    }
}

#[tokio::main]
async fn main() {
    let mut agent = Agent::new();
    match agent.run("What is 2+2?").await {
        Ok(ans) => println!("SOLVED: {}", ans),
        Err(e) => println!("FAILURE: {}", e),
    }
}

Plan-and-Solve vs AutoGPT

AutoGPT:

Recursive loop.
“Figure it out as you go”.
Pros: Can handle unexpected obstacles.
Cons: Gets stuck in trivial loops (“I need to check if I checked the file”). Expensive.

Plan-and-Solve (BabyAGI):

Planner: Generates a DAG of tasks upfront.
Executor: Executes tasks 1-by-1.
Pros: Cheaper, more focused.
Cons: If the plan is wrong (dag nodes are missing), it fails.

Hybrid: Use a Planner to generate the initial list. Use ReAct to execute each item.

Infrastructure: Stateful Serving

Rest APIs are stateless. POST /chat. Agents are highly stateful. A loop can run for 30 minutes.

Architecture:

Client opens WebSocket to wss://api.agent.com/v1/run.
Orchestrator spins up a Pod / Ray Actor for that session.
Agent runs in the pod, streaming partial thoughts ({"thought": "Searching..."}) to the socket.
User can intervene (“Stop! That’s wrong”) via the socket.

Handling The Halting Problem

Agents love to loop forever.

thought: “I need to ensure the file exists.” action: ls obs: file.txt thought: “I should verify it again just to be sure.” action: ls

Safety Mechanisms:

Step Limit: Hard cap at 20 steps.
Loop Detection: Hash the (Thought, Action) tuple. If seen 3 times, Force Stop or hint “You are repeating yourself”.
Cost Limit: Kill job if Tokens > 50k.

Troubleshooting: Common Failures

Scenario 1: The Context Window Overflow

Symptom: Agent crashes after 15 steps with 400 Bad Request: Context Length Exceeded.
Cause: The prompt includes the entire history of Observations (some might be huge JSON dumps).
Fix: Memory Management. Summarize older steps. “Steps 1-10: Searched Google, found nothing.” keep only last 5 raw steps.

Scenario 2: Hallucinated Tools

Symptom: Action: SendEmail(boss@company.com) -> Error: Tool SendEmail not found.
Cause: LLM “guesses” tool names based on training data.
Fix: Provide a Strict Schema (OpenAI Function Calling JSON Schema). Reject any action that doesn’t validate.

Scenario 3: JSON Parsing Hell

Symptom: Agent outputs invalid JSON Action: {"tool": "search", "query": "He said "Hello""}.
Cause: LLM fails to escape quotes inside strings.
Fix: Use a Grammar-Constrained Decoder (llama.cpp grammars) or robust JSON repair libraries like json_repair in Python.

Scenario 4: The Loop of Death

Symptom: Agent repeats “I need to login” 50 times.
Cause: Login tool is failing, but Agent ignores the error message “Invalid Password”.
Fix: Inject a “Frustration Signal”. If the same tool fails 3 times, overwrite the Prompt: “SYSTEM: You are stuck. Try a different approach or ask the user.”

Future Trends: Multi-Agent Swarms

Single Agents are “Jack of all trades, master of none”. Swarms (MetaGPT, AutoGen):

Manager Agent: Breaks down task.
Coder Agent: Writes Python.
Reviewer Agent: Crits code.
User Proxy: Executes code.

They talk to each other. “Conway’s Law” for AI.

MLOps Interview Questions

Q: How do you evaluate an Agent? A: You can’t use Accuracy. You use Success Rate on a benchmark (GAIA, AgentBench). Did it achieve the goal? Also measure Cost per Success.
Q: Why use Rust for Agents? A: Concurrency. An agent might launch 50 parallel scrapers. Python’s GIL hurts. Rust’s tokio handles thousands of async tools effortlessly.
Q: What is “Reflexion”? A: A pattern where the Agent analyzes its own failure trace. “I failed because specific reason. Next time I will do X.” It adds this “lesson” to its memory.
Q: How do you handle secrets (API Keys) in Agents? A: Never put keys in the Prompt. The Tool Implementation holds the key. The LLM only outputs CallTool("Search"). The Tool code injects Authorization: Bearer <KEY>.
Q: What is “Active Prompting”? A: Using a model to select the most helpful Few-Shot examples from a vector DB for the current specific query, rather than using a static set of examples.

Glossary

ReAct: Reasoning and Acting pattern.
Context Window: The maximum text an LLM can process (memory limit).
Function Calling: A fine-tuned capability of LLMs to output structured JSON matching a signature.
Reflexion: An agent architecture that includes a self-critique loop.

Summary Checklist

Tracing: Integrate LangSmith or Arize Phoenix. You cannot debug agents with print(). You need a Trace View.
Human-in-the-Loop: Always implement a ask_user tool. If the agent gets stuck, it should be able to ask for help.
Timeout: Set a 5-minute timeout on tool execution (e.g. Scraper hangs).
Sandbox: Never let an agent run rm -rf / on your production server. Run tools in Docker containers.
Cost: Monitor tokens per task. Agents can burn $100 in 5 minutes if they loop.

The MLOps Omni-Reference