ArchitecturesW0523 min read

Agent Architectures: ReAct, Plan-and-Execute, Reflexion, LATS

Systematic comparison of dominant patterns. ReAct interleaves reasoning with action; Plan-and-Execute commits to a plan and replans on drift; Reflexion learns from failures verbally; LATS adds Monte Carlo Tree Search to reasoning.

Core conceptsReAct loopPlan-Execute-ReplanReflexion

01Learning Objectives

By the end of this lecture, students will be able to:

Describe the ReAct architecture and explain why interleaving reasoning with action improves agent performance.
Compare Plan-and-Execute, Reflexion, and LATS architectures and articulate when each is most appropriate.
Implement a minimal ReAct agent from scratch in Python (no frameworks).
Analyze the observe-think-act loop as the common pattern underlying all agent architectures.
Evaluate hybrid and custom architectures for specific use cases.
Compare modern agent frameworks (LangGraph, OpenAI Agents SDK, CrewAI) and understand how they implement the core architectural patterns.
Describe emerging patterns such as Agent-as-a-Service, Agentic RAG, and self-improving agents.

021. The Need for Agent Architectures

1.1 Beyond the Simple Loop

In Week 1, we built a minimal agent: call the LLM, check for tool calls, execute them, repeat. This works for simple tasks, but it breaks down when:

The task requires multi-step planning (write a research report with 10 sections).
The agent makes mistakes that it needs to detect and correct.
The agent needs to explore multiple approaches and pick the best one.
The task requires long-horizon reasoning where the agent must stay on track across many steps.

Consider an analogy. The simple agent loop is like cooking by looking at the ingredients in front of you and deciding what to do next. It works for scrambled eggs. But for a five-course dinner for 30 people, you need a recipe (plan), a schedule (dependencies), taste tests along the way (evaluation), and the flexibility to adjust when something goes wrong (replanning). Agent architectures provide this structure.

Agent architectures provide structured patterns for how the LLM reasons, acts, and learns from feedback. They are to agent development what design patterns are to software engineering: proven solutions to recurring problems.

This is arguably the most important lecture in the course. The architectures you learn this week, ReAct, Plan-and-Execute, Reflexion, and LATS, form the vocabulary of agent design. Every production agent you encounter is either using one of these patterns directly or combining them in some way.

1.2 The Observe-Think-Act Loop

Every agent architecture is a variation of the same fundamental loop:

Interactive · The Observe-Think-Act Loop

The agent loop

Every agent architecture rides on the same cycle: perceive, reason, act, observe. The architectures you'll meet later are variations on this loop.

The agent loop

Perception

01 / 04

The architectures differ in:

How much thinking happens before acting.
Whether the agent reflects on its actions after the fact.
How the agent explores alternative paths.
Whether planning is separated from execution.
How errors are handled and incorporated.

032. ReAct: Reasoning + Acting

2.1 The Paper

Yao et al. (2023) introduced ReAct (Reasoning + Acting) in "ReAct: Synergizing Reasoning and Acting in Language Models," published at ICLR 2023. This paper is foundational — it established the dominant pattern for LLM-based agents.

2.2 The Core Idea

Before ReAct, there were two separate lines of research:

Chain-of-Thought (CoT): LLMs reason step by step but do not take actions.
Action-only agents: Agents take actions based on observations but do not explicitly reason.

ReAct's insight: interleave reasoning (thoughts) with actions in a single generation.

Interactive · The ReAct Architecture

ReAct simulator

Run an agent step by step

Pick a task and watch the agent interleave thought, action, and observation. Memory and the tool history fill up in real time.

Pick a task

Agent trace0 / 8

Memory0

The agent hasn't remembered anything yet.

Tool history0

No tools called yet.

text

Question: What is the elevation of the birthplace of the inventor of the telephone?

Thought 1: I need to find out who invented the telephone.
Action 1: Search[inventor of the telephone]
Observation 1: Alexander Graham Bell is credited with inventing the telephone.

Thought 2: Now I need to find where Alexander Graham Bell was born.
Action 2: Search[Alexander Graham Bell birthplace]
Observation 2: Alexander Graham Bell was born in Edinburgh, Scotland.

Thought 3: Now I need to find the elevation of Edinburgh, Scotland.
Action 3: Search[elevation of Edinburgh Scotland]
Observation 3: Edinburgh has an average elevation of about 47 meters (154 feet).

Thought 4: I have all the information I need.
Action 4: Finish[47 meters (154 feet)]

2.3 Why ReAct Works

The synergy between reasoning and acting is not just additive; it is multiplicative. Each component makes the other better:

Reasoning grounds actions: The thought step explains why the agent is taking a particular action, reducing random or irrelevant tool calls. Without reasoning, an agent might call a search tool with a vague query. With reasoning, it thinks "I need to find the birthplace specifically, so I should search for 'Alexander Graham Bell birthplace'" and produces a focused, effective query.
Actions ground reasoning: Tool results provide factual information that keeps the reasoning chain accurate, reducing hallucination. Without actions, the model might "reason" using fabricated facts. With actions, it verifies its reasoning against real data.
Transparency: The interleaved thoughts make the agent's decision process inspectable and debuggable. When an agent makes a mistake, you can read its thoughts and understand where the reasoning went wrong.
Error recovery: When an observation is unexpected, the agent can reason about what went wrong and adjust. "The search returned no results. Let me try a different query."

Key Insight: ReAct's magic is in the interleaving. Pure reasoning without actions leads to hallucination. Pure action without reasoning leads to aimless tool use. The combination keeps both on track.

2.4 ReAct vs. Alternatives

Yao et al. (2023) compared ReAct against several baselines on knowledge-intensive tasks (HotpotQA, FEVER) and decision-making tasks (ALFWorld, WebShop):

Approach	Strengths	Weaknesses
CoT only (no actions)	Good reasoning on simple problems	Hallucinates facts, cannot access current info
Act only (no reasoning)	Can use tools	Makes random or suboptimal tool choices
ReAct (reasoning + acting)	Grounded reasoning, better tool use	Higher token cost, can get stuck in loops
CoT + Self-Consistency	High accuracy on reasoning tasks	Cannot take actions, expensive

ReAct outperformed both pure reasoning and pure acting approaches, demonstrating that the synergy between thinking and doing is greater than either alone.

2.5 Minimal ReAct Implementation (No Frameworks)

python

"""
Minimal ReAct agent implemented from scratch.

No LangChain, no LangGraph, no frameworks — just the OpenAI API
and Python. This implementation demonstrates the core ReAct pattern.
"""

import json
import re
from openai import OpenAI

client = OpenAI()

# --- Tool Implementations ---

def search(query: str) -> str:
    """Simulated search tool."""
    # In production, use a real search API
    knowledge = {
        "python creator": "Python was created by Guido van Rossum, first released in 1991.",
        "guido van rossum": "Guido van Rossum is a Dutch programmer, born in Haarlem, Netherlands, on January 31, 1956.",
        "haarlem netherlands": "Haarlem is a city in the Netherlands, capital of North Holland province. Population approximately 162,000.",
        "haarlem elevation": "Haarlem sits at an elevation of approximately 1 meter above sea level, in the low-lying Netherlands.",
        "eiffel tower height": "The Eiffel Tower is 330 meters tall (including antenna) or 300 meters to the roof.",
        "transformer paper": "The Transformer was introduced in 'Attention Is All You Need' by Vaswani et al. (2017) at NeurIPS.",
        "react paper": "ReAct was published by Yao et al. (2023) at ICLR. It synergizes reasoning and acting in LLMs.",
    }

    query_lower = query.lower()
    for key, value in knowledge.items():
        if key in query_lower:
            return value

    return f"No results found for: {query}"


def calculator(expression: str) -> str:
    """Safe calculator."""
    import math
    allowed = {"__builtins__": {}, "math": math, "abs": abs, "round": round,
               "sqrt": math.sqrt, "pi": math.pi, "e": math.e}
    try:
        result = eval(expression, allowed)
        return str(result)
    except Exception as e:
        return f"Error: {e}"


def finish(answer: str) -> str:
    """Signal that the agent has reached a final answer."""
    return answer


TOOLS = {
    "Search": search,
    "Calculator": calculator,
    "Finish": finish,
}


# --- The ReAct Prompt ---

REACT_SYSTEM_PROMPT = """You are a helpful assistant that answers questions using a Thought-Action-Observation loop.

You have access to the following tools:
- Search[query]: Search for information. Input is a search query string.
- Calculator[expression]: Calculate a mathematical expression. Input is a Python math expression.
- Finish[answer]: Return the final answer. Use this when you have enough information.

You MUST follow this EXACT format for each step:

Thought: <your reasoning about what to do next>
Action: <ToolName>[<input>]

Then you will receive:
Observation: <result of the action>

Important rules:
1. Always start with a Thought before taking an Action.
2. Each response should contain exactly ONE Thought and ONE Action.
3. Use Search when you need factual information.
4. Use Calculator when you need precise computations.
5. Use Finish[answer] when you have enough information to answer the question.
6. If a search returns unhelpful results, try rephrasing the query.
7. Do not make up information — always search for facts you are unsure about.
"""


# --- The ReAct Agent ---

class ReActAgent:
    """A minimal ReAct agent."""

    def __init__(self, model: str = "gpt-4o", max_steps: int = 10, verbose: bool = True):
        self.model = model
        self.max_steps = max_steps
        self.verbose = verbose

    def run(self, question: str) -> str:
        """Run the ReAct loop to answer a question."""

        messages = [
            {"role": "system", "content": REACT_SYSTEM_PROMPT},
            {"role": "user", "content": f"Question: {question}"},
        ]

        trajectory = []  # For logging

        for step in range(self.max_steps):
            if self.verbose:
                print(f"\n{'='*50}")
                print(f"Step {step + 1}")
                print(f"{'='*50}")

            # Get the next thought and action from the LLM
            response = client.chat.completions.create(
                model=self.model,
                messages=messages,
                temperature=0.0,
                max_tokens=500,
            )

            assistant_text = response.choices[0].message.content
            messages.append({"role": "assistant", "content": assistant_text})

            if self.verbose:
                print(assistant_text)

            # Parse the action
            action_match = re.search(r'Action:\s*(\w+)\[(.+?)\]', assistant_text)

            if not action_match:
                # The model did not follow the format — try to extract an answer
                if self.verbose:
                    print("  [Warning: Could not parse action, attempting recovery]")

                # Check if there is a Finish action without proper formatting
                finish_match = re.search(r'(?:answer|final answer)[:\s]+(.+)', assistant_text, re.IGNORECASE)
                if finish_match:
                    return finish_match.group(1).strip()

                # Ask the model to reformulate
                messages.append({
                    "role": "user",
                    "content": "Please follow the required format: Thought: <reasoning>\nAction: <ToolName>[<input>]"
                })
                continue

            tool_name = action_match.group(1)
            tool_input = action_match.group(2)

            # Check if this is the Finish action
            if tool_name == "Finish":
                trajectory.append({
                    "step": step + 1,
                    "thought": assistant_text.split("Action:")[0].strip(),
                    "action": f"Finish[{tool_input}]",
                    "observation": "DONE",
                })
                if self.verbose:
                    print(f"\nObservation: Final answer reached.")
                return tool_input

            # Execute the tool
            if tool_name in TOOLS:
                observation = TOOLS[tool_name](tool_input)
            else:
                observation = f"Error: Unknown tool '{tool_name}'. Available tools: {list(TOOLS.keys())}"

            if self.verbose:
                print(f"\nObservation: {observation}")

            # Record the trajectory
            trajectory.append({
                "step": step + 1,
                "thought": assistant_text.split("Action:")[0].strip(),
                "action": f"{tool_name}[{tool_input}]",
                "observation": observation,
            })

            # Feed the observation back
            messages.append({
                "role": "user",
                "content": f"Observation: {observation}"
            })

        return "Agent reached maximum steps without finding an answer."


# --- Usage ---
if __name__ == "__main__":
    agent = ReActAgent(verbose=True)

    # Example 1: Multi-hop question requiring multiple searches
    print("\n" + "=" * 70)
    print("QUESTION 1: Multi-hop factual question")
    print("=" * 70)
    answer = agent.run(
        "What is the elevation of the birthplace of the creator of Python?"
    )
    print(f"\nFINAL ANSWER: {answer}")

    # Example 2: Question requiring search + calculation
    print("\n" + "=" * 70)
    print("QUESTION 2: Search + calculation")
    print("=" * 70)
    answer = agent.run(
        "How many Eiffel Towers stacked on top of each other would it take to reach the cruising altitude of a commercial airplane (35,000 feet)?"
    )
    print(f"\nFINAL ANSWER: {answer}")

2.6 Analyzing the Implementation

Key design decisions in this ReAct agent:

Text-based tool dispatch: Instead of using the API's native function calling, we use a text format (Action: ToolName[input]) and parse it with regex. This is closer to the original ReAct paper and works with any LLM, including open-source models.
Strict format enforcement: The system prompt specifies the exact format, and we handle format violations gracefully (asking the model to try again).
Observation as user message: Tool results are fed back as "user" messages prefixed with "Observation:". This maintains the Thought-Action-Observation cycle.
Maximum steps: A hard limit prevents infinite loops.
Verbose mode: The trajectory is printed for debugging and education.

2.7 Limitations of ReAct

Understanding ReAct's limitations is important because they motivate the more sophisticated architectures that follow:

No explicit planning: ReAct is reactive; it takes one step at a time without a global plan. It is like navigating a city by looking at the next intersection rather than consulting a map. This works for short trips but leads to inefficient routes for long journeys.
No self-correction: If the agent takes a wrong turn, it does not explicitly reflect on what went wrong. It keeps moving forward, potentially building on flawed intermediate results.
Greedy: It follows a single path and does not explore alternatives. If the first approach does not work, the agent has no mechanism for trying a fundamentally different strategy.
Context growth: Each step adds tokens to the context, eventually hitting limits. For long tasks (20+ steps), the early context may be lost or degraded.

Try It Yourself: Run the ReAct implementation above with a multi-hop question like "What is the population of the country where the inventor of the World Wide Web was born?" Observe: (1) How many steps does the agent take? (2) Does it ever lose track of the original question? (3) If a search returns unhelpful results, does it recover? These observations will help you understand why more sophisticated architectures are sometimes needed.

The architectures in the rest of this lecture address these limitations.

043. Plan-and-Execute

3.1 The Idea

Plan-and-Execute separates planning from execution into two distinct phases:

Planning phase: The LLM generates a complete plan (list of steps) before taking any action.
Execution phase: Each step is executed, potentially with replanning if results deviate from expectations.

Interactive · Plan-and-Execute Architecture

Plan-Execute-Replan

A live plan

The agent plans, executes step by step, and replans the moment it drifts from the goal.

The red row simulates drift: the agent must fix the plan before continuing.

Goal

Deliver an executive report with verified data.

1 · Running

Scope the task

2 · Pending

Fetch data

3 · Pending

Analyse

4 · Pending

Draft

5 · Pending

Review & deliver

3.2 Implementation

python

"""
Plan-and-Execute agent architecture.

Separates high-level planning from step-by-step execution.
"""

import json
from openai import OpenAI

client = OpenAI()


def create_plan(question: str) -> list[str]:
    """Use the LLM to create a plan for answering the question."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a planning agent. Given a question, create a "
                    "step-by-step plan to answer it. Each step should be a "
                    "clear, actionable instruction.\n\n"
                    "Available tools:\n"
                    "- Search the web for information\n"
                    "- Calculate mathematical expressions\n"
                    "- Read files from the workspace\n\n"
                    "Return the plan as a JSON array of strings, where each "
                    "string is one step. Return ONLY the JSON array."
                )
            },
            {"role": "user", "content": f"Question: {question}"}
        ],
        response_format={"type": "json_object"},
        temperature=0.0,
    )

    result = json.loads(response.choices[0].message.content)
    # Handle both {"steps": [...]} and {"plan": [...]} formats
    if isinstance(result, dict):
        steps = result.get("steps") or result.get("plan") or list(result.values())[0]
    else:
        steps = result
    return steps


def execute_step(step: str, context: str, tools: list[dict]) -> str:
    """Execute a single step of the plan, using tools as needed."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are an execution agent. Complete the given step "
                    "using the available tools. Provide the result clearly.\n\n"
                    f"Context from previous steps:\n{context}"
                )
            },
            {"role": "user", "content": f"Execute this step: {step}"}
        ],
        tools=tools,
        tool_choice="auto",
        temperature=0.0,
    )
    return response.choices[0].message.content or "[Tool call made]"


def should_replan(original_plan: list[str], completed_steps: list[dict], remaining_steps: list[str]) -> tuple[bool, list[str]]:
    """Check if the plan needs adjustment based on execution results."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a replanning agent. Given the original plan and "
                    "the results of completed steps, determine if the remaining "
                    "steps need to be adjusted.\n\n"
                    "Respond with a JSON object:\n"
                    '{"needs_replan": true/false, "new_remaining_steps": [...]}\n\n'
                    "Only set needs_replan to true if the results significantly "
                    "deviate from what was expected."
                )
            },
            {
                "role": "user",
                "content": json.dumps({
                    "original_plan": original_plan,
                    "completed": completed_steps,
                    "remaining": remaining_steps,
                })
            }
        ],
        response_format={"type": "json_object"},
        temperature=0.0,
    )

    result = json.loads(response.choices[0].message.content)
    return result.get("needs_replan", False), result.get("new_remaining_steps", remaining_steps)


def plan_and_execute(question: str, tools: list[dict], verbose: bool = True) -> str:
    """Run the Plan-and-Execute agent."""

    # Phase 1: Create the plan
    if verbose:
        print("PHASE 1: PLANNING")
        print("-" * 40)

    plan = create_plan(question)

    if verbose:
        for i, step in enumerate(plan):
            print(f"  Step {i+1}: {step}")

    # Phase 2: Execute each step
    if verbose:
        print(f"\nPHASE 2: EXECUTION")
        print("-" * 40)

    completed = []
    remaining = list(plan)
    context = ""

    while remaining:
        current_step = remaining.pop(0)
        step_num = len(completed) + 1

        if verbose:
            print(f"\n  Executing Step {step_num}: {current_step}")

        result = execute_step(current_step, context, tools)

        if verbose:
            print(f"  Result: {result[:200]}...")

        completed.append({"step": current_step, "result": result})
        context += f"\nStep {step_num}: {current_step}\nResult: {result}\n"

        # Check if replanning is needed (every 2 steps)
        if remaining and len(completed) % 2 == 0:
            needs_replan, new_remaining = should_replan(plan, completed, remaining)
            if needs_replan:
                if verbose:
                    print(f"\n  [REPLANNING] Adjusting remaining steps")
                    for i, step in enumerate(new_remaining):
                        print(f"    New Step {len(completed)+i+1}: {step}")
                remaining = new_remaining

    # Phase 3: Synthesize the final answer
    if verbose:
        print(f"\nPHASE 3: SYNTHESIS")
        print("-" * 40)

    synthesis = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": "Synthesize the results of the completed steps into a clear, complete answer to the original question."
            },
            {
                "role": "user",
                "content": f"Question: {question}\n\nCompleted steps and results:\n{context}"
            }
        ],
        temperature=0.0,
    )

    return synthesis.choices[0].message.content

3.3 When to Use Plan-and-Execute

Scenario	ReAct	Plan-and-Execute
Simple factual questions	Preferred (less overhead)	Overkill
Multi-step research tasks	Works but may lose focus	Preferred (stays on track)
Tasks with clear dependencies	Adequate	Preferred (explicit ordering)
Highly dynamic tasks	Preferred (more adaptive)	May plan poorly if environment changes
Tasks requiring many tools	Can struggle with selection	Better (plan clarifies which tools)

054. Reflexion: Verbal Reinforcement Learning

4.1 The Paper

Shinn et al. (2023) introduced Reflexion in "Reflexion: Language Agents with Verbal Reinforcement Learning," published at NeurIPS 2023. The key insight: agents can improve by reflecting on their failures in natural language.

4.2 The Architecture

Reflexion adds a self-reflection step after task completion:

Interactive · The Reflexion Architecture

Reflexion

Learn from failure, no retraining

An actor agent generates, a critic flags the error, and the actor rewrites the next attempt using a verbal reflection.

Actor

Critic

Attempt Feedback

4.3 How Reflexion Works

Attempt: The agent tries to complete the task.
Evaluate: An evaluator checks if the result is correct (this can be automated for tasks with ground truth, or use an LLM for open-ended tasks).
Reflect: If the attempt failed, the agent generates a natural language reflection analyzing what went wrong and how to improve.
Retry: The agent tries again, with previous reflections added to its context.

4.4 Implementation

python

"""
Reflexion agent: learns from failures through self-reflection.
"""

import json
from openai import OpenAI

client = OpenAI()


class ReflexionAgent:
    """An agent that improves through verbal self-reflection."""

    def __init__(self, model: str = "gpt-4o", max_attempts: int = 3):
        self.model = model
        self.max_attempts = max_attempts
        self.reflections = []  # Persistent memory of past reflections

    def attempt(self, task: str, attempt_num: int) -> str:
        """Make one attempt at the task."""

        reflection_context = ""
        if self.reflections:
            reflection_context = "\n\nLessons from previous attempts:\n"
            for i, ref in enumerate(self.reflections):
                reflection_context += f"\nAttempt {i+1} reflection:\n{ref}\n"

        response = client.chat.completions.create(
            model=self.model,
            messages=[
                {
                    "role": "system",
                    "content": (
                        "You are a problem-solving agent. Solve the given task carefully. "
                        "Think step by step.\n"
                        f"{reflection_context}"
                        "\nIf there are lessons from previous attempts, "
                        "make sure to incorporate them into your approach."
                    )
                },
                {"role": "user", "content": task}
            ],
            temperature=0.0 if attempt_num == 0 else 0.3,  # More exploration on retries
        )

        return response.choices[0].message.content

    def evaluate(self, task: str, response: str) -> tuple[bool, str]:
        """Evaluate whether the response correctly addresses the task."""
        eval_response = client.chat.completions.create(
            model=self.model,
            messages=[
                {
                    "role": "system",
                    "content": (
                        "You are a strict evaluator. Assess whether the response "
                        "correctly and completely answers the task. "
                        "Respond with JSON: {\"correct\": true/false, \"feedback\": \"explanation\"}"
                    )
                },
                {
                    "role": "user",
                    "content": f"Task: {task}\n\nResponse:\n{response}"
                }
            ],
            response_format={"type": "json_object"},
            temperature=0.0,
        )

        result = json.loads(eval_response.choices[0].message.content)
        return result.get("correct", False), result.get("feedback", "")

    def reflect(self, task: str, response: str, feedback: str) -> str:
        """Generate a reflection on what went wrong and how to improve."""
        reflection_response = client.chat.completions.create(
            model=self.model,
            messages=[
                {
                    "role": "system",
                    "content": (
                        "You are a self-reflection agent. Analyze the failed attempt "
                        "and provide specific, actionable insights for improvement.\n\n"
                        "Your reflection should:\n"
                        "1. Identify the specific error or gap.\n"
                        "2. Explain WHY the error occurred.\n"
                        "3. Provide a concrete strategy for the next attempt.\n"
                        "Be concise but specific."
                    )
                },
                {
                    "role": "user",
                    "content": (
                        f"Task: {task}\n\n"
                        f"My response:\n{response}\n\n"
                        f"Evaluation feedback: {feedback}"
                    )
                }
            ],
            temperature=0.0,
        )

        return reflection_response.choices[0].message.content

    def run(self, task: str, verbose: bool = True) -> str:
        """Run the Reflexion loop."""

        for attempt_num in range(self.max_attempts):
            if verbose:
                print(f"\n{'='*50}")
                print(f"ATTEMPT {attempt_num + 1}")
                print(f"{'='*50}")

            # Step 1: Make an attempt
            response = self.attempt(task, attempt_num)
            if verbose:
                print(f"\nResponse:\n{response[:500]}...")

            # Step 2: Evaluate
            correct, feedback = self.evaluate(task, response)
            if verbose:
                print(f"\nEvaluation: {'PASS' if correct else 'FAIL'}")
                print(f"Feedback: {feedback}")

            if correct:
                if verbose:
                    print(f"\nTask completed successfully on attempt {attempt_num + 1}!")
                return response

            # Step 3: Reflect (only if failed and more attempts remain)
            if attempt_num < self.max_attempts - 1:
                reflection = self.reflect(task, response, feedback)
                self.reflections.append(reflection)
                if verbose:
                    print(f"\nReflection:\n{reflection}")

        if verbose:
            print(f"\nMax attempts ({self.max_attempts}) reached.")
        return response  # Return the last attempt


# Usage
if __name__ == "__main__":
    agent = ReflexionAgent(max_attempts=3)

    task = """
    Write a Python function that takes a list of integers and returns
    the longest increasing subsequence. The function should handle
    edge cases: empty list, single element, all same elements,
    and already sorted list. Include type hints and a docstring.
    """

    result = agent.run(task)
    print(f"\nFinal result:\n{result}")

4.5 Key Insight: Verbal Reinforcement

Traditional RL uses scalar rewards (numbers) to update model weights. Reflexion uses verbal feedback (natural language reflections) stored in the agent's context. This is a genuinely novel idea that deserves careful attention.

Key Insight: Reflexion is "learning" without changing any model parameters. The model's weights are frozen; what changes is the context the model operates in. By adding reflections like "Last time I forgot to handle the edge case of empty input" to the context, the model's effective behavior changes even though the model itself has not changed. This is analogous to a student who does not get smarter between attempts but takes better notes about their mistakes.

This approach is powerful because:

No weight updates needed (works with frozen models, including closed-source APIs).
Reflections carry rich information (not just "good" or "bad" but "wrong because X, try Y instead"). A scalar reward of 0.3 tells the model almost nothing; a reflection like "The function failed on empty lists because I used index [0] without checking length" tells it exactly what to fix.
Reflections accumulate across attempts, creating a growing knowledge base that makes each subsequent attempt more informed.

4.6 Limitations of Reflexion

Requires a reliable evaluator (hard for open-ended tasks).
Multiple attempts multiply cost and latency.
Context window fills up with reflections on long tasks.
Works best when the model "almost" gets it right — does not help if the model fundamentally lacks the capability.

065. Language Agent Tree Search (LATS)

5.1 The Paper

Zhou et al. (2024) introduced LATS (Language Agent Tree Search) in "Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models," published at ICML 2024. LATS combines tree search (like Monte Carlo Tree Search used in AlphaGo) with LLM agents.

5.2 The Core Idea

LATS treats agent decision-making as a tree search problem:

Interactive · Language Agent Tree Search (LATS)

LATS

Evaluation-guided search

Language Agent Tree Search expands promising nodes via UCB and prunes the rest. Click Expand to grow the tree.

3 / 7

Nodes with high UCB light up: those are the paths LATS prioritises.

At each node:

Select: Choose the most promising node to expand (based on UCB1 or similar criteria).
Expand: Generate possible next actions.
Evaluate: Assess how promising each action is.
Backpropagate: Update the value estimates of parent nodes.

5.3 How LATS Differs from Other Architectures

Feature	ReAct	Tree of Thoughts	LATS
Search strategy	Linear (greedy)	Breadth-first/DFS	Monte Carlo Tree Search
Environment interaction	Yes (tools)	No (pure reasoning)	Yes (tools + reasoning)
Backtracking	No	Yes	Yes
Value estimation	None	LLM-based evaluation	LLM evaluation + environmental feedback
Reflection	No	No	Yes (on failed paths)

5.4 Simplified LATS Concept

python

"""
Simplified LATS-inspired agent.

This is a conceptual implementation showing the key ideas.
A full LATS implementation requires more infrastructure
(proper tree data structures, UCB1 selection, etc.).
"""

import json
import math
from dataclasses import dataclass, field
from openai import OpenAI

client = OpenAI()


@dataclass
class Node:
    """A node in the search tree."""
    state: str                    # Description of current state
    action: str = ""              # Action that led to this state
    observation: str = ""         # Result of the action
    value: float = 0.0            # Estimated value of this node
    visits: int = 0               # Number of times this node was visited
    children: list = field(default_factory=list)
    parent: object = None         # Parent node
    depth: int = 0
    is_terminal: bool = False

    def ucb1(self, exploration_weight: float = 1.4) -> float:
        """Upper Confidence Bound for tree selection."""
        if self.visits == 0:
            return float('inf')  # Unexplored nodes have highest priority
        exploitation = self.value / self.visits
        exploration = exploration_weight * math.sqrt(
            math.log(self.parent.visits) / self.visits
        )
        return exploitation + exploration


class LATSAgent:
    """Language Agent Tree Search."""

    def __init__(self, model: str = "gpt-4o", max_iterations: int = 10,
                 max_depth: int = 5, n_children: int = 3):
        self.model = model
        self.max_iterations = max_iterations
        self.max_depth = max_depth
        self.n_children = n_children

    def generate_actions(self, node: Node, task: str) -> list[str]:
        """Generate possible next actions from the current state."""
        response = client.chat.completions.create(
            model=self.model,
            messages=[
                {
                    "role": "system",
                    "content": (
                        f"You are solving: {task}\n\n"
                        f"Current state: {node.state}\n\n"
                        f"Generate exactly {self.n_children} different possible next actions. "
                        f"Each should be a distinct approach. "
                        f"Return as JSON: {{\"actions\": [\"action1\", \"action2\", ...]}}"
                    )
                },
                {"role": "user", "content": "What are the possible next actions?"}
            ],
            response_format={"type": "json_object"},
            temperature=0.8,
        )

        result = json.loads(response.choices[0].message.content)
        return result.get("actions", [])[:self.n_children]

    def evaluate_state(self, node: Node, task: str) -> float:
        """Evaluate how promising the current state is (0-1)."""
        response = client.chat.completions.create(
            model=self.model,
            messages=[
                {
                    "role": "system",
                    "content": (
                        "Evaluate how close this state is to solving the task. "
                        "Return JSON: {\"score\": 0.0-1.0, \"is_solution\": true/false, \"reasoning\": \"...\"}"
                    )
                },
                {
                    "role": "user",
                    "content": f"Task: {task}\n\nCurrent state:\n{node.state}"
                }
            ],
            response_format={"type": "json_object"},
            temperature=0.0,
        )

        result = json.loads(response.choices[0].message.content)
        node.is_terminal = result.get("is_solution", False)
        return result.get("score", 0.5)

    def select(self, root: Node) -> Node:
        """Select the most promising leaf node using UCB1."""
        node = root
        while node.children:
            # Select child with highest UCB1 score
            node = max(node.children, key=lambda c: c.ucb1())
        return node

    def backpropagate(self, node: Node, value: float):
        """Update value estimates up the tree."""
        current = node
        while current is not None:
            current.visits += 1
            current.value += value
            current = current.parent

    def run(self, task: str, verbose: bool = True) -> str:
        """Run LATS to solve a task."""

        # Initialize the root
        root = Node(state=f"Task: {task}\nNo actions taken yet.", visits=1)

        best_solution = None
        best_score = 0.0

        for iteration in range(self.max_iterations):
            if verbose:
                print(f"\n--- LATS Iteration {iteration + 1} ---")

            # 1. SELECT: Choose the most promising leaf
            leaf = self.select(root)

            if leaf.depth >= self.max_depth:
                if verbose:
                    print(f"  Max depth reached at this branch.")
                self.backpropagate(leaf, 0.0)
                continue

            # 2. EXPAND: Generate child actions
            actions = self.generate_actions(leaf, task)
            if verbose:
                print(f"  Generated {len(actions)} actions")

            for action in actions:
                new_state = f"{leaf.state}\n\nAction: {action}\n"
                child = Node(
                    state=new_state,
                    action=action,
                    parent=leaf,
                    depth=leaf.depth + 1,
                )
                leaf.children.append(child)

            # 3. EVALUATE: Score each new child
            for child in leaf.children:
                score = self.evaluate_state(child, task)
                child.value = score

                if verbose:
                    print(f"  Action: {child.action[:60]}... Score: {score:.2f}"
                          f"{' [SOLUTION]' if child.is_terminal else ''}")

                # Track the best solution
                if child.is_terminal and score > best_score:
                    best_solution = child
                    best_score = score

                # 4. BACKPROPAGATE
                self.backpropagate(child, score)

            # Early termination if we found a good solution
            if best_solution and best_score > 0.9:
                if verbose:
                    print(f"\n  Found high-quality solution (score: {best_score:.2f})")
                break

        if best_solution:
            return best_solution.state
        else:
            # Return the highest-valued path
            best_leaf = self.select(root)
            return best_leaf.state

5.5 When to Use LATS

LATS excels when:

The task has multiple valid solution paths and some are much better than others.
Backtracking is valuable (exploring one path may reveal it is a dead end).
The evaluation function is reliable enough to guide search.
You can afford the computational cost (LATS makes many more LLM calls than ReAct).

LATS is overkill when:

The task is straightforward with a single obvious approach.
Latency requirements are tight.
Budget is constrained.

076. Comparing Architectures

6.1 Summary Table

Architecture	Planning	Reflection	Exploration	Tool Use	LLM Calls	Best For
ReAct	Implicit (step-by-step)	None	Single path	Yes	Low (1 per step)	Simple multi-step tasks
Plan-and-Execute	Explicit upfront	Via replanning	Single path (replanable)	Yes	Medium	Complex tasks with clear structure
Reflexion	Implicit	Explicit after failure	Multiple attempts	Optional	Medium-High	Tasks with clear success criteria
LATS	Implicit per branch	Via backpropagation	Tree search	Yes	High	Complex tasks with multiple approaches

6.2 Decision Framework

This decision tree is a practical guide for choosing an architecture. Print it out and put it on your wall if you are building agents regularly:

In practice, the answer is often "start with ReAct and add complexity as needed." Most agent tasks do not require the full power of LATS, and the additional cost is hard to justify unless errors are very expensive to fix.

6.3 Cost Comparison

For a task requiring ~5 effective steps:

Architecture	Approximate LLM Calls	Relative Cost
ReAct	5-10	1x
Plan-and-Execute	8-15	1.5x
Reflexion (2 attempts)	15-25	2.5x
LATS (3 iterations, 3 branches)	30-50	5x

087. Implementation Patterns in Frameworks

7.1 LangGraph

LangGraph (by LangChain) models agent workflows as directed graphs, where nodes are processing steps and edges define transitions.

python

"""
Conceptual LangGraph-style implementation.
(Simplified for educational purposes — actual LangGraph API may differ.)
"""

# LangGraph models agents as state machines with nodes and edges

# Node: A function that processes state and returns updated state
# Edge: A conditional routing function that determines the next node

from typing import TypedDict, Literal

class AgentState(TypedDict):
    """The state that flows through the graph."""
    messages: list[dict]
    plan: list[str]
    current_step: int
    results: list[str]
    status: str  # "planning", "executing", "reflecting", "done"


def planner_node(state: AgentState) -> AgentState:
    """Generate a plan based on the user's query."""
    # ... LLM call to create plan ...
    state["plan"] = ["step 1", "step 2", "step 3"]
    state["status"] = "executing"
    state["current_step"] = 0
    return state


def executor_node(state: AgentState) -> AgentState:
    """Execute the current step of the plan."""
    step = state["plan"][state["current_step"]]
    # ... Execute step with tools ...
    result = f"Result of: {step}"
    state["results"].append(result)
    state["current_step"] += 1
    return state


def reflector_node(state: AgentState) -> AgentState:
    """Reflect on results and decide whether to continue or replan."""
    # ... LLM evaluates progress ...
    if state["current_step"] >= len(state["plan"]):
        state["status"] = "done"
    return state


def router(state: AgentState) -> Literal["executor", "reflector", "done"]:
    """Route to the next node based on current state."""
    if state["status"] == "done":
        return "done"
    elif state["current_step"] < len(state["plan"]):
        return "executor"
    else:
        return "reflector"


# The graph structure:
#
#   START → planner → executor ◄──┐
#                  │               │
#                  └──▶ reflector ─┘
#                         │
#                         └──▶ END (when done)

7.2 Why Graph-Based Architectures?

Explicit control flow: The graph makes the agent's decision process visible and controllable.
State management: State is passed explicitly through nodes, making debugging easier.
Conditional routing: Different paths for different situations (success, failure, uncertainty).
Composability: Nodes can be reused across different agent workflows.
Human-in-the-loop: Specific nodes can pause for human approval.

7.3 Common Graph Patterns

Linear Pipeline:

Simple, predictable. Good for well-understood workflows.

Loop with Exit:

The ReAct pattern. Loop until the task is complete.

Branch and Merge:

Parallel processing. Execute independent steps concurrently.

Hierarchical:

Multi-agent pattern. An orchestrator delegates to specialized sub-agents.

098. Modern Agent Frameworks and Their Architectures

The architectural patterns described in Sections 2-6 have been implemented in several production-grade frameworks. Understanding these frameworks matters because they encode design decisions about state management, control flow, and inter-agent communication that directly affect how agents behave.

8.1 LangGraph: Graph-Based State Machines

LangGraph (by LangChain) models agent workflows as stateful directed graphs. It is the most explicit framework about control flow.

Core concepts:

Nodes are Python functions that receive and return state. Each node performs one logical step (call the LLM, execute a tool, evaluate a result).
Edges define transitions between nodes. They can be unconditional (always go from A to B) or conditional (go to B if condition X, otherwise go to C).
State is a typed dictionary that flows through the graph and accumulates information.
Checkpoints allow saving and restoring graph state, enabling human-in-the-loop workflows and long-running agents.

python

"""
LangGraph ReAct implementation (conceptual).

This shows how LangGraph maps the ReAct pattern onto a graph structure.
"""

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
from operator import add

class AgentState(TypedDict):
    messages: list[dict]
    tool_calls: list[dict]
    tool_results: Annotated[list[str], add]  # Accumulates across steps

def call_model(state: AgentState) -> AgentState:
    """Node: Call the LLM with current messages."""
    # ... LLM call that may produce tool calls or a final answer ...
    return {"messages": state["messages"] + [response]}

def execute_tools(state: AgentState) -> AgentState:
    """Node: Execute any pending tool calls."""
    # ... Execute tools, append results to messages ...
    return {"tool_results": [result]}

def should_continue(state: AgentState) -> str:
    """Edge: Decide whether to continue or finish."""
    last_message = state["messages"][-1]
    if last_message.get("tool_calls"):
        return "tools"      # Go to execute_tools node
    return "end"            # Go to END

# Build the graph
graph = StateGraph(AgentState)
graph.add_node("agent", call_model)
graph.add_node("tools", execute_tools)
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", should_continue, {"tools": "tools", "end": END})
graph.add_edge("tools", "agent")  # After tools, always go back to agent

app = graph.compile()

How LangGraph implements the patterns from Sections 2-5:

ReAct: A two-node loop (agent node and tools node) with a conditional edge.
Plan-and-Execute: Three nodes (planner, executor, replanner) with conditional routing.
Reflexion: Add a reflector node that evaluates results and either routes to END or back to the agent with reflection context.
LATS: Use LangGraph's branching to explore multiple paths, with a selector node to choose the best branch.

LangGraph's strength is explicitness: the graph structure makes the agent's control flow visible, debuggable, and testable. Its weakness is verbosity: simple agents require more boilerplate than other frameworks.

8.2 OpenAI Agents SDK: Agent-as-a-Function with Handoffs

The OpenAI Agents SDK (released March 2025) takes a different approach: agents are lightweight objects that can hand off control to other agents.

Core concepts:

An Agent is defined by a name, instructions (system prompt), a set of tools, and optionally a list of other agents it can hand off to.
A Handoff is a special tool call where one agent transfers the conversation to another, more specialized agent.
The Runner manages the execution loop, handling tool calls and handoffs transparently.
Guardrails are input/output validators that run in parallel with the agent.

python

"""
OpenAI Agents SDK: Multi-agent handoff pattern (conceptual).

A triage agent delegates to specialized agents based on the user's request.
"""

from agents import Agent, Runner

# Specialized agents
coding_agent = Agent(
    name="Coding Assistant",
    instructions="You are an expert Python developer. Help with code questions.",
    tools=[execute_code, read_file, write_file],
)

research_agent = Agent(
    name="Research Assistant",
    instructions="You are a research assistant. Search for and synthesize information.",
    tools=[web_search, arxiv_search, summarize],
)

# Triage agent that routes to specialists
triage_agent = Agent(
    name="Triage Agent",
    instructions=(
        "You are a triage agent. Determine whether the user needs help with "
        "coding or research, then hand off to the appropriate specialist."
    ),
    handoffs=[coding_agent, research_agent],
)

# Run the agent system
result = Runner.run_sync(triage_agent, "Help me implement a binary search tree in Python")
# The triage agent hands off to coding_agent, which handles the request

Key architectural insight: The Agents SDK implements the Hierarchical graph pattern (Section 7.3) as a first-class concept. Instead of building a graph with an orchestrator node, you define agent relationships declaratively. This makes multi-agent systems easier to build but provides less fine-grained control over the execution flow than LangGraph.

8.3 CrewAI: Role-Based Agent Teams

CrewAI organizes agents as a team (crew) where each agent has a defined role, goal, and backstory. It is inspired by organizational metaphors rather than graph theory.

Core concepts:

An Agent has a role (e.g., "Senior Researcher"), a goal, and a backstory that shapes its behavior.
A Task is a specific piece of work assigned to an agent, with an expected output format.
A Crew is a collection of agents and tasks, with a defined process (sequential or hierarchical).
In sequential mode, tasks are executed one after another, with each task's output available to subsequent tasks.
In hierarchical mode, a manager agent delegates tasks to crew members dynamically.

python

"""
CrewAI: Role-based agent team (conceptual).

A research crew with complementary roles.
"""

from crewai import Agent, Task, Crew, Process

researcher = Agent(
    role="Senior Research Analyst",
    goal="Find and analyze the latest papers on agentic AI architectures",
    backstory="You are a senior NLP researcher with 10 years of experience.",
    tools=[arxiv_search, semantic_scholar_search],
)

writer = Agent(
    role="Technical Writer",
    goal="Synthesize research findings into a clear, structured report",
    backstory="You are a technical writer who specializes in making complex AI topics accessible.",
    tools=[],
)

# Tasks
research_task = Task(
    description="Find the 5 most influential papers on agent architectures published in 2024-2025.",
    expected_output="A list of papers with summaries and key contributions.",
    agent=researcher,
)

writing_task = Task(
    description="Write a 2-page summary of the research findings.",
    expected_output="A structured report in Markdown format.",
    agent=writer,
)

# The crew runs tasks sequentially: researcher first, then writer
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential,
)

result = crew.kickoff()

How CrewAI maps to architectural patterns:

Sequential process: Essentially a Plan-and-Execute architecture where the plan is defined by the task list.
Hierarchical process: A manager agent dynamically assigns work, similar to the orchestrator pattern.
CrewAI's role-based design encourages specialization, which can improve output quality by giving each agent focused instructions rather than one agent handling everything.

8.4 Framework Comparison

Feature	LangGraph	OpenAI Agents SDK	CrewAI
Paradigm	Graph-based state machine	Agent-as-a-function + handoffs	Role-based teams
Control flow	Explicit (edges/conditions)	Implicit (handoffs)	Defined by process type
State management	Typed state dict	Conversation messages	Task outputs
Multi-agent	Via graph nodes	Via handoffs	Via crew/roles
Best for	Complex, custom workflows	Multi-agent delegation	Structured team workflows
Learning curve	Higher	Lower	Lower
Flexibility	Very high	Moderate	Moderate
LLM provider	Any	OpenAI (primarily)	Any

8.5 Choosing a Framework

The choice depends on what you need:

Use LangGraph when you need fine-grained control over agent behavior, custom state management, human-in-the-loop checkpoints, or non-standard control flow. It is the most flexible but also the most verbose.
Use OpenAI Agents SDK when you are building a system of specialized agents that delegate to each other, and you are primarily using OpenAI models. It is elegant for the handoff pattern.
Use CrewAI when the task naturally decomposes into roles (researcher, writer, reviewer) and you want quick prototyping of multi-agent workflows.
Use no framework (as in Section 2.5) when the task is simple enough that a framework adds complexity without benefit, or when you need maximum control for a production system.

109. Emerging Patterns (2025-2026)

The agent landscape is evolving rapidly. Several architectural patterns have emerged that are not yet covered by the foundational papers (ReAct, Reflexion, LATS) but are increasingly important in practice.

9.1 Agent-as-a-Service

Traditional agents run within a single application. Agent-as-a-Service exposes agent capabilities through APIs, allowing other systems (including other agents) to invoke them remotely.

Key characteristics:

Agents are deployed as independent services with well-defined interfaces.
Communication happens via protocols like Google's Agent-to-Agent (A2A) protocol or simple REST/gRPC APIs.
Each agent can use different models, tools, and architectures internally.
Service agents can be maintained, scaled, and updated independently.

This pattern connects to the multi-agent communication protocols covered in Week 8. The architectural implication is that agent interfaces must be designed for composability: clear input/output schemas, error handling, and capability discovery.

9.2 Agentic RAG

Standard Retrieval-Augmented Generation (RAG) follows a fixed pattern: retrieve documents, then generate. Agentic RAG gives the agent control over the retrieval process itself.

Interactive · Agentic RAG: Agent-Controlled Retrieval

RAG pipeline

From document to answer

Index, retrieve, rerank, and augment generation. The first two steps live offline; the rest run on every query.

Chunking

Split documents into semantically coherent pieces.

Offline indexingOnline query

QueryAnswer

Key capabilities of Agentic RAG:

Query planning: The agent decomposes complex questions into multiple retrieval queries.
Source selection: The agent chooses which knowledge bases, databases, or APIs to query.
Result evaluation: The agent assesses whether retrieved documents are relevant and sufficient.
Iterative refinement: If initial results are poor, the agent reformulates queries or tries different sources.
Multi-hop retrieval: The agent uses information from one retrieval to inform subsequent queries.

This pattern connects to the tool use and memory topics from Weeks 4 and 7. Architecturally, Agentic RAG is typically implemented as a ReAct agent where the available tools include multiple retrieval endpoints, and the agent's reasoning determines when and how to use them.

9.3 Self-Improving Agents

Most agent architectures treat each task execution as independent. Self-improving agents maintain persistent memory of their successes and failures, using this history to improve performance over time.

This differs from Reflexion in an important way: Reflexion reflects within a single task (across retries), while self-improving agents learn across tasks over time. The experience memory persists between separate task executions.

Implementation approaches:

Episodic memory: Store complete task trajectories (actions taken, outcomes, reflections) in a vector database. Before starting a new task, retrieve similar past episodes to inform strategy.
Learned rules: After each task, extract generalizable heuristics ("When the API returns a 429 error, wait and retry rather than switching to a different approach") and store them as reusable rules.
Prompt evolution: Periodically revise the agent's system prompt or tool descriptions based on accumulated performance data.

Challenges:

Memory curation: Over time, the experience store grows large and may contain contradictory lessons. The agent needs a mechanism for pruning or prioritizing experiences.
Generalization: A strategy that worked for one task may not transfer to another. Retrieving irrelevant experiences can degrade performance.
Evaluation reliability: Self-improvement requires accurate evaluation of outcomes. If the evaluator is unreliable, the agent may learn the wrong lessons.

Self-improving agents are an active area of research. They point toward a future where agents continuously get better at their jobs, much like human professionals who accumulate expertise over years of practice.

1110. Hybrid and Custom Architectures

10.1 Combining Patterns

Real-world agents often combine elements from multiple architectures:

ReAct + Reflexion:

python

"""
Hybrid: ReAct agent with Reflexion-style self-correction.

Uses ReAct for step-by-step execution, but adds a reflection
step if the agent detects it is going in circles or making errors.
"""

class HybridAgent:
    def __init__(self):
        self.react_steps = []
        self.reflections = []
        self.failed_actions = set()

    def run(self, task: str, max_react_steps: int = 10, max_retries: int = 3) -> str:
        for retry in range(max_retries):
            # Run ReAct
            result = self._react_loop(task, max_react_steps)

            # Evaluate
            success, feedback = self._evaluate(task, result)

            if success:
                return result

            # Reflect and retry
            reflection = self._reflect(task, result, feedback)
            self.reflections.append(reflection)
            self.react_steps = []  # Reset for next attempt

        return result  # Return best effort

Plan-and-Execute + LATS: Use Plan-and-Execute for the overall structure, but use tree search for individual steps that are particularly complex or uncertain.

10.2 Architecture Selection Heuristics

When designing a custom agent, consider:

Task complexity: Simple tasks need simple architectures. Do not over-engineer.
Error tolerance: If errors are costly, add reflection and verification steps.
Latency requirements: Each architectural layer adds latency. Strip unnecessary layers.
Budget: More sophisticated architectures cost more. Calculate the expected cost per task.
Observability: Can you evaluate success programmatically? If yes, Reflexion works well. If not, you need human-in-the-loop.

10.3 The "Good Enough" Principle

A common mistake in agent design is over-engineering. It is tempting to implement LATS because it sounds impressive, but if your task requires 3 tool calls and has a clear success criterion, ReAct with retry logic will serve you better at 1/10th the cost.

Start with the simplest architecture that works:

Start with ReAct: Implement the basic thought-action-observation loop. Test it on your use cases.
Add planning if needed: If the agent frequently loses track of the overall goal (symptoms: random tool calls, repeated work, incomplete outputs), add a planning step.
Add reflection if needed: If the agent makes recurring mistakes (symptoms: the same type of error across different tasks), add self-reflection.
Add tree search if needed: If the agent often takes suboptimal paths (symptoms: correct eventual answer but wasteful execution), add exploration.

Each addition should be motivated by observed failures in the simpler approach. This is the engineering equivalent of "premature optimization is the root of all evil": premature architectural complexity wastes development time and inference budget.

Key Insight: The best architecture is the simplest one that meets your reliability requirements. An elegant LATS implementation that costs $5 per task is worse than a crude ReAct loop that costs$ 0.10 per task, unless the task is so critical that the 50x cost difference is justified by the improvement in reliability.

1211. The Observe-Think-Act Loop in Practice

11.1 Real-World Agent Traces

Let us examine how a real-world coding agent (conceptually similar to Claude Code) handles a task:

text

User: "Fix the failing test in test_auth.py"

OBSERVE: Read the test file to understand what is being tested.
  → read_file("test_auth.py")
  → Found: test_login_valid_credentials is failing

THINK: The test expects a 200 status code but gets 401.
This suggests the authentication logic has a bug.
I should look at the auth module.

ACT: Read the authentication module.
  → read_file("auth/login.py")

OBSERVE: The login function checks password but has a typo:
  Line 42: if password = stored_hash:  (assignment, not comparison!)

THINK: Found the bug — single = instead of ==.
I need to fix this and verify the test passes.

ACT: Edit the file.
  → edit_file("auth/login.py", line=42, new="    if password == stored_hash:")

OBSERVE: File saved successfully.

THINK: Now I should run the test to verify the fix works.

ACT: Run the test.
  → execute("python -m pytest test_auth.py::test_login_valid_credentials -v")

OBSERVE: test_login_valid_credentials PASSED (1 passed in 0.3s)

THINK: The test passes now. The fix is correct. I should also run
the full test suite to make sure I did not break anything.

ACT: Run all tests.
  → execute("python -m pytest test_auth.py -v")

OBSERVE: All 12 tests passed.

THINK: All tests pass. The fix is complete.

RESPOND: "Fixed the bug in auth/login.py line 42: changed single = (assignment)
to == (comparison) in the password check. All 12 tests pass."

11.2 Patterns in Practice

Notice several patterns in this trace:

Incremental understanding: The agent builds understanding step by step, not all at once.
Hypothesis-driven: The agent forms hypotheses ("the auth module probably has a bug") and tests them.
Verification: The agent does not stop after making the fix — it verifies the result.
Defensive checking: Running the full test suite, not just the failing test, catches regressions.

1312. Discussion Questions

Architecture vs. model capability: As LLMs get more capable (GPT-5, Claude 4, etc.), will complex architectures like LATS become unnecessary? Or will they remain valuable regardless of model capability?
The Reflexion paradox: If the model could correctly reflect on what went wrong, why did it make the mistake in the first place? Is self-reflection genuinely adding new information, or is it just giving the model another chance with a different random seed?
When to stop searching: In LATS, how do you decide when you have explored enough and should commit to a solution? What is the trade-off between exploration and exploitation in agent decision-making?
Human-in-the-loop placement: In a Plan-and-Execute architecture, where should human checkpoints be placed? After planning? After each step? Only on error? How does this affect agent autonomy and efficiency?
Architecture transferability: If you design an agent architecture that works well for coding tasks, how well would it transfer to other domains (medical diagnosis, legal research, creative writing)? What would need to change?

1413. Summary and Key Takeaways

Agent architectures provide structured patterns for how LLMs reason, act, and learn from feedback. They determine how the model's capabilities are channeled into effective behavior.
ReAct (Yao et al., 2023) is the foundational pattern: interleave reasoning (Thought) with actions (Action) and environmental feedback (Observation). It is simple, effective, and the starting point for most agents.
Plan-and-Execute separates planning from execution, creating a more structured approach that works well for complex tasks with clear sub-goals.
Reflexion (Shinn et al., 2023) adds self-correction through verbal reflection on failures. It enables agents to improve across attempts without weight updates.
LATS (Zhou et al., 2024) applies Monte Carlo Tree Search to agent decision-making, enabling systematic exploration of alternative paths. It is powerful but expensive.
The observe-think-act loop is the common thread across all architectures. Differences lie in how much thinking, exploring, and reflecting happens around each action.
Start simple: Begin with ReAct and add complexity only when motivated by observed failures. Over-engineering agent architectures wastes resources without improving outcomes.
Modern frameworks (LangGraph, OpenAI Agents SDK, CrewAI) implement these patterns with different paradigms: graph-based state machines, agent-as-a-function with handoffs, and role-based teams. The choice of framework depends on the complexity of control flow and multi-agent requirements.
Emerging patterns are reshaping agent design: Agent-as-a-Service exposes agents via APIs for inter-agent composition, Agentic RAG gives agents control over the retrieval process, and self-improving agents learn from their experience across tasks.

1514. Practical Exercise

Implement and Compare Agent Architectures:

ReAct Agent: Using the implementation from Section 2.5, extend it with real tool implementations (use the arXiv API for search and Python's eval for calculation).
Reflexion Agent: Add self-reflection to your ReAct agent. After the agent completes a task, evaluate the result and, if incorrect, let the agent reflect and retry.
Comparison experiment: Select 5 multi-hop questions (you can use questions from HotpotQA or create your own). Run each question through both agents and compare:
- Accuracy (correct final answer?)
- Efficiency (number of LLM calls)
- Cost (estimated token usage)
- Error recovery (did the agent self-correct?)
Analysis: Write a 2-page report comparing the architectures. Address:
- Which architecture performed better and why?
- What types of questions benefited most from Reflexion?
- What failure modes did you observe?
- If you could design a hybrid architecture, what would it look like?

Deliverable: Python implementation of both agents, test results for 5 questions, and the comparison report.

16References

Google (2025). Agent-to-Agent (A2A) protocol specification. github.com/google/A2A.
Huang, W., Abbeel, P., Pathak, D., & Mordatch, I. (2022). Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In Proceedings of the International Conference on Machine Learning (ICML).
LangChain (2024). LangGraph: Build stateful, multi-actor applications with LLMs. langchain-ai.github.io/langgraph/.
Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., ... & Clark, P. (2023). Self-refine: Iterative refinement with self-feedback. In Advances in Neural Information Processing Systems (NeurIPS).
Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., & Yao, S. (2023). Reflexion: Language agents with verbal reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS).
Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., ... & Wang, J. (2024). A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6), 186345.
Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., ... & Zhou, D. (2023). Self-consistency improves chain of thought reasoning in language models. In Proceedings of the International Conference on Learning Representations (ICLR).
Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., ... & Wang, C. (2023). AutoGen: Enabling next-gen LLM applications via multi-agent conversation. arXiv preprint arXiv:2308.08155.
Moura, J. (2024). CrewAI: Framework for orchestrating role-playing, autonomous AI agents. docs.crewai.com.
OpenAI (2025). OpenAI Agents SDK. openai.github.io/openai-agents-python/.
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing reasoning and acting in language models. In Proceedings of the International Conference on Learning Representations (ICLR).
Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y., & Narasimhan, K. (2024). Tree of thoughts: Deliberate problem solving with large language models. In Advances in Neural Information Processing Systems (NeurIPS).
Zhou, A., Yan, K., Shlapentokh-Rothman, M., Wang, H., & Wang, Y.-X. (2024). Language agent tree search unifies reasoning, acting, and planning in language models. In Proceedings of the International Conference on Machine Learning (ICML).

Part of "Agentic AI: Foundations, Architectures, and Applications" (CC BY-SA 4.0).