FoundationsW0144 min read

Introduction to AI Agents

Foundational definitions of AI agents as systems that perceive, reason, and act. Historical evolution from BDI agents and reinforcement learning to LLM-based agents, with the Russell & Norvig taxonomy mapped onto modern agentic systems.

Core conceptsPerception-action loopAgent autonomyRussell-Norvig taxonomy

01Learning Objectives

By the end of this lecture, students will be able to:

Define what an AI agent is and distinguish agents from conventional software systems.
Trace the historical evolution from expert systems to modern LLM-based agents.
Classify agents according to the Russell and Norvig taxonomy (simple reflex, model-based, goal-based, utility-based, learning).
Identify the four core components of an LLM-based agent: planning, memory, tool use, and action.
Describe at least three real-world deployed agent systems and explain their architectures at a high level.
Implement a minimal agent loop in Python that interacts with an LLM.

021. What Is an AI Agent?

1.1 The Core Definition

An AI agent is a system that perceives its environment through sensors, reasons about what it perceives, and takes actions through actuators to achieve goals. This definition, adapted from Russell and Norvig (2021), is deceptively simple but carries profound implications.

To understand why this definition matters, consider what it excludes. A pocket calculator is not an agent: it does not perceive an environment or decide what to do next. A spam filter is barely an agent: it perceives (incoming emails) and acts (classifying them), but it follows rigid rules without any sense of goals or autonomy. An AI coding assistant that can read your codebase, decide what files need changing, make edits, run tests, and iterate until the tests pass is clearly an agent: it perceives, reasons, plans, acts, and adapts.

The key distinction between an agent and a regular program is autonomy: an agent operates with a degree of independence, making decisions without direct human intervention for each step. A web server responds to requests with predetermined logic. An agent decides what to do next.

More formally, an agent can be described as a function:

text

f: Percept* -> Action

The agent function maps a sequence of percepts (the complete history of everything the agent has perceived) to an action. The agent program is the concrete implementation of this function running on a physical or virtual architecture.

Key Insight: The notation Percept* (with the asterisk) is crucial. It means the agent's decision can depend on its entire history of observations, not just the current one. This is what distinguishes a model-based agent from a simple reflex agent, and it is the foundation of memory in agent systems.

1.2 The Perception-Action Loop

Every agent, no matter how simple or complex, follows a fundamental cycle:

Interactive · The Agent Perception-Action Loop

The agent loop

Every agent architecture rides on the same cycle: perceive, reason, act, observe. The architectures you'll meet later are variations on this loop.

The agent loop

Perception

01 / 04

This loop repeats continuously. The environment changes (partly due to the agent's actions, partly due to external factors), the agent perceives the new state, reasons, and acts again. This is sometimes called the sense-reason-act cycle or the perception-action loop.

Think of it like a doctor treating a patient. The doctor observes symptoms (perceive), considers possible diagnoses and treatments (reason), and prescribes medication or orders tests (act). Then the doctor waits, observes the patient's response to treatment (perceive again), adjusts the diagnosis if needed (reason), and changes the treatment plan (act again). This cycle continues until the patient recovers.

The perception-action loop may seem obvious, but it is surprisingly powerful as a design pattern. Every agent we will encounter in this course, from simple chatbots to sophisticated multi-agent systems, is fundamentally organized around this loop. What changes is the sophistication of each step.

1.3 Why Now? The Confluence of Factors

Before diving deeper, it is worth asking: why are AI agents suddenly everywhere in 2025-2026, when the concept has existed since the 1950s?

Several factors converged:

Large Language Models reached a capability threshold. Models like GPT-4, Claude, and Gemini can follow complex instructions, generate code, use tools, and reason through multi-step problems. Before 2022, no single model could do all of these things at once.
The cost of inference dropped dramatically. In 2023, a single API call to GPT-4 cost roughly $0.03 per 1K tokens. By 2025, equivalent capability costs a fraction of that with smaller, faster models. An agent that makes 20 API calls per task went from prohibitively expensive to pennies.
Tool-use infrastructure matured. Protocols like MCP (Model Context Protocol), function-calling APIs, and sandboxed execution environments gave agents a standardized way to interact with the world.
Developer tooling emerged. Frameworks like LangGraph, CrewAI, and the OpenAI Agents SDK made it practical to build agents without starting from scratch.
Real problems demand it. Software engineering, research, customer support, and data analysis all involve multi-step, context-dependent work that is poorly served by single-turn AI interactions.

1.4 Environments

The nature of the environment profoundly affects agent design. Russell and Norvig (2021) classify environments along several dimensions:

Dimension	Options	Example
Observability	Fully observable vs. Partially observable	Chess (full) vs. Poker (partial)
Determinism	Deterministic vs. Stochastic	Puzzle solving (deterministic) vs. Stock trading (stochastic)
Episodic vs. Sequential	Independent episodes vs. Dependent decisions	Spam filtering (episodic) vs. Conversation (sequential)
Static vs. Dynamic	Environment changes while agent deliberates?	Crossword (static) vs. Self-driving (dynamic)
Discrete vs. Continuous	Finite states/actions vs. Continuous space	Board game (discrete) vs. Robotics (continuous)
Single vs. Multi-agent	One agent vs. Multiple interacting agents	Solitaire (single) vs. Negotiation (multi)

To make these dimensions concrete, consider a coding agent like Claude Code operating in a software repository:

Partially observable: The agent cannot see the entire codebase at once. It must choose which files to read, and some information (like runtime behavior) is only accessible by running code.
Stochastic: Even deterministic code can have surprising behavior. The agent's own LLM calls are probabilistic, meaning the same input may produce different outputs.
Sequential: Every file edit changes the state of the codebase. A change to one file may break another file, and the agent must account for these dependencies.
Dynamic: If you are working in a team, other developers may push changes while the agent is working. The environment changes independently of the agent.
Multi-agent: In a modern development workflow, there may be multiple AI assistants, CI/CD bots, and human developers all acting on the same codebase.

Modern LLM-based agents typically operate in environments that are partially observable, stochastic, sequential, dynamic, and multi-agent. This makes their design particularly challenging, which is why we need the structured approaches covered in this course.

Try It Yourself: Pick three software applications you use daily (e.g., a search engine, a navigation app, an email client). For each one, classify the environment along the six dimensions above. Which ones could benefit from an agentic architecture? Why?

1.5 Agents vs. Traditional Software

Consider the differences:

Aspect	Traditional Software	AI Agent
Control flow	Predetermined	Dynamic, decided at runtime
Input handling	Defined API contract	Open-ended, often natural language
Error recovery	Predefined error handlers	Adaptive, can reason about failures
Goal specification	Hardcoded logic	Can interpret and decompose goals
Tool usage	Fixed integrations	Can discover and learn to use tools
Output	Deterministic	Variable, probabilistic
Adaptation	Requires code changes	Can adjust through prompting

An important nuance: not every system that uses an LLM is an agent. This is a common misconception worth addressing directly.

A chatbot that answers questions in a single turn is not an agent; it is a tool. A system that calls an LLM once to classify an email is not an agent; it is a classifier with an LLM backend. An agent makes decisions about what to do next, maintains state across steps, and takes actions that change its environment.

Common Misconception: "If it uses GPT-4, it's an agent." This is incorrect. Agency is about the architecture (perception-action loop, autonomy, goal-directedness), not about the underlying model. You can build a non-agentic system on GPT-4 (a simple classifier) and an agentic system on a much smaller model (a robot controller with a 7B parameter model).

The boundary between "tool" and "agent" is fuzzy, and that is fine. A more useful question than "Is this an agent?" is "How much agency does this system have?" We can think of agency as a spectrum:

Interactive · The Agency Spectrum

Autonomy spectrum

Who decides at each step?

The right level depends on task risk and how reversible the action is. There's no universal pick.

L3 · HOTL

Live oversight

The agent acts autonomously and the human watches the live trace, ready to interrupt.

Example: Coding assistant with reviewer in the room.

Maximum safetyMaximum autonomy

Safety ←Balance→ Efficiency

1.6 A Mental Model for Agent Design

Throughout this course, it will help to think of an AI agent as analogous to a new employee at a company:

The system prompt is like the job description and company handbook. It tells the agent who it is, what it can do, and what rules it must follow.
Tools are like the software and equipment the employee has access to (email, databases, code editors).
Memory is like the employee's notes, past experience, and the company wiki.
The agent loop is like the daily work cycle: check your tasks, figure out what to do next, do it, observe the results, repeat.
The LLM is like the employee's brain: their general knowledge, reasoning ability, and communication skills.

This analogy is imperfect (agents do not get tired, have emotions, or understand in the human sense), but it provides useful intuitions about agent design. Just as you would give a new employee clear instructions, the right tools, and manageable tasks, you need to give your agent a clear prompt, well-designed tools, and appropriate scope.

032. Historical Context: From Expert Systems to LLM-Based Agents

Understanding the history of AI agents is not just academic nostalgia; it reveals recurring challenges and design patterns that are still relevant today. Many "new" ideas in agentic AI are reinventions of older concepts with better technology.

2.1 Expert Systems (1970s-1990s)

The earliest AI agents were expert systems: rule-based programs that encoded human expertise as if-then rules. MYCIN (Shortliffe, 1976) diagnosed bacterial infections. XCON (McDermott, 1982) configured computer orders for DEC. DENDRAL (Feigenbaum et al., 1971) helped chemists identify molecular structures.

text

IF patient has fever AND patient has stiff neck
THEN suspect meningitis (confidence: 0.7)

Strengths: Explainable, deterministic, domain-specific expertise. You could ask MYCIN why it made a particular diagnosis, and it would show you the chain of rules.

Weaknesses: Brittle, could not handle novel situations, required painstaking manual knowledge engineering, no learning capability. Building an expert system for a new domain meant interviewing human experts for months and manually encoding hundreds or thousands of rules.

The knowledge bottleneck was the fatal flaw. The world is too complex to encode as explicit rules. This limitation drove the AI community toward learning-based approaches.

Key Insight: The knowledge bottleneck of expert systems is exactly what LLMs solve. Instead of manually encoding knowledge as rules, LLMs absorb knowledge from vast corpora of text during pre-training. The challenge shifts from "how do we put knowledge in?" to "how do we get reliable behavior out?"

2.2 Behavior-Based Agents and Robotics (1980s-1990s)

Rodney Brooks (1986) challenged the classical AI approach with his subsumption architecture, arguing that intelligent behavior could emerge from simple reactive layers without explicit world models. His robots at MIT, like Herbert, which collected soda cans from desks, demonstrated that perception-action loops without complex reasoning could produce surprisingly capable behavior.

Brooks's key insight was that the world itself can serve as its own model. Instead of building an elaborate internal representation of the world and then reasoning about it, an agent can use its sensors to directly interact with the world in real time.

The subsumption architecture organized behaviors in layers:

This era introduced the idea that agents need not have a complete world model to act effectively, a principle that resurfaces in modern tool-using LLM agents. When Claude Code reads a file, it does not try to build a complete model of the entire codebase first; it reads what it needs, acts, observes the result, and iterates.

2.3 BDI Agents and Multi-Agent Systems (1990s-2000s)

The Belief-Desire-Intention (BDI) framework (Rao and Georgeff, 1995) formalized agent reasoning in a way that connects surprisingly well to modern LLM agents:

Beliefs: What the agent thinks is true about the world. In a modern agent, this is the information in the context window plus any retrieved memories.
Desires: The goals the agent wants to achieve. In a modern agent, this comes from the user's request and the system prompt.
Intentions: The plans the agent has committed to executing. In a modern agent, this is the current plan (explicit or implicit) that guides the next action.

Platforms like JADE (Java Agent DEvelopment Framework) and multi-agent systems research explored how multiple agents could coordinate, negotiate, and cooperate. This work laid the conceptual groundwork for modern multi-agent AI systems like CrewAI and AutoGen.

The BDI framework also introduced the important distinction between reactive and deliberative agents:

Reactive agents respond immediately to stimuli (like Brooks's robots).
Deliberative agents maintain an internal model and plan before acting (like expert systems).
Most practical agents are hybrid: they have a reactive layer for immediate responses and a deliberative layer for complex planning.

Modern LLM agents are inherently hybrid: the LLM's next-token generation is reactive (conditioned on the immediate context), but prompting strategies like Chain-of-Thought add a deliberative layer on top.

2.4 Reinforcement Learning Agents (2010s)

The combination of deep learning and reinforcement learning produced agents that could learn complex behaviors from interaction. Key milestones:

DQN (Mnih et al., 2015): Playing Atari games at superhuman level from raw pixels. This showed that agents could learn complex control policies from high-dimensional sensory input.
AlphaGo (Silver et al., 2016): Defeating the world champion in Go, a game with more possible positions than atoms in the universe. AlphaGo combined tree search with learned value functions.
AlphaFold (Jumper et al., 2021): Predicting protein structures with remarkable accuracy, solving a 50-year-old biology problem.

These agents were powerful but narrow: each was trained for a single task and could not generalize. AlphaGo could not play chess, let alone write code or have a conversation. Training each agent required millions of episodes of interaction, specialized reward functions, and enormous computational resources.

Key Insight: RL agents learn from experience; LLM agents learn from text about experience. An RL agent that plays chess learns by playing millions of games. An LLM "learns" about chess by reading books, articles, and game transcripts written by humans. This fundamental difference explains both the generality of LLM agents (they can talk about anything) and their limitations (they may confidently describe chess strategies they cannot actually execute).

2.5 The LLM Revolution (2022-Present)

The launch of ChatGPT in November 2022 marked a turning point. For the first time, a single model could:

Understand and generate natural language across domains
Follow complex, multi-step instructions
Write and reason about code
Adopt different personas and follow system instructions

Researchers quickly realized that LLMs could serve as the reasoning core of general-purpose agents. The key papers that catalyzed this shift include:

ReAct (Yao et al., 2023): Showed that interleaving reasoning traces with actions in a single LLM prompt dramatically improved agent performance. This paper is so important that we dedicate a large portion of Week 5 to it.
Toolformer (Schick et al., 2023): Demonstrated that LLMs could learn to use external tools (calculators, search engines, APIs) through self-supervised learning. This opened the door to tool-augmented agents.
Generative Agents (Park et al., 2023): Created a simulated town of 25 AI agents with believable social behaviors, using LLMs with memory architectures. This captured the public imagination and showed that LLM agents could produce emergent collective behavior.

The pace of development since then has been extraordinary. In roughly two years, we went from "can an LLM use a calculator?" to production coding agents, web-browsing agents, and multi-agent orchestration systems.

2.6 Timeline Summary

Interactive · Historical Timeline of AI Agents

Timeline

70 years of agents

From expert systems to today's LLM agents. Click any milestone to open its card.

2026

A2A · agents talking to agents

Formal protocols for multi-agent coordination at scale.

19561972198719952014202220242026

Try It Yourself: Pick one historical AI agent system (MYCIN, Herbert, AlphaGo, or another you find interesting). Research how it perceived its environment, made decisions, and took actions. Identify the perception-action loop. What were its biggest limitations, and how do modern LLM agents address them?

043. Taxonomy of Agents (Russell and Norvig)

Russell and Norvig (2021) define five types of agents, each more capable than the last. Understanding this taxonomy helps us appreciate what modern LLM-based agents are and what they are not. Think of it as a ladder of increasing sophistication.

3.1 Simple Reflex Agents

The simplest agent type. It selects actions based solely on the current percept, ignoring all history. No memory, no model of the world, no planning. Just condition-action rules.

python

def simple_reflex_agent(percept):
    """A thermostat-like agent."""
    if percept["temperature"] > 25:
        return "turn_on_cooling"
    elif percept["temperature"] < 18:
        return "turn_on_heating"
    else:
        return "do_nothing"

This is the thermostat in your house. If the temperature is too high, turn on cooling. If too low, turn on heating. The thermostat does not remember what it did five minutes ago, does not plan for tomorrow's weather forecast, and does not reason about why the temperature is rising. It just reacts.

Condition-action rules (if-then rules) drive behavior. They can be surprisingly effective for well-defined tasks in fully observable, deterministic environments.

Limitations: Cannot handle partially observable environments. If the sensor breaks, the agent has no way to compensate. Cannot reason about the consequences of actions. Cannot handle situations not covered by its rules.

Real-world examples: Thermostats, simple spam filters (keyword-based), automatic door sensors, basic email auto-replies.

3.2 Model-Based Reflex Agents

These agents maintain an internal state, a model of the world, that they update with each percept. This allows them to handle partial observability by remembering things they cannot currently see.

python

class ModelBasedAgent:
    def __init__(self):
        self.state = {"room_occupied": False, "lights_on": False}

    def update_state(self, percept):
        if percept.get("motion_detected"):
            self.state["room_occupied"] = True
            self.state["last_motion_time"] = percept["timestamp"]
        elif self._time_since_last_motion() > 600:  # 10 minutes
            self.state["room_occupied"] = False

    def act(self, percept):
        self.update_state(percept)
        if self.state["room_occupied"] and not self.state["lights_on"]:
            self.state["lights_on"] = True
            return "turn_on_lights"
        elif not self.state["room_occupied"] and self.state["lights_on"]:
            self.state["lights_on"] = False
            return "turn_off_lights"
        return "do_nothing"

The critical difference from a simple reflex agent: this smart lighting system remembers that someone was in the room ten minutes ago, even though the motion sensor has not detected anything recently. It uses this memory to decide that the room is probably still occupied (maybe the person is sitting still, reading).

Key Insight: The internal state allows the agent to reason about things it cannot currently observe. This is the same principle behind the "context window" in LLM agents: the conversation history serves as the agent's internal model of the ongoing task.

Real-world examples: Smart home systems, adaptive cruise control, anti-lock braking systems, inventory management systems.

3.3 Goal-Based Agents

Goal-based agents go beyond reacting to the current state. They consider the future: specifically, which actions will lead to achieving their goals. This is a qualitative leap from model-based agents. Model-based agents ask "What is the world like?" Goal-based agents ask "What do I want the world to be like, and how do I get there?"

This requires three things:

A model of how the world evolves (if I take action A in state S, what state will result?).
A representation of the goal state (what does "success" look like?).
A search or planning algorithm to find a sequence of actions from the current state to the goal.

python

class GoalBasedAgent:
    def __init__(self, goal):
        self.goal = goal  # e.g., {"location": "airport"}
        self.state = {"location": "home"}
        self.plan = []

    def act(self, percept):
        self.update_state(percept)

        if self.goal_achieved():
            return "done"

        if not self.plan:
            self.plan = self.search_for_plan()

        if self.plan:
            return self.plan.pop(0)

        return "replan"  # No plan found, try again

Consider a navigation app. It knows your current location (state), your desired destination (goal), and how roads connect (world model). It searches for the best route (plan) and guides you step by step. If a road is blocked, it replans.

Goal-based agents can be flexible: if the route to the airport is blocked, they can find an alternative route. Simple reflex agents cannot do this because they have no concept of a goal or a plan.

The transition from model-based to goal-based is a qualitative leap. Model-based agents react to the world as it is. Goal-based agents act to make the world into what they want it to be.

Real-world examples: Navigation systems (Google Maps, Waze), game-playing AI, automated scheduling systems, robot path planning.

3.4 Utility-Based Agents

Sometimes there are multiple ways to achieve a goal, and some are better than others. A utility function maps states to a real number representing how "happy" the agent is in that state.

python

class UtilityBasedAgent:
    def __init__(self):
        self.state = {}

    def utility(self, state):
        """Multi-objective utility combining several factors."""
        return (
            0.4 * state.get("comfort", 0) +
            0.3 * state.get("energy_saved", 0) +
            0.2 * state.get("safety", 0) +
            0.1 * state.get("cost_saved", 0)
        )

    def act(self, percept):
        self.update_state(percept)
        possible_actions = self.get_possible_actions()

        # Choose the action that maximizes expected utility
        best_action = max(
            possible_actions,
            key=lambda a: self.expected_utility(a)
        )
        return best_action

Consider choosing a flight. A goal-based agent asks: "Does this flight get me to London?" If yes, the goal is achieved. But a utility-based agent asks: "How good is this option?" It weighs price, departure time, number of stops, airline reputation, and seat comfort. It might choose a slightly more expensive flight that departs at a reasonable hour over a cheap red-eye.

Utility-based agents handle trade-offs and uncertainty naturally. They can reason about risk (should I take the faster but riskier route?) and make decisions under multiple competing objectives.

Key Insight: When an LLM agent chooses between different approaches to solving a task, it is implicitly performing utility-based reasoning. The "utility function" is embedded in the model's training and the system prompt, not explicitly programmed. Understanding this helps explain why prompt engineering is so important: you are shaping the agent's implicit utility function.

Real-world examples: Recommendation systems (Netflix, Spotify), autonomous vehicles (balancing speed, safety, comfort), portfolio optimization, dynamic pricing systems.

3.5 Learning Agents

A learning agent has four conceptual components:

Learning element: Improves the agent based on experience.
Performance element: Selects actions (this is the agent as described above).
Critic: Provides feedback on how the agent is doing.
Problem generator: Suggests exploratory actions to discover new knowledge.

Interactive · Agent Taxonomy (Russell & Norvig)

Russell-Norvig taxonomy

Five architectures, one root

The classic agent taxonomy still describes what we see in modern LLM-based agents. Click any type to read its decision rule and a real example.

LearnsComplexity

Learning agent

Any of the above becomes a learning agent once it adds a critic that updates the model or policy. Reflexion fits here.

Decision rule

policy ← update(policy, critic(trajectory))

Example

Reflexion: learns from failure verbally, no retraining.

← ReactiveAdaptive →

ReflexModelGoalUtilityLearns

The learning agent framework is elegant because it explains how an agent can improve over time. The problem generator is particularly interesting: it suggests actions that might not be optimal right now but that help the agent learn something useful for the future. This is the exploration vs. exploitation trade-off that appears throughout AI.

Modern LLM-based agents are closest to learning agents, but with an important caveat. Most current LLM agents do not update their model weights during deployment. Instead, they "learn" through:

In-context learning: Adapting within a conversation based on examples and feedback provided in the prompt.
External memory: Storing and retrieving past experiences using vector databases or other storage.
Prompt refinement: Improving their own instructions based on feedback (this is what Reflexion does, as we will see in Week 5).

This is a form of non-parametric learning: the model parameters stay fixed, but the agent's effective behavior changes through its context and memory. It is as if an employee improved not by gaining new skills, but by taking better notes and referring to them more effectively.

Try It Yourself: Think about a recent multi-step task you completed at a computer (debugging code, writing a report, planning a trip). Identify the moments where you: (a) reacted reflexively, (b) consulted your "internal model" of the situation, (c) planned ahead, (d) weighed trade-offs, (e) learned from a mistake. How many of these behaviors would you want an AI agent to have?

054. The Modern AI Agent: LLM as the "Brain"

4.1 Why LLMs Changed Everything

Before LLMs, building an agent required:

Manually specifying rules or reward functions
Training specialized models for each capability
Engineering complex pipelines to connect components
Hiring domain experts to encode knowledge

LLMs provide a general-purpose reasoning engine that can:

Understand natural language instructions (no formal specification needed)
Generate plans in natural language
Write and execute code
Use tools described in natural language
Maintain conversational context
Adapt to new tasks through prompting alone

This collapses what used to require an entire engineering team into a single model call, augmented with relatively simple infrastructure.

To appreciate the magnitude of this shift, consider what it took to build a question-answering agent in 2015 versus 2025:

2015 approach: You would need a named entity recognizer, a dependency parser, a question classifier, a knowledge base, a query generator, a passage retriever, a reading comprehension model, and a response generator. Each component would be a separate model, trained on separate data, with custom glue code between them. The system might work well for factoid questions about one domain and fail completely on everything else.

2025 approach: You write a system prompt, define some tools (web search, calculator), and implement a basic agent loop. The LLM handles understanding, reasoning, planning, and response generation. Adding a new domain is a prompt change, not a re-engineering effort.

4.2 The LLM Agent Paradigm

In the modern paradigm, an LLM-based agent consists of:

Interactive · Components of an LLM-Based Agent

Agent anatomy

The four capabilities

LLM-based agents wrap four components around the model. Click any quadrant to open it.

Planning

Decomposes the goal into executable steps and replans when execution drifts.

Example: Plan-and-Execute, ReAct, LATS

Core

LLM

The LLM is the central coordinator. It receives information from perception (user input, tool outputs, environment observations), reasons about what to do next, and dispatches actions through tools.

Imagine the LLM as the brain of a knowledge worker sitting at a desk. The desk has various tools (phone, computer, calculator, filing cabinet). The brain decides which tool to pick up and how to use it. The tools extend the brain's capabilities: the calculator handles precise arithmetic, the filing cabinet stores information for later retrieval, and the phone connects to the outside world. Similarly, the LLM decides when to call a search tool, when to run code, and when to retrieve information from memory.

065. Components of an LLM-Based Agent

The influential survey by Wang et al. (2024), "A Survey on Large Language Model Based Autonomous Agents," identifies four key components of LLM-based agents. We examine each in detail.

5.1 Planning

Planning is the ability to decompose a complex task into manageable sub-tasks and determine the order of execution. It is perhaps the most uniquely "agentic" capability: a system that plans is making decisions about future actions, not just reacting to the present.

Why planning matters: Without planning, an agent that is asked to "write a research report on climate change" might immediately start writing the conclusion. With planning, it first identifies the sections, determines what information it needs for each section, searches for that information, and then writes in a logical order.

Think about how you would approach a complex task at work. If your boss asks you to "prepare a market analysis," you do not immediately start typing. First, you think: What markets? What data do I need? Where can I find it? What format does the boss expect? What is the deadline? This decomposition step is planning, and it is essential for any task that cannot be completed in a single action.

LLM agents need the same capability, but they are surprisingly inconsistent at it. Sometimes they produce excellent plans; other times they miss critical steps or create plans with circular dependencies. This variability is why agent architectures (Week 5) often include explicit planning mechanisms rather than relying on the LLM's implicit planning ability.

Task Decomposition Strategies:

Chain of Thought (CoT): The agent reasons step-by-step through the problem (Wei et al., 2022). This is the simplest form of planning: think before you act.
Plan-then-Execute: Generate a complete plan first, then execute each step. This provides a roadmap but may need adjustment as the agent learns more.
Iterative Refinement: Start with a rough plan, refine it as more information becomes available. This is the most flexible approach and mirrors how humans tackle complex tasks.

Example: Planning for a research task:

text

Goal: "Write a literature review on media bias detection"

Plan:
1. Define the scope: What types of media bias? Which time period?
2. Search academic databases for relevant papers
3. Read and categorize the papers by methodology
4. Identify key themes and trends
5. Draft the literature review sections
6. Add citations and format properly
7. Review and revise

Replanning: Good agents recognize when their initial plan is failing and adapt. If step 2 returns no results for a specific search query, the agent should try alternative queries rather than proceeding with no data. This adaptive replanning is what distinguishes a sophisticated agent from a rigid script.

The planning hierarchy: In practice, planning happens at multiple levels:

Strategic planning: "I need to write a literature review" (overall goal)
Tactical planning: "First I'll search for papers, then categorize them, then write each section" (sub-task decomposition)
Operational planning: "For the search, I'll use Google Scholar with these keywords" (specific action details)

Good agent architectures handle all three levels. We will see this in Plan-and-Execute (Week 5), where the planner handles strategic and tactical levels, and the executor handles operational details.

Key Insight: Planning is where LLM agents most frequently fail. Studies have shown that LLMs often generate plausible-looking plans that have subtle dependency errors (step 4 requires information that is only available after step 6) or miss critical steps. This is why architectures like Plan-and-Execute (Week 5) include explicit replanning mechanisms.

5.2 Memory

LLM agents need memory systems because LLMs alone have a fixed context window. Once the conversation exceeds the context window, older information is lost. Memory comes in several forms, each serving a different purpose:

Short-Term Memory (Working Memory):

The current conversation or context window
Typically 8K-200K tokens depending on the model
Includes the current task, recent observations, and intermediate results
Analogous to a human's working memory: the information you are actively thinking about

Long-Term Memory:

Persistent storage beyond the context window
Implemented via vector databases (e.g., Pinecone, Chroma, pgvector)
Retrieval-Augmented Generation (RAG): retrieve relevant past information when needed
Analogous to a human's reference library: information stored externally that you look up when needed

Episodic Memory:

Records of past experiences and their outcomes
"Last time I tried approach X, it failed because Y"
Enables learning from past mistakes
Analogous to a human's autobiographical memory: remembering what happened in specific past situations

Semantic Memory:

Factual knowledge about the world
Often stored as embeddings or knowledge graphs
Enables the agent to recall domain-specific information
Analogous to a human's general knowledge: facts you know but cannot trace to a specific experience

python

class AgentMemory:
    """A simplified agent memory system."""

    def __init__(self):
        self.short_term = []       # Recent conversation turns
        self.long_term = []        # Stored experiences
        self.episodic = []         # Past task outcomes

    def add_to_short_term(self, observation: str):
        self.short_term.append(observation)
        # Keep only the last N observations to fit context
        if len(self.short_term) > 50:
            self.compress_and_archive()

    def recall_relevant(self, query: str, k: int = 5) -> list[str]:
        """Retrieve k most relevant memories for the current context."""
        # In practice, this would use embedding similarity search
        return self._semantic_search(query, self.long_term, k)

    def store_experience(self, task: str, outcome: str, success: bool):
        """Store a task outcome for future reference."""
        self.episodic.append({
            "task": task,
            "outcome": outcome,
            "success": success,
            "timestamp": datetime.now()
        })

Let us walk through this code to understand the design:

add_to_short_term: Every observation (tool result, user message, etc.) goes into short-term memory. But short-term memory has a limit (50 items here, representing the context window constraint). When it overflows, compress_and_archive would summarize older entries and move them to long-term storage.
recall_relevant: When the agent needs information from the past, it does not search through everything; it uses semantic similarity to find the most relevant memories. This is like how you do not re-read every book you have ever read; you recall the one that seems most relevant to your current question.
store_experience: After completing a task, the agent records what happened. This creates a growing database of experiences that can inform future decisions.

Try It Yourself: Think about how you manage your own "memory" when working on a complex project. What do you keep in your head (working memory)? What do you write down in notes (long-term memory)? What past experiences influence your current approach (episodic memory)? How would you design an artificial system that mimics your approach?

5.3 Tool Use

Tools extend the agent beyond pure text generation. They bridge the gap between what the LLM can reason about and what it can actually do in the world.

Common tool categories:

Category	Examples	Purpose
Information retrieval	Web search, database queries, document retrieval	Access external knowledge
Computation	Calculator, Python interpreter, Wolfram Alpha	Precise calculations
Code execution	Sandboxed environments, REPLs	Run and test code
File operations	Read, write, edit files	Interact with the file system
Communication	Email, Slack, API calls	Interact with external services
Perception	Image analysis, OCR, audio transcription	Process non-text inputs

A tool is typically described to the agent as a function signature with a natural language description:

python

tools = [
    {
        "name": "web_search",
        "description": "Search the web for current information. Use this when you need up-to-date facts.",
        "parameters": {
            "query": {
                "type": "string",
                "description": "The search query"
            }
        }
    },
    {
        "name": "calculator",
        "description": "Perform mathematical calculations. Use this for any arithmetic.",
        "parameters": {
            "expression": {
                "type": "string",
                "description": "A mathematical expression, e.g., '2 + 2 * 3'"
            }
        }
    }
]

The LLM decides when to call a tool, which tool to call, and what arguments to pass. This is a remarkable capability that emerges from training on large corpora that include code, documentation, and tool-use examples.

Think about how you decide to use a calculator. You do not pull out a calculator for "2 + 3"; you compute that in your head. But for "What is 7.3% of $148,293.57?", you reach for the calculator because you know mental arithmetic is unreliable for that level of precision. LLM agents exhibit similar behavior: they use tools selectively, based on the difficulty and precision requirements of the task.

Key Insight: The tool description is arguably more important than the tool implementation. The LLM decides whether and how to use a tool based entirely on its natural language description. A well-implemented tool with a poor description will be misused; a well-described tool will be used correctly even if the implementation is simple.

5.4 Action

The action component is the interface between the agent's decisions and the external world. Actions can be:

Digital actions: API calls, code execution, file modifications, sending messages.
Physical actions: In robotics, controlling motors, grippers, or other actuators.
Communicative actions: Generating responses to users, asking clarifying questions, delegating to other agents.

A critical design decision is the action space: what actions are available to the agent? A narrow action space (few, well-defined actions) is safer but less capable. A broad action space (many possible actions, including code execution) is more powerful but riskier.

Consider the difference:

Narrow: The agent can classify emails as "urgent" or "not urgent." Safe, predictable, but very limited.
Medium: The agent can read emails, search a knowledge base, and draft replies for human review. Useful and reasonably safe.
Broad: The agent can read emails, search the web, write code, execute code, send emails, and modify files. Very powerful, but a mistake could have serious consequences.

This trade-off between capability and safety is one of the central tensions in agent design, and we will return to it throughout this course.

5.5 How the Four Components Work Together

In practice, the four components are deeply interconnected. Consider a coding agent fixing a bug:

Planning: "I need to understand the bug, find the cause, fix it, and verify the fix."
Memory: "This codebase uses a specific testing framework (from earlier conversation). The bug was reported in the authentication module."
Tool use: Read the failing test file, search the codebase for related functions, edit the buggy file, run the tests.
Action: Execute each tool call, observe the results, and decide what to do next.

The agent alternates between these components fluidly. It plans (decompose the task), uses tools (read files), stores information in memory (what did the code look like?), plans again (now that I see the bug, how should I fix it?), uses tools (edit the file, run tests), and evaluates the result (did the tests pass?).

Key Insight: The four components (planning, memory, tool use, action) are not separate modules that run in sequence. They are interleaved and interdependent. Planning requires memory (what do I already know?). Tool use requires planning (which tool should I use?). Memory is updated by actions (what did I just learn?). Designing agents means designing how these components interact.

076. Real-World Agent Systems (2025-2026)

6.1 Coding Agents

Coding agents are the most mature and widely deployed category of LLM-based agents in 2025-2026. They demonstrate the full agent paradigm: perception (reading code), reasoning (understanding bugs), planning (deciding what to change), tool use (running tests, searching codebases), and action (editing files).

Why did coding agents mature first? Several reasons: (1) Code has clear success criteria (does it compile? do the tests pass?), making it easier to evaluate agent performance. (2) The environment is digital, avoiding the challenges of physical interaction. (3) There is a massive amount of code in training data, so LLMs have strong coding priors. (4) Developers, being the first users of AI tools, provided rapid feedback loops for improvement.

Claude Code (Anthropic, 2025)

An agentic coding assistant that operates directly in the terminal
Can read files, write code, execute commands, search codebases, and run tests
Uses a perception-action loop: observe the codebase state, reason about what to change, make edits, verify with tests
Notable for its tool-use architecture with explicit permission systems: it asks for approval before running potentially destructive commands
Demonstrates the "human-in-the-loop" pattern at its best: autonomous for safe operations, supervised for risky ones

Cursor (Cursor Inc., 2024-2025)

An IDE built around AI-assisted coding
The "Composer" feature acts as an agent: it can edit multiple files, run terminal commands, and iterate on feedback
Demonstrates how agents can be embedded in existing developer workflows rather than replacing them
Important lesson: agents do not always need to be standalone systems; they can augment existing tools

Devin (Cognition Labs, 2024)

Marketed as an "AI software engineer"
Can plan tasks, write code, debug, and deploy applications
Operates in a sandboxed environment with browser, terminal, and editor access
Illustrated both the potential and limitations of autonomous coding agents: impressive demos, but real-world performance showed that full autonomy remains challenging

SWE-agent (Yang et al., 2024)

An academic research project that created an agent for solving GitHub issues
Demonstrated that agent-computer interface design matters as much as the underlying model: by designing a custom interface (a simplified terminal with helpful commands), the researchers significantly improved the agent's ability to navigate and edit code
This is an important lesson: the environment and tools you give an agent are as important as the agent's reasoning capability

6.2 Research and Knowledge Agents

Elicit (Ought)

An AI research assistant that can search papers, extract claims, and synthesize findings
Uses tool-augmented LLMs to interact with academic databases
Demonstrates domain-specific agent design for literature review
Shows how agents can add value in knowledge work by handling tedious tasks (finding papers, extracting data) while leaving analysis to humans

Semantic Scholar's Research Agent

AI-powered features for finding, filtering, and understanding scientific literature
Illustrates how agents can operate over structured knowledge bases
Uses the concept of "Agentic RAG": the agent decides what to search for, evaluates results, and iterates

6.3 Web and Computer Use Agents

Anthropic Computer Use (2024-2025)

An agent that can see and interact with a computer screen
Uses screenshot perception + mouse/keyboard actions
Demonstrates how visual perception can augment traditional text-based agents
The perception-action loop here is literal: look at the screen, decide what to click, click it, look at the result

OpenAI Operator (2025)

A web browsing agent that can complete tasks in a browser
Navigates websites, fills forms, clicks buttons
Raises significant questions about authentication, security, and trust
When it encounters a payment page, it pauses and asks the human for approval: a practical example of action-space restriction for safety

6.4 Multi-Agent Systems

ChatDev (Qian et al., 2024)

Simulates a software company with multiple LLM agents (CEO, CTO, Programmer, Tester)
Agents collaborate through structured communication protocols
Demonstrates emergent coordination behavior: the CEO sets requirements, the CTO designs architecture, the Programmer implements, and the Tester finds bugs
Shows that role specialization can improve quality even when all agents use the same underlying model

AutoGen (Microsoft, 2023)

A framework for building multi-agent conversational systems
Agents can have different roles, capabilities, and even different underlying models
Supports both autonomous and human-in-the-loop workflows
Pioneered the concept of "conversational agent orchestration" where agents talk to each other to solve problems

6.5 The Agent Landscape in 2025-2026

The agent ecosystem has matured significantly:

Frameworks:

LangChain / LangGraph: The most widely used agent framework, with support for complex agent workflows as directed graphs
CrewAI: Focuses on multi-agent role-based collaboration
AutoGen: Microsoft's framework for conversational multi-agent systems
Pydantic AI: Type-safe agent framework with strong validation
OpenAI Agents SDK: Lightweight framework with first-class support for agent handoffs

Infrastructure:

Model Context Protocol (MCP): Anthropic's standard for connecting agents to tools and data sources, analogous to USB for AI. We will cover this in depth in Week 4.
Vector databases: Pinecone, Chroma, Weaviate, pgvector for agent memory
Sandboxing: E2B, Modal, and Docker for safe code execution

Deployment patterns:

Single-agent, human-in-the-loop: The most common pattern (e.g., Claude Code, Cursor). The agent works autonomously but pauses for approval at critical points.
Multi-agent orchestration: Specialized agents coordinated by an orchestrator. Emerging in production for complex workflows.
Fully autonomous: Still rare in production due to reliability and safety concerns. Used mainly in controlled environments with clear success criteria.

087. Building a Simple Agent: A Python Implementation

Let us build a minimal but functional agent loop. This example demonstrates the core pattern that underlies all LLM-based agents. We will go through it carefully, explaining every design decision.

7.1 The Agent Loop

python

"""
A minimal LLM-based agent demonstrating the perception-action loop.

This agent can answer questions by optionally searching the web
or performing calculations. It decides which tool to use (if any)
based on the user's query.
"""

import json
from openai import OpenAI  # Or any LLM client library

# --- Configuration ---
client = OpenAI()  # Uses OPENAI_API_KEY environment variable
MODEL = "gpt-4o"   # Or "claude-sonnet-4-20250514" with Anthropic client

# --- Tool Definitions ---
TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Evaluate a mathematical expression. Use for any arithmetic.",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "A Python mathematical expression, e.g., '2**10 + 3*4'"
                    }
                },
                "required": ["expression"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_current_date",
            "description": "Get the current date and time.",
            "parameters": {
                "type": "object",
                "properties": {}
            }
        }
    }
]


# --- Tool Implementations ---
def calculator(expression: str) -> str:
    """Safely evaluate a mathematical expression."""
    try:
        # WARNING: In production, use a proper sandbox, not eval()
        allowed_names = {"__builtins__": {}}
        import math
        allowed_names.update({k: v for k, v in math.__dict__.items()
                              if not k.startswith('_')})
        result = eval(expression, allowed_names)
        return json.dumps({"result": result})
    except Exception as e:
        return json.dumps({"error": str(e)})


def get_current_date() -> str:
    """Return the current date and time."""
    from datetime import datetime
    return json.dumps({"date": datetime.now().isoformat()})


TOOL_REGISTRY = {
    "calculator": calculator,
    "get_current_date": get_current_date,
}


# --- The Agent Loop ---
def run_agent(user_query: str, max_iterations: int = 10) -> str:
    """
    Run the agent loop.

    The agent will:
    1. Receive the user's query
    2. Decide whether to use a tool or respond directly
    3. If using a tool, execute it and feed the result back
    4. Repeat until the agent produces a final response
    """
    messages = [
        {
            "role": "system",
            "content": (
                "You are a helpful assistant. You have access to tools. "
                "Use them when needed to provide accurate answers. "
                "Think step by step before answering."
            )
        },
        {"role": "user", "content": user_query}
    ]

    for iteration in range(max_iterations):
        print(f"\n--- Iteration {iteration + 1} ---")

        # Step 1: Call the LLM
        response = client.chat.completions.create(
            model=MODEL,
            messages=messages,
            tools=TOOLS,
            tool_choice="auto"  # Let the model decide
        )

        assistant_message = response.choices[0].message
        messages.append(assistant_message)

        # Step 2: Check if the agent wants to use tools
        if assistant_message.tool_calls:
            for tool_call in assistant_message.tool_calls:
                function_name = tool_call.function.name
                function_args = json.loads(tool_call.function.arguments)

                print(f"  Tool call: {function_name}({function_args})")

                # Step 3: Execute the tool
                if function_name in TOOL_REGISTRY:
                    result = TOOL_REGISTRY[function_name](**function_args)
                else:
                    result = json.dumps({"error": f"Unknown tool: {function_name}"})

                print(f"  Tool result: {result}")

                # Step 4: Feed the result back to the LLM
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result
                })
        else:
            # No tool calls — the agent is done
            final_response = assistant_message.content
            print(f"\n  Final response: {final_response}")
            return final_response

    return "Agent reached maximum iterations without producing a final answer."


# --- Run the Agent ---
if __name__ == "__main__":
    # Example queries
    queries = [
        "What is 2^32 minus 1?",
        "What day is it today?",
        "If I invest $10,000 at 7% annual compound interest, how much will I have after 20 years?",
    ]

    for query in queries:
        print(f"\n{'='*60}")
        print(f"User: {query}")
        result = run_agent(query)
        print(f"\nAgent: {result}")

7.2 Analyzing the Agent Loop Line by Line

Let us trace through what happens when a user asks: "If I invest $10,000 at 7% annual compound interest, how much will I have after 20 years?"

Setting up (lines in run_agent):

The messages list is initialized with a system prompt and the user's query. This is the agent's "working memory" for this task.
The max_iterations parameter is a safety valve: it prevents the agent from looping forever.

Iteration 1:

The LLM receives the messages and the tool definitions. It sees the calculator tool and decides: "I need to calculate compound interest. I should use the calculator."
It generates a tool_call with function_name="calculator" and arguments={"expression": "10000 * (1.07 ** 20)"}.
We execute the calculator: eval("10000 * (1.07 ** 20)") returns 38696.84.
The result is added to messages as a tool response.

Iteration 2:

The LLM now sees the original question AND the calculator result.
It decides it has enough information and generates a final text response: "After 20 years at 7% compound interest, your $10,000 investment would grow to approximately$ 38,696.84."
Since there are no tool calls, the loop exits.

The code demonstrates several fundamental principles:

The loop: The agent iterates, calling the LLM and executing tools, until it produces a final response without tool calls.
Tool selection: The LLM decides which tool to use (or not). This decision is based on the system prompt, tool descriptions, and the user's query. The tool_choice="auto" parameter tells the model it can choose to use tools or respond directly.
Message history: Each LLM call includes the full conversation history, including tool call results. This gives the LLM context to reason about what it has already done.
Termination: The loop ends when the LLM produces a response without requesting any tool calls, or when the maximum iteration count is reached (a safety measure).
Error handling: The TOOL_REGISTRY lookup handles the case where the model requests an unknown tool. In production, you would add much more robust error handling.

7.3 Tracing Through a Multi-Tool Scenario

To build deeper intuition, let us trace through a more complex query: "What day is it today, and what is 2^(today's day number)?"

Iteration 1:

The LLM sees the question and the two available tools. It reasons (implicitly): "I need today's date first, then I can calculate."
It generates a tool call: get_current_date().
We execute the tool: returns {"date": "2026-03-19T14:30:00"}.
The result is added to the messages.

Iteration 2:

The LLM now sees the question AND the date result. It extracts the day number (19) and decides to calculate.
It generates a tool call: calculator(expression="2**19").
We execute the tool: returns {"result": 524288}.
The result is added to the messages.

Iteration 3:

The LLM now sees the question, the date, and the calculation result. It has everything it needs.
It generates a final text response: "Today is March 19, 2026. 2 raised to the power of 19 (the day number) equals 524,288."
No tool calls, so the loop exits.

This trace demonstrates sequential tool use: the second tool call depended on the result of the first. The LLM managed this dependency implicitly, without any explicit planning mechanism. For simple dependencies like this, the basic loop works well. For more complex dependency chains, we need the architectures covered in Week 5.

7.4 What This Agent Cannot Do (Yet)

This minimal agent lacks several capabilities that production agents require:

Memory beyond the conversation: It has no long-term memory. If you ask it a question in one run, it will not remember the answer in the next run. Each invocation of run_agent() starts from scratch.
Multi-step planning: It reacts turn-by-turn rather than planning ahead. It does not think "First I'll calculate this, then I'll look up that, then I'll combine the results." It just takes the next best action at each step.
Self-reflection: It does not evaluate whether its answers are correct. If the calculator returned a wrong result (due to a wrong expression), the agent would not catch it. It trusts its tools completely.
Error recovery: If a tool fails, it may not recover gracefully. A production agent should retry with different parameters or try an alternative approach.
Parallel tool execution: It processes tools sequentially. If it needs three independent pieces of information, it fetches them one at a time, even though they could be fetched simultaneously.
Cost awareness: The agent does not track how many tokens it has used or how much the task is costing. A production agent needs budget controls.

We will address each of these limitations in subsequent weeks of this course: prompting strategies for better reasoning (Week 3), tool use and MCP (Week 4), agent architectures for planning and reflection (Week 5), and memory systems (Week 7).

7.5 From Minimal to Production: The Gap

The gap between our minimal agent and a production agent is significant. To give you a sense of what production agents handle, here is a non-exhaustive list of concerns:

Concern	Minimal Agent	Production Agent
Error handling	Crashes on API errors	Retries with exponential backoff, graceful degradation
Tool validation	Trusts model's tool calls	Validates arguments against schemas, sanitizes inputs
Cost control	No budget tracking	Token counting, cost limits, model routing
Observability	Print statements	Structured logging, tracing, metrics dashboards
Security	No restrictions	Sandboxed execution, permission systems, input sanitization
Memory	Conversation only	Vector databases, episodic memory, experience replay
Concurrency	Sequential	Parallel tool execution, async operations
User experience	Wait for full response	Streaming output, progress indicators

Building production agents is an engineering discipline, not just a prompting exercise. This course will equip you with the concepts and patterns needed to bridge this gap.

Try It Yourself: Run the agent code (or adapt it for the Anthropic API). Try these queries and observe the agent's behavior: (1) "What is the square root of 144?" (Does it use the calculator for something it could compute directly?) (2) "What is the meaning of life?" (Does it try to use a tool when no tool is appropriate?) (3) "Calculate 2+2, then tell me what day it is" (Does it call both tools?)

098. Ethical Considerations and Open Questions

Ethics in AI agent design is not an afterthought or a box to check; it is woven into every design decision. When you choose how much autonomy to give an agent, you are making an ethical decision. When you decide what data the agent can access, you are making a privacy decision. When you deploy an agent that makes consequential decisions, you are making a fairness decision. This section introduces the key ethical dimensions that will recur throughout the course.

8.1 Autonomy and Control

As agents become more capable, a fundamental tension emerges: autonomy vs. control. More autonomous agents can accomplish more, but they are also harder to supervise and may take unexpected actions.

Consider this concrete scenario: you ask a coding agent to "clean up the repository." A cautious agent might rename some files and update imports. A more aggressive agent might delete files it considers unnecessary, refactor large sections of code, and rewrite the README. Both interpretations are valid, but one might destroy work you wanted to keep.

Key questions:

How much autonomy should we grant an AI agent?
What actions should always require human approval? (Deleting files? Sending emails? Making purchases?)
How do we design effective "human-in-the-loop" systems that do not become bottlenecks?
What is the right default: opt-in (agent does nothing without permission) or opt-out (agent does everything unless told to stop)?

Claude Code provides a practical example: it can read files and edit code freely, but it asks for permission before running shell commands. This is a pragmatic compromise between autonomy and safety: read operations are safe; write operations need a check; destructive operations need explicit approval.

8.2 Accountability

When an agent makes a mistake, say, a coding agent introduces a security vulnerability, or a research agent cites a nonexistent paper, who is responsible? The developer who built the agent? The user who instructed it? The company behind the LLM?

This is not a hypothetical question. As agents take on more consequential tasks (reviewing legal documents, making medical recommendations, managing financial portfolios), the accountability question becomes urgent. Current legal frameworks were not designed for autonomous AI agents, and the industry is still grappling with how to assign responsibility.

8.3 Transparency

Users interacting with agents should understand:

That they are interacting with an AI system (not a human)
What capabilities the agent has (and what it cannot do)
What limitations exist (knowledge cutoffs, potential for errors)
What data the agent can access (privacy implications)

Transparency is also important for debugging and trust. When an agent takes an unexpected action, the developer (and ideally the user) should be able to trace why. This is one reason why the ReAct architecture (Week 5) is so popular: the explicit reasoning traces make the agent's decisions inspectable.

8.4 Bias and Fairness

LLM-based agents inherit the biases of their training data and the design decisions of their creators. An agent that screens job applications, moderates content, or makes recommendations carries these biases into consequential decisions.

For example, a research agent tasked with "finding influential papers" might systematically favor papers from prestigious institutions or English-language venues, not because it was explicitly programmed to do so, but because its training data reflects existing biases in academic publishing.

Key Insight: Agents amplify biases through their action-taking capability. A biased language model generates biased text; a biased agent takes biased actions. The stakes are fundamentally different. This is why responsible agent design requires not just good models, but thoughtful architecture, careful tool design, and robust evaluation.

109. Discussion Questions

Agent vs. Tool: Consider a spell-checker and an AI writing assistant. At what point does a tool become an agent? What features would you require before calling a system an "agent"?

Starting point for thinking: Consider the spectrum from a simple Grammarly-style tool (fixes errors as you type) to a system that rewrites entire paragraphs, suggests structural changes, and maintains its own understanding of your document's goals. Where on this spectrum does "agency" begin?
Autonomy spectrum: Claude Code asks for permission before executing destructive commands. OpenAI's Operator pauses when it encounters a payment page. What principles should guide decisions about when agents should act autonomously vs. asking for permission?

Starting point: Think about the "reversibility" of actions. Can the action be undone? If yes, more autonomy might be acceptable. If no (sending an email, deleting data), human oversight becomes more important.
Russell and Norvig mapping: Where do modern LLM-based agents like Claude Code fit in the Russell and Norvig taxonomy? Are they goal-based? Utility-based? Learning agents? Could they be all of these at once?

Starting point: Consider that Claude Code maintains an internal model (of the codebase), pursues goals (fixing bugs, implementing features), and adapts its approach based on feedback (test results, error messages). Does it fit neatly into one category?
Unintended consequences: The story of Microsoft's Tay chatbot (2016) showed how an agent interacting with a hostile environment can quickly go wrong. What safeguards would you design for an agent operating on the open internet?

Starting point: Consider input validation, output filtering, action restrictions, rate limiting, and monitoring. Which of these would have prevented the Tay incident?
Historical perspective: Expert systems required explicit knowledge engineering. LLM agents learn from data. What are the advantages and disadvantages of each approach? Are there cases where expert systems might still be preferable?

Starting point: Think about domains where errors are extremely costly (medical diagnosis, nuclear power plant control). Would you trust an LLM agent or an expert system with rigorously tested rules? Why?

1110. Summary and Key Takeaways

An AI agent perceives its environment, reasons about it, and takes actions to achieve goals. The perception-action loop is the fundamental pattern.
The historical trajectory from expert systems to LLM-based agents represents a shift from manual knowledge engineering to learned general-purpose reasoning. Each generation addressed limitations of the previous one.
The Russell and Norvig taxonomy (simple reflex, model-based, goal-based, utility-based, learning) provides a useful framework for understanding agent capabilities, though modern LLM agents often blur the boundaries between categories.
LLM-based agents use a large language model as their reasoning core, augmented with four key components: planning (decomposing tasks), memory (storing and retrieving information), tool use (interacting with external systems), and action (executing decisions).
The agent loop (perceive, reason, act, observe, repeat) is simple in structure but rich in implementation detail. Even a minimal agent requires careful handling of tool selection, error recovery, and termination.
Real-world agents in 2025-2026 include coding agents (Claude Code, Cursor, Devin), research agents (Elicit), web agents (Computer Use, Operator), and multi-agent systems (ChatDev, AutoGen).
Ethical considerations around autonomy, accountability, transparency, and bias are not secondary concerns; they are central to responsible agent design and will only become more important as agents take on more consequential tasks.

1211. Practical Exercise

Build Your First Agent: Using the code template provided in Section 7, extend the minimal agent with the following:

Add a third tool: read_file(filepath) that reads the contents of a text file. Include proper error handling (file not found, permission denied) and a security check (restrict to a specific directory to prevent the agent from reading arbitrary files).
Modify the system prompt to give the agent a specific persona (e.g., "You are a helpful teaching assistant for a Computer Science course"). Observe how the persona affects the agent's responses and tool usage patterns.
Test the agent with three different queries that exercise different tools. Document what the agent does at each step.
Test multi-tool scenarios: Give the agent a query that requires multiple tool calls in sequence (e.g., "Read the file data.txt and calculate the average of the numbers in it"). Document:
- How many iterations the agent loop takes
- What tool calls the agent makes and in what order
- Whether the agent's reasoning (if visible) is correct
- Any failures or unexpected behaviors
Reflect on limitations: After testing, write a paragraph about what the agent does well and what it struggles with. What would you change to make it more reliable?

Deliverable: A Python script and a short report (1-2 pages) describing the agent's behavior and any limitations you observed.

13References

Brooks, R. A. (1986). A robust layered control system for a mobile robot. IEEE Journal on Robotics and Automation, 2(1), 14-23.
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.
Park, J. S., O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST).
Qian, C., Liu, W., Liu, H., Chen, N., Dang, Y., Li, J., ... & Sun, M. (2024). ChatDev: Communicative agents for software development. In Proceedings of the 62nd Annual Meeting of the ACL.
Rao, A. S., & Georgeff, M. P. (1995). BDI agents: From theory to practice. In Proceedings of the First International Conference on Multiagent Systems (ICMAS).
Russell, S., & Norvig, P. (2021). Artificial Intelligence: A Modern Approach (4th ed.). Pearson.
Schick, T., Dwivedi-Yu, J., Dessi, R., Raileanu, R., Lomeli, M., Hambro, E., ... & Scialom, T. (2023). Toolformer: Language models can teach themselves to use tools. In Advances in Neural Information Processing Systems (NeurIPS).
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., ... & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484-489.
Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., ... & Wang, J. (2024). A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6), 186345.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., ... & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems (NeurIPS).
Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., ... & Wang, C. (2023). AutoGen: Enabling next-gen LLM applications via multi-agent conversation. arXiv preprint arXiv:2308.08155.
Yang, J., Jimenez, C. E., Wettig, A., Liber, K., Narasimhan, K., & Press, O. (2024). SWE-agent: Agent-computer interfaces enable automated software engineering. In Advances in Neural Information Processing Systems (NeurIPS).
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing reasoning and acting in language models. In Proceedings of the International Conference on Learning Representations (ICLR).

Part of "Agentic AI: Foundations, Architectures, and Applications" (CC BY-SA 4.0).