Introduction to AI Agents
Foundational definitions of AI agents as systems that perceive, reason, and act. Historical evolution from BDI agents and reinforcement learning to LLM-based agents, with the Russell & Norvig taxonomy mapped onto modern agentic systems.
01Learning Objectives
By the end of this lecture, students will be able to:
- Define what an AI agent is and distinguish agents from conventional software systems.
- Trace the historical evolution from expert systems to modern LLM-based agents.
- Classify agents according to the Russell and Norvig taxonomy (simple reflex, model-based, goal-based, utility-based, learning).
- Identify the four core components of an LLM-based agent: planning, memory, tool use, and action.
- Describe at least three real-world deployed agent systems and explain their architectures at a high level.
- Implement a minimal agent loop in Python that interacts with an LLM.
021. What Is an AI Agent?
1.1 The Core Definition
An AI agent is a system that perceives its environment through sensors, reasons about what it perceives, and takes actions through actuators to achieve goals. This definition, adapted from Russell and Norvig (2021), is deceptively simple but carries profound implications.
To understand why this definition matters, consider what it excludes. A pocket calculator is not an agent: it does not perceive an environment or decide what to do next. A spam filter is barely an agent: it perceives (incoming emails) and acts (classifying them), but it follows rigid rules without any sense of goals or autonomy. An AI coding assistant that can read your codebase, decide what files need changing, make edits, run tests, and iterate until the tests pass is clearly an agent: it perceives, reasons, plans, acts, and adapts.
The key distinction between an agent and a regular program is autonomy: an agent operates with a degree of independence, making decisions without direct human intervention for each step. A web server responds to requests with predetermined logic. An agent decides what to do next.
More formally, an agent can be described as a function:
f: Percept* -> ActionThe agent function maps a sequence of percepts (the complete history of everything the agent has perceived) to an action. The agent program is the concrete implementation of this function running on a physical or virtual architecture.
Key Insight: The notation
Percept*(with the asterisk) is crucial. It means the agent's decision can depend on its entire history of observations, not just the current one. This is what distinguishes a model-based agent from a simple reflex agent, and it is the foundation of memory in agent systems.
1.2 The Perception-Action Loop
Every agent, no matter how simple or complex, follows a fundamental cycle:
Interactive · The Agent Perception-Action Loop
The agent loop
The agent loop
Every agent architecture rides on the same cycle: perceive, reason, act, observe. The architectures you'll meet later are variations on this loop.
The agent loop
Perception
01 / 04
This loop repeats continuously. The environment changes (partly due to the agent's actions, partly due to external factors), the agent perceives the new state, reasons, and acts again. This is sometimes called the sense-reason-act cycle or the perception-action loop.
Think of it like a doctor treating a patient. The doctor observes symptoms (perceive), considers possible diagnoses and treatments (reason), and prescribes medication or orders tests (act). Then the doctor waits, observes the patient's response to treatment (perceive again), adjusts the diagnosis if needed (reason), and changes the treatment plan (act again). This cycle continues until the patient recovers.
The perception-action loop may seem obvious, but it is surprisingly powerful as a design pattern. Every agent we will encounter in this course, from simple chatbots to sophisticated multi-agent systems, is fundamentally organized around this loop. What changes is the sophistication of each step.
1.3 Why Now? The Confluence of Factors
Before diving deeper, it is worth asking: why are AI agents suddenly everywhere in 2025-2026, when the concept has existed since the 1950s?
Several factors converged:
-
Large Language Models reached a capability threshold. Models like GPT-4, Claude, and Gemini can follow complex instructions, generate code, use tools, and reason through multi-step problems. Before 2022, no single model could do all of these things at once.
-
The cost of inference dropped dramatically. In 2023, a single API call to GPT-4 cost roughly $0.03 per 1K tokens. By 2025, equivalent capability costs a fraction of that with smaller, faster models. An agent that makes 20 API calls per task went from prohibitively expensive to pennies.
-
Tool-use infrastructure matured. Protocols like MCP (Model Context Protocol), function-calling APIs, and sandboxed execution environments gave agents a standardized way to interact with the world.
-
Developer tooling emerged. Frameworks like LangGraph, CrewAI, and the OpenAI Agents SDK made it practical to build agents without starting from scratch.
-
Real problems demand it. Software engineering, research, customer support, and data analysis all involve multi-step, context-dependent work that is poorly served by single-turn AI interactions.
1.4 Environments
The nature of the environment profoundly affects agent design. Russell and Norvig (2021) classify environments along several dimensions:
| Dimension | Options | Example |
|---|---|---|
| Observability | Fully observable vs. Partially observable | Chess (full) vs. Poker (partial) |
| Determinism | Deterministic vs. Stochastic | Puzzle solving (deterministic) vs. Stock trading (stochastic) |
| Episodic vs. Sequential | Independent episodes vs. Dependent decisions | Spam filtering (episodic) vs. Conversation (sequential) |
| Static vs. Dynamic | Environment changes while agent deliberates? | Crossword (static) vs. Self-driving (dynamic) |
| Discrete vs. Continuous | Finite states/actions vs. Continuous space | Board game (discrete) vs. Robotics (continuous) |
| Single vs. Multi-agent | One agent vs. Multiple interacting agents | Solitaire (single) vs. Negotiation (multi) |
To make these dimensions concrete, consider a coding agent like Claude Code operating in a software repository:
- Partially observable: The agent cannot see the entire codebase at once. It must choose which files to read, and some information (like runtime behavior) is only accessible by running code.
- Stochastic: Even deterministic code can have surprising behavior. The agent's own LLM calls are probabilistic, meaning the same input may produce different outputs.
- Sequential: Every file edit changes the state of the codebase. A change to one file may break another file, and the agent must account for these dependencies.
- Dynamic: If you are working in a team, other developers may push changes while the agent is working. The environment changes independently of the agent.
- Multi-agent: In a modern development workflow, there may be multiple AI assistants, CI/CD bots, and human developers all acting on the same codebase.
Modern LLM-based agents typically operate in environments that are partially observable, stochastic, sequential, dynamic, and multi-agent. This makes their design particularly challenging, which is why we need the structured approaches covered in this course.
Try It Yourself: Pick three software applications you use daily (e.g., a search engine, a navigation app, an email client). For each one, classify the environment along the six dimensions above. Which ones could benefit from an agentic architecture? Why?
1.5 Agents vs. Traditional Software
Consider the differences:
| Aspect | Traditional Software | AI Agent |
|---|---|---|
| Control flow | Predetermined | Dynamic, decided at runtime |
| Input handling | Defined API contract | Open-ended, often natural language |
| Error recovery | Predefined error handlers | Adaptive, can reason about failures |
| Goal specification | Hardcoded logic | Can interpret and decompose goals |
| Tool usage | Fixed integrations | Can discover and learn to use tools |
| Output | Deterministic | Variable, probabilistic |
| Adaptation | Requires code changes | Can adjust through prompting |
An important nuance: not every system that uses an LLM is an agent. This is a common misconception worth addressing directly.
A chatbot that answers questions in a single turn is not an agent; it is a tool. A system that calls an LLM once to classify an email is not an agent; it is a classifier with an LLM backend. An agent makes decisions about what to do next, maintains state across steps, and takes actions that change its environment.
Common Misconception: "If it uses GPT-4, it's an agent." This is incorrect. Agency is about the architecture (perception-action loop, autonomy, goal-directedness), not about the underlying model. You can build a non-agentic system on GPT-4 (a simple classifier) and an agentic system on a much smaller model (a robot controller with a 7B parameter model).
The boundary between "tool" and "agent" is fuzzy, and that is fine. A more useful question than "Is this an agent?" is "How much agency does this system have?" We can think of agency as a spectrum:
Interactive · The Agency Spectrum
Autonomy spectrum
Who decides at each step?
The right level depends on task risk and how reversible the action is. There's no universal pick.
L3 · HOTL
Live oversight
The agent acts autonomously and the human watches the live trace, ready to interrupt.
Example: Coding assistant with reviewer in the room.
1.6 A Mental Model for Agent Design
Throughout this course, it will help to think of an AI agent as analogous to a new employee at a company:
- The system prompt is like the job description and company handbook. It tells the agent who it is, what it can do, and what rules it must follow.
- Tools are like the software and equipment the employee has access to (email, databases, code editors).
- Memory is like the employee's notes, past experience, and the company wiki.
- The agent loop is like the daily work cycle: check your tasks, figure out what to do next, do it, observe the results, repeat.
- The LLM is like the employee's brain: their general knowledge, reasoning ability, and communication skills.
This analogy is imperfect (agents do not get tired, have emotions, or understand in the human sense), but it provides useful intuitions about agent design. Just as you would give a new employee clear instructions, the right tools, and manageable tasks, you need to give your agent a clear prompt, well-designed tools, and appropriate scope.
032. Historical Context: From Expert Systems to LLM-Based Agents
Understanding the history of AI agents is not just academic nostalgia; it reveals recurring challenges and design patterns that are still relevant today. Many "new" ideas in agentic AI are reinventions of older concepts with better technology.
2.1 Expert Systems (1970s-1990s)
The earliest AI agents were expert systems: rule-based programs that encoded human expertise as if-then rules. MYCIN (Shortliffe, 1976) diagnosed bacterial infections. XCON (McDermott, 1982) configured computer orders for DEC. DENDRAL (Feigenbaum et al., 1971) helped chemists identify molecular structures.
IF patient has fever AND patient has stiff neck
THEN suspect meningitis (confidence: 0.7)Strengths: Explainable, deterministic, domain-specific expertise. You could ask MYCIN why it made a particular diagnosis, and it would show you the chain of rules.
Weaknesses: Brittle, could not handle novel situations, required painstaking manual knowledge engineering, no learning capability. Building an expert system for a new domain meant interviewing human experts for months and manually encoding hundreds or thousands of rules.
The knowledge bottleneck was the fatal flaw. The world is too complex to encode as explicit rules. This limitation drove the AI community toward learning-based approaches.
Key Insight: The knowledge bottleneck of expert systems is exactly what LLMs solve. Instead of manually encoding knowledge as rules, LLMs absorb knowledge from vast corpora of text during pre-training. The challenge shifts from "how do we put knowledge in?" to "how do we get reliable behavior out?"
2.2 Behavior-Based Agents and Robotics (1980s-1990s)
Rodney Brooks (1986) challenged the classical AI approach with his subsumption architecture, arguing that intelligent behavior could emerge from simple reactive layers without explicit world models. His robots at MIT, like Herbert, which collected soda cans from desks, demonstrated that perception-action loops without complex reasoning could produce surprisingly capable behavior.
Brooks's key insight was that the world itself can serve as its own model. Instead of building an elaborate internal representation of the world and then reasoning about it, an agent can use its sensors to directly interact with the world in real time.
The subsumption architecture organized behaviors in layers:
This era introduced the idea that agents need not have a complete world model to act effectively, a principle that resurfaces in modern tool-using LLM agents. When Claude Code reads a file, it does not try to build a complete model of the entire codebase first; it reads what it needs, acts, observes the result, and iterates.
2.3 BDI Agents and Multi-Agent Systems (1990s-2000s)
The Belief-Desire-Intention (BDI) framework (Rao and Georgeff, 1995) formalized agent reasoning in a way that connects surprisingly well to modern LLM agents:
- Beliefs: What the agent thinks is true about the world. In a modern agent, this is the information in the context window plus any retrieved memories.
- Desires: The goals the agent wants to achieve. In a modern agent, this comes from the user's request and the system prompt.
- Intentions: The plans the agent has committed to executing. In a modern agent, this is the current plan (explicit or implicit) that guides the next action.
Platforms like JADE (Java Agent DEvelopment Framework) and multi-agent systems research explored how multiple agents could coordinate, negotiate, and cooperate. This work laid the conceptual groundwork for modern multi-agent AI systems like CrewAI and AutoGen.
The BDI framework also introduced the important distinction between reactive and deliberative agents:
- Reactive agents respond immediately to stimuli (like Brooks's robots).
- Deliberative agents maintain an internal model and plan before acting (like expert systems).
- Most practical agents are hybrid: they have a reactive layer for immediate responses and a deliberative layer for complex planning.
Modern LLM agents are inherently hybrid: the LLM's next-token generation is reactive (conditioned on the immediate context), but prompting strategies like Chain-of-Thought add a deliberative layer on top.
2.4 Reinforcement Learning Agents (2010s)
The combination of deep learning and reinforcement learning produced agents that could learn complex behaviors from interaction. Key milestones:
- DQN (Mnih et al., 2015): Playing Atari games at superhuman level from raw pixels. This showed that agents could learn complex control policies from high-dimensional sensory input.
- AlphaGo (Silver et al., 2016): Defeating the world champion in Go, a game with more possible positions than atoms in the universe. AlphaGo combined tree search with learned value functions.
- AlphaFold (Jumper et al., 2021): Predicting protein structures with remarkable accuracy, solving a 50-year-old biology problem.
These agents were powerful but narrow: each was trained for a single task and could not generalize. AlphaGo could not play chess, let alone write code or have a conversation. Training each agent required millions of episodes of interaction, specialized reward functions, and enormous computational resources.
Key Insight: RL agents learn from experience; LLM agents learn from text about experience. An RL agent that plays chess learns by playing millions of games. An LLM "learns" about chess by reading books, articles, and game transcripts written by humans. This fundamental difference explains both the generality of LLM agents (they can talk about anything) and their limitations (they may confidently describe chess strategies they cannot actually execute).
2.5 The LLM Revolution (2022-Present)
The launch of ChatGPT in November 2022 marked a turning point. For the first time, a single model could:
- Understand and generate natural language across domains
- Follow complex, multi-step instructions
- Write and reason about code
- Adopt different personas and follow system instructions
Researchers quickly realized that LLMs could serve as the reasoning core of general-purpose agents. The key papers that catalyzed this shift include:
- ReAct (Yao et al., 2023): Showed that interleaving reasoning traces with actions in a single LLM prompt dramatically improved agent performance. This paper is so important that we dedicate a large portion of Week 5 to it.
- Toolformer (Schick et al., 2023): Demonstrated that LLMs could learn to use external tools (calculators, search engines, APIs) through self-supervised learning. This opened the door to tool-augmented agents.
- Generative Agents (Park et al., 2023): Created a simulated town of 25 AI agents with believable social behaviors, using LLMs with memory architectures. This captured the public imagination and showed that LLM agents could produce emergent collective behavior.
The pace of development since then has been extraordinary. In roughly two years, we went from "can an LLM use a calculator?" to production coding agents, web-browsing agents, and multi-agent orchestration systems.
2.6 Timeline Summary
Interactive · Historical Timeline of AI Agents
Timeline
70 years of agents
From expert systems to today's LLM agents. Click any milestone to open its card.
2026
A2A · agents talking to agents
Formal protocols for multi-agent coordination at scale.
Try It Yourself: Pick one historical AI agent system (MYCIN, Herbert, AlphaGo, or another you find interesting). Research how it perceived its environment, made decisions, and took actions. Identify the perception-action loop. What were its biggest limitations, and how do modern LLM agents address them?
043. Taxonomy of Agents (Russell and Norvig)
Russell and Norvig (2021) define five types of agents, each more capable than the last. Understanding this taxonomy helps us appreciate what modern LLM-based agents are and what they are not. Think of it as a ladder of increasing sophistication.
3.1 Simple Reflex Agents
The simplest agent type. It selects actions based solely on the current percept, ignoring all history. No memory, no model of the world, no planning. Just condition-action rules.
def simple_reflex_agent(percept):
"""A thermostat-like agent."""
if percept["temperature"] > 25:
return "turn_on_cooling"
elif percept["temperature"] < 18:
return "turn_on_heating"
else:
return "do_nothing"This is the thermostat in your house. If the temperature is too high, turn on cooling. If too low, turn on heating. The thermostat does not remember what it did five minutes ago, does not plan for tomorrow's weather forecast, and does not reason about why the temperature is rising. It just reacts.
Condition-action rules (if-then rules) drive behavior. They can be surprisingly effective for well-defined tasks in fully observable, deterministic environments.
Limitations: Cannot handle partially observable environments. If the sensor breaks, the agent has no way to compensate. Cannot reason about the consequences of actions. Cannot handle situations not covered by its rules.
Real-world examples: Thermostats, simple spam filters (keyword-based), automatic door sensors, basic email auto-replies.
3.2 Model-Based Reflex Agents
These agents maintain an internal state, a model of the world, that they update with each percept. This allows them to handle partial observability by remembering things they cannot currently see.
class ModelBasedAgent:
def __init__(self):
self.state = {"room_occupied": False, "lights_on": False}
def update_state(self, percept):
if percept.get("motion_detected"):
self.state["room_occupied"] = True
self.state["last_motion_time"] = percept["timestamp"]
elif self._time_since_last_motion() > 600: # 10 minutes
self.state["room_occupied"] = False
def act(self, percept):
self.update_state(percept)
if self.state["room_occupied"] and not self.state["lights_on"]:
self.state["lights_on"] = True
return "turn_on_lights"
elif not self.state["room_occupied"] and self.state["lights_on"]:
self.state["lights_on"] = False
return "turn_off_lights"
return "do_nothing"The critical difference from a simple reflex agent: this smart lighting system remembers that someone was in the room ten minutes ago, even though the motion sensor has not detected anything recently. It uses this memory to decide that the room is probably still occupied (maybe the person is sitting still, reading).
Key Insight: The internal state allows the agent to reason about things it cannot currently observe. This is the same principle behind the "context window" in LLM agents: the conversation history serves as the agent's internal model of the ongoing task.
Real-world examples: Smart home systems, adaptive cruise control, anti-lock braking systems, inventory management systems.
3.3 Goal-Based Agents
Goal-based agents go beyond reacting to the current state. They consider the future: specifically, which actions will lead to achieving their goals. This is a qualitative leap from model-based agents. Model-based agents ask "What is the world like?" Goal-based agents ask "What do I want the world to be like, and how do I get there?"
This requires three things:
- A model of how the world evolves (if I take action A in state S, what state will result?).
- A representation of the goal state (what does "success" look like?).
- A search or planning algorithm to find a sequence of actions from the current state to the goal.
class GoalBasedAgent:
def __init__(self, goal):
self.goal = goal # e.g., {"location": "airport"}
self.state = {"location": "home"}
self.plan = []
def act(self, percept):
self.update_state(percept)
if self.goal_achieved():
return "done"
if not self.plan:
self.plan = self.search_for_plan()
if self.plan:
return self.plan.pop(0)
return "replan" # No plan found, try againConsider a navigation app. It knows your current location (state), your desired destination (goal), and how roads connect (world model). It searches for the best route (plan) and guides you step by step. If a road is blocked, it replans.
Goal-based agents can be flexible: if the route to the airport is blocked, they can find an alternative route. Simple reflex agents cannot do this because they have no concept of a goal or a plan.
The transition from model-based to goal-based is a qualitative leap. Model-based agents react to the world as it is. Goal-based agents act to make the world into what they want it to be.
Real-world examples: Navigation systems (Google Maps, Waze), game-playing AI, automated scheduling systems, robot path planning.
3.4 Utility-Based Agents
Sometimes there are multiple ways to achieve a goal, and some are better than others. A utility function maps states to a real number representing how "happy" the agent is in that state.
class UtilityBasedAgent:
def __init__(self):
self.state = {}
def utility(self, state):
"""Multi-objective utility combining several factors."""
return (
0.4 * state.get("comfort", 0) +
0.3 * state.get("energy_saved", 0) +
0.2 * state.get("safety", 0) +
0.1 * state.get("cost_saved", 0)
)
def act(self, percept):
self.update_state(percept)
possible_actions = self.get_possible_actions()
# Choose the action that maximizes expected utility
best_action = max(
possible_actions,
key=lambda a: self.expected_utility(a)
)
return best_actionConsider choosing a flight. A goal-based agent asks: "Does this flight get me to London?" If yes, the goal is achieved. But a utility-based agent asks: "How good is this option?" It weighs price, departure time, number of stops, airline reputation, and seat comfort. It might choose a slightly more expensive flight that departs at a reasonable hour over a cheap red-eye.
Utility-based agents handle trade-offs and uncertainty naturally. They can reason about risk (should I take the faster but riskier route?) and make decisions under multiple competing objectives.
Key Insight: When an LLM agent chooses between different approaches to solving a task, it is implicitly performing utility-based reasoning. The "utility function" is embedded in the model's training and the system prompt, not explicitly programmed. Understanding this helps explain why prompt engineering is so important: you are shaping the agent's implicit utility function.
Real-world examples: Recommendation systems (Netflix, Spotify), autonomous vehicles (balancing speed, safety, comfort), portfolio optimization, dynamic pricing systems.
3.5 Learning Agents
A learning agent has four conceptual components:
- Learning element: Improves the agent based on experience.
- Performance element: Selects actions (this is the agent as described above).
- Critic: Provides feedback on how the agent is doing.
- Problem generator: Suggests exploratory actions to discover new knowledge.
Interactive · Agent Taxonomy (Russell & Norvig)
Russell-Norvig taxonomy
Five architectures, one root
The classic agent taxonomy still describes what we see in modern LLM-based agents. Click any type to read its decision rule and a real example.
Learning agent
Any of the above becomes a learning agent once it adds a critic that updates the model or policy. Reflexion fits here.
Decision rule
policy ← update(policy, critic(trajectory))
Example
Reflexion: learns from failure verbally, no retraining.
The learning agent framework is elegant because it explains how an agent can improve over time. The problem generator is particularly interesting: it suggests actions that might not be optimal right now but that help the agent learn something useful for the future. This is the exploration vs. exploitation trade-off that appears throughout AI.
Modern LLM-based agents are closest to learning agents, but with an important caveat. Most current LLM agents do not update their model weights during deployment. Instead, they "learn" through:
- In-context learning: Adapting within a conversation based on examples and feedback provided in the prompt.
- External memory: Storing and retrieving past experiences using vector databases or other storage.
- Prompt refinement: Improving their own instructions based on feedback (this is what Reflexion does, as we will see in Week 5).
This is a form of non-parametric learning: the model parameters stay fixed, but the agent's effective behavior changes through its context and memory. It is as if an employee improved not by gaining new skills, but by taking better notes and referring to them more effectively.
Try It Yourself: Think about a recent multi-step task you completed at a computer (debugging code, writing a report, planning a trip). Identify the moments where you: (a) reacted reflexively, (b) consulted your "internal model" of the situation, (c) planned ahead, (d) weighed trade-offs, (e) learned from a mistake. How many of these behaviors would you want an AI agent to have?
054. The Modern AI Agent: LLM as the "Brain"
4.1 Why LLMs Changed Everything
Before LLMs, building an agent required:
- Manually specifying rules or reward functions
- Training specialized models for each capability
- Engineering complex pipelines to connect components
- Hiring domain experts to encode knowledge
LLMs provide a general-purpose reasoning engine that can:
- Understand natural language instructions (no formal specification needed)
- Generate plans in natural language
- Write and execute code
- Use tools described in natural language
- Maintain conversational context
- Adapt to new tasks through prompting alone
This collapses what used to require an entire engineering team into a single model call, augmented with relatively simple infrastructure.
To appreciate the magnitude of this shift, consider what it took to build a question-answering agent in 2015 versus 2025:
2015 approach: You would need a named entity recognizer, a dependency parser, a question classifier, a knowledge base, a query generator, a passage retriever, a reading comprehension model, and a response generator. Each component would be a separate model, trained on separate data, with custom glue code between them. The system might work well for factoid questions about one domain and fail completely on everything else.
2025 approach: You write a system prompt, define some tools (web search, calculator), and implement a basic agent loop. The LLM handles understanding, reasoning, planning, and response generation. Adding a new domain is a prompt change, not a re-engineering effort.
4.2 The LLM Agent Paradigm
In the modern paradigm, an LLM-based agent consists of:
Interactive · Components of an LLM-Based Agent
Agent anatomy
The four capabilities
LLM-based agents wrap four components around the model. Click any quadrant to open it.
C1
Planning
Decomposes the goal into executable steps and replans when execution drifts.
Example: Plan-and-Execute, ReAct, LATS
Core
LLM
The LLM is the central coordinator. It receives information from perception (user input, tool outputs, environment observations), reasons about what to do next, and dispatches actions through tools.
Imagine the LLM as the brain of a knowledge worker sitting at a desk. The desk has various tools (phone, computer, calculator, filing cabinet). The brain decides which tool to pick up and how to use it. The tools extend the brain's capabilities: the calculator handles precise arithmetic, the filing cabinet stores information for later retrieval, and the phone connects to the outside world. Similarly, the LLM decides when to call a search tool, when to run code, and when to retrieve information from memory.
065. Components of an LLM-Based Agent
The influential survey by Wang et al. (2024), "A Survey on Large Language Model Based Autonomous Agents," identifies four key components of LLM-based agents. We examine each in detail.
5.1 Planning
Planning is the ability to decompose a complex task into manageable sub-tasks and determine the order of execution. It is perhaps the most uniquely "agentic" capability: a system that plans is making decisions about future actions, not just reacting to the present.
Why planning matters: Without planning, an agent that is asked to "write a research report on climate change" might immediately start writing the conclusion. With planning, it first identifies the sections, determines what information it needs for each section, searches for that information, and then writes in a logical order.
Think about how you would approach a complex task at work. If your boss asks you to "prepare a market analysis," you do not immediately start typing. First, you think: What markets? What data do I need? Where can I find it? What format does the boss expect? What is the deadline? This decomposition step is planning, and it is essential for any task that cannot be completed in a single action.
LLM agents need the same capability, but they are surprisingly inconsistent at it. Sometimes they produce excellent plans; other times they miss critical steps or create plans with circular dependencies. This variability is why agent architectures (Week 5) often include explicit planning mechanisms rather than relying on the LLM's implicit planning ability.
Task Decomposition Strategies:
- Chain of Thought (CoT): The agent reasons step-by-step through the problem (Wei et al., 2022). This is the simplest form of planning: think before you act.
- Plan-then-Execute: Generate a complete plan first, then execute each step. This provides a roadmap but may need adjustment as the agent learns more.
- Iterative Refinement: Start with a rough plan, refine it as more information becomes available. This is the most flexible approach and mirrors how humans tackle complex tasks.
Example: Planning for a research task:
Goal: "Write a literature review on media bias detection"
Plan:
1. Define the scope: What types of media bias? Which time period?
2. Search academic databases for relevant papers
3. Read and categorize the papers by methodology
4. Identify key themes and trends
5. Draft the literature review sections
6. Add citations and format properly
7. Review and reviseReplanning: Good agents recognize when their initial plan is failing and adapt. If step 2 returns no results for a specific search query, the agent should try alternative queries rather than proceeding with no data. This adaptive replanning is what distinguishes a sophisticated agent from a rigid script.
The planning hierarchy: In practice, planning happens at multiple levels:
- Strategic planning: "I need to write a literature review" (overall goal)
- Tactical planning: "First I'll search for papers, then categorize them, then write each section" (sub-task decomposition)
- Operational planning: "For the search, I'll use Google Scholar with these keywords" (specific action details)
Good agent architectures handle all three levels. We will see this in Plan-and-Execute (Week 5), where the planner handles strategic and tactical levels, and the executor handles operational details.
Key Insight: Planning is where LLM agents most frequently fail. Studies have shown that LLMs often generate plausible-looking plans that have subtle dependency errors (step 4 requires information that is only available after step 6) or miss critical steps. This is why architectures like Plan-and-Execute (Week 5) include explicit replanning mechanisms.
5.2 Memory
LLM agents need memory systems because LLMs alone have a fixed context window. Once the conversation exceeds the context window, older information is lost. Memory comes in several forms, each serving a different purpose:
Short-Term Memory (Working Memory):
- The current conversation or context window
- Typically 8K-200K tokens depending on the model
- Includes the current task, recent observations, and intermediate results
- Analogous to a human's working memory: the information you are actively thinking about
Long-Term Memory:
- Persistent storage beyond the context window
- Implemented via vector databases (e.g., Pinecone, Chroma, pgvector)
- Retrieval-Augmented Generation (RAG): retrieve relevant past information when needed
- Analogous to a human's reference library: information stored externally that you look up when needed
Episodic Memory:
- Records of past experiences and their outcomes
- "Last time I tried approach X, it failed because Y"
- Enables learning from past mistakes
- Analogous to a human's autobiographical memory: remembering what happened in specific past situations
Semantic Memory:
- Factual knowledge about the world
- Often stored as embeddings or knowledge graphs
- Enables the agent to recall domain-specific information
- Analogous to a human's general knowledge: facts you know but cannot trace to a specific experience
class AgentMemory:
"""A simplified agent memory system."""
def __init__(self):
self.short_term = [] # Recent conversation turns
self.long_term = [] # Stored experiences
self.episodic = [] # Past task outcomes
def add_to_short_term(self, observation: str):
self.short_term.append(observation)
# Keep only the last N observations to fit context
if len(self.short_term) > 50:
self.compress_and_archive()
def recall_relevant(self, query: str, k: int = 5) -> list[str]:
"""Retrieve k most relevant memories for the current context."""
# In practice, this would use embedding similarity search
return self._semantic_search(query, self.long_term, k)
def store_experience(self, task: str, outcome: str, success: bool):
"""Store a task outcome for future reference."""
self.episodic.append({
"task": task,
"outcome": outcome,
"success": success,
"timestamp": datetime.now()
})Let us walk through this code to understand the design:
add_to_short_term: Every observation (tool result, user message, etc.) goes into short-term memory. But short-term memory has a limit (50 items here, representing the context window constraint). When it overflows,compress_and_archivewould summarize older entries and move them to long-term storage.recall_relevant: When the agent needs information from the past, it does not search through everything; it uses semantic similarity to find the most relevant memories. This is like how you do not re-read every book you have ever read; you recall the one that seems most relevant to your current question.store_experience: After completing a task, the agent records what happened. This creates a growing database of experiences that can inform future decisions.
Try It Yourself: Think about how you manage your own "memory" when working on a complex project. What do you keep in your head (working memory)? What do you write down in notes (long-term memory)? What past experiences influence your current approach (episodic memory)? How would you design an artificial system that mimics your approach?
5.3 Tool Use
Tools extend the agent beyond pure text generation. They bridge the gap between what the LLM can reason about and what it can actually do in the world.
Common tool categories:
| Category | Examples | Purpose |
|---|---|---|
| Information retrieval | Web search, database queries, document retrieval | Access external knowledge |
| Computation | Calculator, Python interpreter, Wolfram Alpha | Precise calculations |
| Code execution | Sandboxed environments, REPLs | Run and test code |
| File operations | Read, write, edit files | Interact with the file system |
| Communication | Email, Slack, API calls | Interact with external services |
| Perception | Image analysis, OCR, audio transcription | Process non-text inputs |
A tool is typically described to the agent as a function signature with a natural language description:
tools = [
{
"name": "web_search",
"description": "Search the web for current information. Use this when you need up-to-date facts.",
"parameters": {
"query": {
"type": "string",
"description": "The search query"
}
}
},
{
"name": "calculator",
"description": "Perform mathematical calculations. Use this for any arithmetic.",
"parameters": {
"expression": {
"type": "string",
"description": "A mathematical expression, e.g., '2 + 2 * 3'"
}
}
}
]The LLM decides when to call a tool, which tool to call, and what arguments to pass. This is a remarkable capability that emerges from training on large corpora that include code, documentation, and tool-use examples.
Think about how you decide to use a calculator. You do not pull out a calculator for "2 + 3"; you compute that in your head. But for "What is 7.3% of $148,293.57?", you reach for the calculator because you know mental arithmetic is unreliable for that level of precision. LLM agents exhibit similar behavior: they use tools selectively, based on the difficulty and precision requirements of the task.
Key Insight: The tool description is arguably more important than the tool implementation. The LLM decides whether and how to use a tool based entirely on its natural language description. A well-implemented tool with a poor description will be misused; a well-described tool will be used correctly even if the implementation is simple.
5.4 Action
The action component is the interface between the agent's decisions and the external world. Actions can be:
- Digital actions: API calls, code execution, file modifications, sending messages.
- Physical actions: In robotics, controlling motors, grippers, or other actuators.
- Communicative actions: Generating responses to users, asking clarifying questions, delegating to other agents.
A critical design decision is the action space: what actions are available to the agent? A narrow action space (few, well-defined actions) is safer but less capable. A broad action space (many possible actions, including code execution) is more powerful but riskier.
Consider the difference:
- Narrow: The agent can classify emails as "urgent" or "not urgent." Safe, predictable, but very limited.
- Medium: The agent can read emails, search a knowledge base, and draft replies for human review. Useful and reasonably safe.
- Broad: The agent can read emails, search the web, write code, execute code, send emails, and modify files. Very powerful, but a mistake could have serious consequences.
This trade-off between capability and safety is one of the central tensions in agent design, and we will return to it throughout this course.
5.5 How the Four Components Work Together
In practice, the four components are deeply interconnected. Consider a coding agent fixing a bug:
- Planning: "I need to understand the bug, find the cause, fix it, and verify the fix."
- Memory: "This codebase uses a specific testing framework (from earlier conversation). The bug was reported in the authentication module."
- Tool use: Read the failing test file, search the codebase for related functions, edit the buggy file, run the tests.
- Action: Execute each tool call, observe the results, and decide what to do next.
The agent alternates between these components fluidly. It plans (decompose the task), uses tools (read files), stores information in memory (what did the code look like?), plans again (now that I see the bug, how should I fix it?), uses tools (edit the file, run tests), and evaluates the result (did the tests pass?).
Key Insight: The four components (planning, memory, tool use, action) are not separate modules that run in sequence. They are interleaved and interdependent. Planning requires memory (what do I already know?). Tool use requires planning (which tool should I use?). Memory is updated by actions (what did I just learn?). Designing agents means designing how these components interact.
076. Real-World Agent Systems (2025-2026)
6.1 Coding Agents
Coding agents are the most mature and widely deployed category of LLM-based agents in 2025-2026. They demonstrate the full agent paradigm: perception (reading code), reasoning (understanding bugs), planning (deciding what to change), tool use (running tests, searching codebases), and action (editing files).
Why did coding agents mature first? Several reasons: (1) Code has clear success criteria (does it compile? do the tests pass?), making it easier to evaluate agent performance. (2) The environment is digital, avoiding the challenges of physical interaction. (3) There is a massive amount of code in training data, so LLMs have strong coding priors. (4) Developers, being the first users of AI tools, provided rapid feedback loops for improvement.
Claude Code (Anthropic, 2025)
- An agentic coding assistant that operates directly in the terminal
- Can read files, write code, execute commands, search codebases, and run tests
- Uses a perception-action loop: observe the codebase state, reason about what to change, make edits, verify with tests
- Notable for its tool-use architecture with explicit permission systems: it asks for approval before running potentially destructive commands
- Demonstrates the "human-in-the-loop" pattern at its best: autonomous for safe operations, supervised for risky ones
Cursor (Cursor Inc., 2024-2025)
- An IDE built around AI-assisted coding
- The "Composer" feature acts as an agent: it can edit multiple files, run terminal commands, and iterate on feedback
- Demonstrates how agents can be embedded in existing developer workflows rather than replacing them
- Important lesson: agents do not always need to be standalone systems; they can augment existing tools
Devin (Cognition Labs, 2024)
- Marketed as an "AI software engineer"
- Can plan tasks, write code, debug, and deploy applications
- Operates in a sandboxed environment with browser, terminal, and editor access
- Illustrated both the potential and limitations of autonomous coding agents: impressive demos, but real-world performance showed that full autonomy remains challenging
SWE-agent (Yang et al., 2024)
- An academic research project that created an agent for solving GitHub issues
- Demonstrated that agent-computer interface design matters as much as the underlying model: by designing a custom interface (a simplified terminal with helpful commands), the researchers significantly improved the agent's ability to navigate and edit code
- This is an important lesson: the environment and tools you give an agent are as important as the agent's reasoning capability
6.2 Research and Knowledge Agents
Elicit (Ought)
- An AI research assistant that can search papers, extract claims, and synthesize findings
- Uses tool-augmented LLMs to interact with academic databases
- Demonstrates domain-specific agent design for literature review
- Shows how agents can add value in knowledge work by handling tedious tasks (finding papers, extracting data) while leaving analysis to humans
Semantic Scholar's Research Agent
- AI-powered features for finding, filtering, and understanding scientific literature
- Illustrates how agents can operate over structured knowledge bases
- Uses the concept of "Agentic RAG": the agent decides what to search for, evaluates results, and iterates
6.3 Web and Computer Use Agents
Anthropic Computer Use (2024-2025)
- An agent that can see and interact with a computer screen
- Uses screenshot perception + mouse/keyboard actions
- Demonstrates how visual perception can augment traditional text-based agents
- The perception-action loop here is literal: look at the screen, decide what to click, click it, look at the result
OpenAI Operator (2025)
- A web browsing agent that can complete tasks in a browser
- Navigates websites, fills forms, clicks buttons
- Raises significant questions about authentication, security, and trust
- When it encounters a payment page, it pauses and asks the human for approval: a practical example of action-space restriction for safety
6.4 Multi-Agent Systems
ChatDev (Qian et al., 2024)
- Simulates a software company with multiple LLM agents (CEO, CTO, Programmer, Tester)
- Agents collaborate through structured communication protocols
- Demonstrates emergent coordination behavior: the CEO sets requirements, the CTO designs architecture, the Programmer implements, and the Tester finds bugs
- Shows that role specialization can improve quality even when all agents use the same underlying model
AutoGen (Microsoft, 2023)
- A framework for building multi-agent conversational systems
- Agents can have different roles, capabilities, and even different underlying models
- Supports both autonomous and human-in-the-loop workflows
- Pioneered the concept of "conversational agent orchestration" where agents talk to each other to solve problems
6.5 The Agent Landscape in 2025-2026
The agent ecosystem has matured significantly:
Frameworks:
- LangChain / LangGraph: The most widely used agent framework, with support for complex agent workflows as directed graphs
- CrewAI: Focuses on multi-agent role-based collaboration
- AutoGen: Microsoft's framework for conversational multi-agent systems
- Pydantic AI: Type-safe agent framework with strong validation
- OpenAI Agents SDK: Lightweight framework with first-class support for agent handoffs
Infrastructure:
- Model Context Protocol (MCP): Anthropic's standard for connecting agents to tools and data sources, analogous to USB for AI. We will cover this in depth in Week 4.
- Vector databases: Pinecone, Chroma, Weaviate, pgvector for agent memory
- Sandboxing: E2B, Modal, and Docker for safe code execution
Deployment patterns:
- Single-agent, human-in-the-loop: The most common pattern (e.g., Claude Code, Cursor). The agent works autonomously but pauses for approval at critical points.
- Multi-agent orchestration: Specialized agents coordinated by an orchestrator. Emerging in production for complex workflows.
- Fully autonomous: Still rare in production due to reliability and safety concerns. Used mainly in controlled environments with clear success criteria.
087. Building a Simple Agent: A Python Implementation
Let us build a minimal but functional agent loop. This example demonstrates the core pattern that underlies all LLM-based agents. We will go through it carefully, explaining every design decision.
7.1 The Agent Loop
"""
A minimal LLM-based agent demonstrating the perception-action loop.
This agent can answer questions by optionally searching the web
or performing calculations. It decides which tool to use (if any)
based on the user's query.
"""
import json
from openai import OpenAI # Or any LLM client library
# --- Configuration ---
client = OpenAI() # Uses OPENAI_API_KEY environment variable
MODEL = "gpt-4o" # Or "claude-sonnet-4-20250514" with Anthropic client
# --- Tool Definitions ---
TOOLS = [
{
"type": "function",
"function": {
"name": "calculator",
"description": "Evaluate a mathematical expression. Use for any arithmetic.",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "A Python mathematical expression, e.g., '2**10 + 3*4'"
}
},
"required": ["expression"]
}
}
},
{
"type": "function",
"function": {
"name": "get_current_date",
"description": "Get the current date and time.",
"parameters": {
"type": "object",
"properties": {}
}
}
}
]
# --- Tool Implementations ---
def calculator(expression: str) -> str:
"""Safely evaluate a mathematical expression."""
try:
# WARNING: In production, use a proper sandbox, not eval()
allowed_names = {"__builtins__": {}}
import math
allowed_names.update({k: v for k, v in math.__dict__.items()
if not k.startswith('_')})
result = eval(expression, allowed_names)
return json.dumps({"result": result})
except Exception as e:
return json.dumps({"error": str(e)})
def get_current_date() -> str:
"""Return the current date and time."""
from datetime import datetime
return json.dumps({"date": datetime.now().isoformat()})
TOOL_REGISTRY = {
"calculator": calculator,
"get_current_date": get_current_date,
}
# --- The Agent Loop ---
def run_agent(user_query: str, max_iterations: int = 10) -> str:
"""
Run the agent loop.
The agent will:
1. Receive the user's query
2. Decide whether to use a tool or respond directly
3. If using a tool, execute it and feed the result back
4. Repeat until the agent produces a final response
"""
messages = [
{
"role": "system",
"content": (
"You are a helpful assistant. You have access to tools. "
"Use them when needed to provide accurate answers. "
"Think step by step before answering."
)
},
{"role": "user", "content": user_query}
]
for iteration in range(max_iterations):
print(f"\n--- Iteration {iteration + 1} ---")
# Step 1: Call the LLM
response = client.chat.completions.create(
model=MODEL,
messages=messages,
tools=TOOLS,
tool_choice="auto" # Let the model decide
)
assistant_message = response.choices[0].message
messages.append(assistant_message)
# Step 2: Check if the agent wants to use tools
if assistant_message.tool_calls:
for tool_call in assistant_message.tool_calls:
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)
print(f" Tool call: {function_name}({function_args})")
# Step 3: Execute the tool
if function_name in TOOL_REGISTRY:
result = TOOL_REGISTRY[function_name](**function_args)
else:
result = json.dumps({"error": f"Unknown tool: {function_name}"})
print(f" Tool result: {result}")
# Step 4: Feed the result back to the LLM
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
else:
# No tool calls — the agent is done
final_response = assistant_message.content
print(f"\n Final response: {final_response}")
return final_response
return "Agent reached maximum iterations without producing a final answer."
# --- Run the Agent ---
if __name__ == "__main__":
# Example queries
queries = [
"What is 2^32 minus 1?",
"What day is it today?",
"If I invest $10,000 at 7% annual compound interest, how much will I have after 20 years?",
]
for query in queries:
print(f"\n{'='*60}")
print(f"User: {query}")
result = run_agent(query)
print(f"\nAgent: {result}")7.2 Analyzing the Agent Loop Line by Line
Let us trace through what happens when a user asks: "If I invest $10,000 at 7% annual compound interest, how much will I have after 20 years?"
Setting up (lines in run_agent):
- The
messageslist is initialized with a system prompt and the user's query. This is the agent's "working memory" for this task. - The
max_iterationsparameter is a safety valve: it prevents the agent from looping forever.
Iteration 1:
- The LLM receives the messages and the tool definitions. It sees the calculator tool and decides: "I need to calculate compound interest. I should use the calculator."
- It generates a
tool_callwithfunction_name="calculator"andarguments={"expression": "10000 * (1.07 ** 20)"}. - We execute the calculator:
eval("10000 * (1.07 ** 20)")returns38696.84. - The result is added to
messagesas a tool response.
Iteration 2:
- The LLM now sees the original question AND the calculator result.
- It decides it has enough information and generates a final text response: "After 20 years at 7% compound interest, your 38,696.84."
- Since there are no tool calls, the loop exits.
The code demonstrates several fundamental principles:
-
The loop: The agent iterates, calling the LLM and executing tools, until it produces a final response without tool calls.
-
Tool selection: The LLM decides which tool to use (or not). This decision is based on the system prompt, tool descriptions, and the user's query. The
tool_choice="auto"parameter tells the model it can choose to use tools or respond directly. -
Message history: Each LLM call includes the full conversation history, including tool call results. This gives the LLM context to reason about what it has already done.
-
Termination: The loop ends when the LLM produces a response without requesting any tool calls, or when the maximum iteration count is reached (a safety measure).
-
Error handling: The
TOOL_REGISTRYlookup handles the case where the model requests an unknown tool. In production, you would add much more robust error handling.
7.3 Tracing Through a Multi-Tool Scenario
To build deeper intuition, let us trace through a more complex query: "What day is it today, and what is 2^(today's day number)?"
Iteration 1:
- The LLM sees the question and the two available tools. It reasons (implicitly): "I need today's date first, then I can calculate."
- It generates a tool call:
get_current_date(). - We execute the tool: returns
{"date": "2026-03-19T14:30:00"}. - The result is added to the messages.
Iteration 2:
- The LLM now sees the question AND the date result. It extracts the day number (19) and decides to calculate.
- It generates a tool call:
calculator(expression="2**19"). - We execute the tool: returns
{"result": 524288}. - The result is added to the messages.
Iteration 3:
- The LLM now sees the question, the date, and the calculation result. It has everything it needs.
- It generates a final text response: "Today is March 19, 2026. 2 raised to the power of 19 (the day number) equals 524,288."
- No tool calls, so the loop exits.
This trace demonstrates sequential tool use: the second tool call depended on the result of the first. The LLM managed this dependency implicitly, without any explicit planning mechanism. For simple dependencies like this, the basic loop works well. For more complex dependency chains, we need the architectures covered in Week 5.
7.4 What This Agent Cannot Do (Yet)
This minimal agent lacks several capabilities that production agents require:
- Memory beyond the conversation: It has no long-term memory. If you ask it a question in one run, it will not remember the answer in the next run. Each invocation of
run_agent()starts from scratch. - Multi-step planning: It reacts turn-by-turn rather than planning ahead. It does not think "First I'll calculate this, then I'll look up that, then I'll combine the results." It just takes the next best action at each step.
- Self-reflection: It does not evaluate whether its answers are correct. If the calculator returned a wrong result (due to a wrong expression), the agent would not catch it. It trusts its tools completely.
- Error recovery: If a tool fails, it may not recover gracefully. A production agent should retry with different parameters or try an alternative approach.
- Parallel tool execution: It processes tools sequentially. If it needs three independent pieces of information, it fetches them one at a time, even though they could be fetched simultaneously.
- Cost awareness: The agent does not track how many tokens it has used or how much the task is costing. A production agent needs budget controls.
We will address each of these limitations in subsequent weeks of this course: prompting strategies for better reasoning (Week 3), tool use and MCP (Week 4), agent architectures for planning and reflection (Week 5), and memory systems (Week 7).
7.5 From Minimal to Production: The Gap
The gap between our minimal agent and a production agent is significant. To give you a sense of what production agents handle, here is a non-exhaustive list of concerns:
| Concern | Minimal Agent | Production Agent |
|---|---|---|
| Error handling | Crashes on API errors | Retries with exponential backoff, graceful degradation |
| Tool validation | Trusts model's tool calls | Validates arguments against schemas, sanitizes inputs |
| Cost control | No budget tracking | Token counting, cost limits, model routing |
| Observability | Print statements | Structured logging, tracing, metrics dashboards |
| Security | No restrictions | Sandboxed execution, permission systems, input sanitization |
| Memory | Conversation only | Vector databases, episodic memory, experience replay |
| Concurrency | Sequential | Parallel tool execution, async operations |
| User experience | Wait for full response | Streaming output, progress indicators |
Building production agents is an engineering discipline, not just a prompting exercise. This course will equip you with the concepts and patterns needed to bridge this gap.
Try It Yourself: Run the agent code (or adapt it for the Anthropic API). Try these queries and observe the agent's behavior: (1) "What is the square root of 144?" (Does it use the calculator for something it could compute directly?) (2) "What is the meaning of life?" (Does it try to use a tool when no tool is appropriate?) (3) "Calculate 2+2, then tell me what day it is" (Does it call both tools?)
098. Ethical Considerations and Open Questions
Ethics in AI agent design is not an afterthought or a box to check; it is woven into every design decision. When you choose how much autonomy to give an agent, you are making an ethical decision. When you decide what data the agent can access, you are making a privacy decision. When you deploy an agent that makes consequential decisions, you are making a fairness decision. This section introduces the key ethical dimensions that will recur throughout the course.
8.1 Autonomy and Control
As agents become more capable, a fundamental tension emerges: autonomy vs. control. More autonomous agents can accomplish more, but they are also harder to supervise and may take unexpected actions.
Consider this concrete scenario: you ask a coding agent to "clean up the repository." A cautious agent might rename some files and update imports. A more aggressive agent might delete files it considers unnecessary, refactor large sections of code, and rewrite the README. Both interpretations are valid, but one might destroy work you wanted to keep.
Key questions:
- How much autonomy should we grant an AI agent?
- What actions should always require human approval? (Deleting files? Sending emails? Making purchases?)
- How do we design effective "human-in-the-loop" systems that do not become bottlenecks?
- What is the right default: opt-in (agent does nothing without permission) or opt-out (agent does everything unless told to stop)?
Claude Code provides a practical example: it can read files and edit code freely, but it asks for permission before running shell commands. This is a pragmatic compromise between autonomy and safety: read operations are safe; write operations need a check; destructive operations need explicit approval.
8.2 Accountability
When an agent makes a mistake, say, a coding agent introduces a security vulnerability, or a research agent cites a nonexistent paper, who is responsible? The developer who built the agent? The user who instructed it? The company behind the LLM?
This is not a hypothetical question. As agents take on more consequential tasks (reviewing legal documents, making medical recommendations, managing financial portfolios), the accountability question becomes urgent. Current legal frameworks were not designed for autonomous AI agents, and the industry is still grappling with how to assign responsibility.
8.3 Transparency
Users interacting with agents should understand:
- That they are interacting with an AI system (not a human)
- What capabilities the agent has (and what it cannot do)
- What limitations exist (knowledge cutoffs, potential for errors)
- What data the agent can access (privacy implications)
Transparency is also important for debugging and trust. When an agent takes an unexpected action, the developer (and ideally the user) should be able to trace why. This is one reason why the ReAct architecture (Week 5) is so popular: the explicit reasoning traces make the agent's decisions inspectable.
8.4 Bias and Fairness
LLM-based agents inherit the biases of their training data and the design decisions of their creators. An agent that screens job applications, moderates content, or makes recommendations carries these biases into consequential decisions.
For example, a research agent tasked with "finding influential papers" might systematically favor papers from prestigious institutions or English-language venues, not because it was explicitly programmed to do so, but because its training data reflects existing biases in academic publishing.
Key Insight: Agents amplify biases through their action-taking capability. A biased language model generates biased text; a biased agent takes biased actions. The stakes are fundamentally different. This is why responsible agent design requires not just good models, but thoughtful architecture, careful tool design, and robust evaluation.
109. Discussion Questions
-
Agent vs. Tool: Consider a spell-checker and an AI writing assistant. At what point does a tool become an agent? What features would you require before calling a system an "agent"?
Starting point for thinking: Consider the spectrum from a simple Grammarly-style tool (fixes errors as you type) to a system that rewrites entire paragraphs, suggests structural changes, and maintains its own understanding of your document's goals. Where on this spectrum does "agency" begin?
-
Autonomy spectrum: Claude Code asks for permission before executing destructive commands. OpenAI's Operator pauses when it encounters a payment page. What principles should guide decisions about when agents should act autonomously vs. asking for permission?
Starting point: Think about the "reversibility" of actions. Can the action be undone? If yes, more autonomy might be acceptable. If no (sending an email, deleting data), human oversight becomes more important.
-
Russell and Norvig mapping: Where do modern LLM-based agents like Claude Code fit in the Russell and Norvig taxonomy? Are they goal-based? Utility-based? Learning agents? Could they be all of these at once?
Starting point: Consider that Claude Code maintains an internal model (of the codebase), pursues goals (fixing bugs, implementing features), and adapts its approach based on feedback (test results, error messages). Does it fit neatly into one category?
-
Unintended consequences: The story of Microsoft's Tay chatbot (2016) showed how an agent interacting with a hostile environment can quickly go wrong. What safeguards would you design for an agent operating on the open internet?
Starting point: Consider input validation, output filtering, action restrictions, rate limiting, and monitoring. Which of these would have prevented the Tay incident?
-
Historical perspective: Expert systems required explicit knowledge engineering. LLM agents learn from data. What are the advantages and disadvantages of each approach? Are there cases where expert systems might still be preferable?
Starting point: Think about domains where errors are extremely costly (medical diagnosis, nuclear power plant control). Would you trust an LLM agent or an expert system with rigorously tested rules? Why?
1110. Summary and Key Takeaways
-
An AI agent perceives its environment, reasons about it, and takes actions to achieve goals. The perception-action loop is the fundamental pattern.
-
The historical trajectory from expert systems to LLM-based agents represents a shift from manual knowledge engineering to learned general-purpose reasoning. Each generation addressed limitations of the previous one.
-
The Russell and Norvig taxonomy (simple reflex, model-based, goal-based, utility-based, learning) provides a useful framework for understanding agent capabilities, though modern LLM agents often blur the boundaries between categories.
-
LLM-based agents use a large language model as their reasoning core, augmented with four key components: planning (decomposing tasks), memory (storing and retrieving information), tool use (interacting with external systems), and action (executing decisions).
-
The agent loop (perceive, reason, act, observe, repeat) is simple in structure but rich in implementation detail. Even a minimal agent requires careful handling of tool selection, error recovery, and termination.
-
Real-world agents in 2025-2026 include coding agents (Claude Code, Cursor, Devin), research agents (Elicit), web agents (Computer Use, Operator), and multi-agent systems (ChatDev, AutoGen).
-
Ethical considerations around autonomy, accountability, transparency, and bias are not secondary concerns; they are central to responsible agent design and will only become more important as agents take on more consequential tasks.
1211. Practical Exercise
Build Your First Agent: Using the code template provided in Section 7, extend the minimal agent with the following:
-
Add a third tool:
read_file(filepath)that reads the contents of a text file. Include proper error handling (file not found, permission denied) and a security check (restrict to a specific directory to prevent the agent from reading arbitrary files). -
Modify the system prompt to give the agent a specific persona (e.g., "You are a helpful teaching assistant for a Computer Science course"). Observe how the persona affects the agent's responses and tool usage patterns.
-
Test the agent with three different queries that exercise different tools. Document what the agent does at each step.
-
Test multi-tool scenarios: Give the agent a query that requires multiple tool calls in sequence (e.g., "Read the file data.txt and calculate the average of the numbers in it"). Document:
- How many iterations the agent loop takes
- What tool calls the agent makes and in what order
- Whether the agent's reasoning (if visible) is correct
- Any failures or unexpected behaviors
-
Reflect on limitations: After testing, write a paragraph about what the agent does well and what it struggles with. What would you change to make it more reliable?
Deliverable: A Python script and a short report (1-2 pages) describing the agent's behavior and any limitations you observed.
13References
- Brooks, R. A. (1986). A robust layered control system for a mobile robot. IEEE Journal on Robotics and Automation, 2(1), 14-23.
- Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589.
- Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.
- Park, J. S., O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST).
- Qian, C., Liu, W., Liu, H., Chen, N., Dang, Y., Li, J., ... & Sun, M. (2024). ChatDev: Communicative agents for software development. In Proceedings of the 62nd Annual Meeting of the ACL.
- Rao, A. S., & Georgeff, M. P. (1995). BDI agents: From theory to practice. In Proceedings of the First International Conference on Multiagent Systems (ICMAS).
- Russell, S., & Norvig, P. (2021). Artificial Intelligence: A Modern Approach (4th ed.). Pearson.
- Schick, T., Dwivedi-Yu, J., Dessi, R., Raileanu, R., Lomeli, M., Hambro, E., ... & Scialom, T. (2023). Toolformer: Language models can teach themselves to use tools. In Advances in Neural Information Processing Systems (NeurIPS).
- Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., ... & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484-489.
- Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., ... & Wang, J. (2024). A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6), 186345.
- Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., ... & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems (NeurIPS).
- Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., ... & Wang, C. (2023). AutoGen: Enabling next-gen LLM applications via multi-agent conversation. arXiv preprint arXiv:2308.08155.
- Yang, J., Jimenez, C. E., Wettig, A., Liber, K., Narasimhan, K., & Press, O. (2024). SWE-agent: Agent-computer interfaces enable automated software engineering. In Advances in Neural Information Processing Systems (NeurIPS).
- Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing reasoning and acting in language models. In Proceedings of the International Conference on Learning Representations (ICLR).
Part of "Agentic AI: Foundations, Architectures, and Applications" (CC BY-SA 4.0).