Planning and Task Decomposition
From classical planning (STRIPS, PDDL) to LLM-based natural language planning. Hierarchical task decomposition, plan validation and refinement, dynamic replanning when execution drifts. Integration with the architectures from week 5.
01Learning Objectives
By the end of this lecture, students will be able to:
- Trace the evolution of AI planning from classical approaches (STRIPS, PDDL) to LLM-based natural language planning.
- Explain hierarchical task decomposition and its advantages for complex agent tasks.
- Design plan generation, validation, and refinement pipelines for LLM-based agents.
- Implement dynamic replanning strategies that allow agents to adapt when execution deviates from the plan.
- Construct task graphs with dependency management for parallel and sequential execution.
- Compare modern planning approaches including Tree of Thoughts and Language Agent Tree Search (LATS).
- Build a task decomposition agent in Python that breaks complex goals into executable subtasks.
021. Classical AI Planning
A Brief History
Planning has been a central topic in AI since the field's inception -- arguably, it is the original AI problem. The challenge: given a description of the world, a set of available actions, and a goal, find a sequence of actions that transforms the current world state into one that satisfies the goal.
Think about what planning requires. When you plan a road trip, you consider: where you are now, where you want to be, what routes are available, how long each takes, and what constraints you have (budget, time, preferences). You mentally simulate different sequences of actions and choose the one that best achieves your goal. AI planning formalizes this process.
Understanding classical planning is valuable even when working with modern LLM-based systems because it reveals what we gain (flexibility, generality) and what we lose (guarantees, optimality) when we move from formal to natural-language planning.
STRIPS (1971)
The Stanford Research Institute Problem Solver (STRIPS), developed by Fikes and Nilsson (1971), was one of the first automated planning systems. It introduced a formal representation that remains influential:
- State: A set of logical predicates describing the world (e.g.,
on(A, B),clear(A),on_table(B)). - Actions: Defined by preconditions and effects.
- Preconditions: What must be true for the action to be applicable.
- Add effects: What becomes true after the action.
- Delete effects: What becomes false after the action.
- Goal: A set of predicates that must be true in the final state.
Action: move(Block, From, To)
Preconditions: on(Block, From), clear(Block), clear(To)
Add effects: on(Block, To), clear(From)
Delete effects: on(Block, From), clear(To)STRIPS planning works by searching through the space of possible states, applying actions whose preconditions are met, and checking whether the resulting state satisfies the goal.
PDDL (1998)
The Planning Domain Definition Language (PDDL), introduced by McDermott et al. (1998), standardized the representation of planning problems. PDDL separates the domain (what types of objects exist and what actions are possible) from the problem (what specific objects exist, what the initial state is, and what the goal is).
(define (domain logistics)
(:requirements :strips)
(:types city location package vehicle)
(:action drive
:parameters (?v - vehicle ?from - location ?to - location ?c - city)
:precondition (and (in-city ?from ?c) (in-city ?to ?c) (at ?v ?from))
:effect (and (at ?v ?to) (not (at ?v ?from)))
)
(:action load-package
:parameters (?p - package ?v - vehicle ?l - location)
:precondition (and (at ?v ?l) (at ?p ?l))
:effect (and (in ?p ?v) (not (at ?p ?l)))
)
)PDDL has been extended over the years to support durative actions (with time), numeric fluents, preferences, and probabilistic effects. It remains the standard language for the International Planning Competition (IPC).
Strengths and Limitations of Classical Planning
Strengths:
- Provably correct: if a plan is found, it is guaranteed to achieve the goal.
- Complete: if a plan exists, the planner will find it (given enough time).
- Optimal: many planners find plans that minimize some cost metric.
Limitations:
- Requires complete, formal domain descriptions (which are hard to write for real-world tasks). Try writing a PDDL domain for "plan a birthday party" and you will appreciate this challenge.
- Brittle: any discrepancy between the model and reality causes plan failure. If the restaurant you planned to eat at is closed, the entire plan breaks.
- Does not handle uncertainty, partial observability, or continuous state spaces well.
- Cannot handle open-ended, ill-defined goals ("write a good essay" is not a valid PDDL goal).
Key Insight: Classical planning gives you guarantees that LLM-based planning cannot match (correctness, completeness, optimality). But it requires you to formalize the world completely, which is impractical for most real-world tasks. LLM-based planning trades those guarantees for flexibility and generality. Understanding this trade-off is essential for choosing the right approach.
032. LLM-Based Planning: Natural Language Plans
The Paradigm Shift
LLM-based planning represents a radical departure from classical AI planning. Instead of formal state representations and action schemas, you describe the task in plain English and ask the LLM to generate a plan. No PDDL, no preconditions, no formal effects -- just a prompt.
The analogy is the difference between giving directions using formal coordinates ("Go to 40.7128N, 74.0060W") versus natural language ("Go to the corner of Broadway and Wall Street, then turn left"). Natural language is less precise but much more practical for most situations.
PLANNING_PROMPT = """You are a planning agent. Given a goal, generate a step-by-step
plan to achieve it. Each step should be:
- Specific and actionable
- Ordered logically (dependencies respected)
- Achievable with the available tools
Goal: {goal}
Available tools:
{tools}
Generate a numbered plan:"""
def generate_plan(goal: str, tools: list[str], llm_call) -> list[str]:
"""Generate a natural language plan for a goal.
Args:
goal: The goal to achieve.
tools: List of available tool descriptions.
llm_call: Function to call the LLM.
Returns:
List of plan steps as strings.
"""
tools_str = "\n".join(f"- {tool}" for tool in tools)
prompt = PLANNING_PROMPT.format(goal=goal, tools=tools_str)
response = llm_call(prompt=prompt)
# Parse numbered steps
steps = []
for line in response.strip().split("\n"):
line = line.strip()
if line and line[0].isdigit():
# Remove the number prefix
step = line.lstrip("0123456789.)")
steps.append(step.strip())
return stepsAdvantages of LLM-Based Planning
- No formal domain description needed: The LLM uses its trained knowledge to understand actions and their effects.
- Handles ambiguity: Natural language goals like "make the report better" can be interpreted and planned for.
- Flexible and general: The same planner works across domains without reconfiguration.
- Incorporates world knowledge: The LLM knows that "booking a flight requires a passport" without being told.
Limitations of LLM-Based Planning
- No correctness guarantees: The plan may contain logical errors, impossible steps, or missing dependencies.
- Hallucinated actions: The LLM may suggest actions that are not actually available.
- Poor at long-horizon planning: As plans get longer, LLMs struggle to maintain consistency and track dependencies.
- No optimality: The generated plan is rarely optimal in terms of steps, cost, or time.
The Plan-and-Execute Pattern
A practical approach combines LLM planning with step-by-step execution:
class PlanAndExecuteAgent:
"""An agent that first plans, then executes step by step.
This separates the planning phase (creative, high-level) from the
execution phase (precise, tool-using), allowing each to be optimized
independently.
"""
def __init__(self, planner_llm, executor_llm, tools: dict[str, callable]):
self.planner_llm = planner_llm
self.executor_llm = executor_llm
self.tools = tools
def plan(self, goal: str) -> list[str]:
"""Generate a plan for the goal."""
tools_desc = "\n".join(f"- {name}" for name in self.tools.keys())
prompt = PLANNING_PROMPT.format(goal=goal, tools=tools_desc)
response = self.planner_llm(prompt=prompt)
return self._parse_steps(response)
def execute_step(self, step: str, context: str) -> str:
"""Execute a single step of the plan.
Args:
step: The step to execute.
context: Results from previous steps.
Returns:
The result of executing the step.
"""
prompt = f"""Execute the following step of a plan.
Previous context:
{context}
Current step: {step}
Available tools: {list(self.tools.keys())}
Determine which tool to use (if any) and execute the step.
Respond with the result."""
response = self.executor_llm(prompt=prompt)
# Check if the executor wants to use a tool
for tool_name, tool_fn in self.tools.items():
if tool_name.lower() in response.lower():
# Extract arguments and call the tool
tool_result = tool_fn(step)
return f"Tool '{tool_name}' result: {tool_result}"
return response
def run(self, goal: str) -> dict:
"""Plan and execute a goal.
Returns:
Dict with the plan, execution results, and final outcome.
"""
# Phase 1: Plan
plan = self.plan(goal)
print(f"Plan generated with {len(plan)} steps:")
for i, step in enumerate(plan, 1):
print(f" {i}. {step}")
# Phase 2: Execute
results = []
context = ""
for i, step in enumerate(plan):
print(f"\nExecuting step {i + 1}: {step}")
result = self.execute_step(step, context)
results.append({"step": step, "result": result})
context += f"\nStep {i + 1} ({step}): {result}"
print(f" Result: {result[:200]}")
return {
"goal": goal,
"plan": plan,
"results": results,
"final_context": context,
}
def _parse_steps(self, response: str) -> list[str]:
"""Parse numbered steps from LLM response."""
steps = []
for line in response.strip().split("\n"):
line = line.strip()
if line and (line[0].isdigit() or line.startswith("-")):
step = line.lstrip("0123456789.)- ")
if step:
steps.append(step)
return steps043. Hierarchical Task Decomposition
Why Hierarchical Decomposition?
Complex tasks are difficult to plan in a single pass. This is true for humans as well as AI agents. If you ask someone to "build a web application for managing a restaurant," they do not immediately start writing code. They think at a high level first (what features do we need?), then drill down into specifics (how should the reservation system work?), and only then reach the level of individual coding tasks.
This is hierarchical decomposition: breaking a big problem into smaller problems, and those smaller problems into even smaller ones, until each piece is small enough to execute directly. It is how we manage complexity in software engineering (modules, functions), in organizations (departments, teams), and in everyday life (planning a wedding involves planning the venue, catering, invitations, etc., each of which breaks down further).
A goal like "build a web application for managing a restaurant" cannot be meaningfully decomposed into a flat list of atomic steps. It requires multiple levels of abstraction:
- High level: Design the system architecture, build the frontend, build the backend, set up the database, deploy.
- Medium level: For "build the frontend," decompose into: set up the project, create the layout, build the menu page, build the reservation system, add authentication.
- Low level: For "build the menu page," decompose into: create the data model, build the API endpoint, create the React component, add styling, write tests.
Hierarchical Task Networks (HTN)
Hierarchical Task Networks are a classical AI planning formalism that matches this intuition. In HTN planning:
- Primitive tasks are directly executable actions.
- Compound tasks must be decomposed into subtasks using methods.
- Methods define how a compound task can be decomposed, including ordering constraints between subtasks.
Compound Task: prepare_dinner
Method 1: home_cooking
Subtasks: [choose_recipe, buy_ingredients, cook, serve]
Method 2: order_delivery
Subtasks: [choose_restaurant, place_order, wait, receive]LLM-Based Hierarchical Decomposition
LLMs naturally perform hierarchical decomposition when prompted appropriately:
DECOMPOSITION_PROMPT = """Break down the following task into 3-7 subtasks.
Each subtask should be:
- More specific than the parent task
- Independent enough to be worked on separately
- Small enough to be completed in a focused work session
If a subtask is still complex, mark it with [COMPLEX] so it can be
further decomposed.
Task: {task}
Subtasks:"""
def hierarchical_decompose(
task: str, llm_call, max_depth: int = 3, current_depth: int = 0
) -> dict:
"""Recursively decompose a task into a hierarchy of subtasks.
Args:
task: The task to decompose.
llm_call: Function to call the LLM.
max_depth: Maximum decomposition depth.
current_depth: Current depth in the recursion.
Returns:
A tree structure representing the task hierarchy.
"""
if current_depth >= max_depth:
return {"task": task, "subtasks": [], "leaf": True}
response = llm_call(prompt=DECOMPOSITION_PROMPT.format(task=task))
subtasks = _parse_subtasks(response)
tree = {"task": task, "subtasks": [], "leaf": False}
for subtask_text in subtasks:
is_complex = "[COMPLEX]" in subtask_text
clean_text = subtask_text.replace("[COMPLEX]", "").strip()
if is_complex and current_depth < max_depth - 1:
# Recursively decompose complex subtasks
subtree = hierarchical_decompose(
clean_text, llm_call, max_depth, current_depth + 1
)
tree["subtasks"].append(subtree)
else:
tree["subtasks"].append({
"task": clean_text,
"subtasks": [],
"leaf": True,
})
return tree
def _parse_subtasks(response: str) -> list[str]:
"""Parse subtasks from LLM response."""
subtasks = []
for line in response.strip().split("\n"):
line = line.strip()
if line and (line[0].isdigit() or line.startswith("-")):
subtask = line.lstrip("0123456789.)- ")
if subtask:
subtasks.append(subtask)
return subtasks
def print_task_tree(tree: dict, indent: int = 0) -> None:
"""Pretty-print a task tree."""
prefix = " " * indent
marker = "[leaf]" if tree["leaf"] else "[node]"
print(f"{prefix}{marker} {tree['task']}")
for subtask in tree.get("subtasks", []):
print_task_tree(subtask, indent + 1)054. Plan Validation and Refinement
Why Plans Need Validation
LLM-generated plans are not guaranteed to be correct. Common issues include:
- Missing steps: Critical prerequisites are overlooked.
- Wrong ordering: Steps that depend on each other are in the wrong order.
- Infeasible steps: Steps that cannot be executed with available tools.
- Redundant steps: Unnecessary actions that waste resources.
- Vague steps: Steps that are too abstract to execute directly.
Plan Validation Strategies
Self-Validation
Ask the LLM to critique its own plan:
VALIDATION_PROMPT = """Review the following plan for achieving the stated goal.
Check for:
1. Missing steps: Are there any prerequisites or necessary actions not included?
2. Ordering errors: Are steps in the right order? Are dependencies respected?
3. Feasibility: Can each step be executed with the available tools?
4. Redundancy: Are there unnecessary or duplicate steps?
5. Specificity: Is each step specific enough to be actionable?
Goal: {goal}
Plan:
{plan}
Available tools: {tools}
Provide your assessment, listing any issues found and suggesting corrections."""
def validate_plan(
goal: str, plan: list[str], tools: list[str], llm_call
) -> dict:
"""Validate a plan and identify issues.
Returns:
Dict with validation results and suggested fixes.
"""
plan_str = "\n".join(f"{i+1}. {step}" for i, step in enumerate(plan))
tools_str = ", ".join(tools)
response = llm_call(
prompt=VALIDATION_PROMPT.format(goal=goal, plan=plan_str, tools=tools_str)
)
return {
"original_plan": plan,
"validation_feedback": response,
"needs_revision": any(
keyword in response.lower()
for keyword in ["missing", "error", "incorrect", "should be", "add"]
),
}Cross-Validation with a Critic Agent
Use a separate agent (or a separate LLM call with a different prompt) to review the plan. This provides a form of peer review:
CRITIC_PROMPT = """You are a plan reviewer. Your job is to find flaws in plans
before they are executed. Be thorough and skeptical.
Goal: {goal}
Plan:
{plan}
For each step, assess:
- Is this step necessary?
- Is it in the correct position?
- What could go wrong?
- Is it specific enough?
Rate the overall plan quality (1-10) and list all issues."""Simulation-Based Validation
For plans that involve code execution or API calls, validate by simulating execution in a sandbox:
def simulate_plan_execution(
plan: list[str], tools: dict[str, callable], dry_run: bool = True
) -> list[dict]:
"""Simulate plan execution to catch errors before real execution.
Args:
plan: List of plan steps.
tools: Available tools.
dry_run: If True, tools log what they would do without executing.
Returns:
List of simulation results for each step.
"""
results = []
state = {}
for i, step in enumerate(plan):
result = {
"step": i + 1,
"description": step,
"status": "unknown",
"issues": [],
}
# Check if the step references a known tool
tool_found = False
for tool_name in tools:
if tool_name.lower() in step.lower():
tool_found = True
if dry_run:
result["status"] = "simulated"
result["output"] = f"Would execute: {tool_name}"
else:
try:
output = tools[tool_name](step)
result["status"] = "success"
result["output"] = output
except Exception as e:
result["status"] = "error"
result["issues"].append(str(e))
break
if not tool_found:
result["status"] = "no_tool"
result["issues"].append("No matching tool found for this step.")
results.append(result)
return resultsPlan Refinement
Based on validation feedback, refine the plan:
REFINEMENT_PROMPT = """Revise the following plan based on the feedback provided.
Make minimal changes to fix the identified issues while preserving the parts
that are correct.
Original plan:
{plan}
Feedback:
{feedback}
Revised plan:"""
def refine_plan(
plan: list[str], feedback: str, llm_call
) -> list[str]:
"""Refine a plan based on validation feedback."""
plan_str = "\n".join(f"{i+1}. {step}" for i, step in enumerate(plan))
response = llm_call(
prompt=REFINEMENT_PROMPT.format(plan=plan_str, feedback=feedback)
)
return _parse_subtasks(response)The Plan-Validate-Refine Loop
Combining these components into an iterative improvement loop:
def iterative_planning(
goal: str, tools: list[str], llm_call, max_iterations: int = 3
) -> list[str]:
"""Generate and iteratively refine a plan.
Args:
goal: The goal to plan for.
tools: Available tools.
llm_call: Function to call the LLM.
max_iterations: Maximum refinement iterations.
Returns:
The final refined plan.
"""
# Generate initial plan
plan = generate_plan(goal, tools, llm_call)
print(f"Initial plan: {len(plan)} steps")
for iteration in range(max_iterations):
# Validate
validation = validate_plan(goal, plan, tools, llm_call)
if not validation["needs_revision"]:
print(f"Plan validated after {iteration + 1} iteration(s).")
return plan
# Refine
print(f"Iteration {iteration + 1}: Refining plan based on feedback...")
plan = refine_plan(plan, validation["validation_feedback"], llm_call)
print(f"Reached max iterations ({max_iterations}). Returning best plan.")
return plan065. Dynamic Replanning
When Plans Meet Reality
The military strategist Helmuth von Moltke famously said: "No plan survives contact with the enemy." The same is true for AI agents -- no plan survives contact with reality unchanged. The ability to adapt when things go wrong is what separates a robust agent from a fragile one.
Key Insight: Planning ability is necessary but not sufficient for agency. The real test of an agent is not whether it can make a plan, but whether it can recover when the plan fails. Dynamic replanning -- detecting failures, diagnosing causes, and generating alternative approaches -- is what makes agents robust in real-world environments.
Agents encounter:
- Tool failures: An API returns an error, a file does not exist.
- Unexpected results: A search returns no results, code produces wrong output.
- Changed conditions: New information invalidates previous assumptions.
- Blocked steps: A step cannot be executed due to missing permissions or resources.
Dynamic replanning allows agents to adapt:
class DynamicPlanExecutor:
"""Execute a plan with dynamic replanning on failure.
When a step fails, the executor asks the LLM to generate an
alternative approach or revise the remaining plan.
"""
def __init__(self, llm_call, tools: dict[str, callable]):
self.llm_call = llm_call
self.tools = tools
def execute_with_replanning(
self, goal: str, initial_plan: list[str], max_replans: int = 3
) -> dict:
"""Execute a plan, replanning when steps fail.
Args:
goal: The original goal.
initial_plan: The initial plan.
max_replans: Maximum number of times to replan.
Returns:
Execution results including any replanning events.
"""
plan = list(initial_plan)
execution_log = []
replans = 0
step_idx = 0
while step_idx < len(plan):
step = plan[step_idx]
print(f"Executing step {step_idx + 1}/{len(plan)}: {step}")
try:
result = self._execute_step(step)
execution_log.append({
"step": step,
"status": "success",
"result": result,
})
print(f" Success: {result[:100]}")
step_idx += 1
except Exception as e:
error_msg = str(e)
print(f" Failed: {error_msg}")
execution_log.append({
"step": step,
"status": "failed",
"error": error_msg,
})
if replans >= max_replans:
print(" Max replans reached. Aborting.")
break
# Replan
print(" Replanning...")
remaining_steps = plan[step_idx:]
completed_context = "\n".join(
f"- {log['step']}: {log.get('result', log.get('error', 'N/A'))[:100]}"
for log in execution_log
)
new_plan = self._replan(
goal, completed_context, remaining_steps, error_msg
)
if new_plan:
plan = plan[:step_idx] + new_plan
replans += 1
print(f" New plan ({len(new_plan)} steps remaining)")
else:
print(" Replanning failed. Skipping step.")
step_idx += 1
return {
"goal": goal,
"execution_log": execution_log,
"replans": replans,
"completed": step_idx >= len(plan),
}
def _execute_step(self, step: str) -> str:
"""Execute a single step using available tools."""
for tool_name, tool_fn in self.tools.items():
if tool_name.lower() in step.lower():
return tool_fn(step)
# If no tool matches, use the LLM to reason about the step
return self.llm_call(prompt=f"Execute this step and report the result: {step}")
def _replan(
self, goal: str, completed: str, failed_steps: list[str], error: str
) -> list[str] | None:
"""Generate a new plan given a failure.
Args:
goal: The original goal.
completed: Summary of completed steps.
failed_steps: The steps that failed or remain.
error: The error that triggered replanning.
Returns:
A new list of steps, or None if replanning fails.
"""
prompt = f"""A plan step failed during execution. Generate a revised plan
to achieve the goal, taking into account what has already been completed
and the error encountered.
Goal: {goal}
Completed steps:
{completed}
Failed step: {failed_steps[0] if failed_steps else 'N/A'}
Error: {error}
Remaining original steps:
{chr(10).join(f'- {s}' for s in failed_steps[1:])}
Generate a revised plan for the remaining work (avoid repeating completed steps):"""
response = self.llm_call(prompt=prompt)
steps = _parse_subtasks(response)
return steps if steps else NoneReplanning Strategies
Different situations call for different replanning strategies:
| Failure Type | Strategy | Example |
|---|---|---|
| Tool temporarily unavailable | Retry with backoff | API rate limit hit |
| Tool returns error | Alternative tool | Switch from one search engine to another |
| Step produces wrong result | Modify step | Adjust query parameters |
| Step is fundamentally infeasible | Alternative approach | Rewrite the plan to use a different method |
| Goal itself is unclear | Clarification | Ask the user for more specifics |
076. Task Graphs and Dependency Management
Beyond Linear Plans
Many real-world tasks are not purely sequential. Consider a software project: frontend and backend development can happen in parallel, but integration testing depends on both being complete. A database schema must be designed before the backend can implement queries, but the frontend can be built in parallel with the schema design.
A linear plan (step 1, step 2, step 3...) cannot express these relationships. A task graph -- a directed acyclic graph (DAG) where nodes are tasks and edges are dependencies -- captures them naturally. This is the same concept used by build systems (Make, Gradle), workflow orchestrators (Airflow, Prefect), and CI/CD pipelines.
Try It Yourself: Think about cooking a complete dinner with multiple dishes. Which tasks can be done in parallel (e.g., boiling pasta while grilling vegetables)? Which tasks have dependencies (e.g., cannot plate the dish before cooking it)? Draw a task graph for a three-course dinner. You will find that a good task graph significantly reduces total cooking time compared to a sequential approach.
from dataclasses import dataclass, field
from enum import Enum
class TaskStatus(Enum):
PENDING = "pending"
READY = "ready" # All dependencies met
RUNNING = "running"
COMPLETED = "completed"
FAILED = "failed"
SKIPPED = "skipped"
@dataclass
class TaskNode:
"""A node in the task graph."""
id: str
description: str
dependencies: list[str] = field(default_factory=list) # IDs of prerequisite tasks
status: TaskStatus = TaskStatus.PENDING
result: str | None = None
error: str | None = None
def is_ready(self, completed_tasks: set[str]) -> bool:
"""Check if all dependencies are satisfied."""
return all(dep in completed_tasks for dep in self.dependencies)
class TaskGraph:
"""A directed acyclic graph (DAG) of tasks with dependency management.
Supports parallel execution of independent tasks and respects
ordering constraints.
"""
def __init__(self):
self.nodes: dict[str, TaskNode] = {}
def add_task(
self, task_id: str, description: str, dependencies: list[str] | None = None
) -> None:
"""Add a task to the graph."""
deps = dependencies or []
# Validate dependencies exist
for dep in deps:
if dep not in self.nodes:
raise ValueError(f"Dependency '{dep}' not found. Add it first.")
self.nodes[task_id] = TaskNode(
id=task_id, description=description, dependencies=deps
)
def get_ready_tasks(self) -> list[TaskNode]:
"""Get all tasks that are ready to execute (dependencies satisfied)."""
completed = {
node_id for node_id, node in self.nodes.items()
if node.status == TaskStatus.COMPLETED
}
ready = []
for node in self.nodes.values():
if node.status == TaskStatus.PENDING and node.is_ready(completed):
node.status = TaskStatus.READY
ready.append(node)
return ready
def mark_completed(self, task_id: str, result: str) -> None:
"""Mark a task as completed with its result."""
self.nodes[task_id].status = TaskStatus.COMPLETED
self.nodes[task_id].result = result
def mark_failed(self, task_id: str, error: str) -> None:
"""Mark a task as failed."""
self.nodes[task_id].status = TaskStatus.FAILED
self.nodes[task_id].error = error
# Also skip dependent tasks
self._propagate_failure(task_id)
def _propagate_failure(self, failed_id: str) -> None:
"""Skip all tasks that depend on a failed task."""
for node in self.nodes.values():
if failed_id in node.dependencies and node.status == TaskStatus.PENDING:
node.status = TaskStatus.SKIPPED
node.error = f"Skipped because dependency '{failed_id}' failed."
self._propagate_failure(node.id)
def is_complete(self) -> bool:
"""Check if all tasks are completed (or skipped/failed)."""
return all(
node.status in (TaskStatus.COMPLETED, TaskStatus.FAILED, TaskStatus.SKIPPED)
for node in self.nodes.values()
)
def get_execution_order(self) -> list[list[str]]:
"""Get the topological execution order as batches of parallel tasks.
Returns:
List of batches, where each batch contains tasks that can run in parallel.
"""
completed: set[str] = set()
remaining = set(self.nodes.keys())
batches = []
while remaining:
batch = []
for task_id in remaining:
node = self.nodes[task_id]
if all(dep in completed for dep in node.dependencies):
batch.append(task_id)
if not batch:
raise ValueError("Circular dependency detected!")
batches.append(batch)
completed.update(batch)
remaining -= set(batch)
return batches
def visualize(self) -> str:
"""Create a text visualization of the task graph."""
lines = ["Task Graph:"]
batches = self.get_execution_order()
for i, batch in enumerate(batches):
lines.append(f"\n Batch {i + 1} (parallel):")
for task_id in batch:
node = self.nodes[task_id]
deps = f" (depends on: {', '.join(node.dependencies)})" if node.dependencies else ""
status = f" [{node.status.value}]"
lines.append(f" - {task_id}: {node.description}{deps}{status}")
return "\n".join(lines)
# Example usage
def build_example_graph() -> TaskGraph:
"""Build an example task graph for a data analysis project."""
graph = TaskGraph()
# Independent tasks (can run in parallel)
graph.add_task("fetch_data", "Download dataset from API")
graph.add_task("setup_env", "Set up Python virtual environment and install dependencies")
# Depends on fetch_data
graph.add_task("clean_data", "Clean and preprocess the raw data", ["fetch_data"])
graph.add_task("explore_data", "Perform exploratory data analysis", ["fetch_data"])
# Depends on clean_data and setup_env
graph.add_task("train_model", "Train the ML model", ["clean_data", "setup_env"])
# Depends on train_model and explore_data
graph.add_task("evaluate", "Evaluate model on test set", ["train_model"])
graph.add_task("generate_report", "Generate analysis report", ["evaluate", "explore_data"])
return graphGenerating Task Graphs from Natural Language
An LLM can generate task graphs by identifying both tasks and their dependencies:
TASK_GRAPH_PROMPT = """Analyze the following goal and generate a task graph.
For each task, identify:
1. A unique short ID
2. A description
3. Which other tasks it depends on (by ID)
Output format (one task per line):
ID | Description | Dependencies (comma-separated IDs, or "none")
Goal: {goal}
Tasks:"""
def generate_task_graph(goal: str, llm_call) -> TaskGraph:
"""Generate a task graph from a natural language goal."""
response = llm_call(prompt=TASK_GRAPH_PROMPT.format(goal=goal))
graph = TaskGraph()
tasks_to_add = []
# Parse the response
for line in response.strip().split("\n"):
parts = [p.strip() for p in line.split("|")]
if len(parts) >= 3:
task_id = parts[0].lower().replace(" ", "_")
description = parts[1]
deps_str = parts[2]
deps = (
[] if deps_str.lower() == "none"
else [d.strip().lower().replace(" ", "_") for d in deps_str.split(",")]
)
tasks_to_add.append((task_id, description, deps))
# Add tasks in dependency order
added = set()
max_passes = len(tasks_to_add) * 2
passes = 0
while tasks_to_add and passes < max_passes:
remaining = []
for task_id, description, deps in tasks_to_add:
if all(d in added for d in deps):
graph.add_task(task_id, description, deps)
added.add(task_id)
else:
remaining.append((task_id, description, deps))
tasks_to_add = remaining
passes += 1
return graph087. Decomposition Strategies
Top-Down Decomposition
Start with the high-level goal and recursively break it down into smaller subtasks. This is the most natural approach for LLMs and mirrors how humans plan complex projects.
Advantages: Maintains coherence with the overall goal; ensures all parts serve the main objective. Disadvantages: May miss bottom-up insights; early decomposition choices constrain later options.
Iterative Refinement
Start with a rough plan and progressively refine each step:
def iterative_refinement(
goal: str, llm_call, refinement_rounds: int = 3
) -> list[str]:
"""Iteratively refine a plan through multiple passes.
Round 1: Generate a high-level outline (3-5 major phases)
Round 2: Expand each phase into specific steps
Round 3: Add details, error handling, and validation steps
"""
# Round 1: High-level outline
outline = llm_call(
prompt=f"Create a high-level outline (3-5 phases) for: {goal}"
)
# Round 2: Expand each phase
expanded = llm_call(
prompt=(
f"Expand each phase of this outline into 2-4 specific, actionable steps.\n\n"
f"Outline:\n{outline}\n\nExpanded plan:"
)
)
# Round 3: Add details
detailed = llm_call(
prompt=(
f"Review this plan and add any missing steps, error handling, "
f"and validation checkpoints.\n\n"
f"Plan:\n{expanded}\n\nDetailed plan:"
)
)
return _parse_subtasks(detailed)Least-to-Most Prompting
Introduced by Zhou et al. (2023), this approach decomposes a complex problem into a series of simpler subproblems, solving them from the simplest to the most complex. Each solution builds on the previous ones.
Complex problem: "Calculate the total cost of a round trip from Madrid to Tokyo,
including flights, hotels for 5 nights, and daily meals."
Decomposition:
1. What is the average cost of a round-trip flight from Madrid to Tokyo?
2. What is the average cost of a hotel in Tokyo per night?
3. What is the average daily meal budget in Tokyo?
4. Given answers to 1-3, what is the total cost?098. Planning with Uncertainty and Incomplete Information
The Challenge
Real-world planning often involves:
- Uncertain outcomes: Actions may succeed or fail probabilistically.
- Incomplete information: The agent does not know everything about the current state.
- Dynamic environments: The world changes while the agent is executing its plan.
Contingency Planning
Generate plans with explicit branches for different outcomes:
CONTINGENCY_PROMPT = """Generate a plan for the following goal, including
contingency branches for steps that might fail.
For each step, specify:
- The primary action
- What could go wrong
- The fallback action if it fails
Goal: {goal}
Plan with contingencies:"""
@dataclass
class ContingencyStep:
"""A plan step with contingency handling."""
primary_action: str
potential_failure: str
fallback_action: str
def to_text(self) -> str:
return (
f"DO: {self.primary_action}\n"
f" IF FAILS ({self.potential_failure}): {self.fallback_action}"
)Belief-Space Planning
Rather than planning over world states, plan over belief states (probability distributions over possible states). This is relevant for agents operating with partial observability:
Belief: "The file is probably in /data/ (80%) or /backup/ (20%)"
Plan:
1. Check /data/ for the file.
2. IF found: proceed with processing.
3. IF not found: check /backup/.
4. IF still not found: ask the user for the file location.109. Comparison: Tree of Thoughts and LATS
The methods we have discussed so far (plan-and-execute, hierarchical decomposition, replanning) all generate a single plan and then try to execute it. But what if the first plan is not the best? What if there are multiple promising approaches and we want to explore several before committing?
Tree of Thoughts and Language Agent Tree Search take a different approach: they systematically explore multiple planning paths, evaluate each, and choose the best one. This trades computational cost for plan quality.
Tree of Thoughts (ToT)
Tree of Thoughts (Yao et al., 2023) extends chain-of-thought prompting by exploring multiple reasoning paths simultaneously. Instead of committing to one chain of reasoning, the system generates several candidates at each step and evaluates which are most promising -- much like a chess player considering several possible moves before choosing one. Instead of a single linear chain of thoughts, the LLM generates multiple candidate "thoughts" at each step and evaluates them to decide which paths to pursue.
Interactive · Tree of Thoughts Search with Pruning
Prompting lab
Switch strategies, watch the output
The same problem under four different prompting strategies. The difference is easier to see than to describe — click and compare.
Task
Strategy
System
Think step by step before answering.
User
A shirt costs 24 €. With a 25% discount applied, what do I pay?
Let's think step by step. 1. Discount = 24 × 0.25 = 6 2. Final price = 24 − 6 = 18 Answer: 18 €
Key components:
- Thought generation: The LLM proposes multiple candidate next steps.
- State evaluation: The LLM (or a heuristic) evaluates how promising each partial solution is.
- Search algorithm: BFS or DFS with pruning based on evaluations.
def tree_of_thoughts(
problem: str,
llm_call,
num_thoughts: int = 3,
max_depth: int = 4,
beam_width: int = 2,
) -> str:
"""Simplified Tree of Thoughts implementation.
Uses beam search over multiple reasoning paths.
Args:
problem: The problem to solve.
llm_call: Function to call the LLM.
num_thoughts: Number of candidate thoughts to generate at each step.
max_depth: Maximum depth of the thought tree.
beam_width: Number of paths to keep at each step.
Returns:
The best solution found.
"""
# Initialize with the problem
beams = [{"path": [], "text": problem, "score": 0.0}]
for depth in range(max_depth):
all_candidates = []
for beam in beams:
# Generate candidate next thoughts
thoughts = llm_call(
prompt=(
f"Problem: {problem}\n\n"
f"Progress so far:\n{beam['text']}\n\n"
f"Generate {num_thoughts} different possible next steps. "
f"Number each option."
)
)
# Parse and evaluate each thought
for thought in _parse_thoughts(thoughts):
# Evaluate this partial solution
evaluation = llm_call(
prompt=(
f"Rate how promising this partial solution is (1-10):\n\n"
f"Problem: {problem}\n"
f"Progress: {beam['text']}\n"
f"Next step: {thought}\n\n"
f"Score (just the number):"
)
)
try:
score = float(evaluation.strip().split()[0])
except (ValueError, IndexError):
score = 5.0
all_candidates.append({
"path": beam["path"] + [thought],
"text": f"{beam['text']}\n{thought}",
"score": score,
})
# Keep top beam_width candidates
all_candidates.sort(key=lambda x: x["score"], reverse=True)
beams = all_candidates[:beam_width]
# Return the best path
return beams[0]["text"] if beams else "No solution found."
def _parse_thoughts(response: str) -> list[str]:
"""Parse numbered thoughts from LLM response."""
thoughts = []
for line in response.strip().split("\n"):
line = line.strip()
if line and line[0].isdigit():
thought = line.lstrip("0123456789.)- ").strip()
if thought:
thoughts.append(thought)
return thoughtsLanguage Agent Tree Search (LATS)
LATS (Zhou et al., 2024) combines LLM-based planning with Monte Carlo Tree Search (MCTS). It treats the agent's decision process as a tree search problem:
- Selection: Navigate the tree using UCB1 (Upper Confidence Bound) to balance exploration and exploitation.
- Expansion: Use the LLM to generate new action candidates.
- Simulation: Use the LLM to simulate the outcome of each action.
- Backpropagation: Update the scores of visited nodes based on the simulation results.
- Reflection: When a trajectory fails, generate a reflection that informs future attempts.
LATS is particularly powerful because it:
- Systematically explores the action space (unlike greedy approaches).
- Learns from failures through reflection.
- Balances trying new approaches with exploiting known good paths.
Comparison
| Feature | Chain of Thought | Tree of Thoughts | LATS |
|---|---|---|---|
| Exploration | Single path | Multiple paths | Systematic tree search |
| Evaluation | None (implicit) | LLM evaluation at each step | MCTS + LLM evaluation |
| Backtracking | No | Limited (beam search) | Full (MCTS backpropagation) |
| Learning from failure | No | No | Yes (reflection) |
| Cost | Low (1 LLM call) | Medium (many evaluations) | High (many simulations) |
| Best for | Simple reasoning | Creative/divergent tasks | Complex multi-step tasks |
1110. Practical Example: A Task Decomposition Agent
Let us build a complete task decomposition agent that breaks complex goals into manageable subtasks, creates a task graph, and executes it.
"""
A task decomposition agent that:
1. Takes a complex goal
2. Decomposes it into subtasks with dependencies
3. Builds a task graph
4. Executes tasks in dependency order
5. Replans on failure
Requirements:
pip install openai (or any LLM client)
"""
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class SubTask:
"""A subtask produced by decomposition."""
id: str
description: str
dependencies: list[str] = field(default_factory=list)
estimated_complexity: str = "medium" # low, medium, high
status: str = "pending"
result: str | None = None
DECOMPOSE_SYSTEM = """You are a task decomposition expert. Given a complex goal,
break it down into concrete, actionable subtasks.
Rules:
1. Each subtask should be completable in a single focused step.
2. Identify dependencies between subtasks explicitly.
3. Rate each subtask's complexity (low/medium/high).
4. Aim for 4-8 subtasks (not too few, not too many).
5. Make subtasks specific enough to execute without further clarification.
Output format (one task per line):
TASK_ID | Description | Dependencies (comma-separated IDs or "none") | Complexity
"""
EXECUTE_SYSTEM = """You are a task execution agent. Given a subtask and context
from previously completed subtasks, execute the task and report the result.
Be specific and thorough in your execution. If you produce any output
(text, code, analysis), include it in your response.
If you cannot complete the task, explain why specifically."""
class TaskDecompositionAgent:
"""An agent that decomposes and executes complex tasks."""
def __init__(self, llm_call):
self.llm_call = llm_call
self.subtasks: dict[str, SubTask] = {}
self.execution_log: list[dict] = []
def decompose(self, goal: str) -> list[SubTask]:
"""Decompose a goal into subtasks.
Args:
goal: The complex goal to decompose.
Returns:
List of SubTask objects.
"""
response = self.llm_call(
system=DECOMPOSE_SYSTEM,
prompt=f"Decompose this goal into subtasks:\n\n{goal}",
)
subtasks = self._parse_decomposition(response)
for task in subtasks:
self.subtasks[task.id] = task
return subtasks
def execute(self, goal: str, max_replans: int = 2) -> dict:
"""Decompose and execute a goal.
Args:
goal: The goal to achieve.
max_replans: Maximum replanning attempts on failure.
Returns:
Execution report.
"""
# Step 1: Decompose
print(f"Goal: {goal}")
print(f"{'='*60}")
subtasks = self.decompose(goal)
print(f"\nDecomposed into {len(subtasks)} subtasks:")
for task in subtasks:
deps = f" (after: {', '.join(task.dependencies)})" if task.dependencies else ""
print(f" [{task.id}] {task.description}{deps} [{task.estimated_complexity}]")
# Step 2: Execute in dependency order
print(f"\n{'='*60}")
print("Executing...\n")
completed = set()
replans = 0
while True:
# Find ready tasks
ready = [
task for task in self.subtasks.values()
if task.status == "pending"
and all(dep in completed for dep in task.dependencies)
]
if not ready:
if all(t.status in ("completed", "failed", "skipped") for t in self.subtasks.values()):
break
else:
print("Deadlock: no tasks are ready but some are still pending.")
break
for task in ready:
print(f"Executing [{task.id}]: {task.description}")
# Build context from completed tasks
context = self._build_context(completed)
try:
result = self._execute_subtask(task, context)
task.status = "completed"
task.result = result
completed.add(task.id)
self.execution_log.append({
"task_id": task.id,
"status": "completed",
"result": result[:200],
"timestamp": datetime.now().isoformat(),
})
print(f" Completed: {result[:150]}...")
except Exception as e:
error = str(e)
print(f" Failed: {error}")
task.status = "failed"
self.execution_log.append({
"task_id": task.id,
"status": "failed",
"error": error,
"timestamp": datetime.now().isoformat(),
})
# Replan if possible
if replans < max_replans:
print(" Attempting to replan...")
new_task = self._replan_task(task, error, context)
if new_task:
self.subtasks[new_task.id] = new_task
replans += 1
print(f" Created replacement task [{new_task.id}]: {new_task.description}")
# Step 3: Generate report
report = self._generate_report(goal)
return report
def _execute_subtask(self, task: SubTask, context: str) -> str:
"""Execute a single subtask."""
prompt = (
f"Execute the following subtask:\n\n"
f"Task: {task.description}\n\n"
f"Context from previous steps:\n{context}\n\n"
f"Execute the task and provide the result:"
)
return self.llm_call(system=EXECUTE_SYSTEM, prompt=prompt)
def _build_context(self, completed_ids: set[str]) -> str:
"""Build context string from completed tasks."""
if not completed_ids:
return "(No previous steps completed yet.)"
lines = []
for task_id in completed_ids:
task = self.subtasks[task_id]
result_preview = task.result[:300] if task.result else "N/A"
lines.append(f"[{task_id}] {task.description}\n Result: {result_preview}")
return "\n".join(lines)
def _replan_task(self, failed_task: SubTask, error: str, context: str) -> SubTask | None:
"""Generate a replacement task for a failed one."""
prompt = (
f"The following task failed:\n"
f"Task: {failed_task.description}\n"
f"Error: {error}\n\n"
f"Context:\n{context}\n\n"
f"Generate an alternative approach to accomplish the same objective. "
f"Respond with a single task description."
)
response = self.llm_call(prompt=prompt)
if response.strip():
new_id = f"{failed_task.id}_retry"
return SubTask(
id=new_id,
description=response.strip(),
dependencies=failed_task.dependencies,
estimated_complexity=failed_task.estimated_complexity,
)
return None
def _parse_decomposition(self, response: str) -> list[SubTask]:
"""Parse the LLM's decomposition into SubTask objects."""
subtasks = []
for line in response.strip().split("\n"):
parts = [p.strip() for p in line.split("|")]
if len(parts) >= 3:
task_id = parts[0].lower().replace(" ", "_")
description = parts[1]
deps_str = parts[2]
deps = (
[] if deps_str.lower() == "none"
else [d.strip().lower().replace(" ", "_") for d in deps_str.split(",")]
)
complexity = parts[3].strip().lower() if len(parts) > 3 else "medium"
subtasks.append(SubTask(
id=task_id,
description=description,
dependencies=deps,
estimated_complexity=complexity,
))
return subtasks
def _generate_report(self, goal: str) -> dict:
"""Generate a final execution report."""
completed = [t for t in self.subtasks.values() if t.status == "completed"]
failed = [t for t in self.subtasks.values() if t.status == "failed"]
skipped = [t for t in self.subtasks.values() if t.status == "skipped"]
return {
"goal": goal,
"total_subtasks": len(self.subtasks),
"completed": len(completed),
"failed": len(failed),
"skipped": len(skipped),
"success_rate": len(completed) / max(len(self.subtasks), 1),
"execution_log": self.execution_log,
"subtask_results": {
task_id: {
"description": task.description,
"status": task.status,
"result_preview": (task.result[:200] if task.result else None),
}
for task_id, task in self.subtasks.items()
},
}
# ── Usage Example ─────────────────────────────────────────────────
def main():
"""Demonstrate the task decomposition agent."""
def mock_llm_call(system: str = "", prompt: str = "") -> str:
"""Simulate LLM calls for demonstration."""
if "Decompose" in prompt:
return """t1 | Research existing sentiment analysis approaches | none | low
t2 | Select and download a suitable dataset | none | low
t3 | Set up Python environment with required libraries | none | low
t4 | Implement data preprocessing pipeline | t2, t3 | medium
t5 | Train a baseline sentiment classifier | t4 | medium
t6 | Evaluate model and analyze errors | t5 | medium
t7 | Write up results and conclusions | t6, t1 | low"""
elif "Execute" in prompt or "Execute" in system:
if "research" in prompt.lower():
return (
"Surveyed recent sentiment analysis approaches. Key findings: "
"BERT-based models achieve ~95% accuracy on SST-2. DistilBERT "
"offers a good speed/accuracy trade-off. Recent work explores "
"aspect-based sentiment analysis for finer-grained results."
)
elif "dataset" in prompt.lower():
return (
"Selected the Stanford Sentiment Treebank (SST-2) dataset. "
"Downloaded 67,349 training samples and 872 test samples. "
"Binary sentiment labels (positive/negative)."
)
elif "environment" in prompt.lower():
return (
"Created virtual environment. Installed: torch 2.1, "
"transformers 4.36, datasets 2.16, scikit-learn 1.3."
)
elif "preprocessing" in prompt.lower():
return (
"Implemented preprocessing: tokenization with BertTokenizer, "
"max_length=128, padding and truncation. Created DataLoader "
"with batch_size=32."
)
elif "train" in prompt.lower():
return (
"Trained DistilBERT for 3 epochs. Training loss decreased from "
"0.42 to 0.12. Validation accuracy: 91.2%."
)
elif "evaluate" in prompt.lower():
return (
"Test accuracy: 90.8%. Precision: 0.91, Recall: 0.90, F1: 0.905. "
"Error analysis: model struggles with sarcasm and negation."
)
elif "write" in prompt.lower():
return (
"Report completed. Key findings: DistilBERT achieves 90.8% accuracy "
"on SST-2 with 3 epochs of fine-tuning. Main failure modes are "
"sarcasm detection and complex negation patterns."
)
return "Task completed successfully."
# Create and run the agent
agent = TaskDecompositionAgent(mock_llm_call)
report = agent.execute(
"Build a sentiment analysis model: research approaches, train a model, "
"evaluate it, and write up the results."
)
print(f"\n{'='*60}")
print("EXECUTION REPORT")
print(f"{'='*60}")
print(f"Goal: {report['goal']}")
print(f"Subtasks: {report['total_subtasks']}")
print(f"Completed: {report['completed']}")
print(f"Failed: {report['failed']}")
print(f"Success rate: {report['success_rate']:.0%}")
if __name__ == "__main__":
main()12Discussion Questions
-
Planning horizon: How far ahead should an agent plan? Is it better to plan the entire task upfront or plan incrementally (one or two steps at a time)? What factors influence this decision? Hint: consider the trade-off between coherence (long-horizon planning keeps the big picture in view) and adaptability (short-horizon planning can react to unexpected results). Think about how certainty degrades the further you look ahead.
-
Plan faithfulness: When an LLM generates a plan and then executes it, should it follow the plan exactly or adapt freely? What is the right balance between plan adherence and opportunistic adaptation? Hint: this is analogous to the explore-exploit trade-off. Too much adherence leads to rigidity; too much adaptation leads to losing the thread of the original plan.
-
Planning for humans vs. planning for agents: How should plans generated for agent execution differ from plans generated for human execution? Consider granularity, error handling, and parallelism. Hint: agents need more explicit error handling ("if step 3 fails, try alternative approach X") because they cannot use common sense to recover from unexpected situations.
-
The planning bottleneck: Planning is often the most error-prone part of an agent pipeline. How might we improve plan quality beyond what current LLMs can produce? Hint: consider plan verification (have another LLM check the plan), plan libraries (reuse plans that worked before), and hybrid approaches (use classical planners for the parts that can be formalized).
-
Classical meets modern: Could classical planning algorithms (STRIPS, HTN) be effectively combined with LLM-based approaches? What would that look like? Hint: the LLM could generate a PDDL domain description from natural language, or could fill in the details of an HTN plan skeleton.
-
Planning under resource constraints: How should an agent plan differently when it has a limited budget (e.g., maximum 10 API calls, maximum $1 in costs)? Hint: consider how humans plan when they have limited time: they focus on the most important steps, skip nice-to-haves, and build in less margin for error. How could an agent do the same?
13Summary and Key Takeaways
-
Planning is fundamental to agency. Without the ability to decompose goals into steps, reason about dependencies, and adapt to failures, agents are limited to reactive, single-step interactions.
-
Classical planning provides formal guarantees (correctness, completeness, optimality) but requires formal domain specifications that are impractical for many real-world tasks.
-
LLM-based planning is flexible and general but offers no formal guarantees. Plans may contain errors, miss steps, or hallucinate impossible actions.
-
Hierarchical decomposition manages complexity by breaking large goals into progressively smaller subtasks, each at an appropriate level of abstraction.
-
Plan validation and refinement are essential. Self-validation, cross-validation with critic agents, and simulation-based testing all improve plan quality before execution.
-
Dynamic replanning enables robustness. Agents that can detect failures and adapt their plans are more capable than agents that follow rigid plans.
-
Task graphs with dependency management enable parallel execution and provide clear structure for complex multi-step tasks.
-
Advanced approaches like ToT and LATS systematically explore the planning space, trading computational cost for better solutions. LATS's use of reflection is particularly promising.
14References
-
Fikes, R. E., & Nilsson, N. J. (1971). STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving. Artificial Intelligence, 2(3-4), 189-208.
-
McDermott, D., Ghallab, M., Howe, A., Knoblock, C., Ram, A., Veloso, M., Weld, D., & Wilkins, D. (1998). PDDL -- The Planning Domain Definition Language. Technical Report CVC TR-98-003, Yale Center for Computational Vision and Control.
-
Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y., & Narasimhan, K. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. Advances in Neural Information Processing Systems (NeurIPS), 36.
-
Zhou, A., Yan, K., Shlapentokh-Rothman, M., Wang, H., & Wang, Y.-X. (2024). Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models. International Conference on Machine Learning (ICML).
-
Zhou, D., Scharli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2023). Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. International Conference on Learning Representations (ICLR).
-
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems (NeurIPS), 35.
-
Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu, Y., Fan, L., & Anandkumar, A. (2023). Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv preprint arXiv:2305.16291.
-
Huang, W., Abbeel, P., Pathak, D., & Mordatch, I. (2022). Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. International Conference on Machine Learning (ICML).
-
Valmeekam, K., Marquez, M., Sreedharan, S., & Kambhampati, S. (2024). On the Planning Abilities of Large Language Models -- A Critical Investigation. Advances in Neural Information Processing Systems (NeurIPS), 36.
Part of "Agentic AI: Foundations, Architectures, and Applications" (CC BY-SA 4.0).