SystemsW0823 min read

Multi-Agent Systems

From single agent to teams: division of labour, role specialisation, peer review. Communication protocols (message passing, shared memory, blackboard). Orchestration patterns. Modern standards: Agent-to-Agent (A2A) and Agent Communication Protocol (ACP).

Core conceptsRole specialisationOrchestrationA2A / ACP

01Learning Objectives

By the end of this lecture, students will be able to:

Explain the motivation for multi-agent systems and identify scenarios where multiple agents outperform a single agent.
Compare classical multi-agent systems (from distributed AI) with modern LLM-based multi-agent systems.
Describe and implement key communication protocols: message passing, shared state, and blackboard architectures.
Design multi-agent systems using collaboration patterns: sequential, parallel, hierarchical, and debate-based.
Analyze the AutoGen framework and its approach to multi-agent conversation.
Implement orchestration patterns including supervisor, round-robin, and dynamic routing.
Identify and mitigate challenges such as coordination overhead, error propagation, and achieving consensus.
Compare modern agent communication protocols (A2A, ACP, MCP) and explain how they enable interoperability.
Evaluate modern multi-agent frameworks (OpenAI Agents SDK, Claude Agent SDK, LangGraph, CrewAI, AutoGen) and select appropriate ones for different use cases.
Build a simple two-agent debate system in Python.

021. From Single Agents to Multi-Agent Architectures

Why Multiple Agents?

Consider how human organizations work. No company assigns a single person to design, build, test, market, and sell a product. Instead, they form teams of specialists who collaborate, review each other's work, and bring different perspectives to the problem. Multi-agent AI systems apply the same principle: instead of asking one agent to do everything, we create teams of specialized agents.

A single LLM-based agent can be remarkably capable, but it has inherent limitations:

Context window saturation: Complex tasks require extensive context (tool descriptions, documents, conversation history). A single agent's context window can become overloaded.
Role confusion: When one agent is asked to be a planner, coder, reviewer, and tester simultaneously, it may struggle to maintain consistent behavior across roles.
Lack of verification: A single agent has no one to check its work. It may confidently produce incorrect results without any mechanism for self-correction.
Scalability: Some tasks are naturally parallelizable. A single agent processes steps sequentially.
Specialization: Different tasks may benefit from different models, prompts, or tool sets.

Multi-agent systems address these limitations by distributing work across specialized agents that communicate and collaborate. The key insight is that division of labor and peer review can improve both the quality and efficiency of agent outputs.

Key Insight: The value of multi-agent systems comes not just from parallelism or specialization, but from verification. When one agent reviews another's work, it catches errors that a single agent working alone would miss. This is the same principle behind code review, peer review in science, and checks and balances in government.

When to Use Multi-Agent Systems

Multi-agent systems are not always the right choice. They add complexity, latency, and cost. Use them when:

Use Multi-Agent When	Stick to Single Agent When
Task requires multiple distinct skills	Task is focused and well-defined
Quality benefits from review/debate	Speed is more important than thoroughness
Task is naturally parallelizable	Sequential processing is fine
Different steps need different tools/models	One model handles everything well
Error detection is critical	Errors are easily detected by the user

A Brief History

The concept of multi-agent systems predates LLMs by decades -- it is one of the oldest ideas in AI. The field of Distributed Artificial Intelligence (DAI) emerged in the 1980s, studying how multiple autonomous agents could coordinate to solve problems. What is new is not the idea of multi-agent collaboration, but the use of LLMs as the "brains" of each agent. Key milestones:

1980s: DAI research on cooperative problem solving (Bond & Gasser, 1988).
1990s: Foundation for Intelligent Physical Agents (FIPA) standards for agent communication.
2000s: Multi-agent systems in robotics, simulation, and game theory.
2023-present: LLM-powered multi-agent systems (AutoGen, CrewAI, LangGraph, MetaGPT).

032. Classical Multi-Agent Systems vs. LLM-Based Systems

Classical Multi-Agent Systems

In classical multi-agent systems, agents are typically:

Rule-based or algorithmic: They follow predefined protocols and strategies.
Formally specified: Communication protocols, negotiation mechanisms, and coordination strategies are mathematically defined.
Domain-specific: Designed for specific applications (traffic control, resource allocation, robotic coordination).
Predictable: Given the same inputs and state, they produce the same outputs.

Classical frameworks include the Belief-Desire-Intention (BDI) model:

text

Agent = (Beliefs, Desires, Intentions)

Beliefs:    What the agent knows about the world
Desires:    What the agent wants to achieve (goals)
Intentions: What the agent has committed to doing (plans)

The BDI model provides a principled way to reason about agent behavior, but it requires explicit formalization of all knowledge and goals.

LLM-Based Multi-Agent Systems

LLM-based multi-agent systems differ fundamentally from classical MAS. The most striking difference is that agents communicate in natural language rather than formal protocols. This makes them much easier to build (you define agents through prompts, not code) but much harder to verify (natural language is ambiguous and non-deterministic).

Key characteristics:

Natural language communication: Agents communicate in natural language rather than formal protocols.
Flexible reasoning: LLMs can handle ambiguity, unexpected situations, and novel tasks without explicit programming.
Emergent behavior: Complex behaviors can emerge from simple interaction rules between LLM agents.
Non-deterministic: The same inputs may produce different outputs due to LLM sampling.
Prompt-defined roles: Agent behavior is shaped by system prompts rather than code.

python

# Classical agent: behavior defined in code
class ClassicalAgent:
    def decide(self, percepts: dict) -> str:
        if percepts["temperature"] > 30:
            return "turn_on_cooling"
        elif percepts["temperature"] < 18:
            return "turn_on_heating"
        return "do_nothing"

# LLM agent: behavior defined by prompt
LLM_AGENT_PROMPT = """You are a building climate control agent.
Given sensor readings, decide which action to take.
Consider energy efficiency, occupant comfort, and weather forecasts.
Explain your reasoning before stating your action."""

Comparison Table

Dimension	Classical MAS	LLM-Based MAS
Communication	Formal protocols (FIPA ACL)	Natural language
Reasoning	Logic, planning algorithms	LLM inference
Flexibility	Low (predefined rules)	High (general purpose)
Predictability	High (deterministic)	Low (stochastic)
Scalability	Well-studied	Emerging research
Cost per interaction	Low (computation)	High (LLM API calls)
Verification	Formal methods applicable	Difficult to verify

043. Communication Protocols

How agents communicate is one of the most important architectural decisions in a multi-agent system. The communication protocol determines what information flows between agents, how conflicts are resolved, and how the system scales. Think of it as choosing between email (message passing), a shared whiteboard (shared state), or a structured meeting (blackboard).

Message Passing

The most common communication pattern. Agents exchange messages directly, typically in natural language for LLM-based systems. This is the digital equivalent of agents sending emails to each other.

python

from dataclasses import dataclass, field
from datetime import datetime


@dataclass
class Message:
    """A message between agents."""

    sender: str
    receiver: str
    content: str
    message_type: str = "inform"  # inform, request, propose, accept, reject
    timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
    metadata: dict = field(default_factory=dict)


class MessageBus:
    """A simple message bus for agent communication."""

    def __init__(self):
        self.messages: list[Message] = []
        self.subscribers: dict[str, list] = {}  # agent_name -> callback list

    def send(self, message: Message) -> None:
        """Send a message to a specific agent."""
        self.messages.append(message)
        if message.receiver in self.subscribers:
            for callback in self.subscribers[message.receiver]:
                callback(message)

    def subscribe(self, agent_name: str, callback) -> None:
        """Subscribe an agent to receive messages."""
        if agent_name not in self.subscribers:
            self.subscribers[agent_name] = []
        self.subscribers[agent_name].append(callback)

    def get_history(
        self, agent_name: str | None = None, limit: int = 50
    ) -> list[Message]:
        """Get message history, optionally filtered by agent."""
        if agent_name:
            msgs = [
                m for m in self.messages
                if m.sender == agent_name or m.receiver == agent_name
            ]
        else:
            msgs = self.messages
        return msgs[-limit:]

Advantages: Simple, decoupled, easy to log and debug. Disadvantages: Can become chaotic with many agents; no shared view of the conversation.

Shared State

Agents read from and write to a shared state object. This provides a common "view of the world" that all agents can access.

python

import threading
from typing import Any


class SharedState:
    """Thread-safe shared state for multi-agent systems.

    All agents can read and write to the shared state, providing
    a common ground truth for coordination.
    """

    def __init__(self):
        self._state: dict[str, Any] = {}
        self._lock = threading.Lock()
        self._history: list[dict] = []

    def get(self, key: str, default: Any = None) -> Any:
        """Read a value from shared state."""
        with self._lock:
            return self._state.get(key, default)

    def set(self, key: str, value: Any, agent_name: str = "unknown") -> None:
        """Write a value to shared state."""
        with self._lock:
            old_value = self._state.get(key)
            self._state[key] = value
            self._history.append({
                "agent": agent_name,
                "key": key,
                "old_value": old_value,
                "new_value": value,
                "timestamp": datetime.now().isoformat(),
            })

    def get_all(self) -> dict[str, Any]:
        """Get a snapshot of the entire shared state."""
        with self._lock:
            return dict(self._state)

    def get_changes_by(self, agent_name: str) -> list[dict]:
        """Get all state changes made by a specific agent."""
        return [h for h in self._history if h["agent"] == agent_name]

Advantages: All agents have a consistent view; no message routing needed. Disadvantages: Concurrency issues; tight coupling; harder to scale.

Blackboard Architecture

A blackboard is a hybrid approach where agents post their contributions to a shared "blackboard" that is visible to all agents. A control component decides which agent should act next based on the current state of the blackboard.

python

class Blackboard:
    """Blackboard architecture for multi-agent collaboration.

    The blackboard is a shared workspace where agents post their
    contributions. A controller decides which agent should act next.
    """

    def __init__(self):
        self.entries: list[dict] = []
        self.agents: dict[str, dict] = {}  # name -> agent config

    def register_agent(self, name: str, expertise: list[str], agent_fn) -> None:
        """Register an agent with its areas of expertise."""
        self.agents[name] = {
            "expertise": expertise,
            "agent_fn": agent_fn,
        }

    def post(self, agent_name: str, content: str, entry_type: str = "contribution") -> None:
        """Post a contribution to the blackboard."""
        self.entries.append({
            "agent": agent_name,
            "content": content,
            "type": entry_type,
            "timestamp": datetime.now().isoformat(),
        })

    def get_entries(self, entry_type: str | None = None) -> list[dict]:
        """Get blackboard entries, optionally filtered by type."""
        if entry_type:
            return [e for e in self.entries if e["type"] == entry_type]
        return list(self.entries)

    def get_current_state(self) -> str:
        """Format the blackboard contents for an agent to read."""
        if not self.entries:
            return "The blackboard is empty."
        lines = ["Current blackboard state:"]
        for entry in self.entries:
            lines.append(f"  [{entry['agent']}] ({entry['type']}): {entry['content']}")
        return "\n".join(lines)

    def select_next_agent(self, task_description: str) -> str | None:
        """Select the most appropriate agent for the current task.

        Simple implementation: match task keywords to agent expertise.
        More sophisticated implementations would use an LLM.
        """
        task_lower = task_description.lower()
        best_match = None
        best_score = 0
        for name, config in self.agents.items():
            score = sum(1 for exp in config["expertise"] if exp.lower() in task_lower)
            if score > best_score:
                best_score = score
                best_match = name
        return best_match

The blackboard architecture was originally proposed for speech understanding (Erman et al., 1980) and has been adapted for many collaborative problem-solving scenarios.

054. Collaboration Patterns

The collaboration pattern defines how agents work together. Choosing the right pattern is crucial -- using a debate pattern for a simple data pipeline is overkill, while using a sequential pipeline for a task that requires critical evaluation is insufficient.

Key Insight: The choice of collaboration pattern should match the structure of the problem. Ask yourself: "Is this task naturally sequential? Can parts be done in parallel? Does the result need peer review?" The answer guides the pattern choice.

Interactive · Multi-Agent Communication Patterns

Debate simulator

Two agents and an orchestrator

Multi-agent systems split reasoning by role. Here two agents specialise in defending and attacking the same thesis; an orchestrator delivers the verdict.

Pick a topic

Agent · Pro0

Agent · Con0

Orchestrator

Hit Start to let the agents speak.

0 / 7

Sequential (Pipeline)

Agents process work one after another, each building on the previous agent's output. This is the assembly line of multi-agent systems.

text

Input -> [Agent A] -> [Agent B] -> [Agent C] -> Output

Example: A content creation pipeline where:

Researcher Agent gathers information on a topic.
Writer Agent drafts the content based on the research.
Editor Agent reviews and polishes the draft.
Fact-Checker Agent verifies all claims.

python

class SequentialPipeline:
    """Execute agents in sequence, passing output from one to the next."""

    def __init__(self):
        self.stages: list[tuple[str, callable]] = []

    def add_stage(self, name: str, agent_fn) -> None:
        """Add a stage to the pipeline."""
        self.stages.append((name, agent_fn))

    def run(self, initial_input: str) -> dict:
        """Run the pipeline from start to finish.

        Returns:
            Dict with the final output and intermediate results.
        """
        current_input = initial_input
        results = {"stages": [], "final_output": ""}

        for name, agent_fn in self.stages:
            print(f"Running stage: {name}")
            output = agent_fn(current_input)
            results["stages"].append({
                "name": name,
                "input_preview": current_input[:200],
                "output_preview": output[:200],
            })
            current_input = output

        results["final_output"] = current_input
        return results

Advantages: Simple to understand and implement; clear data flow. Disadvantages: No parallelism; errors propagate forward; no feedback loops.

Parallel (Fan-Out/Fan-In)

Multiple agents work on different aspects of the task simultaneously, and their results are combined.

text

              -> [Agent A] ->
Input -> Fork -> [Agent B] -> Merge -> Output
              -> [Agent C] ->

python

import asyncio
from typing import Callable


async def parallel_execution(
    task: str,
    agents: dict[str, Callable],
    merger: Callable,
) -> str:
    """Execute multiple agents in parallel and merge their outputs.

    Args:
        task: The shared task description.
        agents: Dict mapping agent names to their async callable functions.
        merger: Function that combines all agent outputs into a final result.

    Returns:
        The merged output from all agents.
    """
    # Run all agents concurrently
    async def run_agent(name: str, fn: Callable) -> tuple[str, str]:
        result = await fn(task)
        return name, result

    tasks = [run_agent(name, fn) for name, fn in agents.items()]
    results = await asyncio.gather(*tasks)

    # Merge results
    agent_outputs = {name: output for name, output in results}
    final = merger(task, agent_outputs)
    return final

Advantages: Faster execution; diverse perspectives. Disadvantages: Merging is non-trivial; agents may produce contradictory results.

Hierarchical

A supervisor agent delegates subtasks to worker agents and coordinates their efforts.

python

class HierarchicalSystem:
    """A hierarchical multi-agent system with a supervisor and workers."""

    def __init__(self, supervisor_fn, workers: dict[str, callable]):
        self.supervisor_fn = supervisor_fn
        self.workers = workers

    def run(self, task: str, max_rounds: int = 10) -> str:
        """Run the hierarchical system.

        The supervisor decomposes the task, delegates to workers, and
        synthesizes results. Iteration continues until the supervisor
        declares the task complete.
        """
        conversation_history = [f"Task: {task}"]

        for round_num in range(max_rounds):
            # Supervisor decides what to do next
            supervisor_context = "\n".join(conversation_history)
            delegation = self.supervisor_fn(supervisor_context)

            if "TASK_COMPLETE" in delegation:
                # Extract final answer
                return delegation.replace("TASK_COMPLETE:", "").strip()

            # Parse delegation: which worker and what subtask
            worker_name, subtask = self._parse_delegation(delegation)

            if worker_name not in self.workers:
                conversation_history.append(
                    f"Error: No worker named '{worker_name}'. "
                    f"Available: {list(self.workers.keys())}"
                )
                continue

            # Execute the subtask
            worker_result = self.workers[worker_name](subtask)
            conversation_history.append(
                f"Supervisor delegated to {worker_name}: {subtask}"
            )
            conversation_history.append(
                f"{worker_name} result: {worker_result}"
            )

        return "Max rounds reached without completing the task."

    def _parse_delegation(self, delegation: str) -> tuple[str, str]:
        """Parse the supervisor's delegation into worker name and subtask."""
        # Simple format: "DELEGATE worker_name: subtask description"
        if "DELEGATE" in delegation:
            parts = delegation.split("DELEGATE", 1)[1].strip()
            if ":" in parts:
                worker, subtask = parts.split(":", 1)
                return worker.strip(), subtask.strip()
        return "", delegation

Advantages: Clear authority structure; supervisor can manage complexity; good for heterogeneous tasks. Disadvantages: Supervisor is a bottleneck; single point of failure.

Debate and Adversarial Collaboration

Agents take opposing positions and debate to arrive at a well-reasoned conclusion. This is perhaps the most fascinating collaboration pattern because it harnesses disagreement as a feature, not a bug.

The intuition is simple: if you only hear one perspective, you cannot evaluate its quality. But if you hear two opposing perspectives, the strengths and weaknesses of each become apparent. This is why courts have prosecutors and defense attorneys, why academic papers go through peer review, and why organizations have "red teams" that try to find flaws in plans.

Research by Du et al. (2023) demonstrated that multi-agent debate can improve the factual accuracy and reasoning quality of LLM outputs. When agents are asked to debate their answers, they catch each other's errors and converge on more accurate responses.

Key Insight: Debate works because it forces agents to justify their positions. A single agent can confidently state an incorrect answer. But when challenged by a critic, it must provide evidence -- and if the evidence does not support the claim, the error is exposed. The adversarial dynamic transforms overconfidence into careful reasoning.

Interactive · The Multi-Agent Debate Pattern

Debate simulator

Two agents and an orchestrator

Multi-agent systems split reasoning by role. Here two agents specialise in defending and attacking the same thesis; an orchestrator delivers the verdict.

Pick a topic

Agent · Pro0

Agent · Con0

Orchestrator

Hit Start to let the agents speak.

0 / 7

This pattern is explored in detail in Section 8 (the practical example).

065. Role Specialization

Assigning Personas and Capabilities

In LLM-based multi-agent systems, agent roles are defined primarily through system prompts. Each agent receives a persona that shapes its behavior:

python

AGENT_PROMPTS = {
    "researcher": """You are a Research Agent. Your role is to gather, analyze,
and synthesize information. You are thorough, evidence-based, and skeptical of
unverified claims. When you find information, always note the source.

Your capabilities:
- Search the web for information
- Read and summarize documents
- Identify gaps in available information
- Assess the reliability of sources

You communicate clearly and distinguish between facts, inferences, and opinions.""",

    "developer": """You are a Software Developer Agent. Your role is to write,
review, and debug code. You follow best practices, write clean and well-documented
code, and consider edge cases.

Your capabilities:
- Write code in Python, JavaScript, and SQL
- Run and test code
- Review code for bugs and improvements
- Design software architecture

You explain your technical decisions and trade-offs clearly.""",

    "critic": """You are a Critical Review Agent. Your role is to identify flaws,
risks, and areas for improvement in proposals, plans, and outputs. You are
constructive but unflinching in your assessment.

Your capabilities:
- Identify logical fallacies and weak arguments
- Spot missing considerations and edge cases
- Evaluate feasibility and risks
- Suggest specific improvements

You always provide actionable feedback, not just criticism.""",
}

Capability-Based Role Assignment

Beyond prompts, agents can have different capabilities through different tool sets:

python

class SpecializedAgent:
    """An agent with a specific role and set of tools."""

    def __init__(
        self,
        name: str,
        system_prompt: str,
        tools: list[dict],
        model: str = "gpt-4",
    ):
        self.name = name
        self.system_prompt = system_prompt
        self.tools = tools
        self.model = model
        self.conversation_history: list[dict] = []

    def process(self, message: str) -> str:
        """Process a message and return a response.

        The agent uses its specialized prompt and tools.
        """
        self.conversation_history.append({"role": "user", "content": message})

        # In a real implementation, this calls the LLM API with
        # the system prompt, tools, and conversation history
        response = llm_call(
            model=self.model,
            system=self.system_prompt,
            messages=self.conversation_history,
            tools=self.tools,
        )

        self.conversation_history.append({"role": "assistant", "content": response})
        return response

The MetaGPT Approach

MetaGPT (Hong et al., 2024) takes role specialization further by encoding human software development workflows into a multi-agent framework. Each agent represents a specific role in a software company:

Product Manager: Analyzes requirements and writes product requirement documents.
Architect: Designs the system architecture and API specifications.
Project Manager: Creates task assignments and timelines.
Engineers: Write code based on the design documents.
QA Engineer: Reviews code and writes tests.

The key innovation is that agents communicate through structured documents (like a real company) rather than free-form conversation. This reduces ambiguity and error propagation.

076. AutoGen Framework

Overview

AutoGen (Wu et al., 2023) is a Microsoft Research framework for building multi-agent systems. Its core abstraction is the conversable agent: an agent that can send and receive messages to and from other agents.

Key Concepts

ConversableAgent: The base class for all agents. Each agent has:

A name and system message (prompt)
An LLM configuration
A human input mode (NEVER, ALWAYS, or TERMINATE)
Code execution capabilities (optional)

GroupChat: A conversation between multiple agents with a managed turn-taking protocol.

GroupChatManager: Controls the flow of conversation in a group chat, deciding which agent speaks next.

Example: Two-Agent Conversation

python

# Conceptual example showing AutoGen's design pattern
# (simplified for educational purposes)

class ConversableAgent:
    """Simplified version of AutoGen's ConversableAgent."""

    def __init__(
        self,
        name: str,
        system_message: str,
        llm_call,
        max_consecutive_auto_reply: int = 10,
    ):
        self.name = name
        self.system_message = system_message
        self.llm_call = llm_call
        self.max_consecutive_auto_reply = max_consecutive_auto_reply
        self.chat_messages: dict[str, list[dict]] = {}

    def generate_reply(self, messages: list[dict]) -> str:
        """Generate a reply based on the conversation history."""
        full_messages = [{"role": "system", "content": self.system_message}]
        full_messages.extend(messages)
        return self.llm_call(messages=full_messages)

    def initiate_chat(self, recipient: "ConversableAgent", message: str) -> list[dict]:
        """Start a conversation with another agent.

        The agents take turns responding until a termination
        condition is met.
        """
        conversation = []
        current_message = message

        for turn in range(self.max_consecutive_auto_reply * 2):
            if turn % 2 == 0:
                # This agent's turn
                sender, receiver = self, recipient
            else:
                # Other agent's turn
                sender, receiver = recipient, self

            conversation.append({
                "role": "user" if sender == self else "assistant",
                "sender": sender.name,
                "content": current_message,
            })

            # Check for termination
            if "TERMINATE" in current_message:
                break

            # Generate reply
            current_message = receiver.generate_reply(
                [{"role": "user", "content": m["content"]} for m in conversation]
            )

        return conversation

Group Chat Pattern

python

class GroupChat:
    """A group chat between multiple agents.

    Manages turn-taking and message routing.
    """

    def __init__(
        self,
        agents: list[ConversableAgent],
        max_rounds: int = 20,
        speaker_selection: str = "round_robin",  # or "auto" or "random"
    ):
        self.agents = {agent.name: agent for agent in agents}
        self.max_rounds = max_rounds
        self.speaker_selection = speaker_selection
        self.conversation: list[dict] = []

    def select_next_speaker(self, last_speaker: str) -> str:
        """Select the next agent to speak."""
        agent_names = list(self.agents.keys())

        if self.speaker_selection == "round_robin":
            current_idx = agent_names.index(last_speaker)
            next_idx = (current_idx + 1) % len(agent_names)
            return agent_names[next_idx]

        elif self.speaker_selection == "random":
            import random
            candidates = [n for n in agent_names if n != last_speaker]
            return random.choice(candidates)

        else:  # "auto" - use LLM to decide
            return self._llm_select_speaker(last_speaker)

    def _llm_select_speaker(self, last_speaker: str) -> str:
        """Use an LLM to select the most appropriate next speaker."""
        agent_descriptions = "\n".join(
            f"- {name}: {agent.system_message[:100]}..."
            for name, agent in self.agents.items()
        )
        context = "\n".join(
            f"{m['sender']}: {m['content'][:200]}" for m in self.conversation[-5:]
        )
        # In practice, call the LLM to select the best next speaker
        # based on the conversation context and agent descriptions
        return list(self.agents.keys())[0]  # Placeholder

    def run(self, initial_message: str, initiator: str) -> list[dict]:
        """Run the group chat.

        Args:
            initial_message: The starting message.
            initiator: Name of the agent who starts the conversation.

        Returns:
            The complete conversation history.
        """
        self.conversation = [{
            "sender": initiator,
            "content": initial_message,
        }]

        current_speaker = initiator

        for round_num in range(self.max_rounds):
            # Select next speaker
            next_speaker = self.select_next_speaker(current_speaker)
            agent = self.agents[next_speaker]

            # Generate reply
            messages = [
                {"role": "user", "content": f"[{m['sender']}]: {m['content']}"}
                for m in self.conversation
            ]
            reply = agent.generate_reply(messages)

            self.conversation.append({
                "sender": next_speaker,
                "content": reply,
            })

            # Check for termination
            if "TERMINATE" in reply:
                break

            current_speaker = next_speaker

        return self.conversation

087. Orchestration Patterns

Supervisor Pattern

A dedicated orchestrator agent manages the workflow, deciding which agents to invoke and in what order.

python

SUPERVISOR_PROMPT = """You are a workflow supervisor managing a team of specialized agents.

Available agents:
{agent_descriptions}

Given the user's request, decide which agent(s) to invoke and in what order.
You can invoke agents sequentially or gather information from multiple agents.

For each step, respond in this format:
INVOKE <agent_name>: <instruction for that agent>

When you have enough information to provide a final answer:
FINAL_ANSWER: <your synthesized response>
"""


class SupervisorOrchestrator:
    """Supervisor-based orchestration for multi-agent systems."""

    def __init__(self, supervisor_llm_call, agents: dict[str, callable]):
        self.supervisor_llm_call = supervisor_llm_call
        self.agents = agents
        self.execution_log: list[dict] = []

    def run(self, task: str, max_steps: int = 10) -> dict:
        """Execute a task using supervisor-guided orchestration."""
        agent_descriptions = "\n".join(
            f"- {name}" for name in self.agents.keys()
        )
        system = SUPERVISOR_PROMPT.format(agent_descriptions=agent_descriptions)
        context = f"User request: {task}\n\nExecution history:\n"

        for step in range(max_steps):
            # Ask supervisor what to do next
            full_context = context + "\n".join(
                f"Step {i+1}: Invoked {log['agent']} -> {log['result'][:200]}"
                for i, log in enumerate(self.execution_log)
            )

            decision = self.supervisor_llm_call(system=system, prompt=full_context)

            if "FINAL_ANSWER:" in decision:
                answer = decision.split("FINAL_ANSWER:", 1)[1].strip()
                return {
                    "answer": answer,
                    "steps": self.execution_log,
                    "num_steps": len(self.execution_log),
                }

            if "INVOKE" in decision:
                agent_name, instruction = self._parse_invocation(decision)
                if agent_name in self.agents:
                    result = self.agents[agent_name](instruction)
                    self.execution_log.append({
                        "agent": agent_name,
                        "instruction": instruction,
                        "result": result,
                    })

        return {
            "answer": "Max steps reached.",
            "steps": self.execution_log,
            "num_steps": len(self.execution_log),
        }

    def _parse_invocation(self, decision: str) -> tuple[str, str]:
        """Parse 'INVOKE agent_name: instruction' from supervisor output."""
        parts = decision.split("INVOKE", 1)[1].strip()
        if ":" in parts:
            agent_name, instruction = parts.split(":", 1)
            return agent_name.strip(), instruction.strip()
        return "", parts

Round-Robin Pattern

Each agent takes a turn in a fixed order. Simple but ensures every agent contributes.

Dynamic Routing

An LLM-based router examines the current state and routes to the most appropriate agent. This is similar to the supervisor pattern but the router only makes routing decisions and does not synthesize results.

python

ROUTER_PROMPT = """Given the current task state, select the best agent to handle the next step.

Current state:
{state}

Available agents:
{agents}

Select ONE agent by responding with just the agent name."""


def dynamic_router(
    state: str,
    agents: dict[str, str],  # name -> description
    llm_call,
) -> str:
    """Dynamically route to the best agent for the current state."""
    agents_str = "\n".join(f"- {name}: {desc}" for name, desc in agents.items())
    prompt = ROUTER_PROMPT.format(state=state, agents=agents_str)
    selected = llm_call(prompt=prompt).strip()
    # Validate selection
    if selected in agents:
        return selected
    # Fallback: find closest match
    for name in agents:
        if name.lower() in selected.lower():
            return name
    return list(agents.keys())[0]  # Default to first agent

098. Challenges in Multi-Agent Systems

Multi-agent systems are powerful but not free. Every benefit comes with a cost, and understanding these trade-offs is essential for building systems that actually work in production rather than just in demos.

Coordination Overhead

Every interaction between agents costs time and money (LLM API calls). A system with N agents may require O(N^2) interactions per round, making scaling expensive.

Mitigation strategies:

Limit the number of interaction rounds.
Use cheaper models for coordination messages.
Design efficient communication protocols (structured messages, not free-form conversation).
Use asynchronous communication where possible.

Error Propagation

Errors from one agent can cascade through the system. If a researcher agent retrieves incorrect information, the writer agent incorporates it, and the editor agent may not catch factual errors.

text

[Researcher: wrong fact] -> [Writer: uses wrong fact] -> [Editor: fixes grammar but not facts]

Mitigation strategies:

Include dedicated verification agents.
Implement fact-checking as a separate stage.
Use adversarial agents that actively try to find errors.
Maintain provenance tracking so errors can be traced to their source.

Consensus and Disagreement

When agents disagree, the system needs a mechanism to reach a resolution:

Majority voting: If multiple agents weigh in, go with the majority.
Authority-based: A designated "senior" agent or human makes the final decision.
Evidence-based: Agents must cite evidence; the best-supported position wins.
Iterative refinement: Agents debate until they converge.

python

def majority_vote(agent_responses: dict[str, str], llm_call) -> str:
    """Resolve disagreement through majority voting.

    Uses an LLM to classify responses into groups and select
    the majority position.
    """
    responses_text = "\n".join(
        f"{name}: {response}" for name, response in agent_responses.items()
    )

    prompt = f"""The following agents have provided different responses.
Group them by their position and identify the majority view.

{responses_text}

Majority position (summarize the view held by most agents):"""

    return llm_call(prompt=prompt)

Infinite Loops and Runaway Costs

Without proper termination conditions, agents can enter infinite loops of back-and-forth conversation, consuming resources without making progress. This is one of the most common failure modes in multi-agent systems and one of the most expensive -- two polite agents can exchange "Thank you!" and "You're welcome!" forever if no one tells them to stop.

Common Misconception: "More agent interaction always leads to better results." In practice, there is a point of diminishing returns. After 3-4 rounds of debate, agents often start repeating themselves or making marginal refinements. Always set hard limits on rounds and monitor whether progress is being made.

Mitigation strategies:

Set hard limits on the number of rounds and total tokens.
Track whether progress is being made (are outputs changing substantively between rounds?).
Implement budget tracking and circuit breakers.
Use a timeout mechanism.

python

class CostTracker:
    """Track and limit costs in a multi-agent system."""

    def __init__(self, max_tokens: int = 100000, max_rounds: int = 50):
        self.max_tokens = max_tokens
        self.max_rounds = max_rounds
        self.tokens_used = 0
        self.rounds_completed = 0

    def record_usage(self, tokens: int) -> None:
        self.tokens_used += tokens
        self.rounds_completed += 1

    def can_continue(self) -> bool:
        return (
            self.tokens_used < self.max_tokens
            and self.rounds_completed < self.max_rounds
        )

    def usage_report(self) -> dict:
        return {
            "tokens_used": self.tokens_used,
            "token_budget_remaining": self.max_tokens - self.tokens_used,
            "rounds_completed": self.rounds_completed,
            "rounds_remaining": self.max_rounds - self.rounds_completed,
        }

109. Practical Example: A Two-Agent Debate System

Let us build a complete two-agent debate system where a Proposer agent makes an argument and a Critic agent challenges it. A Judge agent evaluates the debate and renders a final verdict.

python

"""
A two-agent debate system: Proposer vs Critic, evaluated by a Judge.

This demonstrates:
- Role-based agent specialization
- Structured multi-turn interaction
- Adversarial collaboration for better reasoning
- Final evaluation by a third-party judge

Reference: Du et al. (2023) "Improving Factuality and Reasoning in
Language Models through Multiagent Debate"

Requirements:
    pip install openai  (or any LLM client)
"""

from dataclasses import dataclass, field


@dataclass
class DebateConfig:
    """Configuration for the debate system."""

    max_rounds: int = 3
    min_rounds: int = 2
    proposer_model: str = "gpt-4"
    critic_model: str = "gpt-4"
    judge_model: str = "gpt-4"


PROPOSER_SYSTEM = """You are a Proposer in a structured debate. Your role is to:
1. Present clear, well-reasoned arguments in favor of your position.
2. Support your claims with evidence, examples, and logical reasoning.
3. Respond thoughtfully to the Critic's objections.
4. Acknowledge valid criticisms and refine your position when warranted.

Be persuasive but honest. Do not make claims you cannot support.
If the Critic raises a valid point, concede it and adjust your argument."""

CRITIC_SYSTEM = """You are a Critic in a structured debate. Your role is to:
1. Carefully analyze the Proposer's arguments for logical flaws,
   unsupported claims, missing evidence, and counterexamples.
2. Raise specific, substantive objections (not just vague disagreement).
3. Suggest alternative viewpoints or interpretations.
4. Acknowledge strong arguments while pressing on weak ones.

Be rigorous but fair. Your goal is not to "win" but to strengthen
the overall quality of reasoning through critical examination.
Do not be contrarian for its own sake."""

JUDGE_SYSTEM = """You are the Judge evaluating a debate between a Proposer and a Critic.

Evaluate the debate on these criteria:
1. **Argument Quality**: How well-supported and logically sound were the arguments?
2. **Evidence Use**: Were claims backed by concrete evidence or examples?
3. **Responsiveness**: Did each side address the other's points effectively?
4. **Intellectual Honesty**: Did participants acknowledge valid opposing points?
5. **Final Assessment**: What is the most well-supported conclusion given the debate?

Provide:
- A score for each side (1-10)
- Key strengths and weaknesses of each side
- Your assessment of the most defensible position
- Any important points that neither side addressed"""


class DebateAgent:
    """An agent participating in a debate."""

    def __init__(self, name: str, system_prompt: str, llm_call, model: str = "gpt-4"):
        self.name = name
        self.system_prompt = system_prompt
        self.llm_call = llm_call
        self.model = model

    def respond(self, conversation_history: list[dict]) -> str:
        """Generate a response given the conversation history."""
        messages = [{"role": "system", "content": self.system_prompt}]
        messages.extend(conversation_history)
        return self.llm_call(model=self.model, messages=messages)


class DebateSystem:
    """Orchestrates a structured debate between a Proposer and a Critic."""

    def __init__(self, llm_call, config: DebateConfig | None = None):
        self.config = config or DebateConfig()
        self.llm_call = llm_call

        # Create agents
        self.proposer = DebateAgent(
            "Proposer", PROPOSER_SYSTEM, llm_call, self.config.proposer_model
        )
        self.critic = DebateAgent(
            "Critic", CRITIC_SYSTEM, llm_call, self.config.critic_model
        )
        self.judge = DebateAgent(
            "Judge", JUDGE_SYSTEM, llm_call, self.config.judge_model
        )

    def run_debate(self, topic: str) -> dict:
        """Run a complete debate on a topic.

        Args:
            topic: The debate topic or question.

        Returns:
            Dict with the debate transcript, judge's verdict, and metadata.
        """
        transcript: list[dict] = []
        proposer_history: list[dict] = []
        critic_history: list[dict] = []

        print(f"{'='*60}")
        print(f"DEBATE TOPIC: {topic}")
        print(f"{'='*60}\n")

        # Round 1: Proposer opens
        opening_prompt = f"Present your opening argument on the following topic: {topic}"
        proposer_history.append({"role": "user", "content": opening_prompt})
        opening = self.proposer.respond(proposer_history)
        proposer_history.append({"role": "assistant", "content": opening})

        transcript.append({
            "round": 1,
            "speaker": "Proposer",
            "type": "opening",
            "content": opening,
        })
        print(f"[Round 1 - Proposer Opening]\n{opening}\n")

        # Debate rounds
        for round_num in range(1, self.config.max_rounds + 1):
            # Critic responds
            critic_prompt = (
                f"The Proposer argues:\n\n{opening if round_num == 1 else proposer_response}\n\n"
                f"Provide your critique (Round {round_num})."
            )
            critic_history.append({"role": "user", "content": critic_prompt})
            critique = self.critic.respond(critic_history)
            critic_history.append({"role": "assistant", "content": critique})

            transcript.append({
                "round": round_num,
                "speaker": "Critic",
                "type": "critique",
                "content": critique,
            })
            print(f"[Round {round_num} - Critic]\n{critique}\n")

            # Proposer responds to critique
            rebuttal_prompt = (
                f"The Critic objects:\n\n{critique}\n\n"
                f"Respond to the criticism and refine your argument (Round {round_num})."
            )
            proposer_history.append({"role": "user", "content": rebuttal_prompt})
            proposer_response = self.proposer.respond(proposer_history)
            proposer_history.append({"role": "assistant", "content": proposer_response})

            transcript.append({
                "round": round_num,
                "speaker": "Proposer",
                "type": "rebuttal",
                "content": proposer_response,
            })
            print(f"[Round {round_num} - Proposer Rebuttal]\n{proposer_response}\n")

        # Judge evaluates
        debate_text = self._format_for_judge(transcript)
        judge_prompt = (
            f"Evaluate the following debate on the topic: '{topic}'\n\n{debate_text}"
        )
        verdict = self.judge.respond([{"role": "user", "content": judge_prompt}])

        transcript.append({
            "round": "final",
            "speaker": "Judge",
            "type": "verdict",
            "content": verdict,
        })
        print(f"{'='*60}")
        print(f"[JUDGE'S VERDICT]\n{verdict}")
        print(f"{'='*60}")

        return {
            "topic": topic,
            "transcript": transcript,
            "verdict": verdict,
            "num_rounds": self.config.max_rounds,
            "total_turns": len(transcript),
        }

    def _format_for_judge(self, transcript: list[dict]) -> str:
        """Format the debate transcript for the judge."""
        lines = []
        for entry in transcript:
            if entry["speaker"] != "Judge":
                lines.append(
                    f"[{entry['speaker']} - {entry['type'].title()}]\n{entry['content']}\n"
                )
        return "\n".join(lines)


# ── Usage Example ─────────────────────────────────────────────────

def main():
    """Run a sample debate."""

    # Mock LLM call for demonstration
    call_count = {"n": 0}

    def mock_llm_call(model: str = "gpt-4", messages: list[dict] = None, **kwargs) -> str:
        """Simulate LLM responses for the debate."""
        call_count["n"] += 1
        last_msg = messages[-1]["content"] if messages else ""

        if "opening argument" in last_msg.lower():
            return (
                "I argue that open-source AI models are essential for democratic "
                "AI development. First, they enable transparency: researchers and "
                "the public can inspect model weights, training data, and behavior. "
                "Second, they reduce concentration of power: when only a few companies "
                "control AI, they become gatekeepers of a transformative technology. "
                "Third, open-source accelerates innovation through community contributions, "
                "as seen with Linux and the web."
            )
        elif "critique" in last_msg.lower() or "critic" in messages[0].get("content", "").lower():
            return (
                "While transparency is valuable, the Proposer overlooks significant risks. "
                "Open-source AI models can be misused for generating disinformation, "
                "creating deepfakes, or automating cyberattacks. The 'democratization' "
                "argument assumes all actors are benign, which is naive. Furthermore, "
                "truly 'open' AI requires open training data and compute, not just weights. "
                "Most 'open-source' models are still controlled by companies that choose "
                "what to release."
            )
        elif "respond to the criticism" in last_msg.lower():
            return (
                "The Critic raises valid concerns about misuse. However, restricting access "
                "does not prevent misuse; it merely shifts who can misuse. Closed models can "
                "also be jailbroken. The solution is not secrecy but robust safety research, "
                "which itself benefits from open access. On the 'controlled openness' point, "
                "I concede that current open-source AI is imperfect, but the trajectory is "
                "toward greater openness, and this should be encouraged."
            )
        elif "evaluate" in last_msg.lower():
            return (
                "VERDICT:\n\n"
                "Proposer Score: 7/10\n"
                "- Strengths: Clear structure, good examples (Linux, web), acknowledged "
                "valid criticism about 'controlled openness'.\n"
                "- Weaknesses: Insufficient engagement with the misuse argument.\n\n"
                "Critic Score: 8/10\n"
                "- Strengths: Specific counterexamples, distinguished between types of "
                "openness, practical concern about misuse.\n"
                "- Weaknesses: Did not engage with the 'restricting access shifts misuse' "
                "counterargument.\n\n"
                "Assessment: The debate reveals that the question is not binary. The most "
                "defensible position is graduated openness with safety evaluation, combining "
                "the transparency benefits the Proposer champions with the risk mitigation "
                "the Critic demands."
            )
        return "I acknowledge the points raised and will consider them carefully."

    # Run the debate
    config = DebateConfig(max_rounds=2)
    debate = DebateSystem(mock_llm_call, config)
    result = debate.run_debate(
        "Should AI models be open-source? Consider safety, innovation, and power dynamics."
    )

    print(f"\nDebate completed: {result['total_turns']} total turns across {result['num_rounds']} rounds.")


if __name__ == "__main__":
    main()

1110. Modern Agent Communication Protocols

As multi-agent systems mature beyond research prototypes into production deployments, the need for standardized communication protocols has become urgent. In 2024-2025, three major protocols emerged from leading AI companies, each addressing a different layer of the agent interoperability stack. Understanding these protocols is essential for building multi-agent systems that work across frameworks, vendors, and organizational boundaries.

10.1 Background: The Interoperability Problem

Consider a realistic enterprise scenario: a company uses a customer service agent built with LangGraph, a data analysis agent built with the OpenAI Agents SDK, and an internal knowledge agent built with a custom framework on Claude. Today, making these agents collaborate requires custom integration code for every pair of agents. With N agent systems, this means O(N^2) integrations, a classic interoperability problem.

Standardized protocols solve this by defining a common language for agent interaction, just as HTTP standardized web communication and SMTP standardized email.

10.2 Agent-to-Agent Protocol (A2A) by Google

Google announced the Agent-to-Agent Protocol (A2A) in April 2025 as an open protocol for agent interoperability. A2A enables agents built on different frameworks, by different vendors, using different models, to discover each other and collaborate on tasks.

Core Design Principles

A2A is built around several key principles:

Agentic: Agents collaborate as peers. A2A supports scenarios where agents negotiate, delegate, and communicate without reducing the remote agent to a mere tool.
Opaque execution: Agents do not need to share their internal reasoning, memory, or tool usage. Each agent is a black box to others, exposing only its capabilities and results.
Modality-agnostic: Communication can include text, files, structured data, images, audio, and video. This reflects the multimodal nature of real-world agent tasks.
Framework-independent: A2A works regardless of the underlying agent framework (LangGraph, CrewAI, custom code, etc.).

Key Concepts

Agent Card: A JSON metadata document that describes an agent's identity and capabilities. Think of it as a business card for agents. Every A2A-compatible agent hosts an Agent Card at a well-known URL (typically /.well-known/agent.json).

json

{
  "name": "financial-analysis-agent",
  "description": "Analyzes financial reports, generates summaries, and answers questions about company financials.",
  "url": "https://finance-agent.example.com",
  "version": "1.0.0",
  "capabilities": {
    "streaming": true,
    "pushNotifications": true
  },
  "skills": [
    {
      "id": "financial-report-analysis",
      "name": "Financial Report Analysis",
      "description": "Analyzes 10-K, 10-Q, and annual reports to extract key metrics and trends.",
      "inputModes": ["text/plain", "application/pdf"],
      "outputModes": ["text/plain", "application/json"]
    },
    {
      "id": "earnings-qa",
      "name": "Earnings Q&A",
      "description": "Answers questions about a company's earnings based on public filings.",
      "inputModes": ["text/plain"],
      "outputModes": ["text/plain"]
    }
  ],
  "authentication": {
    "schemes": ["bearer"]
  }
}

Task: The fundamental unit of work in A2A. A client agent creates a task on a remote agent and monitors its progress. Tasks have a lifecycle:

The input-required state is particularly important: it allows agents to negotiate and request additional information mid-task, supporting genuine multi-turn collaboration rather than simple request-response.

Message: The communication unit between agents. Each message contains one or more Parts, which can be:

TextPart: Plain text or markdown content
FilePart: Binary files (PDFs, images, spreadsheets)
DataPart: Structured JSON data

Artifact: An output produced by an agent during task execution. While messages are conversational (back-and-forth), artifacts represent deliverables: the report that was generated, the analysis results, the transformed dataset.

How A2A Works: A Walkthrough

Interactive · A2A Protocol: Agent Discovery and Task Flow

A2A envelope

How one agent talks to another

Each message rides in an envelope with standardised fields. Watch what each one means.

version

Protocol version. Lets older servers refuse incompatible messages.

EnvelopeA → B

versiona2a/1.0

fromagent.research

toagent.summariser

thread_idth_8c1...e3

intentsummarise

payload{ doc_id, max_words: 250 }

deadline2026-04-26T17:00Z

Transport Layer

A2A uses HTTP as its transport with JSON-RPC 2.0 as the message format. This is a pragmatic choice: HTTP is universally supported, firewalls are configured for it, and every programming language has HTTP client libraries.

For real-time updates, A2A supports Server-Sent Events (SSE), which allow the remote agent to stream status updates and partial results back to the client without polling.

Conceptual Code Example: A2A Agent Discovery and Task Delegation

python

"""
Conceptual example of A2A agent discovery and task delegation.

This illustrates how a client agent discovers a remote agent via its
Agent Card and delegates a task using the A2A protocol.
"""

import httpx


class A2AClient:
    """A client for interacting with A2A-compatible agents."""

    def __init__(self, base_url: str, auth_token: str | None = None):
        self.base_url = base_url.rstrip("/")
        self.auth_token = auth_token
        self.agent_card: dict | None = None

    def _headers(self) -> dict:
        headers = {"Content-Type": "application/json"}
        if self.auth_token:
            headers["Authorization"] = f"Bearer {self.auth_token}"
        return headers

    async def discover(self) -> dict:
        """Discover the remote agent by fetching its Agent Card."""
        async with httpx.AsyncClient() as client:
            response = await client.get(
                f"{self.base_url}/.well-known/agent.json",
                headers=self._headers(),
            )
            response.raise_for_status()
            self.agent_card = response.json()
            return self.agent_card

    def has_skill(self, skill_id: str) -> bool:
        """Check if the remote agent has a specific skill."""
        if not self.agent_card:
            raise RuntimeError("Call discover() first to fetch the Agent Card.")
        return any(
            s["id"] == skill_id for s in self.agent_card.get("skills", [])
        )

    async def send_task(self, message_text: str, task_id: str | None = None) -> dict:
        """Send a task to the remote agent.

        Args:
            message_text: The task instruction in natural language.
            task_id: Optional task ID (generated by server if omitted).

        Returns:
            The task object with its current status.
        """
        payload = {
            "jsonrpc": "2.0",
            "method": "tasks/send",
            "params": {
                "message": {
                    "role": "user",
                    "parts": [{"type": "text", "text": message_text}],
                },
            },
            "id": task_id or "auto",
        }

        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{self.base_url}/a2a",
                json=payload,
                headers=self._headers(),
            )
            response.raise_for_status()
            return response.json()["result"]

    async def get_task_status(self, task_id: str) -> dict:
        """Poll the status of a previously submitted task."""
        payload = {
            "jsonrpc": "2.0",
            "method": "tasks/get",
            "params": {"id": task_id},
            "id": "status-check",
        }

        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{self.base_url}/a2a",
                json=payload,
                headers=self._headers(),
            )
            response.raise_for_status()
            return response.json()["result"]


async def orchestrate_financial_analysis():
    """Example: A supervisor agent discovers and delegates to a financial agent."""

    # Step 1: Discover the remote agent
    finance_agent = A2AClient(
        base_url="https://finance-agent.example.com",
        auth_token="secret-token",
    )
    card = await finance_agent.discover()
    print(f"Discovered agent: {card['name']}")
    print(f"Skills: {[s['name'] for s in card['skills']]}")

    # Step 2: Check if the agent has the skill we need
    if not finance_agent.has_skill("financial-report-analysis"):
        print("Agent does not have the required skill. Searching for another agent...")
        return

    # Step 3: Send the task
    task = await finance_agent.send_task(
        "Analyze the Q3 2025 earnings report for Acme Corp. "
        "Focus on revenue growth, profit margins, and forward guidance."
    )
    print(f"Task created: {task['id']}, status: {task['status']}")

    # Step 4: Poll for completion (in production, use SSE streaming)
    import asyncio
    while task["status"] in ("submitted", "working"):
        await asyncio.sleep(2)
        task = await finance_agent.get_task_status(task["id"])
        print(f"Task status: {task['status']}")

    # Step 5: Process results
    if task["status"] == "completed":
        for artifact in task.get("artifacts", []):
            print(f"Received artifact: {artifact.get('name', 'unnamed')}")
            for part in artifact.get("parts", []):
                if "text" in part:
                    print(f"  Content: {part['text'][:200]}...")
    elif task["status"] == "input-required":
        print("Agent needs more information:")
        last_message = task.get("messages", [])[-1]
        print(f"  {last_message}")

A2A vs MCP: Complementary, Not Competing

A common source of confusion is the relationship between Google's A2A and Anthropic's Model Context Protocol (MCP). They operate at different layers and are designed to be complementary:

MCP (Model Context Protocol): Connects a model/agent to tools and data sources. It is a vertical integration protocol. Think of it as giving an agent hands: the ability to read files, query databases, call APIs.
A2A (Agent-to-Agent Protocol): Connects agents to other agents. It is a horizontal interoperability protocol. Think of it as giving agents the ability to collaborate with other agents, regardless of how those agents are built internally.

An agent might use MCP to access its own tools (reading a database, calling an API) and A2A to delegate subtasks to specialized agents built by other teams or vendors.

10.3 Agent Communication Protocol (ACP) by IBM

IBM introduced the Agent Communication Protocol (ACP) in early 2025, targeting enterprise multi-agent systems where agents need to collaborate within organizational boundaries. ACP is developed as part of the BeeAI open-source project.

Design Philosophy

While A2A focuses on cross-vendor interoperability, ACP is designed for enterprise orchestration, where all agents are deployed within a controlled environment and need reliable, auditable communication. Key design decisions:

Async-first: All communication is asynchronous by default. Agents submit requests and receive responses via callbacks or polling. This reflects the reality that enterprise agents often perform long-running tasks (generating reports, running analyses) that cannot be completed synchronously.
Multimodal messages: Like A2A, ACP supports text, files, and structured data in messages. This is essential for enterprise use cases where agents exchange documents, spreadsheets, and visualizations.
Agent directory: ACP includes a centralized agent registry where agents publish their capabilities and other agents discover them. This is similar to A2A's Agent Cards but uses a centralized directory rather than distributed discovery.
Event-driven communication: ACP uses an event-driven architecture where agents publish events and subscribe to events from other agents. This decouples senders from receivers and supports complex workflows.

Key Components

Agent Registration: Agents register themselves with the directory, declaring their skills, input/output formats, and availability status.

Message Structure: ACP messages carry a conversation thread ID, allowing multi-turn interactions to be grouped and tracked, which is critical for auditability in enterprise settings.

Run Lifecycle: Similar to A2A's task lifecycle, ACP defines a "run" as the unit of work, with states like created, in-progress, awaiting, completed, failed, and cancelled.

Conceptual Example: ACP Agent Interaction

python

"""
Conceptual example of ACP agent communication within an enterprise system.
"""


class ACPAgentDirectory:
    """A centralized directory for ACP agent discovery."""

    def __init__(self):
        self.agents: dict[str, dict] = {}

    def register(self, agent_id: str, metadata: dict) -> None:
        """Register an agent in the directory."""
        self.agents[agent_id] = {
            **metadata,
            "status": "available",
        }

    def discover(self, required_skill: str) -> list[dict]:
        """Find agents that have a specific skill."""
        return [
            {"id": agent_id, **info}
            for agent_id, info in self.agents.items()
            if required_skill in info.get("skills", [])
            and info["status"] == "available"
        ]

    def update_status(self, agent_id: str, status: str) -> None:
        """Update an agent's availability status."""
        if agent_id in self.agents:
            self.agents[agent_id]["status"] = status


class ACPMessage:
    """A message in the ACP protocol."""

    def __init__(self, thread_id: str, sender: str, content: list[dict]):
        self.thread_id = thread_id
        self.sender = sender
        self.content = content  # List of parts: text, file, data


class ACPAgent:
    """An agent that communicates via ACP."""

    def __init__(self, agent_id: str, skills: list[str], directory: ACPAgentDirectory):
        self.agent_id = agent_id
        self.skills = skills
        self.directory = directory
        self.inbox: list[ACPMessage] = []

        # Register with the directory on creation
        self.directory.register(agent_id, {
            "skills": skills,
            "description": f"Agent with skills: {', '.join(skills)}",
        })

    async def send_to(self, recipient_id: str, thread_id: str, content: list[dict]) -> None:
        """Send a message to another agent."""
        message = ACPMessage(
            thread_id=thread_id,
            sender=self.agent_id,
            content=content,
        )
        # In a real implementation, this would go through a message broker
        print(f"[ACP] {self.agent_id} -> {recipient_id}: {content}")

    async def request_work(self, skill: str, instruction: str) -> str | None:
        """Find an agent with a skill and request work."""
        candidates = self.directory.discover(skill)
        if not candidates:
            print(f"No available agent found with skill: {skill}")
            return None

        target = candidates[0]
        target_id = target["id"]
        self.directory.update_status(target_id, "busy")

        await self.send_to(
            recipient_id=target_id,
            thread_id=f"thread-{self.agent_id}-{target_id}",
            content=[{"type": "text", "text": instruction}],
        )
        return target_id
python
# Conceptual example of OpenAI Agents SDK pattern
# (simplified for educational purposes)

class Agent:
    def __init__(self, name: str, instructions: str, tools: list = None, handoffs: list = None):
        self.name = name
        self.instructions = instructions
        self.tools = tools or []
        self.handoffs = handoffs or []  # List of other Agents

class HandoffExample:
    """Demonstrates the handoff pattern."""

    def build_triage_system(self):
        """Build a customer service triage system with handoffs."""

        billing_agent = Agent(
            name="Billing Specialist",
            instructions="You handle billing questions: invoices, payments, refunds.",
            tools=[lookup_invoice, process_refund],
        )

        technical_agent = Agent(
            name="Technical Support",
            instructions="You handle technical issues: bugs, configuration, integration.",
            tools=[search_docs, create_ticket],
        )

        triage_agent = Agent(
            name="Triage Agent",
            instructions=(
                "You are the first point of contact. Determine whether the customer "
                "needs billing help or technical support, then hand off to the "
                "appropriate specialist."
            ),
            handoffs=[billing_agent, technical_agent],
        )

        return triage_agent

When to use: When you are already in the OpenAI ecosystem and need a simple, lightweight multi-agent system with clear handoff patterns. Good for customer service routing, tiered support systems, and sequential workflows.

11.2 Claude Agent SDK (Anthropic)

Anthropic's Claude Agent SDK provides tools for building agents powered by Claude models. It emphasizes safety, extended thinking, and tool use as first-class concepts.

Key concepts:

Agentic loop: The SDK manages the core loop of thinking, acting (tool use), and observing (tool results).
Extended thinking: Claude can "think" before acting, producing a chain-of-thought that is visible to the developer but not sent back to the user.
MCP integration: Native support for MCP servers, allowing agents to connect to tools and data sources via the MCP protocol.
Computer use: Agents can interact with graphical interfaces, taking screenshots and performing mouse/keyboard actions.

python

# Conceptual example of Claude Agent SDK pattern
# (simplified for educational purposes)

import anthropic


def build_research_agent():
    """Build a research agent using Claude Agent SDK patterns."""
    client = anthropic.Anthropic()

    tools = [
        {
            "name": "search_papers",
            "description": "Search academic papers by query.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"},
                    "max_results": {"type": "integer", "default": 5},
                },
                "required": ["query"],
            },
        },
        {
            "name": "read_pdf",
            "description": "Extract text from a PDF document.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "url": {"type": "string", "description": "URL of the PDF"},
                },
                "required": ["url"],
            },
        },
    ]

    # The agentic loop: send a message, process tool calls, repeat
    messages = [{"role": "user", "content": "Find recent papers on multi-agent debate."}]

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            system="You are a research assistant. Use tools to find and analyze papers.",
            tools=tools,
            messages=messages,
        )

        # Check if the model wants to use a tool
        if response.stop_reason == "tool_use":
            # Execute tool calls and feed results back
            tool_results = execute_tools(response.content)
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})
        else:
            # Model produced a final response
            return response.content

When to use: When you need agents with strong reasoning capabilities, extended thinking, or when your system is built around the MCP ecosystem. Claude's emphasis on safety makes it a good choice for high-stakes applications.

11.3 LangGraph (LangChain Ecosystem)

LangGraph takes a fundamentally different approach: it models multi-agent workflows as directed graphs where nodes are agents or functions and edges are transitions between them.

Key concepts:

StateGraph: A graph where nodes read from and write to a shared state object. Edges define the flow between nodes.
Nodes: Individual processing steps (agent calls, tool executions, conditional logic).
Edges: Transitions between nodes. Can be unconditional (always follow this path) or conditional (choose path based on state).
Checkpointing: LangGraph can save and restore state at any point, enabling human-in-the-loop workflows, error recovery, and long-running processes.
Subgraphs: Graphs can contain other graphs, allowing hierarchical composition.

python

# Conceptual example of LangGraph pattern
# (simplified for educational purposes)

from typing import TypedDict


class ResearchState(TypedDict):
    query: str
    papers: list[dict]
    analysis: str
    needs_more_research: bool
    final_report: str


def search_node(state: ResearchState) -> dict:
    """Search for relevant papers."""
    papers = search_papers(state["query"])
    return {"papers": papers}


def analyze_node(state: ResearchState) -> dict:
    """Analyze the collected papers."""
    analysis = analyze_papers(state["papers"])
    needs_more = len(state["papers"]) < 5
    return {"analysis": analysis, "needs_more_research": needs_more}


def report_node(state: ResearchState) -> dict:
    """Generate the final report."""
    report = generate_report(state["analysis"], state["papers"])
    return {"final_report": report}


def should_continue(state: ResearchState) -> str:
    """Decide whether to search more or write the report."""
    if state["needs_more_research"]:
        return "search"  # Loop back to search
    return "report"  # Proceed to report generation


# Build the graph
# graph = StateGraph(ResearchState)
# graph.add_node("search", search_node)
# graph.add_node("analyze", analyze_node)
# graph.add_node("report", report_node)
# graph.set_entry_point("search")
# graph.add_edge("search", "analyze")
# graph.add_conditional_edges("analyze", should_continue, {"search": "search", "report": "report"})
# app = graph.compile()

When to use: When your multi-agent workflow has complex control flow with loops, branches, and conditional logic. Particularly strong for workflows that need human-in-the-loop intervention, checkpointing, or long-running processes. The graph abstraction makes it easy to visualize and reason about the workflow.

11.4 CrewAI

CrewAI takes inspiration from human team dynamics, using a role-based approach where agents are defined by their role, goal, and backstory. Agents form a "crew" that collaborates on tasks.

Key concepts:

Agent: Defined by a role (e.g., "Senior Data Analyst"), a goal, and a backstory that shapes its persona.
Task: A specific piece of work assigned to an agent, with a description and expected output.
Crew: A team of agents that collaborate to complete a set of tasks.
Process: The collaboration strategy: sequential (tasks executed in order) or hierarchical (a manager agent delegates to workers).

python

# Conceptual example of CrewAI pattern
# (simplified for educational purposes)

class CrewAgent:
    def __init__(self, role: str, goal: str, backstory: str, tools: list = None):
        self.role = role
        self.goal = goal
        self.backstory = backstory
        self.tools = tools or []

class Task:
    def __init__(self, description: str, expected_output: str, agent: CrewAgent):
        self.description = description
        self.expected_output = expected_output
        self.agent = agent

# Define agents
researcher = CrewAgent(
    role="Senior Research Analyst",
    goal="Discover and analyze the latest trends in agentic AI.",
    backstory="A veteran AI researcher with 15 years of experience in multi-agent systems.",
)

writer = CrewAgent(
    role="Technical Writer",
    goal="Transform research findings into clear, engaging content.",
    backstory="A science communicator skilled at making complex topics accessible.",
)

# Define tasks
research_task = Task(
    description="Research the latest developments in agent communication protocols (A2A, ACP, MCP).",
    expected_output="A comprehensive summary with key findings and comparisons.",
    agent=researcher,
)

writing_task = Task(
    description="Write a blog post based on the research findings.",
    expected_output="A 1500-word blog post suitable for a technical audience.",
    agent=writer,
)

# The crew would execute: research_task -> writing_task

When to use: When you want a high-level, intuitive abstraction for multi-agent collaboration. CrewAI is particularly good for content creation pipelines, research workflows, and scenarios where role-based collaboration maps naturally to the problem. Its simplicity makes it a good starting point for teams new to multi-agent systems.

11.5 AutoGen (Microsoft)

AutoGen was covered in detail in Section 6. Since its initial release, it has evolved into AutoGen 0.4+ with a redesigned architecture featuring:

Asynchronous, event-driven communication between agents.
Agent runtime: A managed environment that handles message routing, agent lifecycle, and distributed execution.
Extensible agent types: Beyond conversable agents, AutoGen now supports tool-use agents, code execution agents, and custom agent types.
Multi-language support: Agents can be implemented in Python or .NET.

When to use: When you need a mature, well-tested framework with strong support for conversational multi-agent patterns, code execution, and group chat dynamics.

11.6 Framework Comparison

Framework	Creator	Core Abstraction	Best For	Learning Curve
OpenAI Agents SDK	OpenAI	Handoffs between agents	Simple routing and delegation	Low
Claude Agent SDK	Anthropic	Agentic loop + MCP tools	Safety-critical, reasoning-heavy tasks	Low-Medium
LangGraph	LangChain	State graph with nodes/edges	Complex workflows with loops and branches	Medium-High
CrewAI	CrewAI Inc.	Role-based crews and tasks	Content pipelines, research workflows	Low
AutoGen	Microsoft	Conversable agents, group chat	Multi-agent conversations, code execution	Medium

11.7 Choosing a Framework

There is no single "best" framework. The choice depends on your requirements:

If you need simplicity: Start with CrewAI or OpenAI Agents SDK. Both have gentle learning curves and handle common patterns well.
If you need complex control flow: Use LangGraph. Its graph abstraction handles loops, branches, and human-in-the-loop naturally.
If you need strong reasoning and safety: Use the Claude Agent SDK with extended thinking.
If you need multi-agent conversations: Use AutoGen, which was designed specifically for conversational multi-agent patterns.
If you need cross-framework interoperability: Implement A2A on top of whatever framework you use, so your agents can collaborate with agents built on other frameworks.

The frameworks are converging: most now support similar patterns (tool use, handoffs, multi-turn conversations). The differentiators are increasingly about ecosystem integration, model support, and developer experience rather than fundamental capabilities.

1212. Further Emerging Directions

Scaling Laws for Multi-Agent Systems

Research is beginning to explore how multi-agent system performance scales with the number of agents. Early findings suggest that adding more agents helps up to a point, after which coordination overhead dominates and performance plateaus or degrades (Li et al., 2024).

Self-Organizing Multi-Agent Systems

Rather than pre-defining roles and communication patterns, some research explores systems where agents autonomously organize themselves, forming teams, assigning roles, and establishing communication channels as needed. This is inspired by organizational theory and swarm intelligence.

Human-Agent Teaming

Multi-agent systems increasingly include human participants. Humans may serve as supervisors, reviewers, or collaborators within a multi-agent workflow. Designing effective human-agent interfaces that leverage the strengths of both is an active research area.

13Discussion Questions

Optimal team size: In human organizations, team effectiveness often degrades beyond 7-8 members. Does a similar limit exist for LLM-based multi-agent systems? What factors determine the optimal number of agents?
Agent trust: Should agents trust each other's outputs, or should they always verify? What is the right balance between trust (for efficiency) and verification (for accuracy)?
Emergent hierarchies: If agents are not given predefined roles, will hierarchies emerge naturally? What does this tell us about the nature of organization?
Cost-effectiveness: Multi-agent systems multiply LLM costs. Under what circumstances does the quality improvement justify the additional cost?
Accountability: When a multi-agent system makes an error, which agent is "responsible"? How should we design accountability mechanisms?
Adversarial robustness: If one agent in a multi-agent system is compromised (e.g., through prompt injection), how does this affect the entire system? How can we design resilient multi-agent architectures?
Protocol adoption: Will A2A, ACP, and MCP coexist as complementary layers, or will the ecosystem consolidate around fewer protocols? What historical precedents (e.g., the "browser wars," REST vs. SOAP) can inform our predictions?
Framework lock-in: How much does the choice of multi-agent framework constrain your architecture? What strategies can you use to remain framework-agnostic while still benefiting from framework-specific features?

14Summary and Key Takeaways

Multi-agent systems extend single-agent capabilities through specialization, verification, and parallel processing. They are most valuable for complex tasks that benefit from diverse perspectives.
Communication protocols shape system behavior. Message passing offers flexibility, shared state ensures consistency, and blackboard architectures provide structured collaboration.
Collaboration patterns should match the task. Sequential pipelines suit linear workflows, parallel execution suits independent subtasks, hierarchical structures suit complex decomposition, and debate suits reasoning tasks.
Three complementary protocols are defining the agent communication stack: MCP (model-to-tools, vertical), A2A (agent-to-agent, cross-vendor), and ACP (agent-to-agent, enterprise). Understanding when and how to use each is essential for building production multi-agent systems.
Orchestration is the key challenge. Supervisor, round-robin, and dynamic routing patterns each have trade-offs between control, flexibility, and overhead.
Modern frameworks (OpenAI Agents SDK, Claude Agent SDK, LangGraph, CrewAI, AutoGen) offer different abstractions for building multi-agent systems. The choice depends on workflow complexity, reasoning requirements, and ecosystem constraints.
Practical concerns dominate real deployments: coordination overhead, error propagation, cost management, and termination conditions require careful engineering.
Adversarial collaboration (debate) can improve reasoning quality by forcing agents to defend their positions and address counterarguments.
The ecosystem is converging: frameworks are adopting similar patterns while protocols are enabling cross-framework interoperability. Designing framework-agnostic agents with protocol support is a sound long-term strategy.

15References

Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., Awadallah, A. H., White, R. W., Burger, D., & Wang, C. (2023). AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. arXiv preprint arXiv:2308.08155.
Du, Y., Li, S., Torralba, A., Tenenbaum, J. B., & Mordatch, I. (2023). Improving Factuality and Reasoning in Language Models through Multiagent Debate. arXiv preprint arXiv:2305.14325.
Hong, S., Zhuge, M., Chen, J., Zheng, X., Cheng, Y., Zhang, C., Wang, J., Wang, Z., Yau, S. K. S., Lin, Z., Zhou, L., Ran, C., Xiao, L., Wu, C., & Schmidhuber, J. (2024). MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework. International Conference on Learning Representations (ICLR).
Bond, A. H., & Gasser, L. (1988). Readings in Distributed Artificial Intelligence. Morgan Kaufmann.
Li, G., Hammoud, H. A. A. K., Itani, H., Khizbullin, D., & Ghanem, B. (2024). CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society. Advances in Neural Information Processing Systems (NeurIPS), 36.
Park, J. S., O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative Agents: Interactive Simulacra of Human Behavior. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST).
Erman, L. D., Hayes-Roth, F., Lesser, V. R., & Reddy, D. R. (1980). The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty. ACM Computing Surveys, 12(2), 213-253.
Chan, C., Chen, W., Su, Y., Yu, J., Xue, W., Zhang, S., Fu, J., & Liu, Z. (2024). ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate. International Conference on Learning Representations (ICLR).
Google. (2025). Agent2Agent Protocol (A2A): An Open Protocol for Agent Interoperability. https://google.github.io/A2A/
IBM. (2025). Agent Communication Protocol (ACP). BeeAI Open Source Project. https://agentcommunicationprotocol.dev/
Anthropic. (2024). Model Context Protocol (MCP). https://modelcontextprotocol.io/
OpenAI. (2025). Agents SDK Documentation. https://openai.github.io/openai-agents-python/
LangChain. (2024). LangGraph: Build Stateful Multi-Agent Applications. https://www.langchain.com/langgraph
Moura, J. (2024). CrewAI: Framework for Orchestrating Role-Playing, Autonomous AI Agents. https://www.crewai.com/

Part of "Agentic AI: Foundations, Architectures, and Applications" (CC BY-SA 4.0).