Stop Building 'Do Everything' Agents

ZH+
Architecture
September 9, 2025

Table of Contents

You built a voice agent. It handles customer questions, processes orders, schedules appointments, updates accounts, and answers technical queries. One agent, five responsibilities. You’re proud of how much it can do.

Then you start noticing the issues:

Sometimes it confuses appointment scheduling with order processing
The tone is professional for support but weirdly formal for sales
Tool calling errors increase as you add more features
You spend more time debugging edge cases than shipping

Welcome to the “do everything” agent trap. And here’s the uncomfortable truth: your agent isn’t bad—it’s overloaded.

Let me show you why narrow, focused agents beat monolithic ones every time, and how to architect agent teams with OpenAI’s Agents SDK.

The Monolithic Agent Problem

It seems logical: one smart agent that handles everything. You define all your tools, write comprehensive instructions, and let the LLM figure it out.

But here’s what actually happens:

Problem 1: Context Confusion

Your agent is simultaneously:

A friendly sales assistant
A formal support specialist
A technical troubleshooter
A billing administrator
An appointment scheduler

Each role needs different:

Tone and personality
Domain knowledge
Decision-making patterns
Tool access

One agent trying to be all of these produces inconsistent results. It blends contexts. Uses formal language where casual would work better. Calls the wrong tools because too many options exist.

Problem 2: Latent Space Dilution

LLMs have a “latent space”—the internal representation of what they know and how they respond. When you ask one agent to master five different domains, that space gets spread thin.

A customer support specialist agent has a focused latent space: support knowledge, empathy patterns, escalation procedures.

A do-everything agent has a diluted latent space: a little sales, a little support, a little technical, a little everything. Master of none.

Problem 3: Tool Calling Errors

Give an agent 30 tools, and it’ll occasionally call the wrong one.

“Create an appointment” vs “Create a support ticket”—similar enough that a confused agent picks wrong sometimes.

More tools = more opportunities for mistakes. The error rate isn’t linear—it compounds.

Problem 4: Maintenance Nightmare

When something breaks, which part of your monolithic agent broke?

Was it the sales instructions conflicting with support tone? The technical troubleshooting tools interfering with billing queries? The appointment scheduling logic getting confused with task creation?

Debugging a 2000-line instruction set is hell. Improving one part often breaks another.

The Solution: Role-Split Agent Teams

Instead of one agent doing everything, design teams of focused agents, each with narrow, well-defined responsibilities.

Think about how real teams work:

Bad: Hire one person to do sales, support, engineering, marketing, and accounting.

Good: Hire specialists. Each person focuses on one domain. They hand off to each other when needed.

Voice agents work the same way.

Real-World Example: E-Commerce Agent Team

Let’s replace a monolithic e-commerce agent with a focused team:

The Monolith (Before)

One agent handling:

Product questions
Order placement
Order tracking
Returns processing
Technical support
Account management

Instruction length: 2,400 words

Tool count: 28 tools

Error rate: 12% (wrong tool calls, tone inconsistencies)

The Team (After)

1. Concierge Agent (Router)

Greets customer
Understands what they need
Routes to specialist
Context: conversational, warm

2. Product Specialist Agent

Answers product questions
Provides recommendations
Shares specs and comparisons
Context: knowledgeable, helpful

3. Order Agent

Places orders
Processes payments
Confirms details
Context: efficient, accurate

4. Support Agent

Handles issues
Processes returns
Troubleshoots problems
Context: empathetic, solution-focused

5. Account Agent

Manages profile updates
Handles billing
Updates preferences
Context: professional, secure

Average instruction length per agent: 400 words

Average tools per agent: 5-6 tools

Error rate: 3% (agents focused, fewer conflicting options)

The Architecture: How Agent Teams Actually Work

Here’s the pattern with OpenAI’s Agents SDK:

graph TD
    A[Customer initiates conversation] --> B[Concierge Agent]
    B --> C{Analyzes need}
    C -->|Product question| D[Product Specialist]
    C -->|Place order| E[Order Agent]
    C -->|Issue/return| F[Support Agent]
    C -->|Account change| G[Account Agent]
    D --> H{Need different specialist?}
    E --> H
    F --> H
    G --> H
    H -->|Yes| B
    H -->|No| I[Complete and return to Concierge]
    I --> J{More needs?}
    J -->|Yes| C
    J -->|No| K[End conversation]

The flow:

Concierge gathers information and routes
Specialist handles specific task with focus
Returns to concierge if customer has additional needs
Concierge routes to next specialist if needed
Clean handoffs preserve context throughout

Building Focused Agents With The Agents SDK

Here’s what the code actually looks like:

Focused Session Definitions (Router + Specialists)

const conciergeSession = {
  type: "realtime",
  model: "gpt-realtime",
  modalities: ["audio", "text"],
  tools: [
    {
      type: "function",
      name: "route_to_product",
      description: "Route to product specialist with context.",
      parameters: {
        type: "object",
        properties: {
          query: { type: "string", description: "Product-related customer request" },
          context: { type: "string", description: "Conversation summary" }
        },
        required: ["query", "context"]
      }
    },
    {
      type: "function",
      name: "route_to_orders",
      description: "Route to order specialist with context.",
      parameters: {
        type: "object",
        properties: {
          intent: { type: "string", description: "Order intent" },
          context: { type: "string", description: "Conversation summary" }
        },
        required: ["intent", "context"]
      }
    }
  ]
};

const productSession = {
  type: "realtime",
  model: "gpt-realtime",
  tools: [
    {
      type: "function",
      name: "search_products",
      description: "Search catalog for matching products.",
      parameters: {
        type: "object",
        properties: { query: { type: "string", description: "Search query" } },
        required: ["query"]
      }
    }
  ]
};

const orderSession = {
  type: "realtime",
  model: "gpt-realtime",
  tools: [
    {
      type: "function",
      name: "create_order",
      description: "Create a new order.",
      parameters: {
        type: "object",
        properties: {
          product_ids: { type: "array", items: { type: "string" }, description: "Products to order" },
          quantity: { type: "number", description: "Quantity" },
          shipping_address: { type: "string", description: "Destination" }
        },
        required: ["product_ids", "quantity", "shipping_address"]
      }
    }
  ]
};

const toolHandlers = {
  route_to_product: async (payload) => router.handoff("product", payload),
  route_to_orders: async (payload) => router.handoff("orders", payload),
  search_products: async (payload) => catalog.search(payload),
  create_order: async (payload) => orders.create(payload)
};

The Orchestration Layer

class AgentTeamOrchestrator:
    def __init__(self):
        self.agents = {
            'concierge': conciergeAgent,
            'product': productSpecialist,
            'orders': orderAgent,
            'support': supportAgent,
            'account': accountAgent
        }
        
        self.active_agent = 'concierge'
        self.conversation_context = []
    
    async def handle_conversation(self, customer_id):
        while True:
            # Current agent processes input
            response = await self.agents[self.active_agent].process(
                customer_input,
                context=self.conversation_context
            )
            
            # Log what happened
            self.conversation_context.append({
                'agent': self.active_agent,
                'action': response.action,
                'result': response.result
            })
            
            # Check if routing to different agent
            if response.route_to:
                target_agent = response.route_to
                handoff_context = self.prepare_handoff_context()
                
                # Switch agents
                await self.hand_off(target_agent, handoff_context)
            
            # Check if conversation complete
            if response.conversation_ended:
                break
    
    async def hand_off(self, target_agent, context):
        previous_agent = self.active_agent
        self.active_agent = target_agent
        
        # Narrate the transition
        await self.announce_handoff(previous_agent, target_agent)
        
        # New agent receives context
        await self.agents[target_agent].receive_context(context)
    
    def prepare_handoff_context(self):
        # Extract what's relevant for next agent
        return {
            'customer_info': self.get_customer_info(),
            'conversation_summary': self.summarize_conversation(),
            'completed_actions': self.get_completed_actions(),
            'current_intent': self.detect_current_intent()
        }

Why This Works: The Quality Improvement

1. Focused Latent Space

Each agent’s latent space is deep in one domain instead of shallow across many.

Product specialist really understands products. Order agent really understands order processing.

Quality goes up when agents aren’t trying to remember 30 different contexts.

2. Consistent Tone Per Role

Product specialist: knowledgeable and helpful
Order agent: efficient and accurate
Support agent: empathetic and solution-focused
Account agent: professional and secure

Each role has its own personality. No more tone-switching mid-conversation.

3. Reduced Tool Confusion

Each agent has 5-6 relevant tools, not 30 possible tools.

Less choice = fewer mistakes.

4. Easier Debugging

Something wrong with product recommendations? Debug the product specialist.

Order processing error? Look at the order agent.

The problem space is isolated. Fixes don’t break other agents.

5. Independent Improvement

Want to add new product features? Update product specialist only.

Want better support responses? Improve support agent without touching others.

Changes are surgical, not architectural.

The Results: Real Teams, Real Improvements

Teams switching from monolithic to role-split agents report:

Accuracy: 40% improvement
Focused agents make fewer errors. Tool calling accuracy went from 88% to 97%.

Development speed: 3x faster
Smaller, focused agents are easier to build, test, and improve.

Maintenance time: 60% reduction
Debugging one 400-word agent beats debugging one 2400-word monolith.

Customer satisfaction: 25% increase
Consistent tone and better accuracy make conversations feel more professional.

One engineering lead told us: “We had one massive agent that was a nightmare to maintain. We split it into five focused agents. Our error rate dropped by half, development got faster, and the customer experience improved. We should have done this from day one.”

Common Agent Team Patterns

Pattern 1: Router + Specialists

Concierge routes to specialists based on need. Specialists handle tasks, return to concierge.

Best for: Multi-domain products (e-commerce, SaaS platforms)

Pattern 2: Sequential Pipeline

Each agent handles one stage in a process:

Qualifier → Designer → Estimator → Approver

Best for: Structured workflows (project intake, loan processing)

Pattern 3: Collaborative Swarm

Multiple specialists consult simultaneously:

Researcher gathers info → Analysts provide insights → Strategist synthesizes

Best for: Complex decision-making (investment advice, medical diagnosis)

Pattern 4: Hierarchical Escalation

L1 agent → L2 specialist → L3 expert

Best for: Support systems with tiered expertise

Implementation Best Practices

1. Design Clear Boundaries

Each agent should have:

One primary responsibility
Clear scope (what they do, what they don’t)
Defined handoff triggers

Bad boundary: “Customer agent handles account and support”

Good boundary: “Account agent handles profile, billing, settings. Support agent handles issues, returns, refunds.”

2. Optimize Context Passing

Don’t dump full transcripts on every handoff. Pass:

Relevant entities (customer, products, orders)
Current intent
Completed actions
Tone indicators (sentiment, urgency)

const handoffContext = {
  customer_id: "12345",
  completed_actions: ["searched_products", "viewed_details"],
  current_intent: "ready_to_buy",
  products_interested: ["SKU-123", "SKU-456"],
  sentiment: "positive"
};

3. Maintain Conversation Flow

Handoffs should feel invisible to customers:

Bad: “I’m going to transfer you to another agent now. Please hold.”
[30 second wait]
“Hi, how can I help you?”

Good: “Let me connect you with our order specialist who can complete that for you.”
[2 second handoff]
“Hi! I can see you’re interested in the Pro plan and want to place an order. Let’s get that done.”

4. Monitor Agent Performance

Track metrics per agent:

Task completion rate
Average handling time
Tool calling accuracy
Handoff frequency
Customer satisfaction

If one agent is struggling, improve it independently.

5. Graceful Degradation

What if a specialist isn’t available?

Option A: Fallback to concierge with limited capability
“Our product specialist is helping others, but I can answer basic questions.”

Option B: Queue with callback
“Our order specialist is backed up. Can I have them call you back in 5 minutes?”

Option C: Cross-trained backup
Support agent can handle basic orders if order agent is overloaded.

When to Split, When to Combine

Split when:

Agent handles 3+ distinct domains
Tool count exceeds 15-20
Tone needs vary by task
Error rate increases with new features
Instructions exceed 1000 words
Maintenance becomes difficult

Keep monolithic when:

Single, narrow domain
Tools are all closely related
Tone is consistent across all tasks
<10 tools total
Instructions are <500 words
Rarely needs updates

The Future: Even Smarter Agent Teams

What’s coming:

Dynamic team assembly: System creates specialists on-demand based on need

Shared learning: Agents learn from each other’s interactions

Parallel consultation: Multiple specialists provide input simultaneously

Agent skill graphs: Visual mapping of which agents are best at what

But the core pattern—narrow roles, clear boundaries, smooth handoffs—works today.

Ready to Fix Your Monolithic Agent?

If your agent is trying to do everything and producing inconsistent results, split it into focused specialists.

The technology exists. OpenAI’s Agents SDK handles orchestration. Context passing is built-in. Handoffs are seamless.

The question is: how much longer are you willing to debug a do-everything agent?

Want to learn more? Check out OpenAI’s Assistants API documentation for multi-agent architecture patterns and Function Calling guide for building agent teams that scale with quality.