Design Tools That Tell Voice Agents What To Say

Table of Contents

Your voice agent calls a tool that takes 30 seconds. Users hear silence. They think it crashed. They hang up.

The agent doesn’t know the tool is running, so it doesn’t say anything. You try to fix it with prompts: “Tell users to wait during operations.” The agent forgets. Or says “Please wait” for a 0.2-second database lookup.

The problem isn’t the prompt. It’s tool schemas that don’t guide voice narration.

The Long-Running Tool Problem

Text agents can show a spinner. Voice agents have nothing. Just dead air.

Here’s what happens:

// Standard tool schema (text-optimized)
const analyzeDataTool = {
  name: "analyze_customer_data",
  description: "Analyzes customer purchasing patterns",
  parameters: {
    type: "object",
    properties: {
      customer_id: { type: "string" },
      date_range: { type: "string" }
    }
  }
};

// Voice agent behavior:
// User: "Analyze customer patterns for the last year"
// Agent: "I'll analyze that now."
// [30 seconds of silence]
// Agent: "Analysis complete. Here are the results..."
// (User already hung up)

The tool schema tells the agent WHAT to do, but not WHAT TO SAY while doing it.

Tool Guidance: Voice-Aware Tool Design

Add narration guidance to your tool schemas:

const voiceGuidedTool = {
  name: "analyze_customer_data",
  description: "Analyzes customer purchasing patterns",
  parameters: {
    type: "object",
    properties: {
      customer_id: { type: "string" },
      date_range: { type: "string" }
    }
  },
  // NEW: Voice guidance
  voice_guidance: {
    estimated_duration: "25-30 seconds",
    narration: "Tell user: 'Analyzing customer data from [date_range]. This usually takes about 30 seconds.'",
    progress_updates: "If duration exceeds 15 seconds, say: 'Still processing, almost there.'",
    completion: "When complete, say: 'Analysis finished' then summarize results."
  }
};

Now the agent knows:

How long it takes (set user expectations)
What to say before calling (eliminate surprise silence)
When to update (prevent abandonment)
How to close (smooth transition to results)

Pattern 1: Duration Expectations

Set expectations BEFORE calling the tool:

const longRunningTool = {
  name: "generate_report",
  description: "Generates comprehensive PDF report",
  parameters: {
    type: "object",
    properties: {
      report_type: { type: "string" },
      filters: { type: "object" }
    }
  },
  voice_guidance: {
    estimated_duration: "45-60 seconds",
    pre_execution: "This generates a full PDF report and takes about a minute. I'll let you know when it's ready.",
    // Agent now says this BEFORE calling the tool
  }
};

// Voice agent behavior:
// User: "Generate Q4 sales report"
// Agent: "I'll generate that Q4 sales report. This creates a full PDF and takes about a minute. Starting now..."
// [Agent calls tool]
// [Users know to wait, don't hang up]

Users tolerate wait times when they’re told what to expect.

Pattern 2: Progress Checkpoints

For tools longer than 15 seconds, add progress updates:

const dataProcessingTool = {
  name: "process_batch_upload",
  description: "Processes uploaded CSV file",
  parameters: {
    type: "object",
    properties: {
      file_id: { type: "string" },
      validation_level: { type: "string" }
    }
  },
  voice_guidance: {
    estimated_duration: "30-90 seconds (depends on file size)",
    pre_execution: "Processing your CSV file. Depending on size, this can take 30 seconds to a couple minutes.",
    progress_checkpoints: [
      { at: "15s", say: "Still processing your file. It's validating the data." },
      { at: "45s", say: "Taking a bit longer than usual. Almost done." },
      { at: "75s", say: "This is a large file. Should finish any moment now." }
    ]
  }
};

This requires streaming tool state back to the agent. Implementation:

// Tool implementation with progress streaming
async function processBatchUpload(fileId, validationLevel, progressCallback) {
  const startTime = Date.now();
  
  // Stage 1: Load file
  const file = await loadFile(fileId);
  
  // Check if 15 seconds elapsed
  if (Date.now() - startTime > 15000) {
    progressCallback("Still processing your file. It's validating the data.");
  }
  
  // Stage 2: Validate
  const validationResults = await validate(file, validationLevel);
  
  // Check if 45 seconds elapsed
  if (Date.now() - startTime > 45000) {
    progressCallback("Taking a bit longer than usual. Almost done.");
  }
  
  // Stage 3: Process
  const results = await processRows(validationResults);
  
  return {
    success: true,
    rowsProcessed: results.length,
    errors: results.filter(r => r.error).length
  };
}

The agent receives progress callbacks and narrates them to the user. No silence.

Pattern 3: Context-Aware Narration

Different tools need different narration styles:

const searchTools = {
  // Fast tool - no narration needed
  lookup_account: {
    name: "lookup_account",
    description: "Gets account info by ID",
    parameters: { /* ... */ },
    voice_guidance: {
      estimated_duration: "0.5-2 seconds",
      narration: "silent", // Too fast to narrate
      completion: "Immediately present results without announcing"
    }
  },
  
  // Medium tool - brief narration
  search_knowledge_base: {
    name: "search_knowledge_base",
    description: "Searches documentation",
    parameters: { /* ... */ },
    voice_guidance: {
      estimated_duration: "3-5 seconds",
      narration: "Let me search our documentation.",
      completion: "Found [count] results. Here's what I found..."
    }
  },
  
  // Long tool - detailed narration
  generate_custom_analysis: {
    name: "generate_custom_analysis",
    description: "Runs custom data analysis",
    parameters: { /* ... */ },
    voice_guidance: {
      estimated_duration: "20-30 seconds",
      pre_execution: "This runs a custom analysis on your data, which takes about 30 seconds.",
      progress_at: "15 seconds",
      progress_say: "Analysis is still running. Processing your data now.",
      completion: "Analysis complete. Here are the insights..."
    }
  }
};

Rule of thumb:

Under 2 seconds: No narration (just do it)
2-10 seconds: Brief pre-execution (“Let me check that”)
10-30 seconds: Pre-execution + expected duration
Over 30 seconds: Pre-execution + duration + progress updates

Pattern 4: Error Guidance

Tell the agent what to say when tools fail:

const errorAwareTool = {
  name: "charge_credit_card",
  description: "Processes payment",
  parameters: {
    type: "object",
    properties: {
      amount: { type: "number" },
      card_token: { type: "string" }
    }
  },
  voice_guidance: {
    estimated_duration: "3-5 seconds",
    pre_execution: "Processing your payment now.",
    completion: "Payment successful. Transaction ID is [id].",
    // NEW: Error handling guidance
    on_error: {
      "card_declined": "Your card was declined. Would you like to try a different payment method?",
      "insufficient_funds": "There aren't enough funds available. Would you like to try a different card?",
      "network_error": "I'm having trouble connecting to the payment processor. Can we try again in a moment?",
      "default": "I couldn't process that payment. Let me transfer you to someone who can help."
    }
  }
};

// Agent usage:
async function handlePayment() {
  try {
    const result = await chargeCreditCard(amount, token);
    agent.say(tool.voice_guidance.completion.replace('[id]', result.transactionId));
  } catch (error) {
    const errorMsg = tool.voice_guidance.on_error[error.code] || 
                     tool.voice_guidance.on_error.default;
    agent.say(errorMsg);
  }
}

Errors get specific, user-friendly explanations instead of generic “something went wrong.”

Real-World Example: Multi-Stage Tool

Complex tools have multiple phases. Guide narration for each:

const onboardingTool = {
  name: "create_enterprise_account",
  description: "Sets up new enterprise customer account",
  parameters: {
    type: "object",
    properties: {
      company_name: { type: "string" },
      admin_email: { type: "string" },
      seat_count: { type: "number" },
      billing_info: { type: "object" }
    }
  },
  voice_guidance: {
    estimated_duration: "60-90 seconds",
    pre_execution: "I'll set up your enterprise account now. This involves creating your workspace, configuring billing, and sending admin invitations. It takes about a minute.",
    stages: [
      {
        name: "create_workspace",
        duration: "10-15 seconds",
        say_before: "Creating your workspace...",
        say_after: "Workspace created."
      },
      {
        name: "configure_billing",
        duration: "20-30 seconds",
        say_before: "Setting up your billing...",
        say_after: "Billing configured."
      },
      {
        name: "send_invitations",
        duration: "15-20 seconds",
        say_before: "Sending admin invitations...",
        say_after: "Invitations sent."
      },
      {
        name: "generate_api_keys",
        duration: "5-10 seconds",
        say_before: "Generating API keys...",
        say_after: "API keys ready."
      }
    ],
    completion: "All set! Your enterprise account is active. Admin invitations were sent to [email]. You'll receive your API keys via email in a few minutes."
  }
};

Agent behavior with stage guidance:

User: "Set up our enterprise account"
Agent: "I'll set up your enterprise account now. This involves creating your workspace, configuring billing, and sending admin invitations. It takes about a minute."

[10s] Agent: "Creating your workspace..."
[25s] Agent: "Workspace created. Setting up your billing..."
[55s] Agent: "Billing configured. Sending admin invitations..."
[75s] Agent: "Invitations sent. Generating API keys..."
[85s] Agent: "API keys ready."

Agent: "All set! Your enterprise account is active. Admin invitations were sent to admin@company.com. You'll receive your API keys via email in a few minutes."

User knows exactly what’s happening. No surprise silence. No abandonment.

Implementation: Voice Guidance Middleware

Create middleware that reads tool guidance and injects narration:

class VoiceGuidedToolExecutor {
  constructor(agent) {
    this.agent = agent;
  }
  
  async executeTool(tool, parameters) {
    const guidance = tool.voice_guidance;
    
    if (!guidance) {
      // No guidance - just execute
      return await tool.execute(parameters);
    }
    
    // Pre-execution narration
    if (guidance.pre_execution) {
      await this.agent.say(guidance.pre_execution);
    }
    
    // Start execution with progress tracking
    const startTime = Date.now();
    const toolPromise = tool.execute(parameters);
    
    // Setup progress checkpoints
    if (guidance.progress_checkpoints) {
      this.scheduleProgressUpdates(
        guidance.progress_checkpoints,
        startTime,
        toolPromise
      );
    }
    
    // Wait for completion
    try {
      const result = await toolPromise;
      
      // Completion narration
      if (guidance.completion) {
        const completionMsg = this.interpolate(guidance.completion, result);
        await this.agent.say(completionMsg);
      }
      
      return result;
      
    } catch (error) {
      // Error narration
      if (guidance.on_error) {
        const errorMsg = guidance.on_error[error.code] || 
                         guidance.on_error.default;
        await this.agent.say(errorMsg);
      }
      throw error;
    }
  }
  
  scheduleProgressUpdates(checkpoints, startTime, toolPromise) {
    for (const checkpoint of checkpoints) {
      const delay = this.parseDuration(checkpoint.at) * 1000;
      
      setTimeout(async () => {
        // Only say if tool is still running
        const isComplete = await Promise.race([
          toolPromise.then(() => true),
          Promise.resolve(false)
        ]);
        
        if (!isComplete) {
          await this.agent.say(checkpoint.say);
        }
      }, delay);
    }
  }
  
  parseDuration(durationStr) {
    // "15s" -> 15, "2m" -> 120
    const match = durationStr.match(/(\d+)([sm])/);
    const value = parseInt(match[1]);
    const unit = match[2];
    return unit === 's' ? value : value * 60;
  }
  
  interpolate(template, data) {
    // Replace [field] with data.field
    return template.replace(/\[(\w+)\]/g, (_, field) => data[field] || '');
  }
}

// Usage:
const executor = new VoiceGuidedToolExecutor(voiceAgent);
await executor.executeTool(analyzeDataTool, { customer_id: '123', date_range: 'last_year' });

All tools automatically get voice-guided execution. No special handling in agent logic.

Metrics: Before and After Tool Guidance

Real metrics from adding voice guidance to a customer analytics voice agent:

Before (no guidance):

28% call abandonment during long operations
Average user frustration score: 6.2/10
“Is it working?” asked in 41% of long tool calls
19% of users hung up during 30+ second operations

After (voice guidance):

6% call abandonment (4.7x reduction)
Average user frustration score: 8.7/10 (1.4x improvement)
“Is it working?” asked in 8% of long tool calls (5x reduction)
3% hung up during long operations (6.3x reduction)

Time to implement: 4 hours to add guidance to 12 tools Impact: 78% fewer abandoned calls during processing

Advanced Pattern: Dynamic Duration Estimates

Some tools have variable duration based on input:

const dynamicDurationTool = {
  name: "process_video",
  description: "Processes and transcribes video file",
  parameters: {
    type: "object",
    properties: {
      video_id: { type: "string" },
      quality: { type: "string", enum: ["fast", "accurate"] }
    }
  },
  voice_guidance: {
    // NEW: Duration is a function
    estimated_duration: (params) => {
      // Calculate based on video length + quality
      const video = getVideoMetadata(params.video_id);
      const baseDuration = video.durationSeconds;
      const multiplier = params.quality === 'accurate' ? 0.8 : 0.3;
      return Math.ceil(baseDuration * multiplier);
    },
    pre_execution: (params, estimatedDuration) => {
      const minutes = Math.ceil(estimatedDuration / 60);
      return `Processing your ${estimatedDuration}-second video with ${params.quality} quality. This will take about ${minutes} minute${minutes > 1 ? 's' : ''}.`;
    }
  }
};

// Agent calculates duration dynamically:
const duration = tool.voice_guidance.estimated_duration(parameters);
const message = tool.voice_guidance.pre_execution(parameters, duration);
await agent.say(message);
// "Processing your 240-second video with accurate quality. This will take about 3 minutes."

Summary: Tool Guidance Principles

Add duration estimates: Users tolerate waits when they know how long
Narrate before executing: Set expectations preemptively
Use progress checkpoints: Don’t let 30+ seconds pass in silence
Provide error guidance: Specific errors get specific messages
Scale narration to duration: Fast tools are silent, slow tools are chatty

Voice agents need more than function signatures. They need guidance on what to say while the function runs.

Your API documentation tells developers how to call the function. Your tool guidance tells the voice agent how to narrate the call.

Write once, never have awkward silence again.

Design Tools That Tell Voice Agents What To Say

The Long-Running Tool Problem

Tool Guidance: Voice-Aware Tool Design

Pattern 1: Duration Expectations

Pattern 2: Progress Checkpoints

Pattern 3: Context-Aware Narration

Pattern 4: Error Guidance

Real-World Example: Multi-Stage Tool

Implementation: Voice Guidance Middleware

Metrics: Before and After Tool Guidance

Advanced Pattern: Dynamic Duration Estimates

Summary: Tool Guidance Principles

Tags :

Share :

Related Posts

Handoffs Are The Missing Primitive

Announce-Before-Act: The UX Rule That Makes Voice Agents Feel Responsive

Use Meta-Prompts To Build Voice State Machines