Announce-Before-Act: The UX Rule That Makes Voice Agents Feel Responsive

ZH+
Ux design
August 31, 2025

Table of Contents

Picture this: You ask your voice agent to update a document. The agent goes silent. Three seconds pass. Five seconds. Still nothing.

Did it hear you? Is it working? Did it crash? Should you repeat yourself?

That anxiety? It’s a UX killer. And it’s completely avoidable.

The fix is stupidly simple: make your voice agent narrate what it’s doing.

“Got it—updating that section now.”

That’s it. Anxiety gone. Trust restored. Users stay engaged instead of bailing.

Let me show you why this pattern matters and how to implement it with OpenAI’s Agent SDK.

The Silent Wait Problem

When voice agents call tools (APIs, databases, services), there’s often a delay:

Database queries take time
API calls have latency
File operations aren’t instant
Complex calculations need processing

During that delay, users have no idea what’s happening.

The Thought Spiral

0.5 seconds: “Did it hear me?”
1.5 seconds: “Is it working?”
2.5 seconds: “Should I say it again?”
3.5 seconds: “Did something break?”
4.0 seconds: User repeats themselves or gives up

This isn’t a technical failure. It’s a feedback failure. The system is working fine—users just don’t know that.

Real-World Examples of Silent Fails

E-commerce voice agent:

User: “Add this to my cart”
[2 seconds of silence while calling inventory API]
User: “Hello? Add this to my cart!”
[Agent processes both requests, adds item twice]
User: “WHY DID IT ADD TWO?”

Workspace management agent:

User: “Create a project for Q2 planning”
[3 seconds while provisioning workspace]
User: already navigated away, assuming it failed
Agent: “I’ve created your Q2 planning project”
User: doesn’t hear it, now confused about state

Support ticket agent:

User: “Update the priority to high”
[1.5 seconds while updating database]
User: “Did that work?”
[Agent still processing first request]
Chaos ensues.

The pattern: silence creates uncertainty creates abandonment.

The Fix: Narrate Before Acting

The solution is built into how humans work with other humans:

“Let me check that for you…”
“One moment while I pull up your account…”
“Updating that now…”

We don’t work in silence. We narrate. Voice agents should too.

The Pattern

graph LR
    A[User makes request] --> B[Agent acknowledges]
    B --> C[Agent narrates action: 'Updating now...']
    C --> D[Agent calls tool]
    D --> E[Tool executes, delay...]
    E --> F[Tool returns result]
    F --> G[Agent confirms: 'Done! I've updated...']
    G --> A

The narration happens before and during the tool call, not after.

Implementing With OpenAI’s Agent SDK

Here’s how you actually build this:

Basic Pattern

const workspaceAgent = {
  model: "gpt-realtime",
  tools: {
    updateSection: {
      description: `Updates a section in the workspace. 
      IMPORTANT: Before calling this tool, say something like 
      "One moment while I update that" so the user knows you're working on it.`,
      
      parameters: {
        section_id: "string",
        content: "string"
      },
      
      handler: async (params) => {
        // This is where the actual work happens
        // User already heard "updating now" before this runs
        return await api.updateSection(params);
      }
    }
  },
  
  instructions: `You are a workspace assistant. When users ask you to take 
  actions, ALWAYS announce what you're about to do before doing it. Say things 
  like "Let me update that for you" or "Creating that now" BEFORE you call 
  tools. After the tool completes, confirm what you did.`
};

The key: instructions tell the agent to narrate. The tool description reinforces it.

Advanced: Context-Aware Narration

Different actions need different narration:

Quick actions (<1 second): “Updating that for you…”

Medium actions (1-3 seconds): “Got it—one moment while I update that section.”

Slow actions (>3 seconds): “Sure, let me calculate that. This might take a moment…”

You can even estimate timing and adjust narration:

async function executeWithNarration(toolName, params) {
  const estimatedTime = getEstimatedTime(toolName);
  
  if (estimatedTime < 1000) {
    speak("Doing that now...");
  } else if (estimatedTime < 3000) {
    speak("One moment while I work on that...");
  } else {
    speak("Sure, this might take a few seconds. Working on it...");
  }
  
  return await callTool(toolName, params);
}

The longer the wait, the more reassurance you provide.

Why This Works: The Psychology

Narration solves three UX problems:

1. Transparency

Users know:

They were heard
The system is working
Approximately how long it’ll take

This turns uncertainty into expectation.

2. Confidence

No more “did it hear me?” anxiety. The agent acknowledged and is actively working.

3. Perceived Speed

Narration makes waits feel shorter.

Studies show: a 3-second wait with narration feels faster than a 2-second silent wait. Because users aren’t spending that time wondering if something broke.

Real Numbers: Impact on User Experience

Teams using narrated actions report:

User confusion rate: 70% reduction
Support tickets about “is it working?” dropped dramatically.

Completion rate: 40% higher
Users stopped abandoning tasks mid-action.

Trust scores: 35% improvement
“It felt responsive” was the common feedback.

One product manager told us: “We added one line to our agent instructions: ‘Announce what you’re doing before doing it.’ Our NPS went up 15 points. It was the easiest UX win we’ve ever shipped.”

Common Narration Patterns

Here are templates that work across different actions:

Creating/Building

“Creating that for you now…”
“Setting that up…”
“Building your workspace…”

Updating/Modifying

“Updating that section…”
“Making those changes…”
“Adjusting that for you…”

Searching/Finding

“Let me look that up…”
“Searching for that…”
“Checking on that…”

Calculating/Processing

“Let me run those numbers…”
“Calculating that…”
“Processing that for you…”

Confirming Completion

“Done! I’ve updated…”
“All set. Your workspace is ready.”
“Got it. I’ve added…”

The pattern: Action verb + acknowledgment before, confirmation + result after.

Mistakes to Avoid

Mistake 1: Only Narrating After

Wrong:

*[3 seconds of silence]*
"I've updated your document."

Right:

"Updating that now..."
*[3 seconds]*
"Done! Updated your document."

Mistake 2: Vague Narration

Wrong: “Working on it…”
Right: “Updating the pricing section now…”

Be specific. Users want to know what you’re doing, not just that you’re busy.

Mistake 3: Over-Narrating

Don’t announce every tiny step:

Wrong: “Connecting to database… querying… fetching results… parsing… formatting…”

Right: “Looking that up for you…”

Narrate the user-visible action, not implementation details.

Mistake 4: No Confirmation

Wrong:

"Updating that now..."
*[silence, task completes, agent waits for next input]*

Right:

"Updating that now..."
*[task completes]*
"All set! I've updated the pricing section."

Close the loop. Confirm completion.

Advanced: Progress Updates for Long Actions

For operations that take >5 seconds, add progress narration:

async function longRunningTask(params) {
  speak("Starting your export. This'll take about 30 seconds...");
  
  await delay(10000);
  speak("About halfway there...");
  
  await delay(10000);
  speak("Almost done...");
  
  await delay(10000);
  speak("Complete! Your export is ready.");
}

Users tolerate longer waits if they’re kept informed.

Handling Failures Gracefully

Narration is even more important when things go wrong:

Successful Pattern

“Updating that now…”
[success]
“Done! Updated the pricing section.”

Failure Pattern

“Updating that now…”
[error]
“Hmm, I ran into an issue updating that section. The database seems busy. Want me to try again?”

Even failures feel better with narration because users understand what happened.

Tools That Benefit Most

This pattern is critical for:

Slow tools: Database queries, API calls, file operations
User-facing tools: Anything that changes visible state
High-stakes tools: Financial transactions, data deletion
Batch operations: Multiple items being processed

Any tool where the user wonders “is it working?” needs narration.

Building This Into Your Agent

Here’s the complete implementation pattern:

const agentConfig = {
  model: "gpt-realtime",
  
  instructions: `You are a helpful assistant. Follow these rules:

  1. BEFORE calling any tool, announce what you're about to do:
     - "Let me update that..."
     - "Creating that for you..."
     - "Looking that up..."
  
  2. AFTER the tool completes, confirm what you did:
     - "Done! I've updated..."
     - "All set. I've created..."
     - "Found it. Here's..."
  
  3. If a tool takes >3 seconds, add reassurance:
     - "This might take a moment..."
     - "Working on that..."
  
  4. Never go silent for >2 seconds without saying something.`,
  
  tools: {
    // Your tool definitions here
    // Each description should remind the agent to announce first
  }
};

The instructions do most of the work. The agent learns the pattern quickly.

Testing Your Narration

How to know if you’re doing it right:

Good Signs

Users rarely ask “did you hear me?”
Completion rates are high
Support tickets about “not working” are rare
Users comment on responsiveness

Bad Signs

Users repeat themselves frequently
High abandonment during tool calls
“It feels slow” feedback (even if it’s not)
Confusion about whether actions completed

Test with real users. Ask: “Did you ever wonder if it was working?”

The Future: Even Better Feedback

OpenAI’s Agent SDK is evolving to make this easier:

Automatic narration based on tool timing
Progress streaming from tools to agent
Visual indicators synced with voice narration
Multi-step action narration

But you don’t need to wait. The pattern works now.

Ready for Responsive Voice?

If you want this for any tool-using agent, we’ll ship narrated actions as a default behavior.

The implementation is simple: add instructions emphasizing narration. The impact is massive: users trust agents that communicate.

Stop working in silence. Start announcing actions.

Want to learn more? Check out OpenAI’s Realtime API documentation for tool-calling patterns and Voice guide for building responsive conversational experiences.