Replay Voice Agent Conversations Like Code

ZH+
Sdk development , Debugging
December 28, 2025

Table of Contents

Debugging voice agents is fundamentally different from debugging text agents.

With text, you read transcripts. With voice, you need to hear what happened.

Did the agent mishear the user? Did prosody convey frustration the transcript didn’t capture? Did speech rate cause confusion?

The OpenAI Agents SDK provides built-in tracing with audio playback. You can replay entire conversations with synchronized audio, transcripts, tool calls, and agent reasoning.

Why Audio Matters For Debugging

Text transcripts miss critical information:

Transcript says: “I want to cancel my subscription”

Audio reveals:

Tone: Angry vs matter-of-fact
Pace: Rushed vs deliberate
Emphasis: “I WANT to cancel” vs “I want to CANCEL”
Background: Noisy environment vs quiet office

The agent hears all of this. You need to hear it too.

How Audio Tracing Works

The SDK captures everything automatically:

graph TD
    A[User Speaks] --> B[SDK Captures Audio]
    B --> C[Transcription]
    C --> D[Agent Reasoning]
    D --> E[Tool Calls]
    E --> F[Agent Response]
    F --> G[SDK Logs Everything]
    
    G --> H[Audio Files]
    G --> I[Transcripts]
    G --> J[Reasoning Logs]
    G --> K[Tool Results]
    
    style H fill:#ffe6e6
    style I fill:#e6f3ff
    style J fill:#e6ffe6
    style K fill:#fff3e6

Every turn is logged with:

User audio (WAV file)
Agent audio (WAV file)
Text transcripts
Agent internal reasoning
Tool calls and results
Timing information

Trace Viewer Interface

The SDK includes a trace viewer:

import { TraceViewer } from '@openai/agents-sdk';

const viewer = new TraceViewer();

// Load a specific conversation
await viewer.load('conv_abc123');

// Play turn 5
viewer.playTurn(5);

// Shows:
// - Audio player with waveform
// - Synchronized transcript
// - Tool calls timeline
// - Agent reasoning

The viewer plays audio while highlighting the corresponding transcript, tool calls, and reasoning in real-time.

Real-World Example: Debugging Misunderstanding

Reported issue: “Agent booked wrong time for appointment”

Step 1: Load trace

const trace = await viewer.load('conv_xyz789');
console.log(`${trace.turns.length} turns in conversation`);

Step 2: Find the problematic turn

// Turn 8 was the booking
viewer.playTurn(8);

Audio plays:

User: "Book it for three thirty"
Agent: "Booking for 3:30 PM. Is that correct?"
User: "Yes"

Transcript says: “three thirty”

Step 3: Listen carefully

On replay, you hear: User said “free thirty” (3:30) but there’s background noise that makes “free” sound like “three”.

The bug: ASR heard “three thirty” but user said “free thirty”. Both transcribe the same way, but user meant 3:30 and agent booked 3:30. Wait, that’s correct…

Step 4: Listen to earlier turns

viewer.playTurn(6);

Audio reveals:

User: "I need an appointment tomorrow"
Agent: "What time works for you?"
User: "Free thirty" [background: "Thursday at 2 PM"]

Aha!: User said “free thirty” but background voice said “Thursday at 2 PM”. The agent correctly heard user say “3:30” but background noise might have caused confusion.

Step 5: Check agent reasoning

{
  "turn": 8,
  "agent_reasoning": "User confirmed 3:30 PM appointment",
  "confidence": 0.95,
  "ambiguity_detected": false
}

The agent was confident. But the user was in a noisy environment. The fix: detect background speech and ask for confirmation when confidence is < 0.98.

Synchronized Playback

The trace viewer synchronizes audio with events:

[0.0s] User starts speaking
  [Waveform shows audio]

[2.3s] User finishes speaking
  Transcript: "Book it for three thirty"

[2.5s] Agent starts reasoning
  Internal: "User wants to book 3:30 PM appointment"

[3.0s] Agent calls tool
  Tool: book_appointment(time="15:30", date="2025-03-26")

[3.2s] Tool returns success

[3.4s] Agent starts speaking
  [Waveform shows audio]

[5.1s] Agent finishes speaking
  Transcript: "Booking for 3:30 PM. Is that correct?"

Everything is timestamped and synchronized.

Trace Data Structure

Each trace contains:

{
  "conversation_id": "conv_abc123",
  "started_at": "2025-03-25T10:00:00Z",
  "ended_at": "2025-03-25T10:05:00Z",
  "turns": [
    {
      "turn": 1,
      "timestamp": "2025-03-25T10:00:00Z",
      "user_audio_url": "https://storage/conv_abc123_turn1_user.wav",
      "user_transcript": "I need help with my account",
      "user_audio_duration_ms": 2300,
      "agent_reasoning": "User needs account assistance",
      "agent_confidence": 0.92,
      "tool_calls": [
        {
          "name": "lookup_account",
          "args": { "user_id": "user_123" },
          "result": { /* account data */ },
          "duration_ms": 150
        }
      ],
      "agent_audio_url": "https://storage/conv_abc123_turn1_agent.wav",
      "agent_transcript": "I found your account. How can I help?",
      "agent_audio_duration_ms": 1800,
      "turn_total_ms": 4250
    }
  ],
  "metadata": {
    "agent_name": "CustomerSupport",
    "user_id": "user_123",
    "environment": "production"
  }
}

Finding Failures Quickly

The viewer has filters:

// Find turns where agent confidence was low
const lowConfidence = trace.turns.filter(t => t.agent_confidence < 0.85);

// Find turns with tool failures
const toolErrors = trace.turns.filter(t => 
  t.tool_calls.some(tc => tc.result.error)
);

// Find turns where user interrupted
const interruptions = trace.turns.filter(t => t.user_interrupted);

// Play first problematic turn
viewer.playTurn(lowConfidence[0].turn);

Audio Analysis Features

The SDK provides audio analysis:

// Detect background noise
const noiseLevel = await viewer.analyzeAudio(turn, 'noise_level');
// Returns: { avg_db: -40, max_db: -20, noisy: true }

// Detect speech rate
const speechRate = await viewer.analyzeAudio(turn, 'speech_rate');
// Returns: { words_per_minute: 180, too_fast: true }

// Detect emotion
const emotion = await viewer.analyzeAudio(turn, 'emotion');
// Returns: { primary: 'frustrated', confidence: 0.87 }

These metrics help identify why conversations failed.

Comparing Successful vs Failed Conversations

Load multiple traces and compare:

const successTrace = await viewer.load('conv_success_123');
const failedTrace = await viewer.load('conv_failed_456');

// Compare metrics
console.log('Success avg confidence:', 
  successTrace.avgConfidence); // 0.94

console.log('Failed avg confidence:', 
  failedTrace.avgConfidence); // 0.76

// Play side by side
viewer.compare(successTrace.turns[5], failedTrace.turns[5]);

Export Traces For Analysis

Export to standard formats:

// Export as JSON
await viewer.export('conv_abc123', { format: 'json' });

// Export as CSV (for Excel/data analysis)
await viewer.export('conv_abc123', { format: 'csv' });

// Export audio files
await viewer.export('conv_abc123', { 
  format: 'zip',
  include: ['audio', 'transcripts', 'tool_calls']
});

Integration With Development Tools

The trace viewer integrates with common dev tools:

VS Code Extension:

npm install -g @openai/agents-trace-viewer
code --install-extension openai-agents-trace

Browser DevTools:

// Chrome DevTools shows trace panel
// Click on any turn to play audio + see data

CI/CD Pipeline:

# Run tests, capture traces
- name: Test voice agent
  run: npm test
- name: Upload traces
  uses: openai/upload-traces@v1
  with:
    pattern: 'traces/**/*.json'

Privacy & Compliance

Audio traces contain sensitive data. The SDK supports:

1. PII Redaction:

const trace = await viewer.load('conv_abc123', {
  redact: ['credit_card', 'ssn', 'phone', 'email']
});

2. Retention Policies:

// Auto-delete traces after 30 days
await viewer.setRetentionPolicy({
  max_age_days: 30,
  auto_delete: true
});

3. Access Controls:

// Only engineers can access traces
await viewer.setAccessControl({
  roles: ['engineer', 'support_lead'],
  require_justification: true
});

Performance Impact

Trace capture adds minimal overhead:

Operation	Latency Added
Audio recording	< 5ms
Transcript logging	< 10ms
Tool call logging	< 5ms
File upload	Async (0ms perceived)
Total	< 20ms per turn

For a 2-second voice turn, tracing adds ~1% overhead.

Storage Costs

Audio files are compressed:

1 minute of audio: ~500 KB (Opus codec)
10-turn conversation: ~5 MB
1,000 conversations/day: ~5 GB/day
Monthly storage (30 days): ~150 GB

At $0.02/GB/month, that’s $3/month for 1,000 daily conversations.

Best Practices

1. Always enable tracing in production

Don’t wait for bugs to appear:

const agent = new Agent({
  tracing: {
    enabled: true,
    captureAudio: true,
    retention_days: 30
  }
});

2. Add custom metadata

Tag traces for easier filtering:

agent.on('conversation_start', (context) => {
  context.trace.addMetadata({
    user_tier: 'premium',
    feature: 'account_management',
    environment: 'production'
  });
});

3. Set up alerts

Get notified when confidence drops:

agent.on('turn_complete', (turn) => {
  if (turn.agent_confidence < 0.80) {
    alert.send({
      message: `Low confidence in conversation ${turn.conversation_id}`,
      trace_url: viewer.getUrl(turn.conversation_id)
    });
  }
});

4. Review traces regularly

Don’t just debug failures. Review successful conversations too:

# Weekly trace review meeting
npm run analyze-traces -- --last-7-days --min-confidence 0.90

5. Share traces with team

// Generate shareable link (expires in 24h)
const shareUrl = await viewer.share('conv_abc123', {
  expires_in_hours: 24,
  require_login: true
});

Common Debugging Scenarios

Scenario 1: Agent mishears user

// Play turn, listen to audio
viewer.playTurn(5);

// Check confidence
console.log(turn.agent_confidence); // 0.65 (low)

// Check transcript vs audio
// Audio: "book free thirty"
// Transcript: "book three thirty"

// Fix: Improve ASR model or add confirmation

Scenario 2: Tool call fails

// Find failed tool calls
const failures = trace.turns.filter(t =>
  t.tool_calls.some(tc => tc.result.error)
);

// Play turn with failure
viewer.playTurn(failures[0].turn);

// Check tool error
console.log(failures[0].tool_calls[0].result.error);
// "Network timeout after 5000ms"

// Fix: Increase timeout or add retry logic

Scenario 3: Agent response is unnatural

// Play agent's response
viewer.playAgentAudio(turn);

// Check prosody
const analysis = await viewer.analyzeAudio(turn, 'prosody');
console.log(analysis);
// { monotone: true, speech_rate: 250 } // Too fast

// Fix: Adjust TTS settings for natural pacing

Conclusion

Debugging voice agents requires hearing what happened, not just reading transcripts.

The Agents SDK captures:

Full audio recordings (user + agent)
Synchronized transcripts
Agent reasoning
Tool calls and results
Timing information

The trace viewer lets you replay conversations with audio playback, making debugging as easy as replaying a video.

Result: Voice agent bugs that would take hours to debug now take minutes.

Implementation Guide:

Enable tracing in production (< 20ms overhead)
Use trace viewer to replay problematic conversations
Listen to audio while watching synchronized events
Analyze audio for noise, speech rate, emotion
Export traces for team review and compliance

The SDK handles audio capture, storage, and playback automatically.

Links:

Next: Explore how meta-prompts can generate conversation state machines to keep voice agents on track through complex workflows.