Replay Voice Agent Conversations Like Code
- ZH+
- Sdk development , Debugging
- December 28, 2025
Table of Contents
Debugging voice agents is fundamentally different from debugging text agents.
With text, you read transcripts. With voice, you need to hear what happened.
Did the agent mishear the user? Did prosody convey frustration the transcript didn’t capture? Did speech rate cause confusion?
The OpenAI Agents SDK provides built-in tracing with audio playback. You can replay entire conversations with synchronized audio, transcripts, tool calls, and agent reasoning.
Why Audio Matters For Debugging
Text transcripts miss critical information:
Transcript says: “I want to cancel my subscription”
Audio reveals:
- Tone: Angry vs matter-of-fact
- Pace: Rushed vs deliberate
- Emphasis: “I WANT to cancel” vs “I want to CANCEL”
- Background: Noisy environment vs quiet office
The agent hears all of this. You need to hear it too.
How Audio Tracing Works
The SDK captures everything automatically:
graph TD
A[User Speaks] --> B[SDK Captures Audio]
B --> C[Transcription]
C --> D[Agent Reasoning]
D --> E[Tool Calls]
E --> F[Agent Response]
F --> G[SDK Logs Everything]
G --> H[Audio Files]
G --> I[Transcripts]
G --> J[Reasoning Logs]
G --> K[Tool Results]
style H fill:#ffe6e6
style I fill:#e6f3ff
style J fill:#e6ffe6
style K fill:#fff3e6
Every turn is logged with:
- User audio (WAV file)
- Agent audio (WAV file)
- Text transcripts
- Agent internal reasoning
- Tool calls and results
- Timing information
Trace Viewer Interface
The SDK includes a trace viewer:
import { TraceViewer } from '@openai/agents-sdk';
const viewer = new TraceViewer();
// Load a specific conversation
await viewer.load('conv_abc123');
// Play turn 5
viewer.playTurn(5);
// Shows:
// - Audio player with waveform
// - Synchronized transcript
// - Tool calls timeline
// - Agent reasoning
The viewer plays audio while highlighting the corresponding transcript, tool calls, and reasoning in real-time.
Real-World Example: Debugging Misunderstanding
Reported issue: “Agent booked wrong time for appointment”
Step 1: Load trace
const trace = await viewer.load('conv_xyz789');
console.log(`${trace.turns.length} turns in conversation`);
Step 2: Find the problematic turn
// Turn 8 was the booking
viewer.playTurn(8);
Audio plays:
User: "Book it for three thirty"
Agent: "Booking for 3:30 PM. Is that correct?"
User: "Yes"
Transcript says: “three thirty”
Step 3: Listen carefully
On replay, you hear: User said “free thirty” (3:30) but there’s background noise that makes “free” sound like “three”.
The bug: ASR heard “three thirty” but user said “free thirty”. Both transcribe the same way, but user meant 3:30 and agent booked 3:30. Wait, that’s correct…
Step 4: Listen to earlier turns
viewer.playTurn(6);
Audio reveals:
User: "I need an appointment tomorrow"
Agent: "What time works for you?"
User: "Free thirty" [background: "Thursday at 2 PM"]
Aha!: User said “free thirty” but background voice said “Thursday at 2 PM”. The agent correctly heard user say “3:30” but background noise might have caused confusion.
Step 5: Check agent reasoning
{
"turn": 8,
"agent_reasoning": "User confirmed 3:30 PM appointment",
"confidence": 0.95,
"ambiguity_detected": false
}
The agent was confident. But the user was in a noisy environment. The fix: detect background speech and ask for confirmation when confidence is < 0.98.
Synchronized Playback
The trace viewer synchronizes audio with events:
[0.0s] User starts speaking
[Waveform shows audio]
[2.3s] User finishes speaking
Transcript: "Book it for three thirty"
[2.5s] Agent starts reasoning
Internal: "User wants to book 3:30 PM appointment"
[3.0s] Agent calls tool
Tool: book_appointment(time="15:30", date="2025-03-26")
[3.2s] Tool returns success
[3.4s] Agent starts speaking
[Waveform shows audio]
[5.1s] Agent finishes speaking
Transcript: "Booking for 3:30 PM. Is that correct?"
Everything is timestamped and synchronized.
Trace Data Structure
Each trace contains:
{
"conversation_id": "conv_abc123",
"started_at": "2025-03-25T10:00:00Z",
"ended_at": "2025-03-25T10:05:00Z",
"turns": [
{
"turn": 1,
"timestamp": "2025-03-25T10:00:00Z",
"user_audio_url": "https://storage/conv_abc123_turn1_user.wav",
"user_transcript": "I need help with my account",
"user_audio_duration_ms": 2300,
"agent_reasoning": "User needs account assistance",
"agent_confidence": 0.92,
"tool_calls": [
{
"name": "lookup_account",
"args": { "user_id": "user_123" },
"result": { /* account data */ },
"duration_ms": 150
}
],
"agent_audio_url": "https://storage/conv_abc123_turn1_agent.wav",
"agent_transcript": "I found your account. How can I help?",
"agent_audio_duration_ms": 1800,
"turn_total_ms": 4250
}
],
"metadata": {
"agent_name": "CustomerSupport",
"user_id": "user_123",
"environment": "production"
}
}
Finding Failures Quickly
The viewer has filters:
// Find turns where agent confidence was low
const lowConfidence = trace.turns.filter(t => t.agent_confidence < 0.85);
// Find turns with tool failures
const toolErrors = trace.turns.filter(t =>
t.tool_calls.some(tc => tc.result.error)
);
// Find turns where user interrupted
const interruptions = trace.turns.filter(t => t.user_interrupted);
// Play first problematic turn
viewer.playTurn(lowConfidence[0].turn);
Audio Analysis Features
The SDK provides audio analysis:
// Detect background noise
const noiseLevel = await viewer.analyzeAudio(turn, 'noise_level');
// Returns: { avg_db: -40, max_db: -20, noisy: true }
// Detect speech rate
const speechRate = await viewer.analyzeAudio(turn, 'speech_rate');
// Returns: { words_per_minute: 180, too_fast: true }
// Detect emotion
const emotion = await viewer.analyzeAudio(turn, 'emotion');
// Returns: { primary: 'frustrated', confidence: 0.87 }
These metrics help identify why conversations failed.
Comparing Successful vs Failed Conversations
Load multiple traces and compare:
const successTrace = await viewer.load('conv_success_123');
const failedTrace = await viewer.load('conv_failed_456');
// Compare metrics
console.log('Success avg confidence:',
successTrace.avgConfidence); // 0.94
console.log('Failed avg confidence:',
failedTrace.avgConfidence); // 0.76
// Play side by side
viewer.compare(successTrace.turns[5], failedTrace.turns[5]);
Export Traces For Analysis
Export to standard formats:
// Export as JSON
await viewer.export('conv_abc123', { format: 'json' });
// Export as CSV (for Excel/data analysis)
await viewer.export('conv_abc123', { format: 'csv' });
// Export audio files
await viewer.export('conv_abc123', {
format: 'zip',
include: ['audio', 'transcripts', 'tool_calls']
});
Integration With Development Tools
The trace viewer integrates with common dev tools:
VS Code Extension:
npm install -g @openai/agents-trace-viewer
code --install-extension openai-agents-trace
Browser DevTools:
// Chrome DevTools shows trace panel
// Click on any turn to play audio + see data
CI/CD Pipeline:
# Run tests, capture traces
- name: Test voice agent
run: npm test
- name: Upload traces
uses: openai/upload-traces@v1
with:
pattern: 'traces/**/*.json'
Privacy & Compliance
Audio traces contain sensitive data. The SDK supports:
1. PII Redaction:
const trace = await viewer.load('conv_abc123', {
redact: ['credit_card', 'ssn', 'phone', 'email']
});
2. Retention Policies:
// Auto-delete traces after 30 days
await viewer.setRetentionPolicy({
max_age_days: 30,
auto_delete: true
});
3. Access Controls:
// Only engineers can access traces
await viewer.setAccessControl({
roles: ['engineer', 'support_lead'],
require_justification: true
});
Performance Impact
Trace capture adds minimal overhead:
| Operation | Latency Added |
|---|---|
| Audio recording | < 5ms |
| Transcript logging | < 10ms |
| Tool call logging | < 5ms |
| File upload | Async (0ms perceived) |
| Total | < 20ms per turn |
For a 2-second voice turn, tracing adds ~1% overhead.
Storage Costs
Audio files are compressed:
- 1 minute of audio: ~500 KB (Opus codec)
- 10-turn conversation: ~5 MB
- 1,000 conversations/day: ~5 GB/day
- Monthly storage (30 days): ~150 GB
At $0.02/GB/month, that’s $3/month for 1,000 daily conversations.
Best Practices
1. Always enable tracing in production
Don’t wait for bugs to appear:
const agent = new Agent({
tracing: {
enabled: true,
captureAudio: true,
retention_days: 30
}
});
2. Add custom metadata
Tag traces for easier filtering:
agent.on('conversation_start', (context) => {
context.trace.addMetadata({
user_tier: 'premium',
feature: 'account_management',
environment: 'production'
});
});
3. Set up alerts
Get notified when confidence drops:
agent.on('turn_complete', (turn) => {
if (turn.agent_confidence < 0.80) {
alert.send({
message: `Low confidence in conversation ${turn.conversation_id}`,
trace_url: viewer.getUrl(turn.conversation_id)
});
}
});
4. Review traces regularly
Don’t just debug failures. Review successful conversations too:
# Weekly trace review meeting
npm run analyze-traces -- --last-7-days --min-confidence 0.90
5. Share traces with team
// Generate shareable link (expires in 24h)
const shareUrl = await viewer.share('conv_abc123', {
expires_in_hours: 24,
require_login: true
});
Common Debugging Scenarios
Scenario 1: Agent mishears user
// Play turn, listen to audio
viewer.playTurn(5);
// Check confidence
console.log(turn.agent_confidence); // 0.65 (low)
// Check transcript vs audio
// Audio: "book free thirty"
// Transcript: "book three thirty"
// Fix: Improve ASR model or add confirmation
Scenario 2: Tool call fails
// Find failed tool calls
const failures = trace.turns.filter(t =>
t.tool_calls.some(tc => tc.result.error)
);
// Play turn with failure
viewer.playTurn(failures[0].turn);
// Check tool error
console.log(failures[0].tool_calls[0].result.error);
// "Network timeout after 5000ms"
// Fix: Increase timeout or add retry logic
Scenario 3: Agent response is unnatural
// Play agent's response
viewer.playAgentAudio(turn);
// Check prosody
const analysis = await viewer.analyzeAudio(turn, 'prosody');
console.log(analysis);
// { monotone: true, speech_rate: 250 } // Too fast
// Fix: Adjust TTS settings for natural pacing
Conclusion
Debugging voice agents requires hearing what happened, not just reading transcripts.
The Agents SDK captures:
- Full audio recordings (user + agent)
- Synchronized transcripts
- Agent reasoning
- Tool calls and results
- Timing information
The trace viewer lets you replay conversations with audio playback, making debugging as easy as replaying a video.
Result: Voice agent bugs that would take hours to debug now take minutes.
Implementation Guide:
- Enable tracing in production (< 20ms overhead)
- Use trace viewer to replay problematic conversations
- Listen to audio while watching synchronized events
- Analyze audio for noise, speech rate, emotion
- Export traces for team review and compliance
The SDK handles audio capture, storage, and playback automatically.
Links:
Next: Explore how meta-prompts can generate conversation state machines to keep voice agents on track through complex workflows.