Trace Voice Like You Trace Code: Debugging Voice Agents in Real-Time
- ZH+
- Sdk development
- September 16, 2025
Table of Contents
Text agents break. You read the transcript. You see where it went wrong. You fix it.
Voice agents break. You… what exactly?
You’ve got audio tokens you can’t easily read. Timing issues you can’t see. Tool calls that might have happened at the wrong moment. User interruptions that corrupted state.
And the user’s bug report? “It didn’t work.”
Cool. Super helpful.
Debugging voice agents shouldn’t be archaeology. It should be engineering.
Let me show you how to trace voice agents like you trace code.
The Voice Debugging Problem
When a text-based agent fails, you have a clean transcript:
User: "Update the document"
Agent: "I'll update that now"
[calls update_document with wrong parameters]
Error: Document ID not found
Clear problem. Clear fix.
When a voice agent fails, you have:
- Audio input (maybe unclear)
- Transcription (maybe wrong)
- Tool calls (maybe at the wrong time)
- Audio output (maybe interrupted)
- Timing issues (maybe latency spikes)
- State changes (maybe corrupted)
Which one broke? Where? Why?
Without proper tracing, debugging takes hours of guessing. With it? Minutes of certainty.
What Voice Tracing Looks Like
Imagine this dashboard:
Session: a8f3-4e2b-9d1c
Duration: 2m 14s
Status: Failed at 1m 38s
Timeline:
├─ 00:00.000 | User audio starts
├─ 00:01.200 | Transcription: "update the pricing section"
├─ 00:01.450 | Agent response starts: "I'll update that now"
├─ 00:02.100 | Tool call: update_section(section_id: "pricing", ...)
├─ 00:02.350 | ❌ Tool error: "Section 'pricing' not found"
├─ 00:02.400 | Agent response: "Hmm, I'm having trouble..."
├─ 00:02.800 | User interrupts: "never mind"
└─ 00:03.100 | Session ended
Root cause: Section ID should be "pricing-v2"
You can see:
- What the user actually said
- What got transcribed
- When tools were called
- What parameters were sent
- What errors happened
- When user interrupted
Debug time: 30 seconds.
Building a Voice Tracing System
Let’s build this step by step.
The Architecture
graph TD
A[Voice Agent] --> B[Tracing Layer]
B --> C[Capture Audio I/O]
B --> D[Log Transcriptions]
B --> E[Track Tool Calls]
B --> F[Record Timing]
B --> G[Monitor State]
C --> H[Timeline Database]
D --> H
E --> H
F --> H
G --> H
H --> I[Real-time Dashboard]
H --> J[Historical Query Interface]
H --> K[Replay System]
I --> L[Developers debugging live]
J --> L[Developers analyzing failures]
K --> L[Developers reproducing issues]
Every interaction flows through a tracing layer that captures everything, then makes it queryable and replayable.
Step 1: Wrap the Realtime API
Create a tracing wrapper:
class TracedRealtimeSession {
constructor(sessionId) {
this.sessionId = sessionId;
this.events = [];
this.startTime = Date.now();
}
async connect() {
this.log('session_start');
this.session = await openai.beta.realtime.connect({
model: "gpt-realtime"
});
// Intercept all events
this.session.on('audio.input', (audio) => this.handleAudioInput(audio));
this.session.on('transcription', (text) => this.handleTranscription(text));
this.session.on('tool.call', (call) => this.handleToolCall(call));
this.session.on('audio.output', (audio) => this.handleAudioOutput(audio));
this.session.on('error', (error) => this.handleError(error));
return this.session;
}
log(eventType, data = {}) {
const event = {
session_id: this.sessionId,
timestamp: Date.now(),
elapsed_ms: Date.now() - this.startTime,
event_type: eventType,
data: data
};
this.events.push(event);
// Send to tracing backend asynchronously
this.sendToBackend(event).catch(console.error);
}
handleAudioInput(audio) {
this.log('audio_input', {
duration_ms: audio.duration,
sample_rate: audio.sampleRate,
// Store audio data or reference to blob storage
audio_ref: this.storeAudio(audio)
});
}
handleTranscription(text) {
this.log('transcription', {
text: text,
confidence: text.confidence || null
});
}
handleToolCall(call) {
this.log('tool_call_start', {
tool_name: call.name,
parameters: call.parameters
});
// Wrap the actual tool execution
const originalHandler = call.handler;
call.handler = async (...args) => {
const startTime = Date.now();
try {
const result = await originalHandler(...args);
this.log('tool_call_complete', {
tool_name: call.name,
duration_ms: Date.now() - startTime,
success: true,
result: this.sanitizeResult(result)
});
return result;
} catch (error) {
this.log('tool_call_error', {
tool_name: call.name,
duration_ms: Date.now() - startTime,
success: false,
error: error.message,
stack: error.stack
});
throw error;
}
};
}
handleAudioOutput(audio) {
this.log('audio_output', {
duration_ms: audio.duration,
audio_ref: this.storeAudio(audio)
});
}
handleError(error) {
this.log('error', {
message: error.message,
code: error.code,
stack: error.stack
});
}
async sendToBackend(event) {
await fetch('/api/tracing/events', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(event)
});
}
sanitizeResult(result) {
// Remove sensitive data, truncate large responses
if (typeof result === 'string' && result.length > 1000) {
return result.substring(0, 1000) + '... [truncated]';
}
return result;
}
storeAudio(audio) {
// Upload to blob storage, return reference
const ref = `audio-${this.sessionId}-${Date.now()}`;
this.uploadAudio(ref, audio);
return ref;
}
}
Now every interaction is traced automatically.
Step 2: Build the Timeline Database
Store events in a queryable format:
# Using PostgreSQL with JSONB for flexibility
CREATE TABLE voice_traces (
id SERIAL PRIMARY KEY,
session_id VARCHAR(255) NOT NULL,
timestamp TIMESTAMP NOT NULL,
elapsed_ms INTEGER NOT NULL,
event_type VARCHAR(50) NOT NULL,
data JSONB NOT NULL,
INDEX idx_session_id (session_id),
INDEX idx_timestamp (timestamp),
INDEX idx_event_type (event_type)
);
# Query patterns
# Get full session timeline
SELECT * FROM voice_traces
WHERE session_id = 'a8f3-4e2b-9d1c'
ORDER BY elapsed_ms;
# Find failed tool calls
SELECT * FROM voice_traces
WHERE event_type = 'tool_call_error'
AND timestamp > NOW() - INTERVAL '1 day';
# Get sessions with specific error
SELECT DISTINCT session_id
FROM voice_traces
WHERE event_type = 'error'
AND data->>'message' LIKE '%timeout%';
Step 3: Real-Time Dashboard
Build a UI to visualize traces:
// React component for timeline visualization
function SessionTimeline({ sessionId }) {
const [events, setEvents] = useState([]);
useEffect(() => {
// Real-time updates via WebSocket
const ws = new WebSocket(`wss://api/tracing/${sessionId}`);
ws.onmessage = (msg) => {
const event = JSON.parse(msg.data);
setEvents(prev => [...prev, event]);
};
return () => ws.close();
}, [sessionId]);
return (
<div className="timeline">
{events.map(event => (
<TimelineEvent key={event.id} event={event} />
))}
</div>
);
}
function TimelineEvent({ event }) {
const getIcon = () => {
switch(event.event_type) {
case 'audio_input': return '🎤';
case 'transcription': return '📝';
case 'tool_call_start': return '🔧';
case 'tool_call_complete': return '✅';
case 'tool_call_error': return '❌';
case 'audio_output': return '🔊';
default: return '•';
}
};
const isError = event.event_type.includes('error');
return (
<div className={`event ${isError ? 'error' : ''}`}>
<span className="timestamp">{event.elapsed_ms}ms</span>
<span className="icon">{getIcon()}</span>
<span className="type">{event.event_type}</span>
<EventDetails data={event.data} />
</div>
);
}
function EventDetails({ data }) {
if (data.text) {
return <div className="transcription">"{data.text}"</div>;
}
if (data.tool_name) {
return (
<div className="tool-call">
<span className="tool-name">{data.tool_name}</span>
<pre className="params">{JSON.stringify(data.parameters, null, 2)}</pre>
{data.error && <div className="error-msg">{data.error}</div>}
</div>
);
}
return <pre>{JSON.stringify(data, null, 2)}</pre>;
}
Step 4: Audio Replay
The killer feature: replay sessions with audio:
class SessionReplay {
constructor(sessionId) {
this.sessionId = sessionId;
this.events = [];
this.currentIndex = 0;
}
async load() {
// Fetch full session timeline
const response = await fetch(`/api/tracing/sessions/${this.sessionId}`);
this.events = await response.json();
}
async replay() {
for (let i = 0; i < this.events.length; i++) {
this.currentIndex = i;
const event = this.events[i];
// Wait for proper timing
if (i > 0) {
const delay = event.elapsed_ms - this.events[i-1].elapsed_ms;
await this.sleep(delay);
}
// Render event
this.renderEvent(event);
// Play audio if available
if (event.event_type === 'audio_input' || event.event_type === 'audio_output') {
await this.playAudio(event.data.audio_ref);
}
}
}
renderEvent(event) {
// Highlight in timeline UI
document.querySelector(`[data-event-id="${event.id}"]`).classList.add('active');
// Show details panel
this.showDetails(event);
}
async playAudio(audioRef) {
const audioBlob = await fetch(`/api/audio/${audioRef}`).then(r => r.blob());
const audio = new Audio(URL.createObjectURL(audioBlob));
return new Promise(resolve => {
audio.onended = resolve;
audio.play();
});
}
sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
// Usage
const replay = new SessionReplay('a8f3-4e2b-9d1c');
await replay.load();
await replay.replay(); // Watch the whole session play back
Developers can literally watch the session replay with audio, transcriptions, tool calls, and errors all synchronized.
What to Trace: The Essential Events
Not everything needs tracing. Focus on what matters:
Critical Events
Session lifecycle:
- Session start
- Session end
- Disconnections / reconnections
Audio flow:
- User audio received (duration, quality metrics)
- Agent audio sent (duration)
- Interruptions (user talking over agent)
Transcription:
- What user said (as transcribed)
- Confidence scores
- Any transcription errors
Tool execution:
- Tool call initiated (name, parameters)
- Tool execution time
- Tool result or error
- Tool retries
Errors:
- API errors
- Network timeouts
- Tool failures
- Invalid state transitions
Performance Metrics
Latency tracking:
// Measure key latencies
const latencies = {
transcription: 0, // Audio → text
agent_thinking: 0, // Text → response
tool_execution: 0, // Tool call → result
audio_generation: 0, // Text → audio
total_turn: 0 // User speaks → agent responds
};
function trackLatency(start, end, type) {
const latency = end - start;
latencies[type] = latency;
// Alert if threshold exceeded
if (latency > THRESHOLDS[type]) {
alertSlowness(type, latency);
}
}
Track p50, p95, p99 latencies for each component.
Context and State
Conversation state:
// Log state changes
function logStateChange(before, after) {
trace.log('state_change', {
before: sanitize(before),
after: sanitize(after),
diff: calculateDiff(before, after)
});
}
This helps debug state corruption issues.
Query Patterns for Debugging
Once you have traces, you need to query them effectively:
Find All Failed Sessions
SELECT DISTINCT session_id,
MIN(timestamp) as session_start,
COUNT(*) as error_count
FROM voice_traces
WHERE event_type LIKE '%error%'
GROUP BY session_id
ORDER BY error_count DESC;
Find Sessions With Specific Tool Failures
SELECT session_id, timestamp, data
FROM voice_traces
WHERE event_type = 'tool_call_error'
AND data->>'tool_name' = 'update_section'
AND timestamp > NOW() - INTERVAL '1 day';
Find Slow Tool Calls
SELECT data->>'tool_name' as tool,
AVG(data->>'duration_ms'::numeric) as avg_ms,
MAX(data->>'duration_ms'::numeric) as max_ms,
COUNT(*) as call_count
FROM voice_traces
WHERE event_type = 'tool_call_complete'
GROUP BY data->>'tool_name'
HAVING AVG(data->>'duration_ms'::numeric) > 1000
ORDER BY avg_ms DESC;
Find Interrupted Conversations
SELECT session_id, COUNT(*) as interruption_count
FROM voice_traces
WHERE event_type = 'interruption'
AND timestamp > NOW() - INTERVAL '1 day'
GROUP BY session_id
HAVING COUNT(*) > 3;
Real-World Debugging Scenarios
Let’s walk through actual bugs and how tracing helped:
Bug 1: “It didn’t hear me correctly”
User report: “I said update pricing, it updated product.”
Without tracing: Shrug. Can’t reproduce. Maybe user mumbled?
With tracing:
Timeline for session f3d8-2a1c:
├─ 00:01.200 | Transcription: "update pricing"
├─ 00:01.450 | Agent thinking...
├─ 00:02.100 | Tool call: update_section(section_id: "product", ...)
^^^^^^^^ Wrong parameter!
Root cause: Agent misinterpreted “pricing” as “product” due to ambiguous section naming. Transcription was correct; tool parameter selection was wrong.
Fix: Improve tool description to clarify section ID requirements. Add validation.
Debug time: 2 minutes (vs. hours of guessing)
Bug 2: “It’s slow and unresponsive”
User report: “Takes forever to do anything.”
Without tracing: Check API docs. Optimize code. Still slow. Why?
With tracing:
Latency Analysis:
├─ Transcription: p50=120ms, p95=180ms ✓ Normal
├─ Agent thinking: p50=350ms, p95=450ms ✓ Normal
├─ Tool execution: p50=2800ms, p95=8400ms ❌ SLOW
└─ Audio generation: p50=200ms, p95=280ms ✓ Normal
Slow tool: fetch_user_data
- Average: 3.2s
- Max: 11.4s
- Database query timeout issues
Root cause: Database query in fetch_user_data wasn’t indexed. Nothing to do with voice agent.
Fix: Add database index.
Debug time: 5 minutes (vs. days of optimization wild goose chase)
Bug 3: “It sometimes loses context”
User report: “Asked it to update two sections. It only did one.”
Without tracing: Can’t reproduce. Works fine in testing.
With tracing:
Timeline for session b7c2-9f3a:
├─ 00:01.200 | Transcription: "update the intro section"
├─ 00:02.100 | Tool call: update_section("intro")
├─ 00:02.850 | User interrupts during tool execution: "and the conclusion too"
├─ 00:03.100 | ❌ State corrupted: pending tool call cancelled
├─ 00:03.400 | Agent responds to new input, loses original context
Root cause: User interruption during tool execution cancelled the tool call. Agent started fresh conversation without completing first task.
Fix: Queue tool calls and complete them even if user interrupts. Acknowledge both requests.
Debug time: 10 minutes (vs. “cannot reproduce”)
Alerting on Patterns
Don’t just log—alert when problems spike:
// Monitor error rates
class ErrorRateMonitor {
constructor(window = 60000, threshold = 0.1) {
this.window = window; // 60 seconds
this.threshold = threshold; // 10% error rate
this.events = [];
}
recordEvent(success) {
this.events.push({
success,
timestamp: Date.now()
});
// Clean old events
const cutoff = Date.now() - this.window;
this.events = this.events.filter(e => e.timestamp > cutoff);
// Check threshold
const errorRate = this.calculateErrorRate();
if (errorRate > this.threshold) {
this.alert(errorRate);
}
}
calculateErrorRate() {
const failures = this.events.filter(e => !e.success).length;
return failures / this.events.length;
}
alert(errorRate) {
// Send to Slack, PagerDuty, etc.
console.error(`⚠️ Voice agent error rate: ${(errorRate * 100).toFixed(1)}%`);
notifyTeam({
message: `Voice agent errors spiking`,
error_rate: errorRate,
window_minutes: this.window / 60000
});
}
}
const monitor = new ErrorRateMonitor();
// Hook into tracing
traceSession.on('tool_call_complete', (success) => {
monitor.recordEvent(success);
});
Alert Examples
Error rate spike: More than 10% failures in last 60 seconds
Latency spike: p95 latency > 2x normal
Tool failure pattern: Same tool failing repeatedly
Audio quality drop: Transcription confidence < 0.7 for multiple sessions
Privacy and Compliance
Voice tracing captures sensitive data. Handle it carefully:
Audio Retention
// Configurable retention
const RETENTION_POLICY = {
audio_input: 7, // days
audio_output: 7,
transcriptions: 30,
tool_calls: 90,
errors: 90
};
// Auto-delete old data
async function enforceRetention() {
for (const [eventType, days] of Object.entries(RETENTION_POLICY)) {
await db.query(`
DELETE FROM voice_traces
WHERE event_type = $1
AND timestamp < NOW() - INTERVAL '${days} days'
`, [eventType]);
}
}
// Run daily
setInterval(enforceRetention, 24 * 60 * 60 * 1000);
PII Redaction
function redactPII(text) {
// Remove common PII patterns
return text
.replace(/\b\d{3}-\d{2}-\d{4}\b/g, '[SSN]')
.replace(/\b\d{16}\b/g, '[CC]')
.replace(/\b[\w.-]+@[\w.-]+\.\w+\b/g, '[EMAIL]');
}
// Apply before storing
trace.log('transcription', {
text: redactPII(originalText),
text_hash: hash(originalText) // For matching, not reading
});
Access Controls
// Only authorized developers can view traces
async function getSessionTrace(sessionId, userId) {
const user = await getUser(userId);
if (!user.roles.includes('developer') && !user.roles.includes('support')) {
throw new Error('Unauthorized');
}
// Log access for audit
await audit.log('trace_access', {
session_id: sessionId,
accessed_by: userId,
timestamp: Date.now()
});
return await db.getTrace(sessionId);
}
Real Numbers: Before and After Tracing
Teams who implemented voice tracing report:
Debug time: 80% reduction
From hours to minutes for typical bugs.
Mean time to resolution: 70% faster
From identification to fix deployed.
Issue reproduction rate: 95% vs 40%
Almost all issues can be reproduced from traces vs. guessing.
Developer satisfaction: “Night and day”
One developer told us: “Before tracing, debugging voice agents was my least favorite task. Pure guesswork. Now? It’s actually kind of fun. I can see exactly what happened and fix it immediately.”
Common Tracing Mistakes
Mistake 1: Tracing Too Much
Don’t log raw audio buffers in your database. Reference them in blob storage:
// Wrong
trace.log('audio', { data: audioBuffer }); // 2MB per event!
// Right
trace.log('audio', { ref: uploadToS3(audioBuffer) }); // 50 bytes
Mistake 2: Tracing Too Little
Don’t only trace errors. Trace successful flows too:
// Wrong: only logs failures
if (error) {
trace.log('error', error);
}
// Right: logs everything
trace.log('tool_call_start', params);
try {
const result = await execute();
trace.log('tool_call_complete', result);
} catch (error) {
trace.log('tool_call_error', error);
}
You need context around errors, not just the errors themselves.
Mistake 3: No Correlation IDs
Connect related systems:
// Pass session ID to all related services
const sessionId = generateId();
await traceVoiceAgent(sessionId, ...);
await traceToolExecution(sessionId, ...);
await traceDatabaseQuery(sessionId, ...);
// Now you can correlate across systems
Mistake 4: Ignoring Performance
Tracing shouldn’t slow down your agent:
// Wrong: blocking
trace.log('event', data);
await sendToBackend(data); // BLOCKS voice agent!
// Right: async
trace.log('event', data);
sendToBackend(data).catch(console.error); // Fire and forget
Getting Started: Add Tracing Today
You don’t need to build everything at once. Start minimal:
Week 1: Add basic event logging (transcriptions, tool calls, errors)
Week 2: Build simple timeline query interface
Week 3: Add audio storage and playback
Week 4: Build real-time dashboard
Most teams see wins by week 2.
Ready for Observable Voice Agents?
If you want this for engineering teams iterating on voice agents, real-time tracing is table stakes.
OpenAI’s Realtime API provides the voice capabilities. Your job is making them debuggable.
Stop debugging in the dark. Start tracing like you mean it.
Want to learn more? Check out OpenAI’s Realtime API documentation for event handling patterns and function calling guide for tool-based workflows.