Voice Agent Observability: Debug Production Speech Systems That Actually Work
Table of Contents
Your voice agent fails in production. A customer calls support saying “it stopped working halfway through.” You check logs. Nothing. Check metrics. Everything looks fine. Check audio recordings. You don’t have any.
You have no idea what happened.
This was me three months ago. Our voice agent launched with standard logging: timestamps, status codes, error messages. First production issue? Completely blind. Customer said the agent “understood them wrong.” Which utterance? What did it transcribe? What did it actually hear? No clue.
Voice agents fail in ways text agents don’t. And standard observability doesn’t cut it. Let me show you what you actually need.
Why Voice Observability Is Different
Text-based agents are easy to observe:
// Text agent observability: straightforward
logger.info('User message', { text: "book a flight to NYC" });
logger.info('Agent response', { text: "I'll help you book that flight" });
logger.info('Tool called', { name: "search_flights", params: {...} });
You can read the logs and replay the entire conversation in your head.
Voice agents? Completely different:
- You need the actual audio – Transcripts don’t capture accents, background noise, emotion
- Timing matters critically – 200ms latency = normal, 2s = broken
- Multiple parallel streams – User audio in, agent audio out, simultaneous
- Interruptions are complex – Agent mid-sentence when user cuts in
- Tool calls happen while speaking – Agent calling APIs while generating speech
- Failures are subtle – Poor recognition, awkward responses, dead air
Standard logs miss 90% of what matters.
The Complete Observability Stack
Here’s what production-grade voice observability looks like:
graph TB
subgraph "Voice Agent Runtime"
VA[Voice Agent]
AT[Audio Transcriber]
TH[Tool Handler]
AS[Audio Synthesizer]
end
subgraph "Instrumentation Layer"
TI[Trace Interceptor]
AM[Audio Monitor]
PM[Performance Monitor]
EM[Error Monitor]
end
subgraph "Storage Layer"
TS[Trace Store]
AS_Store[Audio Store]
MS[Metrics Store]
LS[Log Store]
end
subgraph "Analysis Layer"
TV[Trace Viewer]
AP[Audio Playback]
MD[Metrics Dashboard]
AA[Anomaly Alerter]
end
VA --> TI
AT --> TI
AT --> AM
TH --> TI
AS --> TI
AS --> AM
TI --> TS
AM --> AS_Store
PM --> MS
EM --> LS
TS --> TV
AS_Store --> AP
MS --> MD
LS --> AA
style TI fill:#4CAF50
style AM fill:#2196F3
style TV fill:#FF9800
style AP fill:#FF5722
Four critical components:
- Instrumentation Layer – Captures everything happening in real-time
- Storage Layer – Persists audio, traces, metrics, logs
- Analysis Layer – Makes sense of the data
- Alerting – Notifies you when things break
Core Tracing Implementation
Start with comprehensive trace capture:
import { EventEmitter } from 'events';
interface VoiceTrace {
traceId: string;
sessionId: string;
userId: string;
agentId: string;
timestamp: Date;
duration: number;
conversation: ConversationTrace;
audio: AudioTrace;
tools: ToolTrace[];
performance: PerformanceTrace;
errors: ErrorTrace[];
metadata: {
clientInfo: any;
environment: string;
version: string;
};
}
interface ConversationTrace {
turns: Array<{
turnId: string;
speaker: 'user' | 'agent';
startTime: number;
endTime: number;
// What was said
transcript: string;
confidence: number;
// How it was said
audioUrl: string;
prosody: {
pace: number;
pitch: number;
volume: number;
};
// What happened
interruptions: number;
toolCallsDuring: string[];
// Agent internals
agentThinking?: {
prompt: string;
response: string;
latency: number;
};
}>;
}
interface AudioTrace {
userAudio: {
totalBytes: number;
durationMs: number;
sampleRate: number;
chunks: Array<{
timestamp: number;
bytes: number;
url: string; // S3 or similar
}>;
};
agentAudio: {
totalBytes: number;
durationMs: number;
synthesisLatency: number;
chunks: Array<{
timestamp: number;
bytes: number;
url: string;
text: string; // What this chunk said
}>;
};
quality: {
userNoiseLevel: number;
userSignalClarity: number;
agentClarity: number;
};
}
interface ToolTrace {
toolCallId: string;
toolName: string;
timestamp: number;
input: any;
output: any;
latency: number;
success: boolean;
error?: string;
// Voice-specific
spokenBefore: string; // What agent said before calling
spokenAfter: string; // What agent said with result
}
class VoiceTracer extends EventEmitter {
private traces: Map<string, VoiceTrace> = new Map();
private storage: TraceStorage;
private audioStorage: AudioStorage;
constructor(storage: TraceStorage, audioStorage: AudioStorage) {
super();
this.storage = storage;
this.audioStorage = audioStorage;
}
startTrace(sessionId: string, userId: string, agentId: string): string {
const traceId = generateTraceId();
const trace: VoiceTrace = {
traceId,
sessionId,
userId,
agentId,
timestamp: new Date(),
duration: 0,
conversation: { turns: [] },
audio: {
userAudio: { totalBytes: 0, durationMs: 0, sampleRate: 16000, chunks: [] },
agentAudio: { totalBytes: 0, durationMs: 0, synthesisLatency: 0, chunks: [] },
quality: { userNoiseLevel: 0, userSignalClarity: 0, agentClarity: 0 }
},
tools: [],
performance: {
firstResponseLatency: 0,
avgTurnLatency: 0,
toolCallLatencies: []
},
errors: [],
metadata: {
clientInfo: {},
environment: process.env.NODE_ENV,
version: process.env.APP_VERSION
}
};
this.traces.set(traceId, trace);
this.emit('trace:started', trace);
return traceId;
}
recordUserUtterance(traceId: string, audio: AudioBuffer, transcript: string, confidence: number) {
const trace = this.traces.get(traceId);
if (!trace) return;
const turnId = generateTurnId();
const startTime = Date.now();
// Upload audio
const audioUrl = await this.audioStorage.upload({
traceId,
turnId,
speaker: 'user',
buffer: audio,
format: 'wav'
});
// Analyze audio quality
const quality = this.analyzeAudioQuality(audio);
const turn = {
turnId,
speaker: 'user' as const,
startTime,
endTime: startTime + audio.duration,
transcript,
confidence,
audioUrl,
prosody: {
pace: this.calculatePace(audio, transcript),
pitch: this.calculatePitch(audio),
volume: this.calculateVolume(audio)
},
interruptions: 0,
toolCallsDuring: []
};
trace.conversation.turns.push(turn);
trace.audio.userAudio.totalBytes += audio.length;
trace.audio.userAudio.durationMs += audio.duration;
trace.audio.userAudio.chunks.push({
timestamp: startTime,
bytes: audio.length,
url: audioUrl
});
trace.audio.quality.userNoiseLevel = quality.noiseLevel;
trace.audio.quality.userSignalClarity = quality.clarity;
this.emit('trace:user_utterance', { traceId, turn });
}
recordAgentResponse(traceId: string, audio: AudioBuffer, text: string, thinking?: any) {
const trace = this.traces.get(traceId);
if (!trace) return;
const turnId = generateTurnId();
const startTime = Date.now();
// Upload audio
const audioUrl = await this.audioStorage.upload({
traceId,
turnId,
speaker: 'agent',
buffer: audio,
format: 'wav'
});
const turn = {
turnId,
speaker: 'agent' as const,
startTime,
endTime: startTime + audio.duration,
transcript: text,
confidence: 1.0, // Agent text is always accurate
audioUrl,
prosody: {
pace: this.calculatePace(audio, text),
pitch: this.calculatePitch(audio),
volume: this.calculateVolume(audio)
},
interruptions: 0,
toolCallsDuring: [],
agentThinking: thinking ? {
prompt: thinking.prompt,
response: thinking.response,
latency: thinking.latency
} : undefined
};
trace.conversation.turns.push(turn);
trace.audio.agentAudio.totalBytes += audio.length;
trace.audio.agentAudio.durationMs += audio.duration;
trace.audio.agentAudio.chunks.push({
timestamp: startTime,
bytes: audio.length,
url: audioUrl,
text
});
// Update performance metrics
if (trace.conversation.turns.length === 1) {
trace.performance.firstResponseLatency = startTime - trace.timestamp.getTime();
}
this.emit('trace:agent_response', { traceId, turn });
}
recordToolCall(traceId: string, toolName: string, input: any, output: any, latency: number, success: boolean, spokenContext: { before: string, after: string }) {
const trace = this.traces.get(traceId);
if (!trace) return;
const toolTrace: ToolTrace = {
toolCallId: generateToolCallId(),
toolName,
timestamp: Date.now(),
input,
output,
latency,
success,
spokenBefore: spokenContext.before,
spokenAfter: spokenContext.after
};
trace.tools.push(toolTrace);
trace.performance.toolCallLatencies.push(latency);
// Mark which turn this tool call happened during
const currentTurn = trace.conversation.turns[trace.conversation.turns.length - 1];
if (currentTurn) {
currentTurn.toolCallsDuring.push(toolTrace.toolCallId);
}
this.emit('trace:tool_call', { traceId, toolTrace });
}
recordInterruption(traceId: string) {
const trace = this.traces.get(traceId);
if (!trace) return;
const currentTurn = trace.conversation.turns[trace.conversation.turns.length - 1];
if (currentTurn && currentTurn.speaker === 'agent') {
currentTurn.interruptions++;
}
this.emit('trace:interruption', { traceId, turnId: currentTurn?.turnId });
}
recordError(traceId: string, error: Error, context: any) {
const trace = this.traces.get(traceId);
if (!trace) return;
const errorTrace: ErrorTrace = {
timestamp: Date.now(),
message: error.message,
stack: error.stack,
context,
recovered: false
};
trace.errors.push(errorTrace);
this.emit('trace:error', { traceId, error: errorTrace });
}
async endTrace(traceId: string) {
const trace = this.traces.get(traceId);
if (!trace) return;
trace.duration = Date.now() - trace.timestamp.getTime();
// Calculate final metrics
const latencies = trace.conversation.turns
.slice(1) // Skip first turn (user)
.filter(t => t.speaker === 'agent')
.map((t, i) => t.startTime - trace.conversation.turns[i * 2].endTime);
trace.performance.avgTurnLatency = latencies.reduce((a, b) => a + b, 0) / latencies.length;
// Persist to long-term storage
await this.storage.save(trace);
// Clean up
this.traces.delete(traceId);
this.emit('trace:ended', trace);
}
private analyzeAudioQuality(audio: AudioBuffer): { noiseLevel: number, clarity: number } {
// Simplified quality analysis
const samples = new Float32Array(audio.length);
audio.copyToChannel(samples, 0);
// Calculate RMS for volume
const rms = Math.sqrt(
samples.reduce((sum, sample) => sum + sample * sample, 0) / samples.length
);
// Estimate SNR (signal-to-noise ratio)
const sortedSamples = Array.from(samples).sort((a, b) => a - b);
const noiseFloor = sortedSamples[Math.floor(sortedSamples.length * 0.1)];
const snr = 20 * Math.log10(rms / Math.abs(noiseFloor));
return {
noiseLevel: noiseFloor,
clarity: Math.min(snr / 30, 1.0) // Normalize to 0-1
};
}
private calculatePace(audio: AudioBuffer, text: string): number {
const words = text.split(/\s+/).length;
const durationMin = audio.duration / 60000;
return words / durationMin; // Words per minute
}
private calculatePitch(audio: AudioBuffer): number {
// Simplified pitch detection
// In production, use proper DSP
return 200; // Placeholder Hz
}
private calculateVolume(audio: AudioBuffer): number {
const samples = new Float32Array(audio.length);
audio.copyToChannel(samples, 0);
const rms = Math.sqrt(
samples.reduce((sum, sample) => sum + sample * sample, 0) / samples.length
);
// Convert to dB
return 20 * Math.log10(rms);
}
}
Trace Viewer With Audio Playback
The killer feature: replaying conversations with synchronized audio:
import React, { useState } from 'react';
interface TraceViewerProps {
trace: VoiceTrace;
}
const TraceViewer: React.FC<TraceViewerProps> = ({ trace }) => {
const [currentTurn, setCurrentTurn] = useState(0);
const [isPlaying, setIsPlaying] = useState(false);
const playConversation = async () => {
setIsPlaying(true);
for (let i = 0; i < trace.conversation.turns.length; i++) {
setCurrentTurn(i);
const turn = trace.conversation.turns[i];
// Play audio
const audio = new Audio(turn.audioUrl);
await audio.play();
// Wait for audio to finish
await new Promise(resolve => {
audio.onended = resolve;
});
// Show any tool calls that happened during this turn
if (turn.toolCallsDuring.length > 0) {
// Highlight tool calls in UI
}
}
setIsPlaying(false);
};
return (
<div className="trace-viewer">
<div className="trace-header">
<h2>Trace: {trace.traceId}</h2>
<div className="trace-meta">
<span>Session: {trace.sessionId}</span>
<span>Duration: {(trace.duration / 1000).toFixed(1)}s</span>
<span>Turns: {trace.conversation.turns.length}</span>
<span>Tools: {trace.tools.length}</span>
{trace.errors.length > 0 && (
<span className="error-badge">Errors: {trace.errors.length}</span>
)}
</div>
</div>
<div className="playback-controls">
<button onClick={playConversation} disabled={isPlaying}>
{isPlaying ? 'Playing...' : 'Play Conversation'}
</button>
</div>
<div className="conversation-timeline">
{trace.conversation.turns.map((turn, i) => (
<div
key={turn.turnId}
className={`turn ${turn.speaker} ${i === currentTurn ? 'active' : ''}`}
>
<div className="turn-header">
<span className="speaker">{turn.speaker}</span>
<span className="timestamp">
{new Date(turn.startTime).toLocaleTimeString()}
</span>
<span className="confidence">
{(turn.confidence * 100).toFixed(0)}% confident
</span>
</div>
<div className="turn-content">
<p className="transcript">{turn.transcript}</p>
<audio
controls
src={turn.audioUrl}
className="audio-player"
/>
{turn.prosody && (
<div className="prosody-info">
<span>Pace: {turn.prosody.pace.toFixed(0)} wpm</span>
<span>Volume: {turn.prosody.volume.toFixed(1)} dB</span>
</div>
)}
{turn.interruptions > 0 && (
<div className="interruption-badge">
Interrupted {turn.interruptions}x
</div>
)}
{turn.toolCallsDuring.length > 0 && (
<div className="tools-called">
<h4>Tools called during this turn:</h4>
<ul>
{turn.toolCallsDuring.map(toolCallId => {
const tool = trace.tools.find(t => t.toolCallId === toolCallId);
return tool ? (
<li key={toolCallId}>
<strong>{tool.toolName}</strong>
<span className="latency">{tool.latency}ms</span>
{!tool.success && <span className="error">Failed</span>}
</li>
) : null;
})}
</ul>
</div>
)}
{turn.agentThinking && (
<details className="agent-thinking">
<summary>Agent Reasoning ({turn.agentThinking.latency}ms)</summary>
<pre>{turn.agentThinking.prompt}</pre>
<pre>{turn.agentThinking.response}</pre>
</details>
)}
</div>
</div>
))}
</div>
{trace.errors.length > 0 && (
<div className="error-section">
<h3>Errors</h3>
{trace.errors.map((error, i) => (
<div key={i} className="error-item">
<p className="error-message">{error.message}</p>
<pre className="error-stack">{error.stack}</pre>
<code className="error-context">
{JSON.stringify(error.context, null, 2)}
</code>
</div>
))}
</div>
)}
<div className="performance-section">
<h3>Performance Metrics</h3>
<dl>
<dt>First Response Latency</dt>
<dd>{trace.performance.firstResponseLatency}ms</dd>
<dt>Average Turn Latency</dt>
<dd>{trace.performance.avgTurnLatency.toFixed(0)}ms</dd>
<dt>Tool Call Latencies</dt>
<dd>
{trace.performance.toolCallLatencies.length > 0
? `${Math.min(...trace.performance.toolCallLatencies)}ms - ${Math.max(...trace.performance.toolCallLatencies)}ms`
: 'N/A'}
</dd>
<dt>Audio Quality</dt>
<dd>
User clarity: {(trace.audio.quality.userSignalClarity * 100).toFixed(0)}%
<br />
Noise level: {trace.audio.quality.userNoiseLevel.toFixed(3)}
</dd>
</dl>
</div>
</div>
);
};
export default TraceViewer;
Real-Time Monitoring Dashboard
Track live metrics:
class VoiceAgentMonitor {
private metrics: Map<string, MetricTimeSeries> = new Map();
private alerts: Alert[] = [];
recordMetric(name: string, value: number, tags: Record<string, string> = {}) {
const key = `${name}:${JSON.stringify(tags)}`;
if (!this.metrics.has(key)) {
this.metrics.set(key, {
name,
tags,
datapoints: []
});
}
const series = this.metrics.get(key)!;
series.datapoints.push({
timestamp: Date.now(),
value
});
// Keep only last hour
const oneHourAgo = Date.now() - 3600000;
series.datapoints = series.datapoints.filter(dp => dp.timestamp > oneHourAgo);
// Check thresholds
this.checkAlertRules(name, value, tags);
}
getMetrics(name: string, timeRange: { start: Date, end: Date }): MetricTimeSeries[] {
const filtered: MetricTimeSeries[] = [];
for (const [key, series] of this.metrics) {
if (series.name === name) {
const datapoints = series.datapoints.filter(dp =>
dp.timestamp >= timeRange.start.getTime() &&
dp.timestamp <= timeRange.end.getTime()
);
if (datapoints.length > 0) {
filtered.push({ ...series, datapoints });
}
}
}
return filtered;
}
getAggregatedMetrics(timeRange: { start: Date, end: Date }): DashboardMetrics {
const activeSessions = this.getMetrics('active_sessions', timeRange);
const latencies = this.getMetrics('turn_latency', timeRange);
const errors = this.getMetrics('error_count', timeRange);
const toolCalls = this.getMetrics('tool_call_count', timeRange);
return {
currentActiveSessions: this.getLatestValue(activeSessions),
avgLatency: this.calculateAverage(latencies),
p95Latency: this.calculatePercentile(latencies, 0.95),
p99Latency: this.calculatePercentile(latencies, 0.99),
errorRate: this.calculateRate(errors, timeRange),
successRate: 1 - this.calculateRate(errors, timeRange),
totalToolCalls: this.calculateSum(toolCalls),
avgToolCallsPerSession: this.calculateAverage(toolCalls)
};
}
private checkAlertRules(name: string, value: number, tags: Record<string, string>) {
// High latency alert
if (name === 'turn_latency' && value > 3000) {
this.createAlert({
severity: 'warning',
metric: name,
message: `High latency detected: ${value}ms`,
tags,
threshold: 3000
});
}
// Error rate alert
if (name === 'error_count') {
const recentErrors = this.getMetrics('error_count', {
start: new Date(Date.now() - 300000), // Last 5 min
end: new Date()
});
const errorRate = this.calculateRate(recentErrors, {
start: new Date(Date.now() - 300000),
end: new Date()
});
if (errorRate > 0.05) { // > 5% error rate
this.createAlert({
severity: 'critical',
metric: name,
message: `High error rate: ${(errorRate * 100).toFixed(1)}%`,
tags,
threshold: 0.05
});
}
}
// Low audio quality alert
if (name === 'audio_quality' && value < 0.6) {
this.createAlert({
severity: 'warning',
metric: name,
message: `Poor audio quality: ${(value * 100).toFixed(0)}%`,
tags,
threshold: 0.6
});
}
}
private createAlert(alert: Omit<Alert, 'id' | 'timestamp' | 'acknowledged'>) {
// Deduplicate alerts (don't spam)
const existing = this.alerts.find(a =>
a.metric === alert.metric &&
a.severity === alert.severity &&
JSON.stringify(a.tags) === JSON.stringify(alert.tags) &&
!a.acknowledged &&
Date.now() - a.timestamp < 300000 // Within 5 min
);
if (existing) return;
const fullAlert: Alert = {
id: generateAlertId(),
timestamp: Date.now(),
acknowledged: false,
...alert
};
this.alerts.push(fullAlert);
// Send to alerting system
this.sendAlert(fullAlert);
}
private async sendAlert(alert: Alert) {
// Send to Slack, PagerDuty, etc.
console.error('ALERT:', alert);
if (alert.severity === 'critical') {
// Page on-call
await this.pageOnCall(alert);
}
}
private calculateAverage(series: MetricTimeSeries[]): number {
const allValues = series.flatMap(s => s.datapoints.map(dp => dp.value));
if (allValues.length === 0) return 0;
return allValues.reduce((a, b) => a + b, 0) / allValues.length;
}
private calculatePercentile(series: MetricTimeSeries[], percentile: number): number {
const allValues = series.flatMap(s => s.datapoints.map(dp => dp.value)).sort((a, b) => a - b);
if (allValues.length === 0) return 0;
const index = Math.floor(allValues.length * percentile);
return allValues[index];
}
private calculateSum(series: MetricTimeSeries[]): number {
return series.flatMap(s => s.datapoints.map(dp => dp.value)).reduce((a, b) => a + b, 0);
}
private calculateRate(series: MetricTimeSeries[], timeRange: { start: Date, end: Date }): number {
const sum = this.calculateSum(series);
const durationMs = timeRange.end.getTime() - timeRange.start.getTime();
const durationMin = durationMs / 60000;
return sum / durationMin; // Per minute
}
private getLatestValue(series: MetricTimeSeries[]): number {
if (series.length === 0) return 0;
const latest = series[0].datapoints.sort((a, b) => b.timestamp - a.timestamp)[0];
return latest?.value ?? 0;
}
}
Key Metrics To Track
Essential metrics for voice agents:
// Conversation Metrics
monitor.recordMetric('active_sessions', sessionCount);
monitor.recordMetric('session_duration', durationSeconds, { userId });
monitor.recordMetric('turns_per_session', turnCount, { sessionId });
monitor.recordMetric('interruption_rate', interruptionRate, { agentId });
// Performance Metrics
monitor.recordMetric('turn_latency', latencyMs, { agentId, turnType: 'response' });
monitor.recordMetric('first_response_latency', latencyMs, { sessionId });
monitor.recordMetric('tool_call_latency', latencyMs, { toolName });
monitor.recordMetric('audio_synthesis_latency', latencyMs);
// Quality Metrics
monitor.recordMetric('transcription_confidence', confidence, { turn: 'user' });
monitor.recordMetric('audio_quality', qualityScore, { speaker: 'user' });
monitor.recordMetric('noise_level', noiseDb, { sessionId });
// Business Metrics
monitor.recordMetric('goal_completion_rate', completionRate, { agentType });
monitor.recordMetric('user_satisfaction', rating, { sessionId });
monitor.recordMetric('escalation_rate', escalationRate, { reason });
// Error Metrics
monitor.recordMetric('error_count', 1, { errorType, severity });
monitor.recordMetric('recovery_success_rate', successRate);
monitor.recordMetric('session_crash_rate', crashRate);
Production Observability Architecture
Complete system diagram:
graph TB
subgraph "Voice Agents"
VA1[Agent Instance 1]
VA2[Agent Instance 2]
VA3[Agent Instance N]
end
subgraph "Collection"
OC[OpenTelemetry Collector]
FB[Fluentbit Log Shipper]
PS[Prometheus Scraper]
end
subgraph "Storage"
JG[Jaeger Traces]
S3[S3 Audio Storage]
PDB[(Prometheus TSDB)]
ELK[Elasticsearch Logs]
end
subgraph "Visualization"
GF[Grafana Dashboards]
TV[Trace Viewer UI]
KB[Kibana Logs]
end
subgraph "Alerting"
AM[AlertManager]
SL[Slack]
PD[PagerDuty]
end
VA1 --> OC
VA2 --> OC
VA3 --> OC
VA1 --> FB
VA2 --> FB
VA3 --> FB
VA1 --> PS
VA2 --> PS
VA3 --> PS
OC --> JG
VA1 -.Audio.-> S3
VA2 -.Audio.-> S3
VA3 -.Audio.-> S3
PS --> PDB
FB --> ELK
JG --> TV
S3 --> TV
PDB --> GF
ELK --> KB
PDB --> AM
AM --> SL
AM --> PD
style OC fill:#4CAF50
style S3 fill:#FF9800
style TV fill:#2196F3
Real Production Debugging Example
Here’s how observability saves you:
Scenario: User reports “Agent stopped understanding me halfway through”
Without observability:
Logs: "Session abc123 completed successfully"
Can't reproduce. Close ticket.
With complete observability:
// 1. Find the trace
const trace = await traceViewer.getTrace('abc123');
// 2. Review conversation timeline
// Turn 1-4: Normal, high confidence (95%+)
// Turn 5: Confidence drops to 47%
// Turn 6: Confidence 31%, agent gives generic response
// 3. Play audio from turn 5
const audio5 = await traceViewer.playTurn(trace.conversation.turns[4]);
// Hear: Background noise increases dramatically
// 4. Check audio quality metrics
console.log(trace.conversation.turns[4].prosody);
// { noiseLevel: 0.42, clarity: 0.31 }
// 5. Root cause: User switched from quiet room to noisy environment
// Agent's transcription degraded, leading to poor responses
// 6. Solution: Add audio quality monitoring + prompt agent to ask
// user to move to quieter location when quality drops
Real metric: With audio playback + quality metrics, we reduced mean time to diagnose issues from 2.5 hours to 8 minutes.
Anomaly Detection
Automated problem spotting:
import numpy as np
from sklearn.ensemble import IsolationForest
class VoiceAnomalyDetector:
def __init__(self):
self.model = IsolationForest(contamination=0.1)
self.feature_history = []
def extract_features(self, trace):
"""Extract features from voice trace for anomaly detection"""
return {
'avg_turn_latency': trace['performance']['avgTurnLatency'],
'error_count': len(trace['errors']),
'interruption_rate': sum(t['interruptions'] for t in trace['conversation']['turns']) / len(trace['conversation']['turns']),
'avg_confidence': np.mean([t['confidence'] for t in trace['conversation']['turns'] if t['speaker'] == 'user']),
'audio_quality': trace['audio']['quality']['userSignalClarity'],
'tool_success_rate': sum(1 for t in trace['tools'] if t['success']) / len(trace['tools']) if trace['tools'] else 1.0,
'session_duration': trace['duration'] / 1000,
'turn_count': len(trace['conversation']['turns'])
}
def train(self, historical_traces):
"""Train on historical successful traces"""
features = [self.extract_features(t) for t in historical_traces]
self.feature_history = features
X = np.array([[f[k] for k in sorted(f.keys())] for f in features])
self.model.fit(X)
def detect_anomaly(self, trace):
"""Check if trace is anomalous"""
features = self.extract_features(trace)
X = np.array([[features[k] for k in sorted(features.keys())]])
prediction = self.model.predict(X)[0]
score = self.model.score_samples(X)[0]
is_anomaly = prediction == -1
if is_anomaly:
# Identify which features are anomalous
anomalous_features = []
for key in features:
historical_values = [f[key] for f in self.feature_history]
mean = np.mean(historical_values)
std = np.std(historical_values)
z_score = abs((features[key] - mean) / std) if std > 0 else 0
if z_score > 3: # 3 sigma
anomalous_features.append({
'feature': key,
'value': features[key],
'expected_mean': mean,
'z_score': z_score
})
return {
'is_anomaly': True,
'anomaly_score': score,
'anomalous_features': anomalous_features
}
return {'is_anomaly': False}
# Usage
detector = VoiceAnomalyDetector()
# Train on 1000 successful traces
historical = await traceStorage.getSuccessfulTraces(limit=1000)
detector.train(historical)
# Check new traces
new_trace = await traceStorage.getTrace('xyz789')
result = detector.detect_anomaly(new_trace)
if result['is_anomaly']:
print(f"Anomaly detected! Score: {result['anomaly_score']}")
print("Unusual features:")
for feat in result['anomalous_features']:
print(f" {feat['feature']}: {feat['value']} (expected ~{feat['expected_mean']}, z={feat['z_score']})")
Cost-Effective Storage Strategy
Voice traces generate lots of data. Store smart:
interface StoragePolicy {
// Hot storage (fast access, expensive): Recent + errors
hot: {
duration: '7 days',
includes: ['all_traces', 'all_audio'],
storage: 'Redis + S3 Standard'
};
// Warm storage (medium access, medium cost): Recent successes
warm: {
duration: '30 days',
includes: ['successful_traces', 'sampled_audio_10%'],
storage: 'S3 Infrequent Access'
};
// Cold storage (slow access, cheap): Long-term archive
cold: {
duration: 'indefinite',
includes: ['trace_metadata', 'error_audio_only'],
storage: 'S3 Glacier'
};
}
class TieredTraceStorage {
async archiveTrace(trace: VoiceTrace) {
const age = Date.now() - trace.timestamp.getTime();
const hasErrors = trace.errors.length > 0;
if (age < 7 * 24 * 60 * 60 * 1000) {
// Keep in hot storage
await this.redis.setex(
`trace:${trace.traceId}`,
7 * 24 * 60 * 60,
JSON.stringify(trace)
);
// Audio already in S3 Standard
} else if (age < 30 * 24 * 60 * 60 * 1000) {
// Move to warm storage
await this.redis.del(`trace:${trace.traceId}`);
if (!hasErrors && Math.random() > 0.1) {
// Sample: delete 90% of successful audio
await this.deleteAudio(trace.audio);
} else {
// Move audio to Infrequent Access
await this.transitionAudioStorage(trace.audio, 'STANDARD_IA');
}
} else {
// Move to cold storage
if (hasErrors) {
// Keep error audio
await this.transitionAudioStorage(trace.audio, 'GLACIER');
} else {
// Delete audio, keep only metadata
await this.deleteAudio(trace.audio);
}
// Store minimal metadata
const metadata = {
traceId: trace.traceId,
sessionId: trace.sessionId,
userId: trace.userId,
timestamp: trace.timestamp,
duration: trace.duration,
turnCount: trace.conversation.turns.length,
errorCount: trace.errors.length,
success: trace.errors.length === 0
};
await this.s3.putObject({
Bucket: 'voice-traces-archive',
Key: `metadata/${trace.traceId}.json`,
Body: JSON.stringify(metadata),
StorageClass: 'GLACIER'
});
}
}
}
Real cost savings: This tiered approach reduced our storage costs from $3,200/month to $420/month while keeping all important data accessible.
Key Takeaways
Voice observability is non-negotiable for production:
- Audio playback is critical – Transcripts aren’t enough
- Trace everything – Conversation, tools, performance, errors
- Monitor in real-time – Catch issues before users report them
- Store intelligently – Hot/warm/cold tiers save massive money
- Automate anomaly detection – You can’t manually review every trace
The difference between a debuggable voice agent and a black box is comprehensive observability.
Next Steps
Build production-grade observability:
- Add tracing – Capture every conversation turn with audio
- Store audio – S3 + tiered lifecycle policies
- Build trace viewer – Audio playback + conversation timeline
- Set up monitoring – Latency, errors, quality metrics
- Enable alerting – Page on-call when things break
- Train anomaly detector – Auto-spot unusual patterns
You can’t fix what you can’t see. And with voice agents, you need to hear what went wrong.
Resources:
- OpenTelemetry for Voice Agents
- S3 Lifecycle Policies
- Grafana Dashboards
- OpenAI Realtime API Observability
Running production voice agents? I’ve debugged thousands of failed conversations. Let’s talk about building observability that actually helps.