Human-In-The-Loop For Voice Agents
- ZH+
- Architecture , Safety
- December 18, 2025
Table of Contents
Critical actions shouldn’t execute blindly. When a voice agent is about to spend money, delete data, or commit irrevers
ible changes, there needs to be a human in the loop.
The problem is that most agent systems either run completely autonomous (risky) or require developers to manually code pause/resume logic (tedious). The OpenAI Agents SDK provides a better way: built-in human-in-the-loop with automatic resumability.
Why Voice Agents Need Approval Gates
When users talk to voice agents, conversations flow naturally. But that natural flow can lead to actions that need oversight:
- “Pay that $500 invoice to Acme Corp”
- “Delete all customer records from 2023”
- “Transfer $10,000 to savings account”
- “Cancel the recurring subscription”
These aren’t theoretical edge cases. They’re real scenarios where blind execution would be catastrophic.
Traditional approaches fail here:
- Fully autonomous: Agent executes immediately → No safety net
- Manual checkpoints: Developer writes custom pause logic → Brittle, hard to maintain
- Confirmation questions: Agent asks “Are you sure?” → User already said yes, gets annoyed
The Agents SDK solves this with pausable workflows. The agent automatically pauses before critical actions, requests human approval, and resumes from the exact same state after confirmation.
How Human-In-The-Loop Works
Here’s the architecture:
graph TD
A[User Request] --> B[Agent Processes Intent]
B --> C{Critical Action?}
C -->|No| D[Execute Tool]
C -->|Yes| E[Pause & Request Approval]
E --> F{Human Decision}
F -->|Approved| G[Resume & Execute]
F -->|Rejected| H[Cancel & Explain]
G --> I[Return Result]
H --> I
The runtime handles the entire pause/resume cycle. Your agent code stays clean:
const agent = new Agent({
name: "PaymentAgent",
instructions: "You help users manage invoices and payments.",
tools: [
{
name: "process_payment",
description: "Process a payment to a vendor",
requiresApproval: true, // This triggers pause
parameters: {
type: "object",
properties: {
vendor: { type: "string" },
amount: { type: "number" },
invoice_id: { type: "string" }
}
},
execute: async (args) => {
// This only runs after human approval
return await paymentAPI.process(args);
}
}
]
});
// The SDK handles pause/resume automatically
agent.on('approval_required', async (context) => {
// Present approval UI to human
const approved = await showApprovalDialog({
action: context.tool_name,
params: context.tool_args,
reasoning: context.agent_reasoning
});
if (approved) {
context.approve(); // Agent resumes
} else {
context.reject("User declined payment"); // Agent handles cancellation
}
});
Real-World Example: Payment Processing
Let’s walk through a complete payment flow:
User: “Process that invoice from Acme Corp”
Agent (internal reasoning):
- Searches for invoices from Acme Corp
- Finds unpaid invoice: $500.00, ID: INV-1234
- Prepares to call
process_paymenttool - SDK detects
requiresApproval: true - Agent automatically pauses
Agent (to user): “I found invoice INV-1234 for $500 to Acme Corp. This will process the payment immediately. Should I proceed?”
User: “Yes, go ahead”
SDK (internal):
- Captures approval
- Resumes agent from exact state
- Agent executes
process_payment - Returns result
Agent (to user): “Payment processed. Transaction ID: TXN-5678.”
The entire pause/resume cycle happens transparently. The agent doesn’t need custom code to handle approval state.
State Preservation During Pause
One of the hardest problems with human-in-the-loop is state management. When you pause mid-conversation, you need to preserve:
- Conversation history (what was said before)
- Agent reasoning (why it wants to act)
- Tool arguments (what it plans to execute)
- User context (who they are, what they can approve)
The Agents SDK runtime handles all of this:
// When paused, state is automatically serialized
const pausedState = {
conversation_id: "conv_abc123",
turn_number: 15,
agent_state: {
pending_tool: "process_payment",
tool_args: {
vendor: "Acme Corp",
amount: 500,
invoice_id: "INV-1234"
},
reasoning: "User requested payment for outstanding invoice"
},
transcript: [/* full conversation */],
user_context: {/* permissions, identity */}
};
// When approved, state is restored exactly
// Agent picks up where it left off
You don’t write any of this persistence code. The runtime manages it.
Multiple Approval Levels
Some organizations need tiered approval: low-value actions go through, high-value actions need manager approval.
const agent = new Agent({
tools: [
{
name: "process_payment",
requiresApproval: (args) => {
// Conditional approval based on amount
if (args.amount > 1000) {
return {
level: "manager",
reason: "Payment exceeds $1000 limit"
};
}
if (args.amount > 100) {
return {
level: "user",
reason: "Payment confirmation required"
};
}
return false; // Auto-approve small payments
},
execute: async (args) => {
return await paymentAPI.process(args);
}
}
]
});
The SDK supports approval routing:
- < $100: Auto-approved, no human check
- $100-1000: User confirmation required
- > $1000: Manager approval required
Audit Trails
Every paused action creates an audit record:
{
"event_id": "evt_abc123",
"timestamp": "2025-03-04T10:30:00Z",
"agent": "PaymentAgent",
"action": "process_payment",
"args": {
"vendor": "Acme Corp",
"amount": 500,
"invoice_id": "INV-1234"
},
"requested_by": "user_jane",
"approved_by": "user_jane",
"approved_at": "2025-03-04T10:30:15Z",
"status": "approved",
"result": {
"transaction_id": "TXN-5678",
"status": "completed"
}
}
This creates compliance-ready logs showing:
- What was requested
- Who requested it
- Who approved it
- When approval happened
- What the result was
Voice Makes Approval Natural
The reason human-in-the-loop works so well with voice agents is that speaking feels like real-time conversation. Text-based approval flows feel clunky:
Text Agent:
Agent: I will now process payment INV-1234 for $500 to Acme Corp.
Agent: Please approve by clicking the button below.
[Approve] [Reject]
Voice Agent:
Agent: "I found invoice INV-1234 for $500 to Acme Corp.
Should I process this payment?"
User: "Yes"
Agent: "Done. Transaction ID is TXN-5678."
The voice version is 10x faster and feels like a real conversation with a trusted assistant.
Error Recovery
What happens if approval is rejected?
context.reject("User declined payment");
// Agent receives rejection and responds naturally
Agent: "No problem, I won't process that payment.
Would you like me to mark the invoice for review instead?"
The SDK ensures the agent knows why it was rejected and can offer alternatives.
Best Practices
1. Make approval context clear
Bad:
Agent: "Should I proceed?"
Good:
Agent: "This will charge your card $500. Approve?"
2. Set appropriate thresholds
Don’t require approval for trivial actions:
- Searching data ❌ No approval needed
- Reading information ❌ No approval needed
- Sending $10,000 ✅ Approval required
- Deleting customer data ✅ Approval required
3. Explain reasoning
When pausing, tell the user why:
Agent: "This invoice is larger than usual ($5,000 vs your average $500).
Want me to double-check the details before paying?"
4. Support voice and UI approvals
Some users will approve verbally (“Yes, do it”). Others will click a button. Support both:
agent.on('approval_required', async (context) => {
// Show UI + listen for voice
await Promise.race([
showApprovalButton(),
listenForVoiceApproval()
]);
});
Performance Impact
How much latency does pause/resume add?
- Pause trigger: < 50ms (detection is fast)
- State serialization: < 100ms (small state objects)
- Human decision time: Variable (1-10 seconds)
- Resume execution: < 50ms (restore is fast)
The SDK overhead is negligible. The real delay is human decision time - which is the point.
When NOT To Use Human-In-The-Loop
Approval gates aren’t always needed:
- Read-only operations: No need to pause for searches or data lookups
- Low-risk actions: Scheduling a meeting doesn’t need approval
- Already confirmed: If user said “Transfer $500 to savings account”, you have approval
- Time-sensitive: Emergency actions shouldn’t wait for approval
Use judgment. The goal is safety, not friction.
Measuring Success
Track these metrics:
- Approval rate: What % of paused actions get approved? (Target: > 90%)
- Rejection reasons: Why do users reject? (Improve agent accuracy)
- Approval time: How long do users take to decide? (Target: < 5 seconds)
- False positives: How often do we pause unnecessarily? (Target: < 10%)
If approval rate is low, your agent is suggesting wrong actions. If approval time is high, your approval UX is confusing.
Conclusion
Human-in-the-loop isn’t optional for production voice agents handling critical actions. The Agents SDK makes it straightforward:
- Automatic pause on critical actions
- Perfect state preservation
- Clean resume after approval
- Audit trails built in
Voice makes approval feel natural. Users don’t mind saying “yes” when they can hear exactly what’s about to happen.
Result: Voice agents that are both autonomous (fast) and safe (supervised).
Implementation Guide:
- Mark critical tools with
requiresApproval: true - Handle
approval_requiredevents - Show clear approval context to users
- Support both voice and UI confirmation
- Log all approval decisions for audit
The SDK handles pause/resume state management automatically.
Links:
Next: Explore how the Agents SDK runtime manages agent lifecycle, state persistence, and tool execution coordination automatically.