Human-In-The-Loop For Voice Agents

ZH+
Architecture , Safety
December 18, 2025

Table of Contents

Critical actions shouldn’t execute blindly. When a voice agent is about to spend money, delete data, or commit irrevers

ible changes, there needs to be a human in the loop.

The problem is that most agent systems either run completely autonomous (risky) or require developers to manually code pause/resume logic (tedious). The OpenAI Agents SDK provides a better way: built-in human-in-the-loop with automatic resumability.

Why Voice Agents Need Approval Gates

When users talk to voice agents, conversations flow naturally. But that natural flow can lead to actions that need oversight:

“Pay that $500 invoice to Acme Corp”
“Delete all customer records from 2023”
“Transfer $10,000 to savings account”
“Cancel the recurring subscription”

These aren’t theoretical edge cases. They’re real scenarios where blind execution would be catastrophic.

Traditional approaches fail here:

Fully autonomous: Agent executes immediately → No safety net
Manual checkpoints: Developer writes custom pause logic → Brittle, hard to maintain
Confirmation questions: Agent asks “Are you sure?” → User already said yes, gets annoyed

The Agents SDK solves this with pausable workflows. The agent automatically pauses before critical actions, requests human approval, and resumes from the exact same state after confirmation.

How Human-In-The-Loop Works

Here’s the architecture:

graph TD
    A[User Request] --> B[Agent Processes Intent]
    B --> C{Critical Action?}
    C -->|No| D[Execute Tool]
    C -->|Yes| E[Pause & Request Approval]
    E --> F{Human Decision}
    F -->|Approved| G[Resume & Execute]
    F -->|Rejected| H[Cancel & Explain]
    G --> I[Return Result]
    H --> I

The runtime handles the entire pause/resume cycle. Your agent code stays clean:

const agent = new Agent({
  name: "PaymentAgent",
  instructions: "You help users manage invoices and payments.",
  tools: [
    {
      name: "process_payment",
      description: "Process a payment to a vendor",
      requiresApproval: true, // This triggers pause
      parameters: {
        type: "object",
        properties: {
          vendor: { type: "string" },
          amount: { type: "number" },
          invoice_id: { type: "string" }
        }
      },
      execute: async (args) => {
        // This only runs after human approval
        return await paymentAPI.process(args);
      }
    }
  ]
});

// The SDK handles pause/resume automatically
agent.on('approval_required', async (context) => {
  // Present approval UI to human
  const approved = await showApprovalDialog({
    action: context.tool_name,
    params: context.tool_args,
    reasoning: context.agent_reasoning
  });
  
  if (approved) {
    context.approve(); // Agent resumes
  } else {
    context.reject("User declined payment"); // Agent handles cancellation
  }
});

Real-World Example: Payment Processing

Let’s walk through a complete payment flow:

User: “Process that invoice from Acme Corp”

Agent (internal reasoning):

Searches for invoices from Acme Corp
Finds unpaid invoice: $500.00, ID: INV-1234
Prepares to call process_payment tool
SDK detects requiresApproval: true
Agent automatically pauses

Agent (to user): “I found invoice INV-1234 for $500 to Acme Corp. This will process the payment immediately. Should I proceed?”

User: “Yes, go ahead”

SDK (internal):

Captures approval
Resumes agent from exact state
Agent executes process_payment
Returns result

Agent (to user): “Payment processed. Transaction ID: TXN-5678.”

The entire pause/resume cycle happens transparently. The agent doesn’t need custom code to handle approval state.

State Preservation During Pause

One of the hardest problems with human-in-the-loop is state management. When you pause mid-conversation, you need to preserve:

Conversation history (what was said before)
Agent reasoning (why it wants to act)
Tool arguments (what it plans to execute)
User context (who they are, what they can approve)

The Agents SDK runtime handles all of this:

// When paused, state is automatically serialized
const pausedState = {
  conversation_id: "conv_abc123",
  turn_number: 15,
  agent_state: {
    pending_tool: "process_payment",
    tool_args: {
      vendor: "Acme Corp",
      amount: 500,
      invoice_id: "INV-1234"
    },
    reasoning: "User requested payment for outstanding invoice"
  },
  transcript: [/* full conversation */],
  user_context: {/* permissions, identity */}
};

// When approved, state is restored exactly
// Agent picks up where it left off

You don’t write any of this persistence code. The runtime manages it.

Multiple Approval Levels

Some organizations need tiered approval: low-value actions go through, high-value actions need manager approval.

const agent = new Agent({
  tools: [
    {
      name: "process_payment",
      requiresApproval: (args) => {
        // Conditional approval based on amount
        if (args.amount > 1000) {
          return {
            level: "manager",
            reason: "Payment exceeds $1000 limit"
          };
        }
        if (args.amount > 100) {
          return {
            level: "user",
            reason: "Payment confirmation required"
          };
        }
        return false; // Auto-approve small payments
      },
      execute: async (args) => {
        return await paymentAPI.process(args);
      }
    }
  ]
});

The SDK supports approval routing:

< $100: Auto-approved, no human check
$100-1000: User confirmation required
> $1000: Manager approval required

Audit Trails

Every paused action creates an audit record:

{
  "event_id": "evt_abc123",
  "timestamp": "2025-03-04T10:30:00Z",
  "agent": "PaymentAgent",
  "action": "process_payment",
  "args": {
    "vendor": "Acme Corp",
    "amount": 500,
    "invoice_id": "INV-1234"
  },
  "requested_by": "user_jane",
  "approved_by": "user_jane",
  "approved_at": "2025-03-04T10:30:15Z",
  "status": "approved",
  "result": {
    "transaction_id": "TXN-5678",
    "status": "completed"
  }
}

This creates compliance-ready logs showing:

What was requested
Who requested it
Who approved it
When approval happened
What the result was

Voice Makes Approval Natural

The reason human-in-the-loop works so well with voice agents is that speaking feels like real-time conversation. Text-based approval flows feel clunky:

Text Agent:

Agent: I will now process payment INV-1234 for $500 to Acme Corp.
Agent: Please approve by clicking the button below.
[Approve] [Reject]

Voice Agent:

Agent: "I found invoice INV-1234 for $500 to Acme Corp. 
        Should I process this payment?"
User: "Yes"
Agent: "Done. Transaction ID is TXN-5678."

The voice version is 10x faster and feels like a real conversation with a trusted assistant.

Error Recovery

What happens if approval is rejected?

context.reject("User declined payment");

// Agent receives rejection and responds naturally
Agent: "No problem, I won't process that payment. 
        Would you like me to mark the invoice for review instead?"

The SDK ensures the agent knows why it was rejected and can offer alternatives.

Best Practices

1. Make approval context clear

Bad:

Agent: "Should I proceed?"

Good:

Agent: "This will charge your card $500. Approve?"

2. Set appropriate thresholds

Don’t require approval for trivial actions:

Searching data ❌ No approval needed
Reading information ❌ No approval needed
Sending $10,000 ✅ Approval required
Deleting customer data ✅ Approval required

3. Explain reasoning

When pausing, tell the user why:

Agent: "This invoice is larger than usual ($5,000 vs your average $500). 
        Want me to double-check the details before paying?"

4. Support voice and UI approvals

Some users will approve verbally (“Yes, do it”). Others will click a button. Support both:

agent.on('approval_required', async (context) => {
  // Show UI + listen for voice
  await Promise.race([
    showApprovalButton(),
    listenForVoiceApproval()
  ]);
});

Performance Impact

How much latency does pause/resume add?

Pause trigger: < 50ms (detection is fast)
State serialization: < 100ms (small state objects)
Human decision time: Variable (1-10 seconds)
Resume execution: < 50ms (restore is fast)

The SDK overhead is negligible. The real delay is human decision time - which is the point.

When NOT To Use Human-In-The-Loop

Approval gates aren’t always needed:

Read-only operations: No need to pause for searches or data lookups
Low-risk actions: Scheduling a meeting doesn’t need approval
Already confirmed: If user said “Transfer $500 to savings account”, you have approval
Time-sensitive: Emergency actions shouldn’t wait for approval

Use judgment. The goal is safety, not friction.

Measuring Success

Track these metrics:

Approval rate: What % of paused actions get approved? (Target: > 90%)
Rejection reasons: Why do users reject? (Improve agent accuracy)
Approval time: How long do users take to decide? (Target: < 5 seconds)
False positives: How often do we pause unnecessarily? (Target: < 10%)

If approval rate is low, your agent is suggesting wrong actions. If approval time is high, your approval UX is confusing.

Conclusion

Human-in-the-loop isn’t optional for production voice agents handling critical actions. The Agents SDK makes it straightforward:

Automatic pause on critical actions
Perfect state preservation
Clean resume after approval
Audit trails built in

Voice makes approval feel natural. Users don’t mind saying “yes” when they can hear exactly what’s about to happen.

Result: Voice agents that are both autonomous (fast) and safe (supervised).

Implementation Guide:

Mark critical tools with requiresApproval: true
Handle approval_required events
Show clear approval context to users
Support both voice and UI confirmation
Log all approval decisions for audit

The SDK handles pause/resume state management automatically.

Links:

Next: Explore how the Agents SDK runtime manages agent lifecycle, state persistence, and tool execution coordination automatically.