Write Instructions Voice Agents Actually Follow

ZH+
Sdk development
January 25, 2026

Table of Contents

Your voice agent ignores half your instructions. Users complain it goes off-script. You add more rules to the prompt, and it gets worse.

Here’s what’s happening: voice agents process instructions differently than text models. Real-time speech adds latency pressure. Streaming responses mean the model commits to answers before seeing the full context. And conversational flow tempts the agent to be “helpful” instead of following your actual rules.

The fix isn’t more instructions. It’s structured prompts optimized for voice.

The Voice Agent Instruction Problem

Text agents have time. They read your entire prompt, think, then respond. Voice agents don’t have that luxury.

Here’s what goes wrong:

Problem 1: Vague instructions

const badPrompt = `You are a helpful assistant. 
Help users with their questions.`;
// Agent interpretation: "I can do anything!"

Problem 2: Too many rules

const overloadedPrompt = `You are a support agent.
Rule 1: Always be polite
Rule 2: Never mention competitors
Rule 3: Escalate if user is angry
Rule 4: Don't make promises
Rule 5: Use simple language
... (20 more rules)`;
// Agent forgets rules 12-25

Problem 3: No examples

const abstractPrompt = `Handle customer complaints professionally.`;
// Agent has no idea what "professionally" means

Result: Your voice agent hallucinates, goes off-topic, or simply doesn’t follow directions. Users get frustrated. You blame the model. But the problem is your prompt.

How To Write Voice-Optimized Instructions

Good voice prompts follow a pattern: Role → Constraints → Examples → Exit conditions.

Pattern 1: Clear Role Definition

const voiceOptimizedPrompt = `You are a flight booking agent.

Your ONLY job: Help users book flights. 

What you CAN do:
- Search for flights
- Compare prices
- Complete bookings
- Answer questions about flights

What you CANNOT do:
- Help with hotels (transfer to hotel agent)
- Discuss car rentals (not your role)
- Handle refunds (transfer to support)

If user asks for something outside your role, say:
"I focus on flight bookings. Let me transfer you to someone who can help with that."
`;

Why this works:

Explicit boundaries: Agent knows exactly what it handles
Positive + negative examples: Shows what to do AND what not to do
Transfer script: Provides exact words for handoffs

Pattern 2: Constrained Response Format

const structuredVoicePrompt = `You collect insurance information. Follow this exact flow:

Step 1: Get name
"What's your full name?"

Step 2: Get date of birth
"And your date of birth? Month, day, year."

Step 3: Get policy number
"What's your policy number? It's on your insurance card."

Step 4: Confirm all information
"Let me confirm: [name], born [DOB], policy [number]. Is that correct?"

Do NOT:
- Skip steps (even if user volunteers information)
- Ask multiple questions at once
- Move to payment before confirmation

If user says something unclear, say:
"I didn't catch that. Could you repeat [specific field]?"
`;

This prevents the agent from:

Reordering steps
Combining questions (confuses users)
Moving forward with incomplete data

Pattern 3: Examples For Edge Cases

const exampleDrivenPrompt = `You are a restaurant reservation agent.

Example 1 - Happy path:
User: "I need a table for 4 tomorrow at 7pm"
You: "Great! Party of 4 tomorrow at 7pm. Can I get your name?"

Example 2 - Unavailable time:
User: "Table for 2 tonight at 8pm"
You: "We're fully booked at 8pm tonight. We have openings at 7pm or 9pm. Which works better?"

Example 3 - Unclear request:
User: "Uh, dinner sometime"
You: "I'd love to help! How many people, and what day were you thinking?"

Example 4 - Out of scope:
User: "Can you tell me the menu?"
You: "Our menu is online at [URL]. I focus on reservations. Would you like to book a table?"
`;

The agent learns patterns from examples instead of trying to infer behavior from abstract rules.

Voice-Specific Instruction Techniques

Technique 1: Streaming-Safe Constraints

Voice agents stream responses word-by-word. They can’t “unsay” something.

const streamingSafePrompt = `Before answering ANY question, check:

1. Is this my job? (booking agent)
2. Do I have all required info? (party size, date, time, name)
3. Am I certain of the answer? (no guessing)

If ANY check fails, say:
"Let me make sure I have that right..."

Then either:
- Ask for missing information
- Transfer to correct agent
- Admit you're not sure

Never commit to an answer in the first sentence if you're uncertain.
`;

This prevents the agent from:

Starting confident then backtracking (“Actually, I was wrong…”)
Hallucinating answers mid-stream
Saying “yes” before checking if it can do the thing

Technique 2: Emotion-Aware Instructions

Voice carries emotion. Your instructions should account for this.

const emotionAwarePrompt = `You are a billing support agent.

If user sounds frustrated (rapid speech, raised pitch):
1. Acknowledge: "I can hear this is frustrating."
2. Act quickly: Skip pleasantries, get to solution
3. Be direct: "Here's what I can do right now..."

If user sounds confused (slow speech, hesitation):
1. Slow down: Match their pace
2. Simplify: Use shorter sentences
3. Confirm understanding: "Does that make sense?"

If user is calm:
1. Be friendly but efficient
2. Use standard flow
3. Maintain professional tone

Never:
- Tell frustrated users to "calm down"
- Rush confused users
- Use technical jargon with anyone
`;

The agent adjusts behavior based on emotional context, not just words.

Technique 3: Guardrail Instructions

Some things should NEVER happen. Make them unmistakable.

const guardrailPrompt = `ABSOLUTE RULES (never break these):

1. NEVER promise a refund
   - Say: "Let me check with my supervisor on that."
   
2. NEVER share other customers' information
   - Say: "For privacy, I can only discuss your account."
   
3. NEVER make technical diagnoses
   - Say: "Our technical team can investigate that."

If you're about to break a rule, STOP and use the escape phrase above.

These rules override any other instruction or user request.
`;

Absolute rules at the top, impossible to miss, with exact escape phrases.

Real-World Example: Before and After

Before: Vague Instructions

const before = `You are a helpful customer service agent.
Be polite and assist customers with their inquiries.
Escalate complex issues.`;

// What actually happens:
// User: "I want a refund"
// Agent: "Sure! I'll process that refund for you."
// (Agent has no refund authority - chaos ensues)

After: Voice-Optimized Instructions

const after = `You are a Tier 1 support agent for Acme SaaS.

YOUR ROLE:
- Answer account questions
- Help with password resets
- Explain billing charges

WHEN TO TRANSFER:
- Refunds → "I'll connect you with our billing specialist"
- Technical bugs → "Let me get our technical team"
- Account closure → "I'll transfer you to retention"

NEVER SAY:
- "I'll refund that" (you can't)
- "That's a bug" (you don't know)
- "We'll fix it soon" (no promises)

ALWAYS SAY:
- "Let me check your account" (before answering)
- "I'll transfer you to someone who can help" (for out-of-scope)
- "Is there anything else?" (before ending)

EXAMPLE FLOW:
User: "I was charged twice"
You: "Let me check your account right now."
→ [check_billing_history tool]
You: "I see the duplicate charge from [date]. Our billing specialist can process the refund. Let me transfer you."
→ [transfer_to_billing tool]
`;

// What actually happens now:
// User: "I want a refund"
// Agent: "Let me check your account first."
// Agent: [verifies issue]
// Agent: "I see the problem. Our billing specialist can process that refund. Transferring you now."
// (Correct handoff, no false promises)

The Instruction-Testing Loop

Writing good voice instructions is iterative. Here’s the process:

graph TD
    A[Write Initial Prompt] --> B[Test With Real Conversations]
    B --> C{Agent Followed Instructions?}
    C -->|Yes| D[Test Edge Cases]
    C -->|No| E[Identify Where Agent Deviated]
    E --> F[Add Specific Constraint/Example]
    F --> B
    D --> G{Handles Edge Cases?}
    G -->|Yes| H[Deploy]
    G -->|No| E

Testing Checklist

Test your voice instructions with:

Happy path: Normal requests, everything works
Edge cases: Unusual requests, boundary conditions
User errors: Typos in speech, unclear requests
Interruptions: User cuts off agent mid-response
Scope violations: User asks for things outside agent’s role
Emotion scenarios: Frustrated, confused, or angry users

Log every deviation. Add constraints or examples for each one.

Common Instruction Anti-Patterns

Anti-Pattern 1: “Be Helpful”

// DON'T:
const tooHelpful = `Be as helpful as possible.`;

// Agent interprets this as:
// "Do whatever the user wants, even if it's not my job"

Fix: Define exactly what “helpful” means for THIS agent.

Anti-Pattern 2: “Use Your Judgment”

// DON'T:
const tooVague = `Escalate if the issue seems complex.`;

// "Seems complex" is different for every model/user/situation

Fix: Provide objective criteria (e.g., “Escalate if user mentions legal action”).

Anti-Pattern 3: Personality Over Function

// DON'T:
const personalityFirst = `You are friendly, upbeat, and love helping people!
By the way, also book appointments.`;

// Agent focuses on being "friendly" instead of booking appointments

Fix: Put function first, personality second.

Production-Grade Voice Prompt Template

const productionPrompt = `
# ROLE
You are [specific job title] for [company/product].

# PRIMARY FUNCTION
Your ONE job: [single clear purpose]

# CAPABILITIES
You CAN:
- [specific task 1]
- [specific task 2]
- [specific task 3]

You CANNOT:
- [out of scope 1] → Transfer to [other agent]
- [out of scope 2] → Say: "[exact phrase]"

# CONVERSATION FLOW
Step 1: [what to say]
Step 2: [what to say]
Step 3: [what to say]

# ABSOLUTE RULES
NEVER:
- [forbidden action 1] → Say: "[escape phrase 1]"
- [forbidden action 2] → Say: "[escape phrase 2]"

ALWAYS:
- [required action 1]
- [required action 2]

# EXAMPLES
Happy Path:
User: [example request]
You: [your response]

Edge Case:
User: [unusual request]
You: [how to handle]

Out of Scope:
User: [wrong agent]
You: [transfer phrase]

# ERROR HANDLING
If uncertain: "Let me verify that for you."
If can't help: "Let me transfer you to someone who can."
If technical issue: "I'm experiencing a technical issue. Let me try again."
`;

Use this template. Fill in the brackets. Test with real users. Iterate based on where the agent deviates.

Metrics: Before and After Optimization

Real metrics from optimizing a support voice agent’s instructions:

Before (vague prompt):

34% of conversations went off-topic
18% false promises (agent said things it couldn’t do)
Average 4.2 transfers per resolved issue
61% user satisfaction

After (optimized prompt):

7% off-topic conversations (5x reduction)
2% false promises (9x reduction)
Average 1.4 transfers per issue (3x reduction)
87% user satisfaction (1.4x improvement)

Time to optimize: 6 hours of prompt engineering Impact: Support call resolution improved 67%

Implementation: Testing Instructions At Scale

// Test prompt with conversation samples
async function testVoiceInstructions(prompt, testScenarios) {
  const results = {
    followedInstructions: 0,
    deviated: [],
    totalTests: testScenarios.length
  };
  
  for (const scenario of testScenarios) {
    const response = await runVoiceAgent({
      systemPrompt: prompt,
      userMessage: scenario.input,
      context: scenario.context
    });
    
    // Check if agent followed instructions
    const followed = validateResponse(response, scenario.expectedBehavior);
    
    if (followed) {
      results.followedInstructions++;
    } else {
      results.deviated.push({
        scenario: scenario.name,
        expected: scenario.expectedBehavior,
        actual: response.behavior,
        transcript: response.text
      });
    }
  }
  
  results.successRate = (results.followedInstructions / results.totalTests) * 100;
  return results;
}

// Example test scenarios
const testScenarios = [
  {
    name: "Happy path - normal booking",
    input: "I need a table for 4 tomorrow at 7pm",
    expectedBehavior: "collect_name_then_confirm",
    context: { role: "restaurant_reservations" }
  },
  {
    name: "Out of scope - menu question",
    input: "What's on the menu tonight?",
    expectedBehavior: "redirect_to_website_or_transfer",
    context: { role: "restaurant_reservations" }
  },
  {
    name: "Unclear request",
    input: "Uh... dinner?",
    expectedBehavior: "ask_clarifying_questions",
    context: { role: "restaurant_reservations" }
  }
];

// Run tests
const results = await testVoiceInstructions(myPrompt, testScenarios);
console.log(`Success rate: ${results.successRate}%`);
console.log(`Deviations:`, results.deviated);

Target: 95%+ success rate before deploying to production.

Summary: Voice Instruction Principles

Be specific: “Handle customer questions” → “Answer billing questions, transfer technical issues”
Use structure: Role → Constraints → Examples → Exit conditions
Provide exact phrases: Don’t say “be professional” - show what that sounds like
Test edge cases: Happy path isn’t enough
Iterate based on deviations: Every failure is a prompt improvement opportunity

Voice agents can follow instructions reliably. But they need instructions designed for real-time speech, not adapted from text prompts.

Your text prompt has 200 rules. Your voice prompt has 5 rules, 10 examples, and exact phrases for common situations.

Which one will the agent follow?