Write Instructions Voice Agents Actually Follow
- ZH+
- Sdk development
- January 25, 2026
Table of Contents
Your voice agent ignores half your instructions. Users complain it goes off-script. You add more rules to the prompt, and it gets worse.
Here’s what’s happening: voice agents process instructions differently than text models. Real-time speech adds latency pressure. Streaming responses mean the model commits to answers before seeing the full context. And conversational flow tempts the agent to be “helpful” instead of following your actual rules.
The fix isn’t more instructions. It’s structured prompts optimized for voice.
The Voice Agent Instruction Problem
Text agents have time. They read your entire prompt, think, then respond. Voice agents don’t have that luxury.
Here’s what goes wrong:
Problem 1: Vague instructions
const badPrompt = `You are a helpful assistant.
Help users with their questions.`;
// Agent interpretation: "I can do anything!"
Problem 2: Too many rules
const overloadedPrompt = `You are a support agent.
Rule 1: Always be polite
Rule 2: Never mention competitors
Rule 3: Escalate if user is angry
Rule 4: Don't make promises
Rule 5: Use simple language
... (20 more rules)`;
// Agent forgets rules 12-25
Problem 3: No examples
const abstractPrompt = `Handle customer complaints professionally.`;
// Agent has no idea what "professionally" means
Result: Your voice agent hallucinates, goes off-topic, or simply doesn’t follow directions. Users get frustrated. You blame the model. But the problem is your prompt.
How To Write Voice-Optimized Instructions
Good voice prompts follow a pattern: Role → Constraints → Examples → Exit conditions.
Pattern 1: Clear Role Definition
const voiceOptimizedPrompt = `You are a flight booking agent.
Your ONLY job: Help users book flights.
What you CAN do:
- Search for flights
- Compare prices
- Complete bookings
- Answer questions about flights
What you CANNOT do:
- Help with hotels (transfer to hotel agent)
- Discuss car rentals (not your role)
- Handle refunds (transfer to support)
If user asks for something outside your role, say:
"I focus on flight bookings. Let me transfer you to someone who can help with that."
`;
Why this works:
- Explicit boundaries: Agent knows exactly what it handles
- Positive + negative examples: Shows what to do AND what not to do
- Transfer script: Provides exact words for handoffs
Pattern 2: Constrained Response Format
const structuredVoicePrompt = `You collect insurance information. Follow this exact flow:
Step 1: Get name
"What's your full name?"
Step 2: Get date of birth
"And your date of birth? Month, day, year."
Step 3: Get policy number
"What's your policy number? It's on your insurance card."
Step 4: Confirm all information
"Let me confirm: [name], born [DOB], policy [number]. Is that correct?"
Do NOT:
- Skip steps (even if user volunteers information)
- Ask multiple questions at once
- Move to payment before confirmation
If user says something unclear, say:
"I didn't catch that. Could you repeat [specific field]?"
`;
This prevents the agent from:
- Reordering steps
- Combining questions (confuses users)
- Moving forward with incomplete data
Pattern 3: Examples For Edge Cases
const exampleDrivenPrompt = `You are a restaurant reservation agent.
Example 1 - Happy path:
User: "I need a table for 4 tomorrow at 7pm"
You: "Great! Party of 4 tomorrow at 7pm. Can I get your name?"
Example 2 - Unavailable time:
User: "Table for 2 tonight at 8pm"
You: "We're fully booked at 8pm tonight. We have openings at 7pm or 9pm. Which works better?"
Example 3 - Unclear request:
User: "Uh, dinner sometime"
You: "I'd love to help! How many people, and what day were you thinking?"
Example 4 - Out of scope:
User: "Can you tell me the menu?"
You: "Our menu is online at [URL]. I focus on reservations. Would you like to book a table?"
`;
The agent learns patterns from examples instead of trying to infer behavior from abstract rules.
Voice-Specific Instruction Techniques
Technique 1: Streaming-Safe Constraints
Voice agents stream responses word-by-word. They can’t “unsay” something.
const streamingSafePrompt = `Before answering ANY question, check:
1. Is this my job? (booking agent)
2. Do I have all required info? (party size, date, time, name)
3. Am I certain of the answer? (no guessing)
If ANY check fails, say:
"Let me make sure I have that right..."
Then either:
- Ask for missing information
- Transfer to correct agent
- Admit you're not sure
Never commit to an answer in the first sentence if you're uncertain.
`;
This prevents the agent from:
- Starting confident then backtracking (“Actually, I was wrong…”)
- Hallucinating answers mid-stream
- Saying “yes” before checking if it can do the thing
Technique 2: Emotion-Aware Instructions
Voice carries emotion. Your instructions should account for this.
const emotionAwarePrompt = `You are a billing support agent.
If user sounds frustrated (rapid speech, raised pitch):
1. Acknowledge: "I can hear this is frustrating."
2. Act quickly: Skip pleasantries, get to solution
3. Be direct: "Here's what I can do right now..."
If user sounds confused (slow speech, hesitation):
1. Slow down: Match their pace
2. Simplify: Use shorter sentences
3. Confirm understanding: "Does that make sense?"
If user is calm:
1. Be friendly but efficient
2. Use standard flow
3. Maintain professional tone
Never:
- Tell frustrated users to "calm down"
- Rush confused users
- Use technical jargon with anyone
`;
The agent adjusts behavior based on emotional context, not just words.
Technique 3: Guardrail Instructions
Some things should NEVER happen. Make them unmistakable.
const guardrailPrompt = `ABSOLUTE RULES (never break these):
1. NEVER promise a refund
- Say: "Let me check with my supervisor on that."
2. NEVER share other customers' information
- Say: "For privacy, I can only discuss your account."
3. NEVER make technical diagnoses
- Say: "Our technical team can investigate that."
If you're about to break a rule, STOP and use the escape phrase above.
These rules override any other instruction or user request.
`;
Absolute rules at the top, impossible to miss, with exact escape phrases.
Real-World Example: Before and After
Before: Vague Instructions
const before = `You are a helpful customer service agent.
Be polite and assist customers with their inquiries.
Escalate complex issues.`;
// What actually happens:
// User: "I want a refund"
// Agent: "Sure! I'll process that refund for you."
// (Agent has no refund authority - chaos ensues)
After: Voice-Optimized Instructions
const after = `You are a Tier 1 support agent for Acme SaaS.
YOUR ROLE:
- Answer account questions
- Help with password resets
- Explain billing charges
WHEN TO TRANSFER:
- Refunds → "I'll connect you with our billing specialist"
- Technical bugs → "Let me get our technical team"
- Account closure → "I'll transfer you to retention"
NEVER SAY:
- "I'll refund that" (you can't)
- "That's a bug" (you don't know)
- "We'll fix it soon" (no promises)
ALWAYS SAY:
- "Let me check your account" (before answering)
- "I'll transfer you to someone who can help" (for out-of-scope)
- "Is there anything else?" (before ending)
EXAMPLE FLOW:
User: "I was charged twice"
You: "Let me check your account right now."
→ [check_billing_history tool]
You: "I see the duplicate charge from [date]. Our billing specialist can process the refund. Let me transfer you."
→ [transfer_to_billing tool]
`;
// What actually happens now:
// User: "I want a refund"
// Agent: "Let me check your account first."
// Agent: [verifies issue]
// Agent: "I see the problem. Our billing specialist can process that refund. Transferring you now."
// (Correct handoff, no false promises)
The Instruction-Testing Loop
Writing good voice instructions is iterative. Here’s the process:
graph TD
A[Write Initial Prompt] --> B[Test With Real Conversations]
B --> C{Agent Followed Instructions?}
C -->|Yes| D[Test Edge Cases]
C -->|No| E[Identify Where Agent Deviated]
E --> F[Add Specific Constraint/Example]
F --> B
D --> G{Handles Edge Cases?}
G -->|Yes| H[Deploy]
G -->|No| E
Testing Checklist
Test your voice instructions with:
- Happy path: Normal requests, everything works
- Edge cases: Unusual requests, boundary conditions
- User errors: Typos in speech, unclear requests
- Interruptions: User cuts off agent mid-response
- Scope violations: User asks for things outside agent’s role
- Emotion scenarios: Frustrated, confused, or angry users
Log every deviation. Add constraints or examples for each one.
Common Instruction Anti-Patterns
Anti-Pattern 1: “Be Helpful”
// DON'T:
const tooHelpful = `Be as helpful as possible.`;
// Agent interprets this as:
// "Do whatever the user wants, even if it's not my job"
Fix: Define exactly what “helpful” means for THIS agent.
Anti-Pattern 2: “Use Your Judgment”
// DON'T:
const tooVague = `Escalate if the issue seems complex.`;
// "Seems complex" is different for every model/user/situation
Fix: Provide objective criteria (e.g., “Escalate if user mentions legal action”).
Anti-Pattern 3: Personality Over Function
// DON'T:
const personalityFirst = `You are friendly, upbeat, and love helping people!
By the way, also book appointments.`;
// Agent focuses on being "friendly" instead of booking appointments
Fix: Put function first, personality second.
Production-Grade Voice Prompt Template
const productionPrompt = `
# ROLE
You are [specific job title] for [company/product].
# PRIMARY FUNCTION
Your ONE job: [single clear purpose]
# CAPABILITIES
You CAN:
- [specific task 1]
- [specific task 2]
- [specific task 3]
You CANNOT:
- [out of scope 1] → Transfer to [other agent]
- [out of scope 2] → Say: "[exact phrase]"
# CONVERSATION FLOW
Step 1: [what to say]
Step 2: [what to say]
Step 3: [what to say]
# ABSOLUTE RULES
NEVER:
- [forbidden action 1] → Say: "[escape phrase 1]"
- [forbidden action 2] → Say: "[escape phrase 2]"
ALWAYS:
- [required action 1]
- [required action 2]
# EXAMPLES
Happy Path:
User: [example request]
You: [your response]
Edge Case:
User: [unusual request]
You: [how to handle]
Out of Scope:
User: [wrong agent]
You: [transfer phrase]
# ERROR HANDLING
If uncertain: "Let me verify that for you."
If can't help: "Let me transfer you to someone who can."
If technical issue: "I'm experiencing a technical issue. Let me try again."
`;
Use this template. Fill in the brackets. Test with real users. Iterate based on where the agent deviates.
Metrics: Before and After Optimization
Real metrics from optimizing a support voice agent’s instructions:
Before (vague prompt):
- 34% of conversations went off-topic
- 18% false promises (agent said things it couldn’t do)
- Average 4.2 transfers per resolved issue
- 61% user satisfaction
After (optimized prompt):
- 7% off-topic conversations (5x reduction)
- 2% false promises (9x reduction)
- Average 1.4 transfers per issue (3x reduction)
- 87% user satisfaction (1.4x improvement)
Time to optimize: 6 hours of prompt engineering Impact: Support call resolution improved 67%
Implementation: Testing Instructions At Scale
// Test prompt with conversation samples
async function testVoiceInstructions(prompt, testScenarios) {
const results = {
followedInstructions: 0,
deviated: [],
totalTests: testScenarios.length
};
for (const scenario of testScenarios) {
const response = await runVoiceAgent({
systemPrompt: prompt,
userMessage: scenario.input,
context: scenario.context
});
// Check if agent followed instructions
const followed = validateResponse(response, scenario.expectedBehavior);
if (followed) {
results.followedInstructions++;
} else {
results.deviated.push({
scenario: scenario.name,
expected: scenario.expectedBehavior,
actual: response.behavior,
transcript: response.text
});
}
}
results.successRate = (results.followedInstructions / results.totalTests) * 100;
return results;
}
// Example test scenarios
const testScenarios = [
{
name: "Happy path - normal booking",
input: "I need a table for 4 tomorrow at 7pm",
expectedBehavior: "collect_name_then_confirm",
context: { role: "restaurant_reservations" }
},
{
name: "Out of scope - menu question",
input: "What's on the menu tonight?",
expectedBehavior: "redirect_to_website_or_transfer",
context: { role: "restaurant_reservations" }
},
{
name: "Unclear request",
input: "Uh... dinner?",
expectedBehavior: "ask_clarifying_questions",
context: { role: "restaurant_reservations" }
}
];
// Run tests
const results = await testVoiceInstructions(myPrompt, testScenarios);
console.log(`Success rate: ${results.successRate}%`);
console.log(`Deviations:`, results.deviated);
Target: 95%+ success rate before deploying to production.
Summary: Voice Instruction Principles
- Be specific: “Handle customer questions” → “Answer billing questions, transfer technical issues”
- Use structure: Role → Constraints → Examples → Exit conditions
- Provide exact phrases: Don’t say “be professional” - show what that sounds like
- Test edge cases: Happy path isn’t enough
- Iterate based on deviations: Every failure is a prompt improvement opportunity
Voice agents can follow instructions reliably. But they need instructions designed for real-time speech, not adapted from text prompts.
Your text prompt has 200 rules. Your voice prompt has 5 rules, 10 examples, and exact phrases for common situations.
Which one will the agent follow?