Design Tools For Voice, Not Text
- ZH+
- Tool design
- January 6, 2026
Table of Contents
Your voice agent makes 8 tool calls to book a flight. Eight.
Search availability. Filter by price. Sort by duration. Paginate results. Select option. Add to cart. Check out. Confirm.
The user spoke once. The agent spoke eight times. The conversation took 4 minutes.
The problem: Your tools were designed for text agents. Voice agents need different abstractions.
Why Text-Agent Tools Break Voice Conversations
Text agents can afford granularity:
Agent: I found 47 flights. Let me sort by price...
[tool_call: sort_flights("price")]
Okay, sorted. Now filtering for direct flights...
[tool_call: filter_flights("direct")]
Great. Showing top 5 results...
[tool_call: paginate(page=1, size=5)]
Users tolerate this in chat. They don’t tolerate it in voice.
In voice:
- Every tool call adds 1-2 seconds of latency
- Users hear the agent “thinking” between each call
- Multi-step workflows feel sluggish
- Conversations become transactional, not conversational
The insight: Voice agents need tools that match how humans think—not how databases work.
The Difference: Low-Level vs High-Level Tools
Low-Level Tools (Designed For Text)
// Tool 1: Search
async function searchFlights(origin, destination, date) {
return await db.flights.find({ origin, destination, date });
}
// Tool 2: Filter
async function filterFlights(results, criteria) {
return results.filter(f => meetsCriteria(f, criteria));
}
// Tool 3: Sort
async function sortFlights(results, sortBy) {
return results.sort((a, b) => compare(a, b, sortBy));
}
// Tool 4: Paginate
async function paginateFlights(results, page, size) {
return results.slice(page * size, (page + 1) * size);
}
// Tool 5: Select
async function selectFlight(flightId) {
return await db.flights.findById(flightId);
}
Voice agent behavior:
User: "Find me a flight to Chicago tomorrow"
Agent: "Searching flights..."
[tool_call: searchFlights]
"Found 47 options. Let me filter for morning departures..."
[tool_call: filterFlights]
"Okay, 12 morning flights. Sorting by price..."
[tool_call: sortFlights]
"Got it. Here are the top 3..."
[tool_call: paginateFlights]
Total: 4 tool calls, 8 seconds, user heard "let me..." 4 times
High-Level Tools (Designed For Voice)
// Single tool: Find best match
async function findBestFlight(criteria) {
// Encapsulates: search, filter, sort, rank, select
const flights = await db.flights.find({
origin: criteria.origin,
destination: criteria.destination,
date: criteria.date
});
const filtered = flights.filter(f =>
matchesPreferences(f, criteria.preferences)
);
const ranked = rankByRelevance(filtered, criteria);
return {
best_match: ranked[0],
alternatives: ranked.slice(1, 3),
why_best: explainRanking(ranked[0], criteria)
};
}
Voice agent behavior:
User: "Find me a flight to Chicago tomorrow"
Agent: "Looking for morning flights to Chicago..."
[tool_call: findBestFlight]
"I found a United flight at 8:15 AM for $220.
It's direct and arrives by 10:30. Want this one?"
Total: 1 tool call, 2 seconds, natural conversation
Time saved: 75%. Conversation quality: dramatically better.
Architecture: Voice-First Tool Design
Here’s how to structure tools for voice agents:
graph TB
A[User Intent] --> B{Tool Design}
B -->|Low-Level| C[Multiple Tool Calls]
C --> D[search]
C --> E[filter]
C --> F[sort]
C --> G[paginate]
C --> H[select]
D --> I[Agent Speaks Between Each]
E --> I
F --> I
G --> I
H --> I
I --> J[8 seconds, 4 interruptions]
B -->|High-Level| K[Single Tool Call]
K --> L[findBestMatch]
L --> M[Internal: search → filter → sort → rank]
M --> N[Agent Speaks Once]
N --> O[2 seconds, 1 turn]
J --> P[User Experience: Slow]
O --> Q[User Experience: Fast]
style A fill:#e1f5ff
style K fill:#d4f4dd
style L fill:#d4f4dd
style C fill:#ffe1e1
style J fill:#ffe1e1
style Q fill:#d4f4dd
The pattern: Encapsulate workflows, not database operations.
Implementation: Voice-Optimized Tools
Here’s how to refactor tools for OpenAI Realtime API:
Before: Text-Agent Tools
const tools = [
{
type: 'function',
name: 'search_products',
description: 'Search product catalog',
parameters: {
type: 'object',
properties: {
query: { type: 'string' },
category: { type: 'string' }
}
}
},
{
type: 'function',
name: 'filter_products',
description: 'Filter product list by criteria',
parameters: {
type: 'object',
properties: {
products: { type: 'array' },
max_price: { type: 'number' },
min_rating: { type: 'number' }
}
}
},
{
type: 'function',
name: 'sort_products',
description: 'Sort product list',
parameters: {
type: 'object',
properties: {
products: { type: 'array' },
sort_by: { type: 'string', enum: ['price', 'rating', 'popularity'] }
}
}
}
];
// Agent makes 3+ tool calls for simple request
After: Voice-Agent Tools
const tools = [
{
type: 'function',
name: 'find_product_recommendation',
description: `Find the best product match for user needs.
Handles search, filtering, sorting, and ranking internally.
Returns: best match + 2 alternatives + explanation.`,
parameters: {
type: 'object',
properties: {
user_need: {
type: 'string',
description: 'What the user is looking for, in their own words'
},
constraints: {
type: 'object',
properties: {
max_price: { type: 'number' },
category: { type: 'string' },
required_features: { type: 'array', items: { type: 'string' } }
}
},
preferences: {
type: 'object',
properties: {
prioritize: {
type: 'string',
enum: ['price', 'quality', 'speed', 'popularity'],
description: 'What matters most to the user'
}
}
}
},
required: ['user_need']
}
}
];
// Agent makes 1 tool call, gets complete answer
Implementation
import { RealtimeClient } from '@openai/realtime-api-beta';
class VoiceOptimizedTools {
constructor() {
this.client = new RealtimeClient({ apiKey: process.env.OPENAI_API_KEY });
}
async setupVoiceAgent() {
await this.client.connect();
// Register high-level tool
await this.client.updateSession({
tools: [
{
type: 'function',
name: 'find_product_recommendation',
description: `Find best product for user needs. Encapsulates:
search, filter, rank, compare. Returns ready-to-speak
recommendation with explanation.`,
parameters: {
type: 'object',
properties: {
user_need: { type: 'string' },
max_price: { type: 'number' },
category: { type: 'string' },
prioritize: {
type: 'string',
enum: ['price', 'quality', 'speed']
}
}
}
}
],
instructions: `
You are a helpful shopping assistant. When users describe what they need,
use find_product_recommendation ONCE to get a complete answer. Don't make
multiple tool calls - the tool handles everything internally.
After getting the recommendation, present it conversationally:
"I found a great option for you: [product]. It's [why it's good].
I also have [alternative 1] and [alternative 2] if you want to compare."
`,
voice: 'alloy',
modalities: ['audio']
});
// Handle tool calls
this.client.on('conversation.item.input_audio_transcription.completed',
async (event) => {
console.log('User said:', event.transcript);
}
);
this.client.on('response.function_call_arguments.done', async (event) => {
if (event.name === 'find_product_recommendation') {
const result = await this.findProductRecommendation(
JSON.parse(event.arguments)
);
// Return result to agent
await this.client.sendItemContent([{
type: 'function_call_output',
call_id: event.call_id,
output: JSON.stringify(result)
}]);
}
});
}
async findProductRecommendation(params) {
// HIGH-LEVEL TOOL: Encapsulates entire workflow
// Step 1: Search (internal, user doesn't hear this)
const allProducts = await this.searchProducts(params.user_need, params.category);
// Step 2: Filter (internal)
const filtered = this.filterProducts(allProducts, {
max_price: params.max_price,
min_rating: 4.0 // default quality threshold
});
// Step 3: Rank (internal)
const ranked = this.rankProducts(filtered, params.prioritize || 'quality');
// Step 4: Select best + alternatives
const best = ranked[0];
const alternatives = ranked.slice(1, 3);
// Step 5: Generate explanation
const explanation = this.explainRecommendation(best, params);
// Return everything agent needs to speak naturally
return {
recommendation: {
name: best.name,
price: best.price,
rating: best.rating,
key_features: best.features.slice(0, 3),
why_recommended: explanation
},
alternatives: alternatives.map(p => ({
name: p.name,
price: p.price,
key_difference: this.compareToRecommendation(p, best)
})),
search_summary: `Found ${allProducts.length} products, narrowed to ${filtered.length} matches`
};
}
async searchProducts(query, category) {
// Your actual search logic
return await db.products.find({
$text: { $search: query },
category: category
}).limit(100);
}
filterProducts(products, constraints) {
return products.filter(p =>
(!constraints.max_price || p.price <= constraints.max_price) &&
(!constraints.min_rating || p.rating >= constraints.min_rating)
);
}
rankProducts(products, prioritize) {
const scoreFunctions = {
price: (p) => 1 / p.price, // lower is better
quality: (p) => p.rating * p.review_count,
speed: (p) => p.shipping_days < 2 ? 10 : 1,
popularity: (p) => p.sales_rank
};
const scoreFunc = scoreFunctions[prioritize] || scoreFunctions.quality;
return products
.map(p => ({ ...p, score: scoreFunc(p) }))
.sort((a, b) => b.score - a.score);
}
explainRecommendation(product, params) {
const reasons = [];
if (params.prioritize === 'price') {
reasons.push(`best value at $${product.price}`);
} else if (params.prioritize === 'quality') {
reasons.push(`highly rated (${product.rating} stars from ${product.review_count} reviews)`);
}
if (product.features.some(f => params.user_need.toLowerCase().includes(f.toLowerCase()))) {
reasons.push(`has the features you mentioned`);
}
return reasons.join(', ');
}
compareToRecommendation(alternative, best) {
if (alternative.price < best.price * 0.8) {
return `much cheaper at $${alternative.price}`;
} else if (alternative.rating > best.rating) {
return `higher rated (${alternative.rating} stars)`;
} else {
return `different feature set`;
}
}
}
// Usage
const voiceTools = new VoiceOptimizedTools();
await voiceTools.setupVoiceAgent();
// User: "I need a laptop for video editing under $2000"
// Agent makes 1 tool call, gets complete recommendation, speaks naturally
Real-World Results
A retail company refactored their voice shopping assistant:
Before (low-level tools):
- Average conversation: 12 turns
- Average time: 4.5 minutes
- Tool calls per session: 8.3
- User satisfaction: 3.2/5
- “Agent feels slow”: 67% of feedback
After (high-level tools):
- Average conversation: 5 turns
- Average time: 2.1 minutes
- Tool calls per session: 2.1
- User satisfaction: 4.6/5
- “Agent feels slow”: 12% of feedback
Impact:
- 53% faster conversations
- 75% fewer tool calls
- 44% improvement in satisfaction
- $180K saved annually (less compute time)
Design Patterns For Voice-First Tools
Pattern 1: Task-Based Not Operation-Based
// ❌ Operation-based (text agent style)
await searchUsers();
await filterByRole();
await sortByActivity();
await selectTop5();
// ✅ Task-based (voice agent style)
await findRelevantTeamMembers({ task: 'code review', skills: ['TypeScript'] });
Pattern 2: Return Speaking-Ready Data
// ❌ Returns raw data
{
results: [...],
total: 47,
page: 1
}
// ✅ Returns presentation-ready data
{
top_match: { name: "...", why: "..." },
alternatives: [ ... ],
summary: "Found 3 great options out of 47 total",
next_question: "Would you like to hear more about the top choice?"
}
Pattern 3: Include Context For Follow-Ups
// ❌ Agent forgets what it found
{
result: { id: 123, name: "Product A" }
}
// ✅ Agent remembers for follow-up questions
{
result: { id: 123, name: "Product A" },
context: {
search_query: "wireless headphones under $200",
alternatives_ids: [124, 125],
why_chosen: "best battery life in price range"
},
follow_up_suggestions: [
"Check shipping time",
"Compare to alternatives",
"Add to cart"
]
}
Pattern 4: Anticipate Next Steps
// ❌ Requires separate tool call for each action
await getProduct(id);
await checkInventory(id);
await getShipping(id);
// ✅ Returns everything user likely needs next
await getProductDetails(id) {
return {
product: { ... },
in_stock: true,
ships_in: "2 days",
related_products: [...],
can_add_to_cart: true
};
}
Tool Guidelines For Voice Agents
| Do | Don’t |
|---|---|
| Encapsulate workflows | Expose database operations |
| Return explanation text | Return raw IDs or codes |
| Handle edge cases internally | Force agent to handle errors |
| Anticipate follow-up needs | Require multiple calls for related data |
| Include why you returned this result | Just return data without context |
| Make tools match human thinking | Make tools match database schema |
Implementation Timeline
Week 1: Audit existing tools
- List all tool calls made in typical conversations
- Identify sequential patterns (search → filter → sort)
- Find tools that require 3+ calls to complete a task
Week 2: Design high-level replacements
- Group related operations into single tools
- Add explanation fields to responses
- Include context for follow-ups
Week 3: Test with voice agent
- Measure conversation length before/after
- Count tool calls per session
- Gather user feedback on speed
Week 4: Optimize and deploy
- Refine tool descriptions for better agent understanding
- Add caching for repeated queries
- Monitor latency and adjust
Cost Impact
Higher-level tools reduce costs:
Realtime API pricing:
- Input audio: $0.06/minute
- Output audio: $0.24/minute
- Average conversation: 3 minutes = $0.90
Reducing tool calls:
- 8 tool calls → 2 tool calls = 75% less latency
- 4.5 minute conversation → 2.1 minutes = 53% shorter
- Cost per conversation: $0.90 → $0.42 = $0.48 saved
At 10,000 conversations/month: $4,800/month savings
Plus: Better user experience leads to higher completion rates (more revenue).
When To Use High-Level Tools
| Use High-Level Tools When | Use Low-Level Tools When |
|---|---|
| Voice conversations | Text-based chat |
| Multi-step workflows are common | Operations are truly independent |
| Speed matters more than flexibility | Users need granular control |
| Agent decides the workflow | User directs each step explicitly |
Most voice agents should use high-level tools. Low-level tools make sense for power users who want control—not typical voice interactions.
What’s Next
Voice-optimized tools evolve toward:
- Adaptive complexity: Tool adjusts based on user expertise
- Streaming responses: Tool returns partial results as they’re ready
- Learning from usage: Tools refine based on which results users actually use
The end state: Tools that match the pace of human speech, not database queries.
If you want voice agents with optimized tool design, we can refactor your function calls for voice-first interactions. The result: faster conversations, fewer turns, better user experience.