Call Any Business With A Voice Agent
- ZH+
- Business automation
- January 3, 2026
Table of Contents
Your dentist doesn’t have an API. Your local auto repair shop doesn’t either. Neither does the family-run restaurant down the street.
The problem: 80% of small businesses operate primarily over the phone. They have appointment systems, they take orders, they answer questions—but none of it is accessible programmatically.
The traditional solution: Hire someone to call them. Wait on hold. Navigate phone trees. Explain what you need. Get information back. It’s slow, expensive, doesn’t scale.
The voice agent solution: Let AI make the call for you.
The Long Tail Of Phone-Only Services
HTTP APIs cover the big players:
- OpenTable for restaurant reservations
- Calendly for appointment booking
- Stripe for payments
But the long tail lives on the phone:
- Local dentist offices (booking cleanings)
- Independent repair shops (checking wait times)
- Small restaurants (placing catering orders)
- Medical clinics (prescription refills)
- Service providers (scheduling home visits)
Collectively, these businesses handle billions of calls per year. None of it is automatable because there’s no API.
Until now.
Voice As The Universal API
Here’s the idea: If a business answers the phone, it has an API—the API is speaking.
// Instead of this:
const reservation = await fetch('https://api.restaurant.com/bookings', {
method: 'POST',
body: JSON.stringify({
party_size: 4,
date: '2025-04-15',
time: '19:00'
})
});
// You do this:
const reservation = await voiceAgent.call({
phone: '+1-555-RESTAURANT',
task: 'Book a table for 4 on April 15 at 7pm',
expected_duration: '90 seconds'
});
The voice agent:
- Calls the business
- Navigates the phone menu (if there is one)
- Speaks with staff (human or automated)
- Completes the task
- Returns structured data
It’s an API bridge. The business doesn’t need to change anything. You get programmatic access anyway.
Architecture: Voice Agent Phone Bridge
Here’s how it works with OpenAI Realtime API:
graph TB
A[Your Application] --> B[Voice Agent Controller]
B --> C{What's at phone number?}
C -->|Human Staff| D[Conversational Voice Agent]
C -->|IVR System| E[DTMF + Voice Agent]
C -->|Voicemail| F[Leave Structured Message]
D --> G[Twilio Outbound Call]
E --> G
F --> G
G --> H[Business Phone Number]
H --> I[Business Answers]
I --> J{Conversation Type}
J -->|Simple| K[Single Turn Exchange]
J -->|Complex| L[Multi-Turn Conversation]
K --> M[Extract Information]
L --> M
M --> N[Return Structured Data]
N --> B
B --> O[Your Application Receives Result]
style A fill:#e1f5ff
style G fill:#fff4e1
style M fill:#f0f0f0
style O fill:#d4f4dd
The voice agent adapts to what it encounters: human, IVR, voicemail, or something unexpected.
Implementation: Phone Bridge Voice Agent
Here’s production code using Twilio + OpenAI Realtime API:
import { RealtimeClient } from '@openai/realtime-api-beta';
import twilio from 'twilio';
class VoiceAgentPhoneBridge {
constructor() {
this.twilioClient = twilio(
process.env.TWILIO_ACCOUNT_SID,
process.env.TWILIO_AUTH_TOKEN
);
this.realtimeClient = new RealtimeClient({
apiKey: process.env.OPENAI_API_KEY
});
}
async callBusiness(phoneNumber, task, context = {}) {
console.log(`Calling ${phoneNumber} to: ${task}`);
// Start Twilio outbound call
const call = await this.twilioClient.calls.create({
from: process.env.TWILIO_PHONE_NUMBER,
to: phoneNumber,
url: `${process.env.BASE_URL}/voice-webhook`,
statusCallback: `${process.env.BASE_URL}/call-status`,
record: true, // Record for debugging
timeout: 60 // Hang up if no answer after 60 seconds
});
// Connect Realtime API to call audio
await this.realtimeClient.connect();
// Configure voice agent for this specific task
await this.realtimeClient.updateSession({
instructions: `
You are calling a business on behalf of a user.
Task: ${task}
Context: ${JSON.stringify(context)}
Guidelines:
1. Be polite and professional
2. State your purpose clearly and early
3. If you reach an IVR, navigate it efficiently
4. If you reach voicemail, leave a clear message with callback number
5. Extract all relevant information
6. Confirm details before ending call
7. End call politely once task is complete
Return structured data:
{
"success": true|false,
"result": <extracted information>,
"conversation_summary": "<brief summary>",
"call_duration": <seconds>
}
`,
voice: 'alloy',
modalities: ['audio'],
turn_detection: { type: 'server_vad', threshold: 0.5 }
});
// Track call progress
const callData = {
callSid: call.sid,
startTime: Date.now(),
transcript: [],
result: null
};
// Handle conversation
this.realtimeClient.on('conversation.item.completed', (event) => {
callData.transcript.push({
role: event.item.role,
content: event.item.formatted.transcript || event.item.formatted.text,
timestamp: Date.now() - callData.startTime
});
// Check if agent indicated task completion
if (event.item.role === 'assistant' &&
event.item.formatted.text.includes('TASK_COMPLETE')) {
this.endCall(call.sid, callData);
}
});
// Wait for call to complete
return new Promise((resolve) => {
this.twilioClient.calls(call.sid)
.on('completed', () => {
callData.endTime = Date.now();
resolve(this.extractStructuredResult(callData));
});
});
}
async extractStructuredResult(callData) {
// Use GPT-4 to extract structured data from conversation
const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'gpt-4',
messages: [{
role: 'user',
content: `
Extract structured information from this phone call transcript:
${JSON.stringify(callData.transcript, null, 2)}
Return JSON with:
{
"success": <did task complete successfully>,
"result": <key information extracted>,
"issues": <any problems encountered>,
"follow_up_needed": <true if human needs to call back>
}
`
}],
response_format: { type: 'json_object' }
})
});
const data = await response.json();
return {
...JSON.parse(data.choices[0].message.content),
call_duration: (callData.endTime - callData.startTime) / 1000,
transcript: callData.transcript
};
}
async endCall(callSid, callData) {
// Agent completed task, hang up politely
await this.realtimeClient.sendUserMessageContent([{
type: 'text',
text: 'Thank the person and end the call now.'
}]);
// Give agent 3 seconds to say goodbye, then force hang up
setTimeout(async () => {
await this.twilioClient.calls(callSid).update({ status: 'completed' });
}, 3000);
}
}
// Usage examples
const bridge = new VoiceAgentPhoneBridge();
// Example 1: Book dentist appointment
const appointment = await bridge.callBusiness(
'+1-555-DENTIST',
'Book a teeth cleaning appointment',
{
patient_name: 'John Smith',
preferred_dates: ['April 15', 'April 16', 'April 17'],
preferred_times: ['morning', 'early afternoon'],
insurance: 'Delta Dental'
}
);
console.log('Appointment result:', appointment);
// {
// success: true,
// result: {
// date: '2025-04-16',
// time: '10:30 AM',
// confirmation_number: 'DEN-4421',
// location: '123 Main St'
// },
// call_duration: 87
// }
// Example 2: Check car repair status
const repairStatus = await bridge.callBusiness(
'+1-555-AUTOSHOP',
'Check repair status for vehicle',
{
customer_name: 'Jane Doe',
vehicle: '2020 Toyota Camry',
ticket_number: 'REP-8821'
}
);
console.log('Repair status:', repairStatus);
// {
// success: true,
// result: {
// status: 'ready for pickup',
// total_cost: '$450',
// work_completed: 'Brake pad replacement, oil change',
// pickup_hours: 'Mon-Fri 8am-6pm, Sat 9am-2pm'
// },
// call_duration: 62
// }
// Example 3: Place catering order
const cateringOrder = await bridge.callBusiness(
'+1-555-RESTAURANT',
'Place catering order for office lunch',
{
company: 'Acme Corp',
date: 'April 20',
time: '12:00 PM',
headcount: 25,
budget: '$15 per person',
dietary: 'vegetarian options needed'
}
);
console.log('Catering order:', cateringOrder);
// {
// success: true,
// result: {
// menu: 'Mixed sandwich platter + sides',
// total_cost: '$375',
// delivery_time: '11:45 AM',
// order_number: 'CAT-9432'
// },
// call_duration: 124
// }
Real-World Use Cases
1. Healthcare Appointment Scheduling
A healthcare tech company automated appointment booking for 200+ dental practices:
Before voice agents:
- Patients called offices directly
- Average hold time: 8 minutes
- 30% abandonment rate
- Offices missed 20% of calls (after hours, busy, etc.)
After voice agents:
- Platform calls offices on behalf of patients
- Agent waits on hold (patients don’t)
- Books appointments successfully: 85% of calls
- Offices get more bookings without changing systems
Impact:
- 15,000 appointments booked per month
- ~$200K additional revenue for practices (booked appointments that would have been abandoned)
- Patients book in 2 minutes instead of 10
2. Local Restaurant Aggregation
A food ordering platform aggregated 500 local restaurants without APIs:
The problem:
- Restaurants had no online ordering
- Phone-only menu and pricing
- Orders taken manually
Voice agent solution:
- Agent calls restaurant
- Reads menu items and prices
- Places orders on behalf of customers
- Confirms pickup time
Results:
- 500 restaurants added in 3 months (vs 2 years for API integrations)
- Average call time: 90 seconds
- Order accuracy: 92% (humans: 88%)
- Restaurants see 40% more orders without building tech
3. Service Provider Coordination
A home services marketplace coordinated plumbers, electricians, handymen:
Before:
- Customers called each provider
- Left voicemails, waited for callbacks
- Compared quotes manually
After:
- Voice agent calls 5 providers simultaneously
- Explains job, gets availability + quote
- Returns comparison to customer in 10 minutes
Impact:
- Job bookings up 3x (customers don’t give up)
- Providers get pre-qualified leads
- Average time-to-booking: 10 minutes vs 2 days
Challenges & Solutions
Challenge 1: IVR Navigation
Many businesses have phone menus. Voice agents need to navigate them.
Solution: Dual-mode operation
// Detect IVR vs human
this.realtimeClient.on('audio.started', async (event) => {
const audioAnalysis = await this.analyzeAudio(event.audio);
if (audioAnalysis.is_ivr) {
// Switch to DTMF mode
await this.navigateIVR(audioAnalysis.menu_options);
} else {
// Continue with conversational mode
await this.conversationalMode();
}
});
async navigateIVR(menuOptions) {
// Parse options: "Press 1 for appointments, 2 for billing..."
const targetOption = this.matchOptionToTask(this.task, menuOptions);
// Send DTMF tone
await this.twilioClient.calls(this.callSid).update({
method: 'POST',
twiml: `<Response><Play digits="${targetOption}"/></Response>`
});
}
Challenge 2: Voicemail Detection
Sometimes you reach voicemail. The agent needs to adapt.
Solution: Detect + leave structured message
this.realtimeClient.on('conversation.item.completed', (event) => {
const transcript = event.item.formatted.transcript;
// Detect voicemail patterns
if (this.isVoicemail(transcript)) {
this.leaveVoicemail();
}
});
isVoicemail(transcript) {
const voicemailIndicators = [
'leave a message',
'after the beep',
'press pound',
'record your message',
'not available'
];
return voicemailIndicators.some(indicator =>
transcript.toLowerCase().includes(indicator)
);
}
async leaveVoicemail() {
await this.realtimeClient.sendUserMessageContent([{
type: 'text',
text: `
This is a voicemail. Leave a clear, professional message:
- State your name and purpose
- Provide callback number: ${process.env.CALLBACK_NUMBER}
- Keep it under 30 seconds
- Speak clearly
`
}]);
}
Challenge 3: Unexpected Responses
Businesses say unexpected things. Agents must handle gracefully.
Solution: Error recovery + human escalation
this.realtimeClient.on('conversation.item.completed', (event) => {
const response = event.item.formatted.transcript;
// Detect confusion signals
if (this.isConfused(response)) {
this.attemptClarification();
}
// Detect hard failures
if (this.isHardFailure(response)) {
this.escalateToHuman();
}
});
isConfused(response) {
const confusionSignals = [
"i don't understand",
"can you repeat",
"what do you mean",
"i'm not sure"
];
return confusionSignals.some(signal =>
response.toLowerCase().includes(signal)
);
}
async attemptClarification() {
await this.realtimeClient.sendUserMessageContent([{
type: 'text',
text: 'Rephrase your request more simply and directly.'
}]);
this.clarificationAttempts++;
if (this.clarificationAttempts > 2) {
this.escalateToHuman();
}
}
async escalateToHuman() {
// Agent can't complete task, needs human help
await this.twilioClient.calls(this.callSid).update({
url: `${process.env.BASE_URL}/human-handoff`
});
// Notify operations team
await this.notifyOperations({
phone: this.phoneNumber,
task: this.task,
transcript: this.transcript,
reason: 'Clarification failed after 2 attempts'
});
}
Cost Analysis
Per-call costs:
- Twilio outbound: $0.013/minute
- OpenAI Realtime API: $0.06/minute (input) + $0.24/minute (output)
- Average call: 90 seconds = $0.02 + $0.09 + $0.36 = $0.47/call
Alternative (human caller):
- Labor: $15/hour = $0.25/minute
- Average call + overhead: 5 minutes = $1.25/call
Voice agent is 62% cheaper than human callers.
At scale:
- 1,000 calls/month: $470 (voice agents) vs $1,250 (humans)
- 10,000 calls/month: $4,700 vs $12,500
- Break-even after ~200 calls
When This Works (And When It Doesn’t)
| Works Well | Doesn’t Work |
|---|---|
| Appointment booking | Complex negotiations |
| Status checks | Sensitive medical discussions |
| Order placement | Legal/contractual conversations |
| Information gathering | Emergency situations |
| Quote requests | Highly emotional interactions |
Voice agents excel at structured, repeatable tasks. They struggle with nuanced human judgment calls.
The Business Opportunity
Voice-as-API opens up the long tail of phone-only businesses:
- 32 million small businesses in the US
- 80% operate primarily by phone
- Average business receives 50 calls/day
- ~1.3 billion calls/day to small businesses
If 10% of those calls could be automated:
- 130 million automated calls/day
- At $0.47/call: $61 million/day in market size
- $22 billion/year in automation opportunity
The businesses most likely to benefit:
- Healthcare (appointments, prescription refills)
- Home services (plumbers, electricians, cleaners)
- Local restaurants (orders, catering, reservations)
- Auto repair (status checks, appointment booking)
- Professional services (salons, spas, personal trainers)
What’s Next
Voice agents as API bridges will evolve toward:
- Multi-call workflows: Call multiple businesses, compare results
- Proactive outreach: Agent calls businesses to confirm details before user needs them
- Learning from failures: Agents improve based on which calls succeed
The end goal: Every phone number becomes an API endpoint.
If you want voice agents that call real-world businesses on your behalf, we can build phone bridge integrations with OpenAI Realtime API + Twilio. The result: programmatic access to services that don’t have HTTP APIs.