Before an AI voice reaches customers: 12 questions to answer

When an AI voice sounds natural, it is tempting to jump straight to the demo. That is the wrong starting point. The question is not whether the voice can speak nicely. The question is what it may say, ask, store and hand over when the call gets messy.
xAI wrote on June 3 that Grok is becoming the default engine for Vapi's 12 core voices and that Grok Speech-to-Text and Text-to-Speech are now available in the Vapi Dashboard. The same documentation describes voice agents over WebSockets, tool use, text to speech, speech to text and short-lived client tokens. In plain terms: voice AI is moving from experiment to purchasable infrastructure.
Source: xAI, Grok Becomes the Voice of Vapi and xAI Voice APIs
What an AI voice agent actually is
An AI voice agent is a service that can listen, interpret, respond with synthetic speech and sometimes use tools during the call. That can be harmless, such as rescheduling an appointment or summarizing a support case. It can also become sensitive quickly: personal data, complaints, payments, health data, student matters or promises someone later has to stand behind.
That is why the first version should be narrow. A good first voice workflow has a clear task, a known stop rule and a human who can take over without drama. Reception, support, sales, and school offices can absolutely benefit from voice AI, but only if the call is designed as a workflow, not as a charming robot at the switchboard.
12 questions before the voice meets real people
-
Which call may the AI handle, and which call must it never handle? Write the boundary in plain language. "Reschedule an appointment" is a different risk from "advise the customer on what to do".
-
How does the voice say it is AI? The line should come early, without legal fog. Test it out loud. If it feels awkward in a real call, it will not be used consistently.
-
Which data categories may the call touch? Names, phone numbers, order status, student information and health data do not belong in the same risk class. Perplexity Health is a useful example of why personal data needs clear sources, disconnection, deletion and medical boundaries.
Source: Perplexity, Introducing Perplexity Health and Function for Perplexity
-
What should the AI do when it is unsure? It needs a simple line that stops guessing: "I do not want to answer that incorrectly. I will hand you over."
-
When does human handoff happen? Decide the triggers: anger, payment, sensitive data, legal questions, medical questions, HR issues, children or several misunderstandings in a row.
-
Which tools may the voice use? Read access is often enough at the start. Write access, payment, booking, and outbound email should come later, after logged testing.
-
What gets saved after the call? Transcripts are useful for QA, but they are also data. Decide retention, access and who may review samples.
-
How do you measure quality? Do not only count completed calls. Track wrong promises, unnecessary handoffs, missed handoffs, silence, interruptions and whether the customer had to repeat themselves.
-
What does a failed call cost? A voice agent can look cheap per minute and still be expensive if it creates rework, bad bookings or angry customers. Set a cap for call length and test volume.
-
Who owns the script, knowledge source and updates? If nobody owns the answers, the agent will slowly start speaking old truth.
-
How do you test the voice before customers do? Run internal calls with deliberately bad scenarios: noise, accents, interrupted sentences, angry questions, unclear names and customers trying to push the agent outside its job.
-
What happens when the vendor changes the model, voice, price or service status? Store the model name, vendor, integration owner and fallback in the same document as the call flow. Otherwise, you first notice the change when someone complains.
Start with one narrow call
I would not start with "AI that answers everything". Start with a call where failure is manageable: checking opening hours, collecting the right case number, booking a callback window, summarizing a support case or routing incoming questions to the right person.
Anthropic gives similar advice in its Cowork guide, although for knowledge work: choose the right kind of task, provide rich context and let the system repeat the assignment before work starts. The same habit fits voice workflows. Before the agent speaks to a customer, it should be able to state what it thinks it is doing, which sources it may use and when it should stop.
Source: Anthropic, Best practices for getting started with Claude Cowork
A simple start document
If you want to get moving without getting stuck in platform choices, create a document with seven rows: call purpose, forbidden topics, allowed data sources, handoff rules, transcript rules, QA sample and responsible owner.
It sounds boring. Good. Boring documents often stop early AI projects from becoming too large. Once the document exists, Tool Forge or an internal technical owner can choose platform, connectors and test setup with far less guessing.
The Forge newsletter
Get new articles in your inbox
Pick the topics you care about. No noise, at most one email a week.
We follow GDPR. Unsubscribe anytime.


