Best Practices¶
This page collects practical recommendations for building effective voice AI agents on SpeakNode. These tips are drawn from production experience and apply to most voice agent use cases.
Voice Settings¶
Stability and Speed¶
The TTS voice stability and speed settings have a significant impact on how natural your agent sounds.
- Speed: Keep between 0.9x and 1.1x for natural conversation. Going faster can sound rushed; going slower can feel sluggish.
- Stability: A value between 0.5 and 0.75 provides a good balance of expressiveness and consistency. Lower stability sounds more emotive but less predictable.
Tip
Test your voice settings by running a few conversations and listening back. Small adjustments make a big difference in perceived quality.
Background Sounds¶
Adding a subtle background sound (e.g., office ambience) can make the agent sound more realistic and reduce the "uncanny valley" effect of pure silence between turns.
Use background sounds sparingly -- they should be barely noticeable, not distracting.
Conversation Design¶
Turn Timeout¶
The turn timeout controls how long the agent waits after the caller stops speaking before it responds. The right value depends on your use case:
| Use Case | Recommended Timeout | Rationale |
|---|---|---|
| Customer support | 5 - 10 seconds | Callers may pause to think or look up information. |
| Data collection | 10 - 15 seconds | Callers need time to find account numbers, addresses, etc. |
| Quick Q&A | 3 - 5 seconds | Fast-paced interactions benefit from shorter timeouts. |
Interruption Handling¶
In voice conversations, callers will sometimes speak while the agent is still talking. This is natural behavior. Configure your agent to handle interruptions gracefully:
- Allow the agent to be interrupted during informational responses.
- Use shorter sentences so the agent yields naturally.
- In your system prompt, instruct the agent to acknowledge interruptions (e.g., "Sorry, go ahead.").
Silence Handling¶
Extended silence can confuse callers. Add guidance in your system prompt:
Model Selection¶
Choosing the right LLM model affects response quality, latency, and cost.
| Model Type | Latency | Best For | Trade-offs |
|---|---|---|---|
| GPT-4o | Medium | General-purpose support, complex reasoning | Higher cost, slightly higher latency |
| GPT-4o Mini | Low | High-volume, straightforward tasks | Less capable with nuanced reasoning |
| Flash models | Very Low | Low-latency requirements, simple interactions | May struggle with multi-step reasoning |
| Claude (Sonnet/Haiku) | Medium/Low | Complex reasoning, detailed instructions, long prompts | Availability depends on configuration |
Tip
Start with a general-purpose model like GPT-4o during development. Once your agent is stable, test with faster models to see if quality remains acceptable at lower latency and cost.
Prompt Engineering for Voice¶
Voice agents require different prompting strategies than text-based chatbots. See the dedicated Prompting Guide for full details. Key highlights:
Multi-Agent Patterns¶
For complex workflows, use multiple agents connected by transfer rules.
Orchestrator + Specialists¶
A common pattern is to use one "front door" agent that routes callers to specialized agents:
- Orchestrator Agent: Greets the caller, identifies their intent, and transfers to the right specialist.
- Billing Agent: Handles billing inquiries, payment issues, and plan changes.
- Technical Support Agent: Handles product questions, troubleshooting, and bug reports.
- Scheduling Agent: Handles appointment booking and rescheduling.
Configure transfer rules on the orchestrator agent with clear conditions:
- "Transfer to the billing agent when the caller mentions invoices, payments, charges, or subscription changes."
- "Transfer to technical support when the caller describes a problem, error, or needs help using the product."
Tip
Keep each specialist agent's system prompt focused on its domain. This improves response quality compared to a single agent that tries to handle everything.
Pre-Tool Speech¶
Always configure pre-tool speech for tools that involve network calls. Silence during a phone call feels much longer than it actually is.
Agent goes silent for 3 seconds while calling the API.
Agent says: "Let me check that for you, one moment." API call happens while the caller waits with context.
Testing Checklist¶
Before deploying your agent to production, verify:
- [ ] All configured languages work correctly with appropriate first messages.
- [ ] The agent handles interruptions without breaking flow.
- [ ] Tools trigger at the right moments and return expected results.
- [ ] Transfer rules route to the correct agents.
- [ ] The agent gracefully handles tool errors (e.g., API timeout).
- [ ] Silence and edge cases are handled (caller goes quiet, background noise).
- [ ] The agent stays within its guardrails (does not make up information, does not go off-topic).