Skip to content

Best Practices

This page collects practical recommendations for building effective voice AI agents on SpeakNode. These tips are drawn from production experience and apply to most voice agent use cases.

Voice Settings

Stability and Speed

The TTS voice stability and speed settings have a significant impact on how natural your agent sounds.

  • Speed: Keep between 0.9x and 1.1x for natural conversation. Going faster can sound rushed; going slower can feel sluggish.
  • Stability: A value between 0.5 and 0.75 provides a good balance of expressiveness and consistency. Lower stability sounds more emotive but less predictable.

Tip

Test your voice settings by running a few conversations and listening back. Small adjustments make a big difference in perceived quality.

Background Sounds

Adding a subtle background sound (e.g., office ambience) can make the agent sound more realistic and reduce the "uncanny valley" effect of pure silence between turns.

Use background sounds sparingly -- they should be barely noticeable, not distracting.

Conversation Design

Turn Timeout

The turn timeout controls how long the agent waits after the caller stops speaking before it responds. The right value depends on your use case:

Use Case Recommended Timeout Rationale
Customer support 5 - 10 seconds Callers may pause to think or look up information.
Data collection 10 - 15 seconds Callers need time to find account numbers, addresses, etc.
Quick Q&A 3 - 5 seconds Fast-paced interactions benefit from shorter timeouts.

Interruption Handling

In voice conversations, callers will sometimes speak while the agent is still talking. This is natural behavior. Configure your agent to handle interruptions gracefully:

  • Allow the agent to be interrupted during informational responses.
  • Use shorter sentences so the agent yields naturally.
  • In your system prompt, instruct the agent to acknowledge interruptions (e.g., "Sorry, go ahead.").

Silence Handling

Extended silence can confuse callers. Add guidance in your system prompt:

Wait for the user to respond.
If the user is silent for more than 10 seconds, gently prompt them:
"Are you still there? Take your time."
If they remain silent after two prompts, say:
"It seems like you may have stepped away. I'll end the call for now.
Feel free to call back anytime."

Model Selection

Choosing the right LLM model affects response quality, latency, and cost.

Model Type Latency Best For Trade-offs
GPT-4o Medium General-purpose support, complex reasoning Higher cost, slightly higher latency
GPT-4o Mini Low High-volume, straightforward tasks Less capable with nuanced reasoning
Flash models Very Low Low-latency requirements, simple interactions May struggle with multi-step reasoning
Claude (Sonnet/Haiku) Medium/Low Complex reasoning, detailed instructions, long prompts Availability depends on configuration

Tip

Start with a general-purpose model like GPT-4o during development. Once your agent is stable, test with faster models to see if quality remains acceptable at lower latency and cost.

Prompt Engineering for Voice

Voice agents require different prompting strategies than text-based chatbots. See the dedicated Prompting Guide for full details. Key highlights:

You are a helpful assistant. Answer the user's questions thoroughly
and provide detailed explanations with examples.
You are a friendly phone support agent. Keep responses under 2-3
sentences. Speak naturally and conversationally. If a topic needs
a longer explanation, break it into steps and confirm understanding
after each step.

Multi-Agent Patterns

For complex workflows, use multiple agents connected by transfer rules.

Orchestrator + Specialists

A common pattern is to use one "front door" agent that routes callers to specialized agents:

  1. Orchestrator Agent: Greets the caller, identifies their intent, and transfers to the right specialist.
  2. Billing Agent: Handles billing inquiries, payment issues, and plan changes.
  3. Technical Support Agent: Handles product questions, troubleshooting, and bug reports.
  4. Scheduling Agent: Handles appointment booking and rescheduling.

Configure transfer rules on the orchestrator agent with clear conditions:

  • "Transfer to the billing agent when the caller mentions invoices, payments, charges, or subscription changes."
  • "Transfer to technical support when the caller describes a problem, error, or needs help using the product."

Tip

Keep each specialist agent's system prompt focused on its domain. This improves response quality compared to a single agent that tries to handle everything.

Pre-Tool Speech

Always configure pre-tool speech for tools that involve network calls. Silence during a phone call feels much longer than it actually is.

Agent goes silent for 3 seconds while calling the API.

Agent says: "Let me check that for you, one moment." API call happens while the caller waits with context.

Testing Checklist

Before deploying your agent to production, verify:

  • [ ] All configured languages work correctly with appropriate first messages.
  • [ ] The agent handles interruptions without breaking flow.
  • [ ] Tools trigger at the right moments and return expected results.
  • [ ] Transfer rules route to the correct agents.
  • [ ] The agent gracefully handles tool errors (e.g., API timeout).
  • [ ] Silence and edge cases are handled (caller goes quiet, background noise).
  • [ ] The agent stays within its guardrails (does not make up information, does not go off-topic).