Voice & Communication¶

SpeakNode uses WebRTC for real-time voice communication between users and AI agents.

How It Works¶

User speaks → WebRTC audio → STT (speech-to-text) → 
LLM processes text → generates response → 
TTS (text-to-speech) → WebRTC audio → User hears

Converts user speech to text for the LLM. Supported providers:

Processes the conversation and generates responses. Supported providers:

Converts agent responses to speech. Supported providers:

Detects when the user starts and stops speaking. Controls turn-taking behavior — when the agent should start or stop talking.

Every conversation is automatically recorded via LiveKit Egress. Recordings include:

The platform uses SignalR to notify the frontend about session events:

Status	Description
Pending	Session created, room not yet ready
Dispatching	LiveKit room created, waiting for agent to join
Active	Agent joined, conversation in progress
Completed	Conversation ended normally
Failed	Error occurred during session