I recently built a voice agent that runs flawlessly on my local machine but fail in production. This covers all the mistakes made and lessons learned when deploying voice agents.
Observations
Local tests pass using Websocket runner (localhost:8000). Basic setup using FastAPI. Also able to connect to React Native client running on a local network
Upon deploying, voice agents refuse to establish connection. Console shows errors with cross site scripting and server shows 200 code. Both client and server deployments pass any without any errors.
Later switched to WebRTC implementation but this didn’t work either.
Root Cause
Under the hood implementation showed Websocket implementation. Websockets are heavyweight network connections designed for reliable delivery of general data. This introduces additional risks in production environments because
WebSockets are built on TCP, so audio streams will be subject to head-of-line blocking. One lost packet stall the whole audio stream, ballooning latency.
Opus audio codecs can lower their bitrate the moment the network reports congestion assuming they get timely signals but this doesn’t happen in a Websocket connection. When bandwidth drops, packets just pile up in the socket queue, inflating end-to-end delay until you hear your own words echoed back seconds later.
A lightweight client for testing and prototyping WebRTC transport next but in a production environment NAT and firewall restrictions can block peer-to-peer connections. Without properly configured STUN or TURN servers, WebRTC connections fail to establish. Manually adding STUN servers (like stun:stun.l.google.com:19302) will also fail.
Preventative Measures
Websockets are easy and quick for testing locally. Do NOT use them in client-to-server connections. Prefer WebRTC. Exception to this is when using server-to-server connections
Tip: Set up a local runner with Websockets
Echo cancellation is built into most WebRTC implementations. Echo cancellation is critical for any app that uses both a speaker and microphone simultaneously.
Set up deployment pipeline in the following order to avoid issues: build -> tag -> push -> secret set -> deploy -> keys create -> keys use.
After a successful local run, do a sanity check to make sure new files, dependencies and other artifacts are included in the Dockerfile.
File-based data storage work locally, but fails in production. Filesystem persistence doesn’t work in a serverless environment. Go with a serverless database instead to stay lean.
NEVER use unauthenticated Docker images stored on Docker Hub. When done so without authentication, this counts against the quota of unauthenticated pulls. Docker Hub has a very low limit for unauthenticated pulls. With too many over a short time, pulls are rate limited. Keep your images private and use image pull secrets to avoids hitting rate limits.
Maintains at least one warm agent instance to serve incoming requests to avoid cold starts and allow rapid testing. Most providers default to 0 unless explicitly stated. Revert to 0 when not active
On Windsurf, set to chat mode first then write mode. It proposes a better and bug free solution on the second try. Also sometimes helps you clarify where you might go wrong.
Keep an eye out for zombie agents. They disrupt joining sessions while staying idle.