How OpenAI scales low-latency voice AI

5/4/2026

OpenAI WebRTC infrastructure for low-latency voice at scale See how its relay-transceiver design cuts latency and improves voice quality

OpenAI describes how it rebuilt part of its WebRTC infrastructure to support lowlatency voice interactions at large scale. The system is used for ChatGPT voice, the Realtime API, and other realtime AI workflows where quick turntaking and stable audio quality are important. The company said standard WebRTC helps browser and mobile clients connect reliably, but its previous setup did not fit well with Kubernetes and highconcurrency traffic. To address that, OpenAI moved to a split relay plus transceiver architecture, where a lightweight relay forwards packets and a transceiver service owns the full WebRTC session state. According to the post, this approach reduces the public UDP footprint, simplifies scaling, and helps route traffic to nearby infrastructure around the world. OpenAI said the design improves firstpacket routing and keeps latency, jitter, and packet loss low enough to make voice conversations feel more natural.