Voice Agents

Realtime Agents

Voice Agents use OpenAI speech-to-speech models to provide realtime voice chat. These models support streaming audio, text, and tool calls and are great for applications like voice/phone customer support, mobile app experiences, and voice chat.

The Voice Agents SDK provides a TypeScript client for the OpenAI Realtime API.

Voice Agents Quickstart Build your first realtime voice assistant using the OpenAI Agents SDK in minutes.

Key features

Connect over WebSocket or WebRTC
Can be used both in the browser and for backend connections
Audio and interruption handling
Multi-agent orchestration through handoffs
Tool definition and calling
Custom guardrails to monitor model output
Callbacks for streamed events
Reuse the same components for both text and voice agents

By using speech-to-speech models, we can leverage the model’s ability to process the audio in realtime without the need of transcribing and reconverting the text back to audio after the model acted.

Speech-to-speech model