Voice Agents
Voice Agents use OpenAI speech-to-speech models to provide realtime voice chat. These models support streaming audio, text, and tool calls and are great for applications like voice/phone customer support, mobile app experiences, and voice chat.
The Voice Agents SDK provides a TypeScript client for the OpenAI Realtime API.
Voice Agents Quickstart Build your first realtime voice assistant using the OpenAI Agents SDK in minutes.
Key features
Section titled “Key features”- Connect over WebSocket or WebRTC
- Can be used both in the browser and for backend connections
- Audio and interruption handling
- Multi-agent orchestration through handoffs
- Tool definition and calling
- Custom guardrails to monitor model output
- Callbacks for streamed events
- Reuse the same components for both text and voice agents
By using speech-to-speech models, we can leverage the model’s ability to process the audio in realtime without the need of transcribing and reconverting the text back to audio after the model acted.