Quickstart
Realtime agents enable voice conversations with your AI agents using OpenAI's Realtime API. This guide walks you through creating your first realtime voice agent.
Beta feature
Realtime agents are in beta. Expect some breaking changes as we improve the implementation.
Prerequisites
- Python 3.9 or higher
- OpenAI API key
- Basic familiarity with the OpenAI Agents SDK
Installation
If you haven't already, install the OpenAI Agents SDK:
Creating your first realtime agent
1. Import required components
2. Create a realtime agent
agent = RealtimeAgent(
name="Assistant",
instructions="You are a helpful voice assistant. Keep your responses conversational and friendly.",
)
3. Set up the runner
runner = RealtimeRunner(
starting_agent=agent,
config={
"model_settings": {
"model_name": "gpt-4o-realtime-preview",
"voice": "alloy",
"modalities": ["text", "audio"],
}
}
)
4. Start a session
async def main():
# Start the realtime session
session = await runner.run()
async with session:
# Send a text message to start the conversation
await session.send_message("Hello! How are you today?")
# The agent will stream back audio in real-time (not shown in this example)
# Listen for events from the session
async for event in session:
if event.type == "response.audio_transcript.done":
print(f"Assistant: {event.transcript}")
elif event.type == "conversation.item.input_audio_transcription.completed":
print(f"User: {event.transcript}")
# Run the session
asyncio.run(main())
Complete example
Here's a complete working example:
import asyncio
from agents.realtime import RealtimeAgent, RealtimeRunner
async def main():
# Create the agent
agent = RealtimeAgent(
name="Assistant",
instructions="You are a helpful voice assistant. Keep responses brief and conversational.",
)
# Set up the runner with configuration
runner = RealtimeRunner(
starting_agent=agent,
config={
"model_settings": {
"model_name": "gpt-4o-realtime-preview",
"voice": "alloy",
"modalities": ["text", "audio"],
"input_audio_transcription": {
"model": "whisper-1"
},
"turn_detection": {
"type": "server_vad",
"threshold": 0.5,
"prefix_padding_ms": 300,
"silence_duration_ms": 200
}
}
}
)
# Start the session
session = await runner.run()
async with session:
print("Session started! The agent will stream audio responses in real-time.")
# Process events
async for event in session:
if event.type == "response.audio_transcript.done":
print(f"Assistant: {event.transcript}")
elif event.type == "conversation.item.input_audio_transcription.completed":
print(f"User: {event.transcript}")
elif event.type == "error":
print(f"Error: {event.error}")
break
if __name__ == "__main__":
asyncio.run(main())
Configuration options
Model settings
model_name
: Choose from available realtime models (e.g.,gpt-4o-realtime-preview
)voice
: Select voice (alloy
,echo
,fable
,onyx
,nova
,shimmer
)modalities
: Enable text and/or audio (["text", "audio"]
)
Audio settings
input_audio_format
: Format for input audio (pcm16
,g711_ulaw
,g711_alaw
)output_audio_format
: Format for output audioinput_audio_transcription
: Transcription configuration
Turn detection
type
: Detection method (server_vad
,semantic_vad
)threshold
: Voice activity threshold (0.0-1.0)silence_duration_ms
: Silence duration to detect turn endprefix_padding_ms
: Audio padding before speech
Next steps
- Learn more about realtime agents
- Check out working examples in the examples/realtime folder
- Add tools to your agent
- Implement handoffs between agents
- Set up guardrails for safety
Authentication
Make sure your OpenAI API key is set in your environment:
Or pass it directly when creating the session: