Quickstart
Realtime agents enable voice conversations with your AI agents using OpenAI's Realtime API. This guide walks you through creating your first realtime voice agent.
Beta feature
Realtime agents are in beta. Expect some breaking changes as we improve the implementation.
Prerequisites
- Python 3.10 or higher
- OpenAI API key
- Basic familiarity with the OpenAI Agents SDK
Installation
If you haven't already, install the OpenAI Agents SDK:
Creating your first realtime agent
1. Import required components
2. Create a realtime agent
agent = RealtimeAgent(
name="Assistant",
instructions="You are a helpful voice assistant. Keep your responses conversational and friendly.",
)
3. Set up the runner
runner = RealtimeRunner(
starting_agent=agent,
config={
"model_settings": {
"model_name": "gpt-realtime",
"voice": "ash",
"modalities": ["audio"],
"input_audio_format": "pcm16",
"output_audio_format": "pcm16",
"input_audio_transcription": {"model": "gpt-4o-mini-transcribe"},
"turn_detection": {"type": "semantic_vad", "interrupt_response": True},
}
}
)
4. Start a session
# Start the session
session = await runner.run()
async with session:
print("Session started! The agent will stream audio responses in real-time.")
# Process events
async for event in session:
try:
if event.type == "agent_start":
print(f"Agent started: {event.agent.name}")
elif event.type == "agent_end":
print(f"Agent ended: {event.agent.name}")
elif event.type == "handoff":
print(f"Handoff from {event.from_agent.name} to {event.to_agent.name}")
elif event.type == "tool_start":
print(f"Tool started: {event.tool.name}")
elif event.type == "tool_end":
print(f"Tool ended: {event.tool.name}; output: {event.output}")
elif event.type == "audio_end":
print("Audio ended")
elif event.type == "audio":
# Enqueue audio for callback-based playback with metadata
# Non-blocking put; queue is unbounded, so drops won’t occur.
pass
elif event.type == "audio_interrupted":
print("Audio interrupted")
# Begin graceful fade + flush in the audio callback and rebuild jitter buffer.
elif event.type == "error":
print(f"Error: {event.error}")
elif event.type == "history_updated":
pass # Skip these frequent events
elif event.type == "history_added":
pass # Skip these frequent events
elif event.type == "raw_model_event":
print(f"Raw model event: {_truncate_str(str(event.data), 200)}")
else:
print(f"Unknown event type: {event.type}")
except Exception as e:
print(f"Error processing event: {_truncate_str(str(e), 200)}")
def _truncate_str(s: str, max_length: int) -> str:
if len(s) > max_length:
return s[:max_length] + "..."
return s
Full example (same flow in one file)
This is the same quickstart flow rewritten as a single script.
import asyncio
from agents.realtime import RealtimeAgent, RealtimeRunner
async def main():
# Create the agent
agent = RealtimeAgent(
name="Assistant",
instructions="You are a helpful voice assistant. Keep responses brief and conversational.",
)
# Set up the runner with configuration
runner = RealtimeRunner(
starting_agent=agent,
config={
"model_settings": {
"model_name": "gpt-realtime",
"voice": "ash",
"modalities": ["audio"],
"input_audio_format": "pcm16",
"output_audio_format": "pcm16",
"input_audio_transcription": {"model": "gpt-4o-mini-transcribe"},
"turn_detection": {"type": "semantic_vad", "interrupt_response": True},
}
},
)
# Start the session
session = await runner.run()
async with session:
print("Session started! The agent will stream audio responses in real-time.")
# Process events
async for event in session:
try:
if event.type == "agent_start":
print(f"Agent started: {event.agent.name}")
elif event.type == "agent_end":
print(f"Agent ended: {event.agent.name}")
elif event.type == "handoff":
print(f"Handoff from {event.from_agent.name} to {event.to_agent.name}")
elif event.type == "tool_start":
print(f"Tool started: {event.tool.name}")
elif event.type == "tool_end":
print(f"Tool ended: {event.tool.name}; output: {event.output}")
elif event.type == "audio_end":
print("Audio ended")
elif event.type == "audio":
# Enqueue audio for callback-based playback with metadata
# Non-blocking put; queue is unbounded, so drops won’t occur.
pass
elif event.type == "audio_interrupted":
print("Audio interrupted")
# Begin graceful fade + flush in the audio callback and rebuild jitter buffer.
elif event.type == "error":
print(f"Error: {event.error}")
elif event.type == "history_updated":
pass # Skip these frequent events
elif event.type == "history_added":
pass # Skip these frequent events
elif event.type == "raw_model_event":
print(f"Raw model event: {_truncate_str(str(event.data), 200)}")
else:
print(f"Unknown event type: {event.type}")
except Exception as e:
print(f"Error processing event: {_truncate_str(str(e), 200)}")
def _truncate_str(s: str, max_length: int) -> str:
if len(s) > max_length:
return s[:max_length] + "..."
return s
if __name__ == "__main__":
# Run the session
asyncio.run(main())
Configuration and deployment notes
Use these options after you have a basic session running.
Model settings
model_name: Choose from available realtime models (e.g.,gpt-realtime)voice: Select voice (alloy,echo,fable,onyx,nova,shimmer)modalities: Enable text or audio (["text"]or["audio"])output_modalities: Optionally constrain output to text and/or audio (["text"],["audio"], or both)
Audio settings
input_audio_format: Format for input audio (pcm16,g711_ulaw,g711_alaw)output_audio_format: Format for output audioinput_audio_transcription: Transcription configurationinput_audio_noise_reduction: Input noise-reduction config (near_fieldorfar_field)
Turn detection
type: Detection method (server_vad,semantic_vad)threshold: Voice activity threshold (0.0-1.0)silence_duration_ms: Silence duration to detect turn endprefix_padding_ms: Audio padding before speech
Run settings
async_tool_calls: Whether function tools run asynchronously (defaults toTrue)guardrails_settings.debounce_text_length: Minimum accumulated transcript size before output guardrails run (defaults to100)tool_error_formatter: Callback to customize model-visible tool error messages
For the full schema, see the API reference for RealtimeRunConfig and RealtimeSessionModelSettings.
Authentication
Make sure your OpenAI API key is set in your environment:
Or pass it directly when creating the session:
Azure OpenAI endpoint format
If you connect to Azure OpenAI instead of OpenAI's default endpoint, pass a GA Realtime URL in
model_config["url"] and set auth headers explicitly.
session = await runner.run(
model_config={
"url": "wss://<your-resource>.openai.azure.com/openai/v1/realtime?model=<deployment-name>",
"headers": {"api-key": "<your-azure-api-key>"},
}
)
You can also use a bearer token:
session = await runner.run(
model_config={
"url": "wss://<your-resource>.openai.azure.com/openai/v1/realtime?model=<deployment-name>",
"headers": {"authorization": f"Bearer {token}"},
}
)
Avoid using the legacy beta path (/openai/realtime?api-version=...) with realtime agents. The
SDK expects the GA Realtime interface.
Next steps
- Learn more about realtime agents
- Check out working examples in the examples/realtime folder
- Add tools to your agent
- Implement handoffs between agents
- Set up guardrails for safety