Quickstart
Realtime agents in the Python SDK are server-side, low-latency agents built on the OpenAI Realtime API over WebSocket transport.
Python SDK boundary
The Python SDK does not provide a browser WebRTC transport. This page only covers Python-managed realtime sessions over server-side WebSockets. Use this SDK for server-side orchestration, tools, approvals, and telephony integrations. See also Realtime transport.
Prerequisites
- Python 3.10 or higher
- OpenAI API key
- Basic familiarity with the OpenAI Agents SDK
Installation
If you haven't already, install the OpenAI Agents SDK:
Create a server-side realtime session
1. Import the realtime components
2. Define the starting agent
agent = RealtimeAgent(
name="Assistant",
instructions="You are a helpful voice assistant. Keep responses short and conversational.",
)
3. Configure the runner
Prefer the nested audio.input / audio.output session settings shape for new code. For new realtime agents, start with gpt-realtime-2.
runner = RealtimeRunner(
starting_agent=agent,
config={
"model_settings": {
"model_name": "gpt-realtime-2",
"audio": {
"input": {
"format": "pcm16",
"transcription": {"model": "gpt-4o-mini-transcribe"},
"turn_detection": {
"type": "semantic_vad",
"interrupt_response": True,
},
},
"output": {
"format": "pcm16",
"voice": "ash",
},
},
}
},
)
4. Start the session and send input
runner.run() returns a RealtimeSession. The connection is opened when you enter the session context.
async def main() -> None:
session = await runner.run()
async with session:
await session.send_message("Say hello in one short sentence.")
async for event in session:
if event.type == "audio":
# Forward or play event.audio.data.
pass
elif event.type == "history_added":
print(event.item)
elif event.type == "agent_end":
# One assistant turn finished.
break
elif event.type == "error":
print(f"Error: {event.error}")
if __name__ == "__main__":
asyncio.run(main())
session.send_message() accepts either a plain string or a structured realtime message. For raw audio chunks, use session.send_audio().
What this quickstart does not include
- Microphone capture and speaker playback code. See the realtime examples in
examples/realtime. - SIP / telephony attach flows. See Realtime transport and the SIP section.
Key settings
Once the basic session works, the settings most people reach for next are:
model_nameaudio.input.format,audio.output.formataudio.input.transcriptionaudio.input.noise_reductionaudio.input.turn_detectionfor automatic turn detectionaudio.output.voicetool_choice,prompt,tracingasync_tool_calls,guardrails_settings.debounce_text_length,tool_error_formatter
The older flat aliases such as input_audio_format, output_audio_format, input_audio_transcription, and turn_detection still work, but nested audio settings are preferred for new code.
For manual turn control, use a raw session.update / input_audio_buffer.commit / response.create flow as described in the Realtime agents guide.
For the full schema, see RealtimeRunConfig and RealtimeSessionModelSettings.
Connection options
Set your API key in the environment:
Or pass it directly when starting the session:
model_config also supports:
url: Custom WebSocket endpointheaders: Custom request headerscall_id: Attach to an existing realtime call. In this repo, the documented attach flow is SIP.playback_tracker: Report how much audio the user has actually heard
If you pass headers explicitly, the SDK will not inject an Authorization header for you.
When connecting to Azure OpenAI, pass a GA Realtime endpoint URL in model_config["url"] and explicit headers. Avoid the legacy beta path (/openai/realtime?api-version=...) with realtime agents. See the Realtime agents guide for details.
Next steps
- Read Realtime transport to choose between server-side WebSocket and SIP.
- Read the Realtime agents guide for lifecycle, structured input, approvals, handoffs, guardrails, and low-level control.
- Browse the examples in
examples/realtime.