Quickstart

Realtime agents in the Python SDK are server-side, low-latency agents built on the OpenAI Realtime API over WebSocket transport.

Python SDK boundary

The Python SDK does not provide a browser WebRTC transport. This page only covers Python-managed realtime sessions over server-side WebSockets. Use this SDK for server-side orchestration, tools, approvals, and telephony integrations. See also Realtime transport.

Prerequisites

Python 3.10 or higher
OpenAI API key
Basic familiarity with the OpenAI Agents SDK

Installation

If you haven't already, install the OpenAI Agents SDK:

pip install openai-agents

Create a server-side realtime session

1. Import the realtime components

import asyncio

from agents.realtime import RealtimeAgent, RealtimeRunner

2. Define the starting agent

agent = RealtimeAgent(
    name="Assistant",
    instructions="You are a helpful voice assistant. Keep responses short and conversational.",
)

3. Configure the runner

Prefer the nested audio.input / audio.output session settings shape for new code. For new realtime agents, start with gpt-realtime-2.1.

runner = RealtimeRunner(
    starting_agent=agent,
    config={
        "model_settings": {
            "model_name": "gpt-realtime-2.1",
            "audio": {
                "input": {
                    "format": "pcm16",
                    "transcription": {"model": "gpt-4o-mini-transcribe"},
                    "turn_detection": {
                        "type": "semantic_vad",
                        "interrupt_response": True,
                    },
                },
                "output": {
                    "format": "pcm16",
                    "voice": "ash",
                },
            },
        }
    },
)

4. Start the session and send input

runner.run() returns a RealtimeSession. The connection is opened when you enter the session context.

async def main() -> None:
    session = await runner.run()

    async with session:
        await session.send_message("Say hello in one short sentence.")

        async for event in session:
            if event.type == "audio":
                # Forward or play event.audio.data.
                pass
            elif event.type == "history_added":
                print(event.item)
            elif event.type == "agent_end":
                # One assistant turn finished.
                break
            elif event.type == "error":
                print(f"Error: {event.error}")


if __name__ == "__main__":
    asyncio.run(main())

session.send_message() accepts either a plain string or a structured realtime message. For raw audio chunks, use session.send_audio().

What this quickstart does not include

Microphone capture and speaker playback code. See the realtime examples in examples/realtime.
SIP / telephony attach flows. See Realtime transport and the SIP section.

Key settings

Once the basic session works, the settings most people reach for next are:

model_name
audio.input.format, audio.output.format
audio.input.transcription
audio.input.noise_reduction
audio.input.turn_detection for automatic turn detection
audio.output.voice
tool_choice, prompt, tracing
async_tool_calls, tool_execution.pre_approval_tool_input_guardrails, guardrails_settings.debounce_text_length, tool_error_formatter

The older flat aliases such as input_audio_format, output_audio_format, input_audio_transcription, and turn_detection still work, but nested audio settings are preferred for new code.

For manual turn control, use a raw session.update / input_audio_buffer.commit / response.create flow as described in the Realtime agents guide.

For the full schema, see RealtimeRunConfig and RealtimeSessionModelSettings.

Connection options

Set your API key in the environment:

export OPENAI_API_KEY="your-api-key-here"

Or pass it directly when starting the session:

session = await runner.run(model_config={"api_key": "your-api-key"})

model_config also supports:

url: Custom WebSocket endpoint
headers: Custom request headers
call_id: Attach to an existing realtime call. In this repo, the documented attach flow is SIP.
playback_tracker: Report how much audio the user has actually heard

If you pass headers explicitly, the SDK will not inject an Authorization header for you.

When connecting to Azure OpenAI, pass a GA Realtime endpoint URL in model_config["url"] and explicit headers. Avoid the legacy beta path (/openai/realtime?api-version=...) with realtime agents. See the Realtime agents guide for details.

Next steps

Read Realtime transport to choose between server-side WebSocket and SIP.
Read the Realtime agents guide for lifecycle, structured input, approvals, handoffs, guardrails, and low-level control.
Browse the examples in examples/realtime.