Quickstart

Realtime agents enable voice conversations with your AI agents using OpenAI's Realtime API. This guide walks you through creating your first realtime voice agent.

Beta feature

Realtime agents are in beta. Expect some breaking changes as we improve the implementation.

Prerequisites

Python 3.9 or higher
OpenAI API key
Basic familiarity with the OpenAI Agents SDK

Installation

If you haven't already, install the OpenAI Agents SDK:

pip install openai-agents

Creating your first realtime agent

1. Import required components

import asyncio
from agents.realtime import RealtimeAgent, RealtimeRunner

2. Create a realtime agent

agent = RealtimeAgent(
    name="Assistant",
    instructions="You are a helpful voice assistant. Keep your responses conversational and friendly.",
)

3. Set up the runner

runner = RealtimeRunner(
    starting_agent=agent,
    config={
        "model_settings": {
            "model_name": "gpt-4o-realtime-preview",
            "voice": "alloy",
            "modalities": ["text", "audio"],
        }
    }
)

4. Start a session

async def main():
    # Start the realtime session
    session = await runner.run()

    async with session:
        # Send a text message to start the conversation
        await session.send_message("Hello! How are you today?")

        # The agent will stream back audio in real-time (not shown in this example)
        # Listen for events from the session
        async for event in session:
            if event.type == "response.audio_transcript.done":
                print(f"Assistant: {event.transcript}")
            elif event.type == "conversation.item.input_audio_transcription.completed":
                print(f"User: {event.transcript}")

# Run the session
asyncio.run(main())

Complete example

Here's a complete working example:

import asyncio
from agents.realtime import RealtimeAgent, RealtimeRunner

async def main():
    # Create the agent
    agent = RealtimeAgent(
        name="Assistant",
        instructions="You are a helpful voice assistant. Keep responses brief and conversational.",
    )

    # Set up the runner with configuration
    runner = RealtimeRunner(
        starting_agent=agent,
        config={
            "model_settings": {
                "model_name": "gpt-4o-realtime-preview",
                "voice": "alloy",
                "modalities": ["text", "audio"],
                "input_audio_transcription": {
                    "model": "whisper-1"
                },
                "turn_detection": {
                    "type": "server_vad",
                    "threshold": 0.5,
                    "prefix_padding_ms": 300,
                    "silence_duration_ms": 200
                }
            }
        }
    )

    # Start the session
    session = await runner.run()

    async with session:
        print("Session started! The agent will stream audio responses in real-time.")

        # Process events
        async for event in session:
            if event.type == "response.audio_transcript.done":
                print(f"Assistant: {event.transcript}")
            elif event.type == "conversation.item.input_audio_transcription.completed":
                print(f"User: {event.transcript}")
            elif event.type == "error":
                print(f"Error: {event.error}")
                break

if __name__ == "__main__":
    asyncio.run(main())

Configuration options

Model settings

model_name: Choose from available realtime models (e.g., gpt-4o-realtime-preview)
voice: Select voice (alloy, echo, fable, onyx, nova, shimmer)
modalities: Enable text and/or audio (["text", "audio"])

Audio settings

input_audio_format: Format for input audio (pcm16, g711_ulaw, g711_alaw)
output_audio_format: Format for output audio
input_audio_transcription: Transcription configuration

Turn detection

type: Detection method (server_vad, semantic_vad)
threshold: Voice activity threshold (0.0-1.0)
silence_duration_ms: Silence duration to detect turn end
prefix_padding_ms: Audio padding before speech

Next steps

Learn more about realtime agents
Check out working examples in the examples/realtime folder
Add tools to your agent
Implement handoffs between agents
Set up guardrails for safety

Authentication

Make sure your OpenAI API key is set in your environment:

export OPENAI_API_KEY="your-api-key-here"

Or pass it directly when creating the session:

session = await runner.run(model_config={"api_key": "your-api-key"})