コンテンツにスキップ

Quickstart

Realtime agents enable voice conversations with your AI agents using OpenAI's Realtime API. This guide walks you through creating your first realtime voice agent.

Beta feature

Realtime agents are in beta. Expect some breaking changes as we improve the implementation.

Prerequisites

  • Python 3.9 or higher
  • OpenAI API key
  • Basic familiarity with the OpenAI Agents SDK

Installation

If you haven't already, install the OpenAI Agents SDK:

pip install openai-agents

Creating your first realtime agent

1. Import required components

import asyncio
from agents.realtime import RealtimeAgent, RealtimeRunner

2. Create a realtime agent

agent = RealtimeAgent(
    name="Assistant",
    instructions="You are a helpful voice assistant. Keep your responses conversational and friendly.",
)

3. Set up the runner

runner = RealtimeRunner(
    starting_agent=agent,
    config={
        "model_settings": {
            "model_name": "gpt-4o-realtime-preview",
            "voice": "alloy",
            "modalities": ["text", "audio"],
        }
    }
)

4. Start a session

async def main():
    # Start the realtime session
    session = await runner.run()

    async with session:
        # Send a text message to start the conversation
        await session.send_message("Hello! How are you today?")

        # The agent will stream back audio in real-time (not shown in this example)
        # Listen for events from the session
        async for event in session:
            if event.type == "response.audio_transcript.done":
                print(f"Assistant: {event.transcript}")
            elif event.type == "conversation.item.input_audio_transcription.completed":
                print(f"User: {event.transcript}")

# Run the session
asyncio.run(main())

Complete example

Here's a complete working example:

import asyncio
from agents.realtime import RealtimeAgent, RealtimeRunner

async def main():
    # Create the agent
    agent = RealtimeAgent(
        name="Assistant",
        instructions="You are a helpful voice assistant. Keep responses brief and conversational.",
    )

    # Set up the runner with configuration
    runner = RealtimeRunner(
        starting_agent=agent,
        config={
            "model_settings": {
                "model_name": "gpt-4o-realtime-preview",
                "voice": "alloy",
                "modalities": ["text", "audio"],
                "input_audio_transcription": {
                    "model": "whisper-1"
                },
                "turn_detection": {
                    "type": "server_vad",
                    "threshold": 0.5,
                    "prefix_padding_ms": 300,
                    "silence_duration_ms": 200
                }
            }
        }
    )

    # Start the session
    session = await runner.run()

    async with session:
        print("Session started! The agent will stream audio responses in real-time.")

        # Process events
        async for event in session:
            if event.type == "response.audio_transcript.done":
                print(f"Assistant: {event.transcript}")
            elif event.type == "conversation.item.input_audio_transcription.completed":
                print(f"User: {event.transcript}")
            elif event.type == "error":
                print(f"Error: {event.error}")
                break

if __name__ == "__main__":
    asyncio.run(main())

Configuration options

Model settings

  • model_name: Choose from available realtime models (e.g., gpt-4o-realtime-preview)
  • voice: Select voice (alloy, echo, fable, onyx, nova, shimmer)
  • modalities: Enable text and/or audio (["text", "audio"])

Audio settings

  • input_audio_format: Format for input audio (pcm16, g711_ulaw, g711_alaw)
  • output_audio_format: Format for output audio
  • input_audio_transcription: Transcription configuration

Turn detection

  • type: Detection method (server_vad, semantic_vad)
  • threshold: Voice activity threshold (0.0-1.0)
  • silence_duration_ms: Silence duration to detect turn end
  • prefix_padding_ms: Audio padding before speech

Next steps

Authentication

Make sure your OpenAI API key is set in your environment:

export OPENAI_API_KEY="your-api-key-here"

Or pass it directly when creating the session:

session = await runner.run(model_config={"api_key": "your-api-key"})