Skip to content

Voice Agents Quickstart

  1. Create a project

    In this quickstart we will create a voice agent you can use in the browser. If you want to scaffold a new project, you can start with Next.js or Vite.

    Terminal window
    npm create vite@latest my-project -- --template vanilla-ts
  2. Install the recommended package (requires Zod v4)

    Terminal window
    npm install @openai/agents zod
  3. Generate a client ephemeral token

    As this application will run in the user’s browser, we need a secure way to connect to the model through the Realtime API. The recommended flow matches the official Realtime API with WebRTC guide: your backend creates a short-lived ephemeral client token, then your browser uses that token to establish the WebRTC connection. For testing purposes you can also generate a token using curl and your regular OpenAI API key.

    Terminal window
    export OPENAI_API_KEY="sk-proj-...(your own key here)"
    curl -X POST https://api.openai.com/v1/realtime/client_secrets \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
    "session": {
    "type": "realtime",
    "model": "gpt-realtime"
    }
    }'

    The response contains a top-level value field that starts with the ek_ prefix, plus the effective session object. Use value as the client secret when establishing the WebRTC connection. This token is short-lived, so your backend should mint a fresh one when needed. If your browser session needs hosted MCP tools with authorization or custom headers, include that hosted MCP configuration in the server-side session payload you send to POST /v1/realtime/client_secrets instead of exposing those credentials in browser code.

  1. Create your first Agent

    Creating a new RealtimeAgent is very similar to creating a regular Agent.

    import { RealtimeAgent } from '@openai/agents/realtime';
    const agent = new RealtimeAgent({
    name: 'Assistant',
    instructions: 'You are a helpful assistant.',
    });
  2. Create a session

    Unlike a regular agent, a voice agent is continuously running inside a RealtimeSession that handles the conversation and connection to the model over time. This session also manages audio processing, interruptions, and the broader conversation lifecycle that you will configure later on.

    import { RealtimeSession } from '@openai/agents/realtime';
    const session = new RealtimeSession(agent, {
    model: 'gpt-realtime',
    });

    The RealtimeSession constructor takes an agent as the first argument. This agent will be the first one your user interacts with.

  3. Connect to the session

    To connect to the session you need to pass the client ephemeral token you generated earlier.

    await session.connect({ apiKey: 'ek_...(put your own key here)' });

    In the browser, this connects to the Realtime API using WebRTC and automatically configures microphone capture and audio playback for you. On that default WebRTC path, the SDK sends the initial session configuration as soon as the data channel opens and tries to wait for the matching session.updated acknowledgement before connect() resolves, with a timeout fallback if that acknowledgement never arrives. If you run RealtimeSession in a server runtime such as Node.js, the SDK automatically falls back to WebSocket instead; on the WebSocket path, connect() resolves after the socket opens and the initial config has been sent, so session.updated may arrive slightly later. You can learn more about the transport choices in the Realtime Transport Layer guide.

  1. Putting it all together

    import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';
    export async function setupCounter(element: HTMLButtonElement) {
    // ....
    // for quickly start, you can append the following code to the auto-generated TS code
    const agent = new RealtimeAgent({
    name: 'Assistant',
    instructions: 'You are a helpful assistant.',
    });
    const session = new RealtimeSession(agent);
    // Automatically connects your microphone and audio output in the browser via WebRTC.
    try {
    await session.connect({
    // To get this ephemeral key string, you can run the following command or implement the equivalent on the server side:
    // curl -s -X POST https://api.openai.com/v1/realtime/client_secrets -H "Authorization: Bearer $OPENAI_API_KEY" -H "Content-Type: application/json" -d '{"session": {"type": "realtime", "model": "gpt-realtime"}}' | jq .value
    apiKey: 'ek_...(put your own key here)',
    });
    console.log('You are connected!');
    } catch (e) {
    console.error(e);
    }
    }
  2. Fire up the app and start talking

    Start your web server and open the page that includes your new Realtime Agent code. You should see a microphone permission request. Once you grant access, you should be able to start talking to your agent.

    Terminal window
    npm run dev

From here you can start designing and building your own voice agent: