Skip to content

Voice Agents Quickstart

  1. Create a project

    In this quickstart we will create a voice agent you can use in the browser. If you want to check out a new project, you can try out Next.js or Vite.

    Terminal window
    npm create vite@latest my-project --template vanilla-ts
  2. Install the Agents SDK

    Terminal window
    npm install @openai/agents

    Alternatively you can install @openai/agents-realtime for a standalone browser package.

  3. Generate a client ephemeral token

    As this application will run in the users browser, we need a secure way to connect to the model through the Realtime API. For this we can use a ephemeral client key that should get generated on your backend server. For testing purposes you can also generate a key using curl and your regular OpenAI API key.

    Terminal window
    curl -X POST https://api.openai.com/v1/realtime/sessions \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
    "model": "gpt-4o-realtime-preview-2025-06-03"
    }'

    The response will contain a client_secret.value value that you can use to connect later on. Note that this key is only valid for a short period of time and will need to be regenerated.

  4. Create your first Agent

    Creating a new RealtimeAgent is very similar to creating a regular Agent.

    import { RealtimeAgent } from '@openai/agents-realtime';
    const agent = new RealtimeAgent({
    name: 'Assistant',
    instructions: 'You are a helpful assistant.',
    });
  5. Create a session

    Unlike a regular agent, a Voice Agent is continously running and listening inside a RealtimeSession that handles the conversation and connection to the model over time. This session will also handle the audio processing, interruptions, and a lot of the other lifecycle functionality we will cover later on.

    import { RealtimeSession } from '@openai/agents-realtime';
    const session = new RealtimeSession(agent, {
    model: 'gpt-4o-realtime-preview-2025-06-03',
    });

    The RealtimeSession constructor takes an agent as the first argument. This agent will be the first agent that your user will be able to interact with.

  6. Connect to the session

    To connect to the session you need to pass the client ephemeral token you generated earlier on.

    await session.connect({ apiKey: '<client-api-key>' });

    This will connect to the Realtime API using WebRTC in the browser and automatically configure your microphone and speaker for audio input and output. If you are running your RealtimeSession on a backend server (like Node.js) the SDK will automatically use WebSocket as a connection. You can learn more about the different transport layers in the Realtime Transport Layer guide.

  7. Putting it all together

    import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';
    const agent = new RealtimeAgent({
    name: 'Assistant',
    instructions: 'You are a helpful assistant.',
    });
    const session = new RealtimeSession(agent);
    // Automatically connects your microphone and audio output
    // in the browser via WebRTC.
    await session.connect({
    apiKey: '<client-api-key>',
    });
  8. Fire up the engines and start talking

    Start up your webserver and navigate to the page that includes your new Realtime Agent code. You should see a request for microphone access. Once you grant access you should be able to start talking to your agent.

    Terminal window
    npm run dev

From here you can start designing and building your own voice agent. Voice agents include a lot of the same features as regular agents, but have some of their own unique features.