Voice Agents Quickstart
Project setup and credentials
Section titled “Project setup and credentials”-
Create a project
In this quickstart we will create a voice agent you can use in the browser. If you want to scaffold a new project, you can start with
Next.jsorVite.Terminal window npm create vite@latest my-project -- --template vanilla-ts -
Install the recommended package (requires Zod v4)
Terminal window npm install @openai/agents zod -
Generate a client ephemeral token
As this application will run in the user’s browser, we need a secure way to connect to the model through the Realtime API. The recommended flow matches the official Realtime API with WebRTC guide: your backend creates a short-lived ephemeral client token, then your browser uses that token to establish the WebRTC connection. For testing purposes you can also generate a token using
curland your regular OpenAI API key.Terminal window export OPENAI_API_KEY="sk-proj-...(your own key here)"curl -X POST https://api.openai.com/v1/realtime/client_secrets \-H "Authorization: Bearer $OPENAI_API_KEY" \-H "Content-Type: application/json" \-d '{"session": {"type": "realtime","model": "gpt-realtime"}}'The response contains a top-level
valuefield that starts with theek_prefix, plus the effectivesessionobject. Usevalueas the client secret when establishing the WebRTC connection. This token is short-lived, so your backend should mint a fresh one when needed. If your browser session needs hosted MCP tools withauthorizationor customheaders, include that hosted MCP configuration in the server-sidesessionpayload you send toPOST /v1/realtime/client_secretsinstead of exposing those credentials in browser code.
Create and connect the voice agent
Section titled “Create and connect the voice agent”-
Create your first Agent
Creating a new
RealtimeAgentis very similar to creating a regularAgent.import { RealtimeAgent } from '@openai/agents/realtime';const agent = new RealtimeAgent({name: 'Assistant',instructions: 'You are a helpful assistant.',}); -
Create a session
Unlike a regular agent, a voice agent is continuously running inside a
RealtimeSessionthat handles the conversation and connection to the model over time. This session also manages audio processing, interruptions, and the broader conversation lifecycle that you will configure later on.import { RealtimeSession } from '@openai/agents/realtime';const session = new RealtimeSession(agent, {model: 'gpt-realtime',});The
RealtimeSessionconstructor takes anagentas the first argument. This agent will be the first one your user interacts with. -
Connect to the session
To connect to the session you need to pass the client ephemeral token you generated earlier.
await session.connect({ apiKey: 'ek_...(put your own key here)' });In the browser, this connects to the Realtime API using WebRTC and automatically configures microphone capture and audio playback for you. On that default WebRTC path, the SDK sends the initial session configuration as soon as the data channel opens and tries to wait for the matching
session.updatedacknowledgement beforeconnect()resolves, with a timeout fallback if that acknowledgement never arrives. If you runRealtimeSessionin a server runtime such as Node.js, the SDK automatically falls back to WebSocket instead; on the WebSocket path,connect()resolves after the socket opens and the initial config has been sent, sosession.updatedmay arrive slightly later. You can learn more about the transport choices in the Realtime Transport Layer guide.
Run and test the app
Section titled “Run and test the app”-
Putting it all together
import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';export async function setupCounter(element: HTMLButtonElement) {// ....// for quickly start, you can append the following code to the auto-generated TS codeconst agent = new RealtimeAgent({name: 'Assistant',instructions: 'You are a helpful assistant.',});const session = new RealtimeSession(agent);// Automatically connects your microphone and audio output in the browser via WebRTC.try {await session.connect({// To get this ephemeral key string, you can run the following command or implement the equivalent on the server side:// curl -s -X POST https://api.openai.com/v1/realtime/client_secrets -H "Authorization: Bearer $OPENAI_API_KEY" -H "Content-Type: application/json" -d '{"session": {"type": "realtime", "model": "gpt-realtime"}}' | jq .valueapiKey: 'ek_...(put your own key here)',});console.log('You are connected!');} catch (e) {console.error(e);}} -
Fire up the app and start talking
Start your web server and open the page that includes your new Realtime Agent code. You should see a microphone permission request. Once you grant access, you should be able to start talking to your agent.
Terminal window npm run dev
Next Steps
Section titled “Next Steps”From here you can start designing and building your own voice agent:
- Add tools, handoffs, and guardrails.
- Learn how turn detection and voice activity detection, interruptions, and manual response control affect the conversation loop.
- Add text input, image input, and session history management.
- Choose the right transport for your deployment: WebRTC, WebSocket, or a custom transport.