Realtime Transport Layer
Choose the transport based on where the session runs and how much raw media or event control you need.
| Scenario | Recommended transport | Why |
|---|---|---|
| Browser speech-to-speech app | OpenAIRealtimeWebRTC | Lowest-friction path. The SDK manages microphone capture, playback, and the WebRTC connection for you. |
| Server-side voice loop or custom audio pipeline | OpenAIRealtimeWebSocket | Works well when you already control audio capture/playback and want direct event access. |
| SIP or telephony bridge | OpenAIRealtimeSIP | Attaches a RealtimeSession to an existing SIP-initiated Realtime call by callId. |
| Cloudflare Workers / workerd | Cloudflare extension transport | workerd cannot open outbound WebSockets with the global WebSocket constructor. |
| Provider-specific phone flow on Twilio | Twilio extension transport | Handles Twilio audio forwarding and interruption behavior for you. |
Default transport layers
Section titled “Default transport layers”WebRTC is the default browser choice
Section titled “WebRTC is the default browser choice”The default browser transport uses WebRTC. Audio is captured from the microphone and played back automatically, which is why the Quickstart can connect with just an ephemeral token and session.connect(...).
On this path, session.connect() tries to wait until the initial session configuration has been acknowledged with session.updated before it resolves, so your instructions and tools are applied before audio starts flowing. There is still a timeout fallback if that acknowledgement never arrives.
To use your own media stream or audio element, provide an OpenAIRealtimeWebRTC instance when creating the session.
import { RealtimeAgent, RealtimeSession, OpenAIRealtimeWebRTC } from '@openai/agents/realtime';
const agent = new RealtimeAgent({ name: 'Greeter', instructions: 'Greet the user with cheer and answer questions.',});
async function main() { const transport = new OpenAIRealtimeWebRTC({ mediaStream: await navigator.mediaDevices.getUserMedia({ audio: true }), audioElement: document.createElement('audio'), });
const customSession = new RealtimeSession(agent, { transport });}For lower-level customization, OpenAIRealtimeWebRTC also accepts changePeerConnection, which lets you inspect or replace the freshly created RTCPeerConnection before the offer is generated.
WebSocket is the default server choice
Section titled “WebSocket is the default server choice”Pass transport: 'websocket' or an instance of OpenAIRealtimeWebSocket when creating the session to use a WebSocket connection instead of WebRTC. This works well for server-side use cases, telephony bridges, and custom audio pipelines.
On the WebSocket path, session.connect() resolves once the socket is open and the initial config has been sent. The matching session.updated event may arrive slightly later, so do not assume connect() means that update has already been echoed back.
import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';
const agent = new RealtimeAgent({ name: 'Greeter', instructions: 'Greet the user with cheer and answer questions.',});
const myRecordedArrayBuffer = new ArrayBuffer(0);
const wsSession = new RealtimeSession(agent, { transport: 'websocket', model: 'gpt-realtime',});await wsSession.connect({ apiKey: process.env.OPENAI_API_KEY! });
wsSession.on('audio', (event) => { // event.data is a chunk of PCM16 audio});
wsSession.sendAudio(myRecordedArrayBuffer);Use any recording/playback library to handle the raw PCM16 audio bytes.
For advanced integrations, OpenAIRealtimeWebSocket accepts createWebSocket() so you can supply your own socket implementation, and skipOpenEventListeners when that custom connector is responsible for transitioning the socket into the connected state. The Cloudflare transport in @openai/agents-extensions is built on these hooks.
SIP is for call providers and telephony bridges
Section titled “SIP is for call providers and telephony bridges”Use OpenAIRealtimeSIP when you want a RealtimeSession to attach to an existing SIP-initiated Realtime call. It is a thin SIP-aware transport: audio is handled by the SIP call itself, and you connect the SDK session by callId.
- Accept the incoming call by generating an initial session configuration with
OpenAIRealtimeSIP.buildInitialConfig(). This ensures the SIP invitation and the later SDK session start from the same defaults. - Attach a
RealtimeSessionthat uses theOpenAIRealtimeSIPtransport and connect with thecallIdissued by the provider webhook. - If you need provider-specific media forwarding or event bridging, use an integration transport such as the Twilio extension.
import OpenAI from 'openai';import { OpenAIRealtimeSIP, RealtimeAgent, RealtimeSession, type RealtimeSessionOptions,} from '@openai/agents/realtime';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY!, webhookSecret: process.env.OPENAI_WEBHOOK_SECRET!,});
const agent = new RealtimeAgent({ name: 'Receptionist', instructions: 'Welcome the caller, answer scheduling questions, and hand off if the caller requests a human.',});
const sessionOptions: Partial<RealtimeSessionOptions> = { model: 'gpt-realtime', config: { audio: { input: { turnDetection: { type: 'semantic_vad', interruptResponse: true }, }, }, },};
export async function acceptIncomingCall(callId: string): Promise<void> { const initialConfig = await OpenAIRealtimeSIP.buildInitialConfig( agent, sessionOptions, ); await openai.realtime.calls.accept(callId, initialConfig);}
export async function attachRealtimeSession( callId: string,): Promise<RealtimeSession> { const session = new RealtimeSession(agent, { transport: new OpenAIRealtimeSIP(), ...sessionOptions, });
session.on('history_added', (item) => { console.log('Realtime update:', item.type); });
await session.connect({ apiKey: process.env.OPENAI_API_KEY!, callId, });
return session;}Cloudflare Workers and workerd
Section titled “Cloudflare Workers and workerd”Cloudflare Workers and other workerd runtimes cannot open outbound WebSockets using the global WebSocket constructor. Use the Cloudflare transport from the extensions package, which performs the fetch()-based upgrade internally.
import { CloudflareRealtimeTransportLayer } from '@openai/agents-extensions';import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';
const agent = new RealtimeAgent({ name: 'My Agent',});
// Create a transport that connects to OpenAI Realtime via Cloudflare/workerd's fetch-based upgrade.const cfTransport = new CloudflareRealtimeTransportLayer({ url: 'wss://api.openai.com/v1/realtime?model=gpt-realtime',});
const session = new RealtimeSession(agent, { // Set your own transport. transport: cfTransport,});For the full setup, read Realtime Agents on Cloudflare.
Twilio phone calls
Section titled “Twilio phone calls”You can connect a RealtimeSession to Twilio using either raw WebSockets or the dedicated Twilio transport in @openai/agents-extensions. The dedicated transport is the better default when you want the SDK to handle interruption timing and audio forwarding for Twilio Media Streams.
For the full setup, read Realtime Agents on Twilio.
Bring your own transport
Section titled “Bring your own transport”If you want to use a different speech-to-speech API or have your own custom transport mechanism, you can implement the RealtimeTransportLayer interface and emit the RealtimeTransportEventTypes events yourself.
Access raw Realtime events when you need them
Section titled “Access raw Realtime events when you need them”If you want more direct access to the underlying Realtime API, you have two options.
Option 1 - Accessing the transport layer
Section titled “Option 1 - Accessing the transport layer”If you still want to benefit from all of the capabilities of the RealtimeSession you can access your transport layer through session.transport.
The transport layer emits every event it receives under the * event and you can send raw events using sendEvent(). This is the escape hatch for low-level operations such as session.update, response.create, or response.cancel.
import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';
const agent = new RealtimeAgent({ name: 'Greeter', instructions: 'Greet the user with cheer and answer questions.',});
const session = new RealtimeSession(agent, { model: 'gpt-realtime',});
session.transport.on('*', (event) => { // JSON parsed version of the event received on the connection});
// Send any valid event as JSON. For example triggering a new responsesession.transport.sendEvent({ type: 'response.create', // ...});Option 2 - Only using the transport layer
Section titled “Option 2 - Only using the transport layer”If you do not need automatic tool execution, guardrails, or local history management, you can also use the transport layer as a “thin” client that just manages connection and interruptions.
import { OpenAIRealtimeWebRTC } from '@openai/agents/realtime';
const client = new OpenAIRealtimeWebRTC();const audioBuffer = new ArrayBuffer(0);
await client.connect({ apiKey: '<api key>', model: 'gpt-realtime', initialSessionConfig: { instructions: 'Speak like a pirate', outputModalities: ['audio'], audio: { input: { format: 'pcm16', }, output: { format: 'pcm16', voice: 'ash', }, }, },});
// optionally for WebSocketsclient.on('audio', (newAudio) => {});
client.sendAudio(audioBuffer);