传输机制

默认传输层

WebRTC 连接

默认传输层使用 WebRTC。音频将从麦克风录制并自动播放。

如需使用您自己的媒体流或音频元素，在创建会话时提供一个 OpenAIRealtimeWebRTC 实例。

import { RealtimeAgent, RealtimeSession, OpenAIRealtimeWebRTC } from '@openai/agents/realtime';

const agent = new RealtimeAgent({
  name: 'Greeter',
  instructions: 'Greet the user with cheer and answer questions.',
});

async function main() {
  const transport = new OpenAIRealtimeWebRTC({
    mediaStream: await navigator.mediaDevices.getUserMedia({ audio: true }),
    audioElement: document.createElement('audio'),
  });

  const customSession = new RealtimeSession(agent, { transport });
}

WebSocket 连接

在创建会话时传入 transport: 'websocket' 或 OpenAIRealtimeWebSocket 的实例，以使用 WebSocket 连接替代 WebRTC。这非常适合服务器端用例，例如构建使用 Twilio 的电话智能体。

import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';

const agent = new RealtimeAgent({
  name: 'Greeter',
  instructions: 'Greet the user with cheer and answer questions.',
});

const myRecordedArrayBuffer = new ArrayBuffer(0);

const wsSession = new RealtimeSession(agent, {
  transport: 'websocket',
  model: 'gpt-realtime',
});
await wsSession.connect({ apiKey: process.env.OPENAI_API_KEY! });

wsSession.on('audio', (event) => {
  // event.data is a chunk of PCM16 audio
});

wsSession.sendAudio(myRecordedArrayBuffer);

使用任意录制/播放库来处理原始 PCM16 音频字节。

SIP 连接

通过使用 OpenAIRealtimeSIP 传输，将来自 Twilio 等提供商的 SIP 呼叫进行桥接。该传输会让 Realtime 会话与您的电话服务提供商发出的 SIP 事件保持同步。

通过 OpenAIRealtimeSIP.buildInitialConfig() 生成初始会话配置以接受来电。这可确保 SIP 邀请与 Realtime 会话共享一致的默认值。
附加一个使用 OpenAIRealtimeSIP 传输的 RealtimeSession，并使用提供商 webhook 发放的 callId 进行连接。
监听会话事件，用于驱动通话分析、转录或升级逻辑。

import OpenAI from 'openai';
import {
  OpenAIRealtimeSIP,
  RealtimeAgent,
  RealtimeSession,
  type RealtimeSessionOptions,
} from '@openai/agents/realtime';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
  webhookSecret: process.env.OPENAI_WEBHOOK_SECRET!,
});

const agent = new RealtimeAgent({
  name: 'Receptionist',
  instructions:
    'Welcome the caller, answer scheduling questions, and hand off if the caller requests a human.',
});

const sessionOptions: Partial<RealtimeSessionOptions> = {
  model: 'gpt-realtime',
  config: {
    audio: {
      input: {
        turnDetection: { type: 'semantic_vad', interruptResponse: true },
      },
    },
  },
};

export async function acceptIncomingCall(callId: string): Promise<void> {
  const initialConfig = await OpenAIRealtimeSIP.buildInitialConfig(
    agent,
    sessionOptions,
  );
  await openai.realtime.calls.accept(callId, initialConfig);
}

export async function attachRealtimeSession(
  callId: string,
): Promise<RealtimeSession> {
  const session = new RealtimeSession(agent, {
    transport: new OpenAIRealtimeSIP(),
    ...sessionOptions,
  });

  session.on('history_added', (item) => {
    console.log('Realtime update:', item.type);
  });

  await session.connect({
    apiKey: process.env.OPENAI_API_KEY!,
    callId,
  });

  return session;
}

Cloudflare Workers（workerd）注意事项

Cloudflare Workers 和其他 workerd 运行时无法使用全局 WebSocket 构造函数打开出站 WebSocket。请使用扩展包中的 Cloudflare 传输，它会在内部执行基于 fetch() 的升级。

import { CloudflareRealtimeTransportLayer } from '@openai/agents-extensions';
import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';

const agent = new RealtimeAgent({
  name: 'My Agent',
});

// Create a transport that connects to OpenAI Realtime via Cloudflare/workerd's fetch-based upgrade.
const cfTransport = new CloudflareRealtimeTransportLayer({
  url: 'wss://api.openai.com/v1/realtime?model=gpt-realtime',
});

const session = new RealtimeSession(agent, {
  // Set your own transport.
  transport: cfTransport,
});

自定义传输机制

如果您想使用不同的语音到语音 API，或拥有自己的自定义传输机制，可以通过实现 RealtimeTransportLayer 接口并发出 RealtimeTransportEventTypes 事件来创建自己的传输层。

与 Realtime API 的更直接交互

如果您想使用 OpenAI Realtime API，同时对 Realtime API 拥有更直接的访问方式，有两种选项：

选项 1 - 访问传输层

如果您仍希望受益于 RealtimeSession 的全部能力，可以通过 session.transport 访问您的传输层。

传输层会在 * 事件下发出其接收到的每一个事件，您也可以使用 sendEvent() 方法发送原始事件。

import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';

const agent = new RealtimeAgent({
  name: 'Greeter',
  instructions: 'Greet the user with cheer and answer questions.',
});

const session = new RealtimeSession(agent, {
  model: 'gpt-realtime',
});

session.transport.on('*', (event) => {
  // JSON parsed version of the event received on the connection
});

// Send any valid event as JSON. For example triggering a new response
session.transport.sendEvent({
  type: 'response.create',
  // ...
});

选项 2 — 仅使用传输层

如果您不需要自动工具执行、护栏等功能，也可以将传输层用作仅管理连接和中断的“瘦”客户端。

import { OpenAIRealtimeWebRTC } from '@openai/agents/realtime';

const client = new OpenAIRealtimeWebRTC();
const audioBuffer = new ArrayBuffer(0);

await client.connect({
  apiKey: '<api key>',
  model: 'gpt-4o-mini-realtime-preview',
  initialSessionConfig: {
    instructions: 'Speak like a pirate',
    voice: 'ash',
    modalities: ['text', 'audio'],
    inputAudioFormat: 'pcm16',
    outputAudioFormat: 'pcm16',
  },
});

// optionally for WebSockets
client.on('audio', (newAudio) => {});

client.sendAudio(audioBuffer);