跳转到内容

传输机制

默认传输层使用 WebRTC。音频会从麦克风录制并自动回放。

如需使用自定义的媒体流或音频元素,请在创建会话时提供一个 OpenAIRealtimeWebRTC 实例。

import { RealtimeAgent, RealtimeSession, OpenAIRealtimeWebRTC } from '@openai/agents/realtime';
const agent = new RealtimeAgent({
name: 'Greeter',
instructions: 'Greet the user with cheer and answer questions.',
});
async function main() {
const transport = new OpenAIRealtimeWebRTC({
mediaStream: await navigator.mediaDevices.getUserMedia({ audio: true }),
audioElement: document.createElement('audio'),
});
const customSession = new RealtimeSession(agent, { transport });
}

在创建会话时传入 transport: 'websocket' 或一个 OpenAIRealtimeWebSocket 实例,以使用 WebSocket 连接而非 WebRTC。该方式适合服务端场景,例如使用 Twilio 构建电话智能体。

import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';
const agent = new RealtimeAgent({
name: 'Greeter',
instructions: 'Greet the user with cheer and answer questions.',
});
const myRecordedArrayBuffer = new ArrayBuffer(0);
const wsSession = new RealtimeSession(agent, {
transport: 'websocket',
model: 'gpt-realtime',
});
await wsSession.connect({ apiKey: process.env.OPENAI_API_KEY! });
wsSession.on('audio', (event) => {
// event.data is a chunk of PCM16 audio
});
wsSession.sendAudio(myRecordedArrayBuffer);

使用任意录制/回放库来处理原始 PCM16 音频字节。

通过使用 OpenAIRealtimeSIP 传输层桥接来自 Twilio 等提供商的 SIP 呼叫。该传输层使 Realtime 会话与您的电信提供商发出的 SIP 事件保持同步。

  1. 通过 OpenAIRealtimeSIP.buildInitialConfig() 生成初始会话配置以接受来电。这可确保 SIP 邀请与 Realtime 会话共享相同的默认值。
  2. 附加一个使用 OpenAIRealtimeSIP 传输层的 RealtimeSession,并使用提供商 webhook 发放的 callId 进行连接。
  3. 监听会话事件以驱动通话分析、转录或升级逻辑。
import OpenAI from 'openai';
import {
OpenAIRealtimeSIP,
RealtimeAgent,
RealtimeSession,
type RealtimeSessionOptions,
} from '@openai/agents/realtime';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY!,
webhookSecret: process.env.OPENAI_WEBHOOK_SECRET!,
});
const agent = new RealtimeAgent({
name: 'Receptionist',
instructions:
'Welcome the caller, answer scheduling questions, and hand off if the caller requests a human.',
});
const sessionOptions: Partial<RealtimeSessionOptions> = {
model: 'gpt-realtime',
config: {
audio: {
input: {
turnDetection: { type: 'semantic_vad', interruptResponse: true },
},
},
},
};
export async function acceptIncomingCall(callId: string): Promise<void> {
const initialConfig = await OpenAIRealtimeSIP.buildInitialConfig(
agent,
sessionOptions,
);
await openai.realtime.calls.accept(callId, initialConfig);
}
export async function attachRealtimeSession(
callId: string,
): Promise<RealtimeSession> {
const session = new RealtimeSession(agent, {
transport: new OpenAIRealtimeSIP(),
...sessionOptions,
});
session.on('history_added', (item) => {
console.log('Realtime update:', item.type);
});
await session.connect({
apiKey: process.env.OPENAI_API_KEY!,
callId,
});
return session;
}

Cloudflare Workers 与其他 workerd 运行时无法使用全局 WebSocket 构造函数打开出站 WebSocket。请使用扩展包中的 Cloudflare 传输,该传输在内部执行基于 fetch() 的升级。

import { CloudflareRealtimeTransportLayer } from '@openai/agents-extensions';
import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';
const agent = new RealtimeAgent({
name: 'My Agent',
});
// Create a transport that connects to OpenAI Realtime via Cloudflare/workerd's fetch-based upgrade.
const cfTransport = new CloudflareRealtimeTransportLayer({
url: 'wss://api.openai.com/v1/realtime?model=gpt-realtime',
});
const session = new RealtimeSession(agent, {
// Set your own transport.
transport: cfTransport,
});

如果您希望使用不同的语音到语音 API,或自定义传输机制,可通过实现 RealtimeTransportLayer 接口并触发 RealtimeTransportEventTypes 事件来创建自己的传输层。

如果您想使用 OpenAI Realtime API,并且需要更直接地访问 Realtime API,有两种方式:

如果您仍希望受益于 RealtimeSession 的全部能力,可以通过 session.transport 访问传输层。

传输层会在 * 事件下发出其接收到的每个事件,您也可以使用 sendEvent() 方法发送原始事件。

import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';
const agent = new RealtimeAgent({
name: 'Greeter',
instructions: 'Greet the user with cheer and answer questions.',
});
const session = new RealtimeSession(agent, {
model: 'gpt-realtime',
});
session.transport.on('*', (event) => {
// JSON parsed version of the event received on the connection
});
// Send any valid event as JSON. For example triggering a new response
session.transport.sendEvent({
type: 'response.create',
// ...
});

如果您不需要自动工具执行、护栏等功能,也可以将传输层作为仅管理连接与中断的“轻量”客户端来使用。

import { OpenAIRealtimeWebRTC } from '@openai/agents/realtime';
const client = new OpenAIRealtimeWebRTC();
const audioBuffer = new ArrayBuffer(0);
await client.connect({
apiKey: '<api key>',
model: 'gpt-4o-mini-realtime-preview',
initialSessionConfig: {
instructions: 'Speak like a pirate',
voice: 'ash',
modalities: ['text', 'audio'],
inputAudioFormat: 'pcm16',
outputAudioFormat: 'pcm16',
},
});
// optionally for WebSockets
client.on('audio', (newAudio) => {});
client.sendAudio(audioBuffer);