跳转到内容

传输机制

默认传输层使用 WebRTC。音频将从麦克风录制并自动播放。

如需使用您自己的媒体流或音频元素,请在创建会话时提供一个 OpenAIRealtimeWebRTC 实例。

import { RealtimeAgent, RealtimeSession, OpenAIRealtimeWebRTC } from '@openai/agents/realtime';
const agent = new RealtimeAgent({
name: 'Greeter',
instructions: 'Greet the user with cheer and answer questions.',
});
async function main() {
const transport = new OpenAIRealtimeWebRTC({
mediaStream: await navigator.mediaDevices.getUserMedia({ audio: true }),
audioElement: document.createElement('audio'),
});
const customSession = new RealtimeSession(agent, { transport });
}

在创建会话时传入 transport: 'websocket' 或一个 OpenAIRealtimeWebSocket 实例,以使用 WebSocket 连接替代 WebRTC。此方式非常适合服务器端用例,例如使用 Twilio 构建电话智能体。

import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';
const agent = new RealtimeAgent({
name: 'Greeter',
instructions: 'Greet the user with cheer and answer questions.',
});
const myRecordedArrayBuffer = new ArrayBuffer(0);
const wsSession = new RealtimeSession(agent, {
transport: 'websocket',
model: 'gpt-realtime',
});
await wsSession.connect({ apiKey: process.env.OPENAI_API_KEY! });
wsSession.on('audio', (event) => {
// event.data is a chunk of PCM16 audio
});
wsSession.sendAudio(myRecordedArrayBuffer);

使用任意录制/回放库来处理原始 PCM16 音频字节。

通过使用 OpenAIRealtimeSIP 传输将来自 Twilio 等供应商的 SIP 呼叫桥接过来。该传输会使 Realtime 会话与您的通信服务提供商发出的 SIP 事件保持同步。

  1. 通过 OpenAIRealtimeSIP.buildInitialConfig() 生成初始会话配置以接受来电。这可确保 SIP 邀请与 Realtime 会话共享相同的默认值。
  2. 附加一个使用 OpenAIRealtimeSIP 传输的 RealtimeSession,并使用供应商 webhook 下发的 callId 进行连接。
  3. 监听会话事件以驱动通话分析、转写或升级逻辑。
import OpenAI from 'openai';
import {
OpenAIRealtimeSIP,
RealtimeAgent,
RealtimeSession,
type RealtimeSessionOptions,
} from '@openai/agents/realtime';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY!,
webhookSecret: process.env.OPENAI_WEBHOOK_SECRET!,
});
const agent = new RealtimeAgent({
name: 'Receptionist',
instructions:
'Welcome the caller, answer scheduling questions, and hand off if the caller requests a human.',
});
const sessionOptions: Partial<RealtimeSessionOptions> = {
model: 'gpt-realtime',
config: {
audio: {
input: {
turnDetection: { type: 'semantic_vad', interruptResponse: true },
},
},
},
};
export async function acceptIncomingCall(callId: string): Promise<void> {
const initialConfig = await OpenAIRealtimeSIP.buildInitialConfig(
agent,
sessionOptions,
);
await openai.realtime.calls.accept(callId, initialConfig);
}
export async function attachRealtimeSession(
callId: string,
): Promise<RealtimeSession> {
const session = new RealtimeSession(agent, {
transport: new OpenAIRealtimeSIP(),
...sessionOptions,
});
session.on('history_added', (item) => {
console.log('Realtime update:', item.type);
});
await session.connect({
apiKey: process.env.OPENAI_API_KEY!,
callId,
});
return session;
}

Cloudflare Workers 与其他 workerd 运行时无法使用全局 WebSocket 构造函数发起出站 WebSocket。请使用扩展包中的 Cloudflare 传输,它会在内部执行基于 fetch() 的升级。

import { CloudflareRealtimeTransportLayer } from '@openai/agents-extensions';
import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';
const agent = new RealtimeAgent({
name: 'My Agent',
});
// Create a transport that connects to OpenAI Realtime via Cloudflare/workerd's fetch-based upgrade.
const cfTransport = new CloudflareRealtimeTransportLayer({
url: 'wss://api.openai.com/v1/realtime?model=gpt-realtime',
});
const session = new RealtimeSession(agent, {
// Set your own transport.
transport: cfTransport,
});

如果您想使用不同的语音到语音 API,或拥有自定义传输机制,您可以通过实现 RealtimeTransportLayer 接口并发送 RealtimeTransportEventTypes 事件来创建自己的传输层。

如果您想使用 OpenAI Realtime API,同时需要更直接地访问 Realtime API,有两种方式:

如果您仍希望受益于 RealtimeSession 的全部能力,可以通过 session.transport 访问传输层。

传输层会在 * 事件下发出它接收到的每个事件,您也可以使用 sendEvent() 方法发送原始事件。

import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';
const agent = new RealtimeAgent({
name: 'Greeter',
instructions: 'Greet the user with cheer and answer questions.',
});
const session = new RealtimeSession(agent, {
model: 'gpt-realtime',
});
session.transport.on('*', (event) => {
// JSON parsed version of the event received on the connection
});
// Send any valid event as JSON. For example triggering a new response
session.transport.sendEvent({
type: 'response.create',
// ...
});

如果您不需要自动工具执行、护栏等功能,也可以将传输层作为一个仅管理连接与中断的“轻量”客户端来使用。

import { OpenAIRealtimeWebRTC } from '@openai/agents/realtime';
const client = new OpenAIRealtimeWebRTC();
const audioBuffer = new ArrayBuffer(0);
await client.connect({
apiKey: '<api key>',
model: 'gpt-4o-mini-realtime-preview',
initialSessionConfig: {
instructions: 'Speak like a pirate',
voice: 'ash',
modalities: ['text', 'audio'],
inputAudioFormat: 'pcm16',
outputAudioFormat: 'pcm16',
},
});
// optionally for WebSockets
client.on('audio', (newAudio) => {});
client.sendAudio(audioBuffer);