リアルタイムトランスポート

既定のトランスポート層

WebRTC 接続

既定のトランスポート層は WebRTC を使用します。音声はマイクから録音され、自動で再生されます。

独自のメディアストリームやオーディオ要素を使用する場合は、セッション作成時に OpenAIRealtimeWebRTC インスタンスを渡します。

import { RealtimeAgent, RealtimeSession, OpenAIRealtimeWebRTC } from '@openai/agents/realtime';

const agent = new RealtimeAgent({
  name: 'Greeter',
  instructions: 'Greet the user with cheer and answer questions.',
});

async function main() {
  const transport = new OpenAIRealtimeWebRTC({
    mediaStream: await navigator.mediaDevices.getUserMedia({ audio: true }),
    audioElement: document.createElement('audio'),
  });

  const customSession = new RealtimeSession(agent, { transport });
}

WebSocket 接続

WebRTC の代わりに WebSocket 接続を使用するには、セッション作成時に transport: 'websocket' または OpenAIRealtimeWebSocket のインスタンスを指定します。これはサーバーサイドのユースケース、たとえば Twilio で電話エージェントを構築する場合に適しています。

import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';

const agent = new RealtimeAgent({
  name: 'Greeter',
  instructions: 'Greet the user with cheer and answer questions.',
});

const myRecordedArrayBuffer = new ArrayBuffer(0);

const wsSession = new RealtimeSession(agent, {
  transport: 'websocket',
  model: 'gpt-realtime',
});
await wsSession.connect({ apiKey: process.env.OPENAI_API_KEY! });

wsSession.on('audio', (event) => {
  // event.data is a chunk of PCM16 audio
});

wsSession.sendAudio(myRecordedArrayBuffer);

任意の録音/再生ライブラリを使用して、元 PCM16 音声バイトを処理できます。

SIP 接続

OpenAIRealtimeSIP トランスポートを使用して、Twilio などのプロバイダからの SIP 通話をブリッジします。トランスポートは、あなたのテレフォニープロバイダが発行する SIP イベントと Realtime セッションの同期を維持します。

OpenAIRealtimeSIP.buildInitialConfig() で初期セッション設定を生成して着信を受け入れます。これにより、SIP 招待と Realtime セッションが同一のデフォルトを共有します
OpenAIRealtimeSIP トランスポートを使用する RealtimeSession をアタッチし、プロバイダの webhook によって発行された callId で接続します
セッションイベントをリッスンして、通話分析、トランスクリプト、またはエスカレーションロジックを駆動します

import OpenAI from 'openai';
import {
  OpenAIRealtimeSIP,
  RealtimeAgent,
  RealtimeSession,
  type RealtimeSessionOptions,
} from '@openai/agents/realtime';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
  webhookSecret: process.env.OPENAI_WEBHOOK_SECRET!,
});

const agent = new RealtimeAgent({
  name: 'Receptionist',
  instructions:
    'Welcome the caller, answer scheduling questions, and hand off if the caller requests a human.',
});

const sessionOptions: Partial<RealtimeSessionOptions> = {
  model: 'gpt-realtime',
  config: {
    audio: {
      input: {
        turnDetection: { type: 'semantic_vad', interruptResponse: true },
      },
    },
  },
};

export async function acceptIncomingCall(callId: string): Promise<void> {
  const initialConfig = await OpenAIRealtimeSIP.buildInitialConfig(
    agent,
    sessionOptions,
  );
  await openai.realtime.calls.accept(callId, initialConfig);
}

export async function attachRealtimeSession(
  callId: string,
): Promise<RealtimeSession> {
  const session = new RealtimeSession(agent, {
    transport: new OpenAIRealtimeSIP(),
    ...sessionOptions,
  });

  session.on('history_added', (item) => {
    console.log('Realtime update:', item.type);
  });

  await session.connect({
    apiKey: process.env.OPENAI_API_KEY!,
    callId,
  });

  return session;
}

Cloudflare Workers (workerd) に関する注意

Cloudflare Workers やその他の workerd ランタイムは、グローバルな WebSocket コンストラクタを使用してアウトバウンド WebSocket を開けません。拡張パッケージの Cloudflare 用トランスポートを使用してください。これは内部で fetch() ベースのアップグレードを実行します。

import { CloudflareRealtimeTransportLayer } from '@openai/agents-extensions';
import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';

const agent = new RealtimeAgent({
  name: 'My Agent',
});

// Create a transport that connects to OpenAI Realtime via Cloudflare/workerd's fetch-based upgrade.
const cfTransport = new CloudflareRealtimeTransportLayer({
  url: 'wss://api.openai.com/v1/realtime?model=gpt-realtime',
});

const session = new RealtimeSession(agent, {
  // Set your own transport.
  transport: cfTransport,
});

独自トランスポート機構の構築

別の speech-to-speech API を使用したい場合や独自のトランスポート機構がある場合は、RealtimeTransportLayer インターフェースを実装し、RealtimeTransportEventTypes イベントを発行して独自に作成できます。

Realtime API とのより直接的なやり取り

OpenAI Realtime API を使いつつ、より直接的に Realtime API へアクセスしたい場合は、次の 2 つの方法があります。

オプション 1 - トランスポート層へのアクセス

引き続き RealtimeSession のすべての機能を活用したい場合は、session.transport を通じてトランスポート層にアクセスできます。

トランスポート層は受信したすべてのイベントを * イベントで発行し、sendEvent() メソッドを使って元のイベントを送信できます。

import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';

const agent = new RealtimeAgent({
  name: 'Greeter',
  instructions: 'Greet the user with cheer and answer questions.',
});

const session = new RealtimeSession(agent, {
  model: 'gpt-realtime',
});

session.transport.on('*', (event) => {
  // JSON parsed version of the event received on the connection
});

// Send any valid event as JSON. For example triggering a new response
session.transport.sendEvent({
  type: 'response.create',
  // ...
});

オプション 2 — トランスポート層のみを使用

自動ツール実行やガードレールなどが不要な場合、接続と割り込みの管理だけを行う「“thin” クライアント」としてトランスポート層を使用できます。

import { OpenAIRealtimeWebRTC } from '@openai/agents/realtime';

const client = new OpenAIRealtimeWebRTC();
const audioBuffer = new ArrayBuffer(0);

await client.connect({
  apiKey: '<api key>',
  model: 'gpt-4o-mini-realtime-preview',
  initialSessionConfig: {
    instructions: 'Speak like a pirate',
    voice: 'ash',
    modalities: ['text', 'audio'],
    inputAudioFormat: 'pcm16',
    outputAudioFormat: 'pcm16',
  },
});

// optionally for WebSockets
client.on('audio', (newAudio) => {});

client.sendAudio(audioBuffer);