将实时智能体连接到 Twilio
Twilio 提供了一个 Media Streams API,可将电话通话的原始音频发送到 WebSocket 服务器。该设置可用于将您的语音智能体概述连接到 Twilio。您可以使用 websocket
模式下的默认 Realtime Session 传输,将来自 Twilio 的事件连接到您的 Realtime Session。不过,这需要您设置正确的音频格式,并调整自身的打断时机,因为电话通话相较于基于 Web 的对话天然会引入更多延迟。
为改善设置体验,我们创建了一个专用的传输层,为您处理与 Twilio 的连接,包括处理中断和音频转发。
-
确保您拥有 Twilio 账号和一个 Twilio 电话号码。
-
设置一个可接收来自 Twilio 事件的 WebSocket 服务器。
如果您在本地开发,需要配置一个本地隧道,例如 需要配置本地隧道,如
ngrok
或 Cloudflare Tunnel 以使您的本地服务器可被 Twilio 访问。您可以使用TwilioRealtimeTransportLayer
连接到 Twilio。 -
通过安装扩展包来安装 Twilio 适配器:
Terminal window npm install @openai/agents-extensions -
导入适配器和模型以连接到您的
RealtimeSession
:import { TwilioRealtimeTransportLayer } from '@openai/agents-extensions';import { RealtimeAgent, RealtimeSession } from '@openai/agents/realtime';const agent = new RealtimeAgent({name: 'My Agent',});// Create a new transport mechanism that will bridge the connection between Twilio and// the OpenAI Realtime API.const twilioTransport = new TwilioRealtimeTransportLayer({twilioWebSocket: websocketConnection,});const session = new RealtimeSession(agent, {// set your own transporttransport: twilioTransport,}); -
将您的
RealtimeSession
连接到 Twilio:session.connect({ apiKey: 'your-openai-api-key' });
RealtimeSession
中您期望的所有事件与行为都会如预期工作,包括工具调用、护栏等。阅读语音智能体概述,了解如何将 RealtimeSession
与语音智能体结合使用的更多信息。
提示与注意事项
Section titled “提示与注意事项”-
速度至关重要。
为了接收来自 Twilio 的所有必要事件和音频,您应在拿到 WebSocket 连接引用后尽快创建
TwilioRealtimeTransportLayer
实例,并立即调用session.connect()
。 -
访问原始 Twilio 事件。
若要访问 Twilio 发送的原始事件,您可以在
RealtimeSession
实例上监听transport_event
事件。来自 Twilio 的每个事件都会有twilio_message
类型,并包含一个携带原始事件数据的message
属性。 -
查看调试日志。
有时您可能会遇到需要更多信息的问题。使用环境变量
DEBUG=openai-agents*
将显示来自 Agents SDK 的所有调试日志。或者,您也可以只启用 Twilio 适配器的调试日志:DEBUG=openai-agents:extensions:twilio*
。
完整示例服务器
Section titled “完整示例服务器”下面是一个端到端的 WebSocket 服务器示例,它接收来自 Twilio 的请求并将其转发到 RealtimeSession
。
import Fastify from 'fastify';import type { FastifyInstance, FastifyReply, FastifyRequest } from 'fastify';import dotenv from 'dotenv';import fastifyFormBody from '@fastify/formbody';import fastifyWs from '@fastify/websocket';import { RealtimeAgent, RealtimeSession, backgroundResult, tool,} from '@openai/agents/realtime';import { TwilioRealtimeTransportLayer } from '@openai/agents-extensions';import { hostedMcpTool } from '@openai/agents';import { z } from 'zod';import process from 'node:process';
// Load environment variables from .env filedotenv.config();
// Retrieve the OpenAI API key from environment variables. You must have OpenAI Realtime API access.const { OPENAI_API_KEY } = process.env;if (!OPENAI_API_KEY) { console.error('Missing OpenAI API key. Please set it in the .env file.'); process.exit(1);}const PORT = +(process.env.PORT || 5050);
// Initialize Fastifyconst fastify = Fastify();fastify.register(fastifyFormBody);fastify.register(fastifyWs);
const weatherTool = tool({ name: 'weather', description: 'Get the weather in a given location.', parameters: z.object({ location: z.string(), }), execute: async ({ location }: { location: string }) => { return backgroundResult(`The weather in ${location} is sunny.`); },});
const secretTool = tool({ name: 'secret', description: 'A secret tool to tell the special number.', parameters: z.object({ question: z .string() .describe( 'The question to ask the secret tool; mainly about the special number.', ), }), execute: async ({ question }: { question: string }) => { return `The answer to ${question} is 42.`; }, needsApproval: true,});
const agent = new RealtimeAgent({ name: 'Greeter', instructions: 'You are a friendly assistant. When you use a tool always first say what you are about to do.', tools: [ hostedMcpTool({ serverLabel: 'dnd', }), hostedMcpTool({ serverLabel: 'deepwiki', }), secretTool, weatherTool, ],});
// Root Routefastify.get('/', async (_request: FastifyRequest, reply: FastifyReply) => { reply.send({ message: 'Twilio Media Stream Server is running!' });});
// Route for Twilio to handle incoming and outgoing calls// <Say> punctuation to improve text-to-speech translationfastify.all( '/incoming-call', async (request: FastifyRequest, reply: FastifyReply) => { const twimlResponse = `<?xml version="1.0" encoding="UTF-8"?><Response> <Say>O.K. you can start talking!</Say> <Connect> <Stream url="wss://${request.headers.host}/media-stream" /> </Connect></Response>`.trim(); reply.type('text/xml').send(twimlResponse); },);
// WebSocket route for media-streamfastify.register(async (scopedFastify: FastifyInstance) => { scopedFastify.get( '/media-stream', { websocket: true }, async (connection: any) => { const twilioTransportLayer = new TwilioRealtimeTransportLayer({ twilioWebSocket: connection, });
const session = new RealtimeSession(agent, { transport: twilioTransportLayer, model: 'gpt-realtime', config: { audio: { output: { voice: 'verse', }, }, }, });
session.on('mcp_tools_changed', (tools: { name: string }[]) => { const toolNames = tools.map((tool) => tool.name).join(', '); console.log(`Available MCP tools: ${toolNames || 'None'}`); });
session.on( 'tool_approval_requested', (_context: unknown, _agent: unknown, approvalRequest: any) => { console.log( `Approving tool call for ${approvalRequest.approvalItem.rawItem.name}.`, ); session .approve(approvalRequest.approvalItem) .catch((error: unknown) => console.error('Failed to approve tool call.', error), ); }, );
session.on( 'mcp_tool_call_completed', (_context: unknown, _agent: unknown, toolCall: unknown) => { console.log('MCP tool call completed.', toolCall); }, );
await session.connect({ apiKey: OPENAI_API_KEY, }); console.log('Connected to the OpenAI Realtime API'); }, );});
fastify.listen({ port: PORT }, (err: Error | null) => { if (err) { console.error(err); process.exit(1); } console.log(`Server is listening on port ${PORT}`);});
process.on('SIGINT', () => { fastify.close(); process.exit(0);});