Skip to content

RealtimeAgent

A specialized agent instance that is meant to be used within a RealtimeSession to build voice agents. Due to the nature of this agent, some configuration options are not supported that are supported by regular Agent instances. For example:

  • model choice is not supported as all RealtimeAgents will be handled by the same model within a RealtimeSession
  • modelSettings is not supported as all RealtimeAgents will be handled by the same model within a RealtimeSession
  • outputType is not supported as RealtimeAgents do not support structured outputs
  • toolUseBehavior is not supported as all RealtimeAgents will be handled by the same model within a RealtimeSession
  • voice can be configured on an Agent level however it cannot be changed after the first agent within a RealtimeSession spoke
const agent = new RealtimeAgent({
name: 'my-agent',
instructions: 'You are a helpful assistant that can answer questions and help with tasks.',
})
const session = new RealtimeSession(agent);
Type Parameter Default type

TContext

UnknownContext

new RealtimeAgent<TContext>(config): RealtimeAgent<TContext>
Parameter Type

config

RealtimeAgentConfiguration<TContext>

RealtimeAgent<TContext>

Agent<
RealtimeContextData<TContext>,
TextOutput
>.constructor
handoffDescription: string;

A description of the agent. This is used when the agent is used as a handoff, so that an LLM knows what it does and when to invoke it.

Agent.handoffDescription

handoffs: (Agent<any, "text"> | Handoff<any, "text">)[];

Handoffs are sub-agents that the agent can delegate to. You can provide a list of handoffs, and the agent can choose to delegate to them if relevant. Allows for separation of concerns and modularity.

Agent.handoffs

inputGuardrails: InputGuardrail[];

A list of checks that run in parallel to the agent’s execution, before generating a response. Runs only if the agent is the first agent in the chain.

Agent.inputGuardrails

instructions: string | (runContext, agent) => string | Promise<string>;

The instructions for the agent. Will be used as the “system prompt” when this agent is invoked. Describes what the agent should do, and how it responds.

Can either be a string, or a function that dynamically generates instructions for the agent. If you provide a function, it will be called with the context and the agent instance. It must return a string.

Agent.instructions

mcpServers: MCPServer[];

A list of Model Context Protocol servers the agent can use. Every time the agent runs, it will include tools from these servers in the list of available tools.

NOTE: You are expected to manage the lifecycle of these servers. Specifically, you must call server.connect() before passing it to the agent, and server.cleanup() when the server is no longer needed.

Agent.mcpServers

model: string | Model;

The model implementation to use when invoking the LLM. By default, if not set, the agent will use the default model configured in modelSettings.defaultModel

Agent.model

modelSettings: ModelSettings;

Configures model-specific tuning parameters (e.g. temperature, top_p, etc.)

Agent.modelSettings

name: string;

The name of the agent.

Agent.name

outputGuardrails: OutputGuardrail<AgentOutputType<unknown>>[];

A list of checks that run on the final output of the agent, after generating a response. Runs only if the agent produces a final output.

Agent.outputGuardrails

outputType: "text";

The type of the output object. If not provided, the output will be a string.

Agent.outputType

resetToolChoice: boolean;

Wether to reset the tool choice to the default value after a tool has been called. Defaults to true. This ensures that the agent doesn’t enter an infinite loop of tool usage.

Agent.resetToolChoice

tools: Tool<RealtimeContextData<TContext>>[];

A list of tools the agent can use.

Agent.tools

toolUseBehavior: ToolUseBehavior;

This lets you configure how tool use is handled.

  • run_llm_again: The default behavior. Tools are run, and then the LLM receives the results and gets to respond.
  • stop_on_first_tool: The output of the frist tool call is used as the final output. This means that the LLM does not process the result of the tool call.
  • A list of tool names: The agent will stop running if any of the tools in the list are called. The final output will be the output of the first matching tool call. The LLM does not process the result of the tool call.
  • A function: if you pass a function, it will be called with the run context and the list of tool results. It must return a ToolsToFinalOutputResult, which determines whether the tool call resulted in a final output.

NOTE: This configuration is specific to FunctionTools. Hosted tools, such as file search, web search, etc. are always processed by the LLM

Agent.toolUseBehavior

readonly voice: string;

The voice intended to be used by the agent. If another agent already spoke during the RealtimeSession, changing the voice during a handoff will fail.

get outputSchemaName(): string

Ouput schema name

string

Agent.outputSchemaName
asTool(options): FunctionTool

Transform this agent into a tool, callable by other agents.

This is different from handoffs in two ways:

  1. In handoffs, the new agent receives the conversation history. In this tool, the new agent receives generated input.
  2. In handoffs, the new agent takes over the conversation. In this tool, the new agent is called as a tool, and the conversation is continued by the original agent.
Parameter Type Description

options

{ customOutputExtractor: (output) => string | Promise<string>; toolDescription: string; toolName: string; }

Options for the tool.

options.customOutputExtractor?

(output) => string | Promise<string>

A function that extracts the output text from the agent. If not provided, the last message from the agent will be used.

options.toolDescription?

string

The description of the tool, which should indicate what the tool does and when to use it.

options.toolName?

string

The name of the tool. If not provided, the name of the agent will be used.

FunctionTool

A tool that runs the agent and returns the output text.

Agent.asTool

clone(config): Agent<RealtimeContextData<TContext>, "text">

Makes a copy of the agent, with the given arguments changed. For example, you could do:

const newAgent = agent.clone({ instructions: 'New instructions' })
Parameter Type Description

config

Partial<AgentConfiguration<RealtimeContextData<TContext>, "text">>

A partial configuration to change.

Agent<RealtimeContextData<TContext>, "text">

A new agent with the given changes.

Agent.clone

emit<K>(type, ...args): boolean
Type Parameter

K extends keyof AgentHookEvents<RealtimeContextData<TContext>, "text">

Parameter Type

type

K

args

AgentHookEvents<RealtimeContextData<TContext>, "text">[K]

boolean

Agent.emit

getAllTools(): Promise<Tool<RealtimeContextData<TContext>>[]>

ALl agent tools, including the MCPl and function tools.

Promise<Tool<RealtimeContextData<TContext>>[]>

all configured tools

Agent.getAllTools

getMcpTools(): Promise<Tool<RealtimeContextData<TContext>>[]>

Fetches the available tools from the MCP servers.

Promise<Tool<RealtimeContextData<TContext>>[]>

the MCP powered tools

Agent.getMcpTools

getSystemPrompt(runContext): Promise<undefined | string>

Returns the system prompt for the agent.

If the agent has a function as its instructions, this function will be called with the runContext and the agent instance.

Parameter Type

runContext

RunContext<RealtimeContextData<TContext>>

Promise<undefined | string>

Agent.getSystemPrompt

off<K>(type, listener): EventEmitter<EventTypes>
Type Parameter

K extends keyof AgentHookEvents<RealtimeContextData<TContext>, "text">

Parameter Type

type

K

listener

(…args) => void

EventEmitter<EventTypes>

Agent.off

on<K>(type, listener): EventEmitter<EventTypes>
Type Parameter

K extends keyof AgentHookEvents<RealtimeContextData<TContext>, "text">

Parameter Type

type

K

listener

(…args) => void

EventEmitter<EventTypes>

Agent.on

once<K>(type, listener): EventEmitter<EventTypes>
Type Parameter

K extends keyof AgentHookEvents<RealtimeContextData<TContext>, "text">

Parameter Type

type

K

listener

(…args) => void

EventEmitter<EventTypes>

Agent.once

processFinalOutput(output): string

Processes the final output of the agent.

Parameter Type Description

output

string

The output of the agent.

string

The parsed out.

Agent.processFinalOutput

toJSON(): object

Returns a JSON representation of the agent, which is serializable.

object

A JSON object containing the agent’s name.

name: string;
Agent.toJSON

static create<TOutput, Handoffs>(config): Agent<unknown, TOutput | HandoffsOutputUnion<Handoffs>>

Create an Agent with handoffs and automatically infer the union type for TOutput from the handoff agents’ output types.

Type Parameter Default type

TOutput extends AgentOutputType<unknown>

"text"

Handoffs extends readonly (Agent<any, any> | Handoff<any, any>)[]

[]

Parameter Type

config

AgentConfigWithHandoffs<TOutput, Handoffs>

Agent<unknown, TOutput | HandoffsOutputUnion<Handoffs>>

Agent.create