Guardrails
Guardrails run in parallel to your agents, allowing you to perform checks and validations on user input or agent output. For example, you may run a lightweight model as a guardrail before invoking an expensive model. If the guardrail detects malicious usage, it can trigger an error and stop the costly model from running.
There are two kinds of guardrails:
- Input guardrails run on the initial user input.
- Output guardrails run on the final agent output.
Input guardrails
Section titled “Input guardrails”Input guardrails run in three steps:
- The guardrail receives the same input passed to the agent.
- The guardrail function executes and returns a
GuardrailFunctionOutput
wrapped inside anInputGuardrailResult
. - If
tripwireTriggered
istrue
, anInputGuardrailTripwireTriggered
error is thrown.
Note Input guardrails are intended for user input, so they only run if the agent is the first agent in the workflow. Guardrails are configured on the agent itself because different agents often require different guardrails.
Output guardrails
Section titled “Output guardrails”Output guardrails follow the same pattern:
- The guardrail receives the same input passed to the agent.
- The guardrail function executes and returns a
GuardrailFunctionOutput
wrapped inside anOutputGuardrailResult
. - If
tripwireTriggered
istrue
, anOutputGuardrailTripwireTriggered
error is thrown.
Note Output guardrails only run if the agent is the last agent in the workflow. For realtime voice interactions see the voice agents guide.
Tripwires
Section titled “Tripwires”When a guardrail fails, it signals this via a tripwire. As soon as a tripwire is triggered, the runner throws the corresponding error and halts execution.
Implementing a guardrail
Section titled “Implementing a guardrail”A guardrail is simply a function that returns a GuardrailFunctionOutput
. Below is a minimal example that checks whether the user is asking for math homework help by running another agent under the hood.
import { Agent, run, InputGuardrailTripwireTriggered, InputGuardrail,} from '@openai/agents';import { z } from 'zod';
const guardrailAgent = new Agent({ name: 'Guardrail check', instructions: 'Check if the user is asking you to do their math homework.', outputType: z.object({ isMathHomework: z.boolean(), reasoning: z.string(), }),});
const mathGuardrail: InputGuardrail = { name: 'Math Homework Guardrail', execute: async ({ input, context }) => { const result = await run(guardrailAgent, input, { context }); return { outputInfo: result.finalOutput, tripwireTriggered: result.finalOutput?.isMathHomework ?? false, }; },};
const agent = new Agent({ name: 'Customer support agent', instructions: 'You are a customer support agent. You help customers with their questions.', inputGuardrails: [mathGuardrail],});
async function main() { try { await run(agent, 'Hello, can you help me solve for x: 2x + 3 = 11?'); console.log("Guardrail didn't trip - this is unexpected"); } catch (e) { if (e instanceof InputGuardrailTripwireTriggered) { console.log('Math homework guardrail tripped'); } }}
main().catch(console.error);
Output guardrails work the same way.
import { Agent, run, OutputGuardrailTripwireTriggered, OutputGuardrail,} from '@openai/agents';import { z } from 'zod';
// The output by the main agentconst MessageOutput = z.object({ response: z.string() });type MessageOutput = z.infer<typeof MessageOutput>;
// The output by the math guardrail agentconst MathOutput = z.object({ reasoning: z.string(), isMath: z.boolean() });
// The guardrail agentconst guardrailAgent = new Agent({ name: 'Guardrail check', instructions: 'Check if the output includes any math.', outputType: MathOutput,});
// An output guardrail using an agent internallyconst mathGuardrail: OutputGuardrail<typeof MessageOutput> = { name: 'Math Guardrail', async execute({ agentOutput, context }) { const result = await run(guardrailAgent, agentOutput.response, { context, }); return { outputInfo: result.finalOutput, tripwireTriggered: result.finalOutput?.isMath ?? false, }; },};
const agent = new Agent({ name: 'Support agent', instructions: 'You are a user support agent. You help users with their questions.', outputGuardrails: [mathGuardrail], outputType: MessageOutput,});
async function main() { try { const input = 'Hello, can you help me solve for x: 2x + 3 = 11?'; await run(agent, input); console.log("Guardrail didn't trip - this is unexpected"); } catch (e) { if (e instanceof OutputGuardrailTripwireTriggered) { console.log('Math output guardrail tripped'); } }}
main().catch(console.error);
guardrailAgent
is used inside the guardrail functions.- The guardrail function receives the agent input or output and returns the result.
- Extra information can be included in the guardrail result.
agent
defines the actual workflow where guardrails are applied.