Guardrails

Guardrails run in parallel to your agents, allowing you to perform checks and validations on user input or agent output. For example, you may run a lightweight model as a guardrail before invoking an expensive model. If the guardrail detects malicious usage, it can trigger an error and stop the costly model from running.

There are two kinds of guardrails:

Input guardrails run on the initial user input.
Output guardrails run on the final agent output.

Input guardrails

Input guardrails run in three steps:

The guardrail receives the same input passed to the agent.
The guardrail function executes and returns a GuardrailFunctionOutput wrapped inside an InputGuardrailResult.
If tripwireTriggered is true, an InputGuardrailTripwireTriggered error is thrown.

Note Input guardrails are intended for user input, so they only run if the agent is the first agent in the workflow. Guardrails are configured on the agent itself because different agents often require different guardrails.

Output guardrails

Output guardrails follow the same pattern:

The guardrail receives the same input passed to the agent.
The guardrail function executes and returns a GuardrailFunctionOutput wrapped inside an OutputGuardrailResult.
If tripwireTriggered is true, an OutputGuardrailTripwireTriggered error is thrown.

Note Output guardrails only run if the agent is the last agent in the workflow. For realtime voice interactions see the voice agents guide.

Tripwires

When a guardrail fails, it signals this via a tripwire. As soon as a tripwire is triggered, the runner throws the corresponding error and halts execution.

Implementing a guardrail

A guardrail is simply a function that returns a GuardrailFunctionOutput. Below is a minimal example that checks whether the user is asking for math homework help by running another agent under the hood.

import {
  Agent,
  run,
  InputGuardrailTripwireTriggered,
  InputGuardrail,
} from '@openai/agents';
import { z } from 'zod';

const guardrailAgent = new Agent({
  name: 'Guardrail check',
  instructions: 'Check if the user is asking you to do their math homework.',
  outputType: z.object({
    isMathHomework: z.boolean(),
    reasoning: z.string(),
  }),
});

const mathGuardrail: InputGuardrail = {
  name: 'Math Homework Guardrail',
  execute: async ({ input, context }) => {
    const result = await run(guardrailAgent, input, { context });
    return {
      outputInfo: result.finalOutput,
      tripwireTriggered: result.finalOutput?.isMathHomework ?? false,
    };
  },
};

const agent = new Agent({
  name: 'Customer support agent',
  instructions:
    'You are a customer support agent. You help customers with their questions.',
  inputGuardrails: [mathGuardrail],
});

async function main() {
  try {
    await run(agent, 'Hello, can you help me solve for x: 2x + 3 = 11?');
    console.log("Guardrail didn't trip - this is unexpected");
  } catch (e) {
    if (e instanceof InputGuardrailTripwireTriggered) {
      console.log('Math homework guardrail tripped');
    }
  }
}

main().catch(console.error);

Output guardrails work the same way.

import {
  Agent,
  run,
  OutputGuardrailTripwireTriggered,
  OutputGuardrail,
} from '@openai/agents';
import { z } from 'zod';

// The output by the main agent
const MessageOutput = z.object({ response: z.string() });
type MessageOutput = z.infer<typeof MessageOutput>;

// The output by the math guardrail agent
const MathOutput = z.object({ reasoning: z.string(), isMath: z.boolean() });

// The guardrail agent
const guardrailAgent = new Agent({
  name: 'Guardrail check',
  instructions: 'Check if the output includes any math.',
  outputType: MathOutput,
});

// An output guardrail using an agent internally
const mathGuardrail: OutputGuardrail<typeof MessageOutput> = {
  name: 'Math Guardrail',
  async execute({ agentOutput, context }) {
    const result = await run(guardrailAgent, agentOutput.response, {
      context,
    });
    return {
      outputInfo: result.finalOutput,
      tripwireTriggered: result.finalOutput?.isMath ?? false,
    };
  },
};

const agent = new Agent({
  name: 'Support agent',
  instructions:
    'You are a user support agent. You help users with their questions.',
  outputGuardrails: [mathGuardrail],
  outputType: MessageOutput,
});

async function main() {
  try {
    const input = 'Hello, can you help me solve for x: 2x + 3 = 11?';
    await run(agent, input);
    console.log("Guardrail didn't trip - this is unexpected");
  } catch (e) {
    if (e instanceof OutputGuardrailTripwireTriggered) {
      console.log('Math output guardrail tripped');
    }
  }
}

main().catch(console.error);

guardrailAgent is used inside the guardrail functions.
The guardrail function receives the agent input or output and returns the result.
Extra information can be included in the guardrail result.
agent defines the actual workflow where guardrails are applied.