Quickstart
Get started with Guardrails in minutes. Guardrails provides drop-in replacements for OpenAI clients that automatically validate inputs and outputs using configurable safety checks.
Install
pip install openai-guardrails
Set API Key
export OPENAI_API_KEY=sk-...
Create Pipeline Configuration
The fastest way is using the Guardrails Wizard - a no-code tool for creating configurations.
Or define manually:
{
"version": 1,
"input": {
"version": 1,
"guardrails": [
{"name": "URL Filter", "config": {}},
{"name": "Moderation", "config": {"categories": ["hate", "violence"]}}
]
},
"output": {
"version": 1,
"guardrails": [
{"name": "Contains PII", "config": {"entities": ["EMAIL_ADDRESS", "PHONE_NUMBER"]}}
]
}
}
Pipeline Stages
Guardrails use a pipeline configuration with 1 to 3 stages:
- Preflight - Runs before the LLM call (e.g., mask PII, moderation)
- Input - Runs in parallel with the LLM call (e.g., jailbreak detection)
- Output - Runs over the LLM generated content (e.g., fact checking, compliance)
Not all stages are required - you can use just input, just output, or any combination.
Use as Drop-in Replacement
Replace your OpenAI client with the Guardrails version (GuardrailsAsyncOpenAI or GuardrailsOpenAI):
We support chat.completions.create and responses.create as well as responses.parse for structured outputs.
import asyncio
from pathlib import Path
from guardrails import GuardrailsAsyncOpenAI
async def main():
# Use GuardrailsAsyncOpenAI instead of AsyncOpenAI
client = GuardrailsAsyncOpenAI(config=Path("guardrails_config.json"))
try:
response = await client.responses.create(
model="gpt-5",
input="Hello world"
)
# Access OpenAI response attributes directly
print(response.output_text)
except GuardrailTripwireTriggered as exc:
print(f"Guardrail triggered: {exc.guardrail_result.info}")
asyncio.run(main())
That's it! Your existing OpenAI code now includes automatic guardrail validation based on your pipeline configuration. The response object acts as a drop-in replacement for OpenAI responses with added guardrail results.
Multi-Turn Conversations
When maintaining conversation history across multiple turns, only append messages after guardrails pass. This prevents blocked input messages from polluting your conversation context.
messages: list[dict] = []
while True:
user_input = input("You: ")
try:
# ✅ Pass user input inline (don't mutate messages first)
response = await client.chat.completions.create(
messages=messages + [{"role": "user", "content": user_input}],
model="gpt-4o"
)
response_content = response.choices[0].message.content
print(f"Assistant: {response_content}")
# ✅ Only append AFTER guardrails pass
messages.append({"role": "user", "content": user_input})
messages.append({"role": "assistant", "content": response_content})
except GuardrailTripwireTriggered:
# ❌ Guardrail blocked - message NOT added to history
print("Message blocked by guardrails")
continue
Why this matters: If you append the user message before the guardrail check, blocked messages remain in your conversation history and get sent on every subsequent turn, even though they violated your safety policies.
Guardrail Execution Error Handling
Guardrails supports two error handling modes for guardrail execution failures:
Fail-Safe Mode (Default)
If a guardrail fails to execute (e.g., invalid model name), the system continues passing back tripwire_triggered=False:
# Default: raise_guardrail_errors=False
client = GuardrailsAsyncOpenAI(config="config.json")
# Continues execution even if guardrails have any errors
Fail-Secure Mode
Enable strict mode to raise exceptions on guardrail execution failures:
# Strict mode: raise_guardrail_errors=True
client = GuardrailsAsyncOpenAI(
config="config.json",
raise_guardrail_errors=True
)
# Raises exceptions if guardrails fail to execute properly
Note: This only affects guardrail execution errors. Safety violations (tripwires) are handled separately - see Tripwires for details.
Agents SDK Integration
For OpenAI Agents SDK users, we provide GuardrailAgent as a drop-in replacement:
from pathlib import Path
from guardrails import GuardrailAgent
from agents import Runner
# Create agent with guardrails automatically configured
agent = GuardrailAgent(
config=Path("guardrails_config.json"),
name="Customer support agent",
instructions="You are a customer support agent. You help customers with their questions.",
)
# Use exactly like a regular Agent
result = await Runner.run(agent, "Hello, can you help me?")
GuardrailAgent automatically configures guardrails from your pipeline:
- Prompt Injection Detection: Applied at the tool level (before and after each tool call) to ensure tool actions align with user intent
- Other guardrails (PII, Jailbreak, etc.): Applied at the agent level (input/output boundaries)
Tool Violation Handling
By default (block_on_tool_violations=False), tool guardrail violations use reject_content to block the violative tool call/output while allowing the agent to continue execution and explain the issue to the user.
Set block_on_tool_violations=True to raise exceptions and halt execution immediately when tool guardrails are violated.
Azure OpenAI
Use the Azure-specific client (GuardrailsAsyncAzureOpenAI or GuardrailsAzureOpenAI):
from pathlib import Path
from guardrails import GuardrailsAsyncAzureOpenAI
client = GuardrailsAsyncAzureOpenAI(
config=Path("guardrails_config.json"),
azure_endpoint="https://your-resource.openai.azure.com/",
api_key="your-azure-key",
api_version="2025-01-01-preview"
)
Third-Party Models
Works with any OpenAI-compatible API:
from pathlib import Path
from guardrails import GuardrailsAsyncOpenAI
# Local Ollama model
client = GuardrailsAsyncOpenAI(
config=Path("guardrails_config.json"),
base_url="http://127.0.0.1:11434/v1/",
api_key="ollama"
)
Token Usage Tracking
LLM-based guardrails (Jailbreak, Custom Prompt Check, etc.) consume tokens. You can track token usage across all guardrail calls using the unified total_guardrail_token_usage function:
from guardrails import GuardrailsAsyncOpenAI, total_guardrail_token_usage
client = GuardrailsAsyncOpenAI(config="config.json")
response = await client.responses.create(model="gpt-4o", input="Hello")
# Get aggregated token usage from all guardrails
tokens = total_guardrail_token_usage(response)
print(f"Guardrail tokens used: {tokens['total_tokens']}")
# Output: Guardrail tokens used: 425
The function returns a dictionary:
{
"prompt_tokens": 300, # Sum of prompt tokens across all LLM guardrails
"completion_tokens": 125, # Sum of completion tokens
"total_tokens": 425, # Total tokens used by guardrails
}
Works Across All Surfaces
total_guardrail_token_usage works with any guardrails result type:
# OpenAI client responses
response = await client.responses.create(...)
tokens = total_guardrail_token_usage(response)
# Streaming (use the last chunk)
async for chunk in stream:
last_chunk = chunk
tokens = total_guardrail_token_usage(last_chunk)
# Agents SDK
result = await Runner.run(agent, input)
tokens = total_guardrail_token_usage(result)
Per-Guardrail Token Usage
Each guardrail result includes its own token usage in the info dict:
OpenAI Clients (GuardrailsAsyncOpenAI, etc.):
response = await client.responses.create(model="gpt-4.1", input="Hello")
for gr in response.guardrail_results.all_results:
usage = gr.info.get("token_usage")
if usage:
print(f"{gr.info['guardrail_name']}: {usage['total_tokens']} tokens")
Agents SDK - access token usage per stage via RunResult:
result = await Runner.run(agent, "Hello")
# Input guardrails
for gr in result.input_guardrail_results:
usage = gr.output.output_info.get("token_usage") if gr.output.output_info else None
if usage:
print(f"Input: {usage['total_tokens']} tokens")
# Output guardrails
for gr in result.output_guardrail_results:
usage = gr.output.output_info.get("token_usage") if gr.output.output_info else None
if usage:
print(f"Output: {usage['total_tokens']} tokens")
# Tool guardrails: result.tool_input_guardrail_results, result.tool_output_guardrail_results
Non-LLM guardrails (URL Filter, Moderation, PII) don't consume tokens and won't have token_usage in their info.
Next Steps
- Explore examples for advanced patterns
- Learn about streaming considerations