Custom Prompt Check

Implements custom content checks using configurable LLM prompts. Uses your custom LLM prompts to perform specialized validation, allows you to define exactly what constitutes a violation, provides flexibility for business-specific validation rules, and returns structured results based on your prompt design.

Configuration

{
    "name": "Custom Prompt Check",
    "config": {
        "model": "gpt-5",
        "confidence_threshold": 0.7,
        "system_prompt_details": "Determine if the user's request needs to be escalated to a senior support agent. Indications of escalation include: ...",
        "include_reasoning": false,
        "max_turns": 10
    }
}

Parameters

model (required): Model to use for the check (e.g., "gpt-5")
confidence_threshold (required): Minimum confidence score to trigger tripwire (0.0 to 1.0)
system_prompt_details (required): Custom instructions defining the content detection criteria
include_reasoning (optional): Whether to include reasoning/explanation fields in the guardrail output (default: false)
- When false: The LLM only generates the essential fields (flagged and confidence), reducing token generation costs
- When true: Additionally, returns detailed reasoning for its decisions
- Performance: In our evaluations, disabling reasoning reduces median latency by 40% on average (ranging from 18% to 67% depending on model) while maintaining detection performance
- Use Case: Keep disabled for production to minimize costs; enable for development and debugging
max_turns (optional): Maximum number of conversation turns to include for multi-turn analysis (default: 10)
- Set to 1 for single-turn mode

Implementation Notes

LLM Required: Uses an LLM for analysis
Business Scope: system_prompt_details should clearly define your policy and acceptable topics. Effective prompt engineering is essential for optimal LLM performance and detection accuracy.

What It Returns

Returns a GuardrailResult with the following info dictionary:

{
    "guardrail_name": "Custom Prompt Check",
    "flagged": true,
    "confidence": 0.85,
    "threshold": 0.7,
    "token_usage": {
        "prompt_tokens": 110,
        "completion_tokens": 18,
        "total_tokens": 128
    }
}

flagged: Whether the custom validation criteria were met
confidence: Confidence score (0.0 to 1.0) for the validation
threshold: The confidence threshold that was configured
reason: Explanation of why the input was flagged (or not flagged) - only included when include_reasoning=true
token_usage: Token usage details from the LLM call