Contains PII
Detects personally identifiable information (PII) such as SSNs, phone numbers, credit card numbers, and email addresses using Guardrails' built-in TypeScript regex engine. The check can automatically mask detected spans or block the request based on configuration.
Advanced Security Features:
- Unicode normalization: Prevents bypasses using fullwidth characters (@) or zero-width spaces
- Encoded PII detection: Optionally detects PII hidden in Base64, URL-encoded, or hex strings
- URL context awareness: Detects emails in query parameters (e.g.,
GET /api?user=john@example.com) - Custom patterns: Extends the default entity list with CVV/CVC codes, BIC/SWIFT identifiers, and other global formats
Configuration
{
"name": "Contains PII",
"config": {
"entities": ["EMAIL_ADDRESS", "US_SSN", "CREDIT_CARD", "PHONE_NUMBER", "CVV", "BIC_SWIFT"],
"block": false,
"detect_encoded_pii": false
}
}
Parameters
entities(optional): List of PII entity types to detect. Defaults to all entities exceptNRPandPERSON(see note below). See thePIIEntityenum insrc/checks/pii.tsfor the full list, including custom entities such asCVV(credit card security codes) andBIC_SWIFT(bank identification codes).block(optional): Whether to block content or just mask PII (default:false)detect_encoded_pii(optional): Iftrue, detects PII in Base64/URL-encoded/hex strings (default:false)
Important: NRP and PERSON Entity Deprecation
As of v0.2.0, the NRP and PERSON entities have been removed from the default entity list due to their high false positive rates. These patterns are overly broad and cause issues in production:
NRPmatches any two consecutive words (e.g., "nuevo cliente", "crea un", "the user")PERSONmatches any two capitalized words (e.g., "New York", "The User", "European Union")
Impact:
- ❌ Causes false positives in natural language conversation
- ❌ Particularly problematic for non-English languages (Spanish, French, etc.)
- ❌ Breaks normal text in pre-flight masking mode
Future Improvement: More robust implementations of
NRPandPERSONdetection are planned for a future release. Stay tuned for updates.
Migration Path:
If you need to detect person names or national registration numbers, consider these alternatives:
- For National Registration Numbers: Use region-specific patterns instead:
SG_NRIC_FIN(Singapore)UK_NINO(UK National Insurance Number)FI_PERSONAL_IDENTITY_CODE(Finland)-
KR_RRN(Korea Resident Registration Number) -
For Person Names: Consider using a dedicated NER (Named Entity Recognition) service or LLM-based detection for more accurate results.
-
If you still need these patterns: You can explicitly include them in your configuration, but be aware of the false positives:
A deprecation warning will be logged when these entities are used.{ "entities": ["NRP", "PERSON", "EMAIL_ADDRESS"], "block": false }
Reference: Issue #47
Implementation Notes
Under the hood the TypeScript guardrail normalizes text (Unicode NFKC), strips zero-width characters, and runs curated regex patterns for each configured entity. When detect_encoded_pii is enabled the check also decodes Base64, URL-encoded, and hexadecimal substrings before rescanning them for matches, remapping any findings back to the original encoded content.
Stage-specific behavior is critical:
- Pre-flight stage: Use
block=false(default) for automatic PII masking of user input - Output stage: Use
block=trueto prevent PII exposure in LLM responses - Masking in output stage is not supported and will not work as expected
PII masking mode (default, block=false):
- Automatically replaces detected PII with placeholder tokens like
<EMAIL_ADDRESS>,<US_SSN> - Does not trigger tripwire - allows content through with PII masked
Blocking mode (block=true):
- Triggers tripwire when PII is detected
- Prevents content from being delivered to users
What It Returns
Returns a GuardrailResult with the following info dictionary:
Basic Example (Plain PII)
{
"guardrail_name": "Contains PII",
"detected_entities": {
"EMAIL_ADDRESS": ["user@email.com"],
"US_SSN": ["123-45-6789"]
},
"entity_types_checked": ["EMAIL_ADDRESS", "US_SSN", "CREDIT_CARD"],
"checked_text": "Contact me at <EMAIL_ADDRESS>, SSN: <US_SSN>",
"block_mode": false,
"pii_detected": true
}
With Encoded PII Detection Enabled
When detect_encoded_pii: true, the guardrail also detects and masks encoded PII:
{
"guardrail_name": "Contains PII",
"detected_entities": {
"EMAIL_ADDRESS": [
"user@email.com",
"am9obkBleGFtcGxlLmNvbQ==",
"%6a%6f%65%40domain.com",
"6a6f686e406578616d706c652e636f6d"
]
},
"entity_types_checked": ["EMAIL_ADDRESS"],
"checked_text": "Contact <EMAIL_ADDRESS> or <EMAIL_ADDRESS_ENCODED> or <EMAIL_ADDRESS_ENCODED>",
"block_mode": false,
"pii_detected": true
}
Note: Encoded PII is masked with <ENTITY_TYPE_ENCODED> to distinguish it from plain text PII.
Field Descriptions
detected_entities: Detected entities and their values (includes both plain and encoded forms whendetect_encoded_piiis enabled)entity_types_checked: List of entity types that were configured for detectionchecked_text: Text with PII masked. Plain PII uses<ENTITY_TYPE>, encoded PII uses<ENTITY_TYPE_ENCODED>block_mode: Whether the check was configured to block or maskpii_detected: Boolean indicating if any PII was found (plain or encoded)