Skip to content

Contains PII

Detects personally identifiable information (PII) such as SSNs, phone numbers, credit card numbers, and email addresses using Guardrails' built-in TypeScript regex engine. The check can automatically mask detected spans or block the request based on configuration.

Advanced Security Features:

  • Unicode normalization: Prevents bypasses using fullwidth characters (@) or zero-width spaces
  • Encoded PII detection: Optionally detects PII hidden in Base64, URL-encoded, or hex strings
  • URL context awareness: Detects emails in query parameters (e.g., GET /api?user=john@example.com)
  • Custom patterns: Extends the default entity list with CVV/CVC codes, BIC/SWIFT identifiers, and other global formats

Configuration

{
    "name": "Contains PII",
    "config": {
        "entities": ["EMAIL_ADDRESS", "US_SSN", "CREDIT_CARD", "PHONE_NUMBER", "CVV", "BIC_SWIFT"],
        "block": false,
        "detect_encoded_pii": false
    }
}

Parameters

  • entities (optional): List of PII entity types to detect. Defaults to all entities except NRP and PERSON (see note below). See the PIIEntity enum in src/checks/pii.ts for the full list, including custom entities such as CVV (credit card security codes) and BIC_SWIFT (bank identification codes).
  • block (optional): Whether to block content or just mask PII (default: false)
  • detect_encoded_pii (optional): If true, detects PII in Base64/URL-encoded/hex strings (default: false)

Important: NRP and PERSON Entity Deprecation

As of v0.2.0, the NRP and PERSON entities have been removed from the default entity list due to their high false positive rates. These patterns are overly broad and cause issues in production:

  • NRP matches any two consecutive words (e.g., "nuevo cliente", "crea un", "the user")
  • PERSON matches any two capitalized words (e.g., "New York", "The User", "European Union")

Impact:

  • ❌ Causes false positives in natural language conversation
  • ❌ Particularly problematic for non-English languages (Spanish, French, etc.)
  • ❌ Breaks normal text in pre-flight masking mode

Future Improvement: More robust implementations of NRP and PERSON detection are planned for a future release. Stay tuned for updates.

Migration Path:

If you need to detect person names or national registration numbers, consider these alternatives:

  1. For National Registration Numbers: Use region-specific patterns instead:
  2. SG_NRIC_FIN (Singapore)
  3. UK_NINO (UK National Insurance Number)
  4. FI_PERSONAL_IDENTITY_CODE (Finland)
  5. KR_RRN (Korea Resident Registration Number)

  6. For Person Names: Consider using a dedicated NER (Named Entity Recognition) service or LLM-based detection for more accurate results.

  7. If you still need these patterns: You can explicitly include them in your configuration, but be aware of the false positives:

    {
        "entities": ["NRP", "PERSON", "EMAIL_ADDRESS"],
        "block": false
    }
    
    A deprecation warning will be logged when these entities are used.

Reference: Issue #47

Implementation Notes

Under the hood the TypeScript guardrail normalizes text (Unicode NFKC), strips zero-width characters, and runs curated regex patterns for each configured entity. When detect_encoded_pii is enabled the check also decodes Base64, URL-encoded, and hexadecimal substrings before rescanning them for matches, remapping any findings back to the original encoded content.

Stage-specific behavior is critical:

  • Pre-flight stage: Use block=false (default) for automatic PII masking of user input
  • Output stage: Use block=true to prevent PII exposure in LLM responses
  • Masking in output stage is not supported and will not work as expected

PII masking mode (default, block=false):

  • Automatically replaces detected PII with placeholder tokens like <EMAIL_ADDRESS>, <US_SSN>
  • Does not trigger tripwire - allows content through with PII masked

Blocking mode (block=true):

  • Triggers tripwire when PII is detected
  • Prevents content from being delivered to users

What It Returns

Returns a GuardrailResult with the following info dictionary:

Basic Example (Plain PII)

{
    "guardrail_name": "Contains PII",
    "detected_entities": {
        "EMAIL_ADDRESS": ["user@email.com"],
        "US_SSN": ["123-45-6789"]
    },
    "entity_types_checked": ["EMAIL_ADDRESS", "US_SSN", "CREDIT_CARD"],
    "checked_text": "Contact me at <EMAIL_ADDRESS>, SSN: <US_SSN>",
    "block_mode": false,
    "pii_detected": true
}

With Encoded PII Detection Enabled

When detect_encoded_pii: true, the guardrail also detects and masks encoded PII:

{
    "guardrail_name": "Contains PII",
    "detected_entities": {
        "EMAIL_ADDRESS": [
            "user@email.com",
            "am9obkBleGFtcGxlLmNvbQ==",
            "%6a%6f%65%40domain.com",
            "6a6f686e406578616d706c652e636f6d"
        ]
    },
    "entity_types_checked": ["EMAIL_ADDRESS"],
    "checked_text": "Contact <EMAIL_ADDRESS> or <EMAIL_ADDRESS_ENCODED> or <EMAIL_ADDRESS_ENCODED>",
    "block_mode": false,
    "pii_detected": true
}

Note: Encoded PII is masked with <ENTITY_TYPE_ENCODED> to distinguish it from plain text PII.

Field Descriptions

  • detected_entities: Detected entities and their values (includes both plain and encoded forms when detect_encoded_pii is enabled)
  • entity_types_checked: List of entity types that were configured for detection
  • checked_text: Text with PII masked. Plain PII uses <ENTITY_TYPE>, encoded PII uses <ENTITY_TYPE_ENCODED>
  • block_mode: Whether the check was configured to block or mask
  • pii_detected: Boolean indicating if any PII was found (plain or encoded)