Skip to content

Scrubber Module

The Scrubber module provides operations for security and privacy, including automatic PII redaction.

RedactPII

Redact sensitive personally identifiable information (PII) from text using regex patterns.

prompt_refiner.scrubber.RedactPII

RedactPII(
    redact_types=None,
    custom_patterns=None,
    custom_keywords=None,
)

Bases: Refiner

Redact sensitive information from text using regex patterns.

Initialize the PII redaction operation.

Parameters:

Name Type Description Default
redact_types Optional[Set[str]]

Set of PII types to redact (default: all) Options: "email", "phone", "ip", "credit_card", "ssn", "url"

None
custom_patterns Optional[dict[str, str]]

Dictionary of custom regex patterns to apply Format: {"name": "regex_pattern"}

None
custom_keywords Optional[Set[str]]

Set of custom keywords to redact (case-insensitive)

None
Source code in src/prompt_refiner/scrubber/pii.py
def __init__(
    self,
    redact_types: Optional[Set[str]] = None,
    custom_patterns: Optional[dict[str, str]] = None,
    custom_keywords: Optional[Set[str]] = None,
):
    """
    Initialize the PII redaction operation.

    Args:
        redact_types: Set of PII types to redact (default: all)
            Options: "email", "phone", "ip", "credit_card", "ssn", "url"
        custom_patterns: Dictionary of custom regex patterns to apply
            Format: {"name": "regex_pattern"}
        custom_keywords: Set of custom keywords to redact (case-insensitive)
    """
    self.redact_types = redact_types or set(self.PATTERNS.keys())
    self.custom_patterns = custom_patterns or {}
    self.custom_keywords = custom_keywords or set()

Functions

process
process(text)

Redact PII from the input text.

Parameters:

Name Type Description Default
text str

The input text

required

Returns:

Type Description
str

Text with PII redacted

Source code in src/prompt_refiner/scrubber/pii.py
def process(self, text: str) -> str:
    """
    Redact PII from the input text.

    Args:
        text: The input text

    Returns:
        Text with PII redacted
    """
    result = text

    # Apply standard PII patterns
    for pii_type in self.redact_types:
        if pii_type in self.PATTERNS:
            pattern = self.PATTERNS[pii_type]
            replacement = self.REPLACEMENTS.get(pii_type, "[REDACTED]")
            result = re.sub(pattern, replacement, result)

    # Apply custom patterns
    for name, pattern in self.custom_patterns.items():
        replacement = f"[{name.upper()}]"
        result = re.sub(pattern, replacement, result)

    # Apply custom keywords (case-insensitive)
    for keyword in self.custom_keywords:
        # Use word boundaries to avoid partial matches
        pattern = rf"\b{re.escape(keyword)}\b"
        result = re.sub(pattern, "[REDACTED]", result, flags=re.IGNORECASE)

    return result

Supported PII Types

  • email: Email addresses → [EMAIL]
  • phone: Phone numbers (US format) → [PHONE]
  • ip: IP addresses → [IP]
  • credit_card: Credit card numbers → [CARD]
  • ssn: Social Security Numbers → [SSN]
  • url: URLs → [URL]

Examples

from prompt_refiner import RedactPII

# Redact all PII types
redactor = RedactPII()
result = redactor.process("Contact me at john@example.com or 555-123-4567")
# Output: "Contact me at [EMAIL] or [PHONE]"

# Redact specific types only
redactor = RedactPII(redact_types={"email", "phone"})
result = redactor.process("Email: john@example.com, IP: 192.168.1.1")
# Output: "Email: [EMAIL], IP: 192.168.1.1"

# Custom patterns
redactor = RedactPII(
    custom_patterns={"employee_id": r"EMP-\d{5}"}
)
result = redactor.process("Employee EMP-12345 accessed the system")
# Output: "Employee [EMPLOYEE_ID] accessed the system"

# Custom keywords (case-insensitive)
redactor = RedactPII(
    custom_keywords={"confidential", "secret"}
)
result = redactor.process("This is Confidential information")
# Output: "This is [REDACTED] information"

Combining Options

from prompt_refiner import RedactPII

# Redact standard PII + custom patterns + keywords
redactor = RedactPII(
    redact_types={"email", "phone", "ssn"},
    custom_patterns={"employee_id": r"EMP-\d{5}"},
    custom_keywords={"internal", "confidential"}
)

text = """
Employee EMP-12345
Email: john@example.com
Phone: 555-123-4567
SSN: 123-45-6789
This is Confidential information for internal use only.
"""

result = redactor.process(text)

Common Use Cases

Before Sending to LLM APIs

from prompt_refiner import Refiner, RedactPII

secure_pipeline = (
    Refiner()
    .pipe(RedactPII(redact_types={"email", "phone", "ssn", "credit_card"}))
)

# Safe to send to external APIs
secure_text = secure_pipeline.run(user_input)

Logging and Monitoring

from prompt_refiner import Refiner, RedactPII

log_redactor = (
    Refiner()
    .pipe(RedactPII())  # Redact all PII types
)

# Safe to log
safe_log = log_redactor.run(sensitive_data)
logger.info(safe_log)

Data Export Compliance

from prompt_refiner import Refiner, RedactPII

# Custom redaction for specific compliance needs
gdpr_redactor = (
    Refiner()
    .pipe(RedactPII(
        redact_types={"email", "phone", "ip"},
        custom_keywords={"customer_name", "address", "dob"}
    ))
)

export_data = gdpr_redactor.run(user_data)

Security Best Practices

Regex Limitations

PII redaction uses regex patterns which may not catch all variations. For production use:

  • Test thoroughly with your specific data
  • Consider using specialized PII detection services for critical applications
  • Add custom patterns for domain-specific PII
  • Review redacted output before sending to external services

Defense in Depth

PII redaction is one layer of security. Always:

  • Validate and sanitize user input
  • Use proper authentication and authorization
  • Encrypt data in transit and at rest
  • Follow your organization's security policies