OpenAI Launches Privacy Filter to Redact Sensitive Data in Text
OpenAI has released Privacy Filter, a model to identify and mask personally identifiable information in text.
Why it matters: The launch gives legal and compliance teams new tools to manage sensitive information, as privacy becomes a top concern in AI-powered workflows. Open, developer-friendly access means organizations can deploy protections locally and adapt them to in-house needs.
- Privacy Filter can detect eight types of PII, including names, addresses, emails, and secrets.
- The model supports a 128,000 token context window and includes 1.5 billion total parameters.
- It achieved up to 97.43% F1 score on the PII-Masking-300k benchmark after annotation correction.
- Released under Apache 2.0 license, Privacy Filter can be fine-tuned and run on local systems.
On April 22, 2026, OpenAI announced Privacy Filter, a new open-weight AI model built to identify and redact personal information in text before downstream processing. The move responds to the legal sector’s rising need for robust privacy controls as AI tools become standard in case management, e-discovery, and document review.
- Privacy Filter is designed to spot eight types of personally identifiable information (PII): private persons, addresses, emails, phone numbers, URLs, dates, account numbers, and secrets like passwords or API keys.
- It features 1.5 billion total parameters and an active parameter count of 50 million, handling context windows of up to 128,000 tokens for large-scale documents and chat logs.
- On the PII-Masking-300k benchmark, the model achieved a 96% F1 score, increasing to 97.43% after correcting annotation errors, underscoring its accuracy in real-world PII detection.
- Licensed under Apache 2.0, Privacy Filter is available on Hugging Face and GitHub for local deployment and fine-tuning -- allowing compliance or in-house legal teams to customize protections.
OpenAI privacy engineer Charles de Bourcy said, "We think a strong ecosystem is one where more builders have usable tools and clear guidance and the ability to improve protections in their own environments." The company emphasized its model’s role in helping AI learn “about the world, not about private individuals.”
For legal teams tasked with client confidentiality, this kind of on-premises data redaction could strengthen compliance with privacy regulations, while reducing risk in generative AI workflows.
By the numbers:
- 1.5B — total parameters in the Privacy Filter model
- 128,000 — number of tokens in the model's context window
- 97.43% — F1 score achieved after annotation corrections
Yes, but: The model's performance across languages and adaptability to industry-specific privacy requirements is not yet detailed.