How to redact personal data from a pdf for gdpr
- Step 1Map your PII against the four patterns — List the personal data in the document. Emails, phones, dash-SSNs, and card numbers are auto-detected. Names, addresses, DOBs, NI numbers, and any free-text identifiers are NOT — flag those for manual redaction.
- Step 2Ensure a text layer exists — Detection reads text via pdf.js. Scanned correspondence needs PDF OCR first to create a text layer; verify the OCR result.
- Step 3Open the redactor and drop the file — Load the document into the PDF PII Redactor. Browser-only, nothing uploaded, no options panel. It runs immediately.
- Step 4Redact the un-detected identifiers manually — Because the tool cannot box names/addresses/DOBs, handle those separately — for example by editing the source before export, or by overlaying covers in your PDF editor — then run this tool for the contact/account fields.
- Step 5Flatten / rasterise the output — The boxes are visual; the data survives beneath them. Rasterise via PDF to PNG + Image to PDF, or use PDF Flatten, so the personal data is actually destroyed.
- Step 6Scrub metadata and verify — Run Metadata Scrubber to strip author/producer/dates, then PDF to Text to confirm no detected PII remains extractable. Keep a record of your manual redactions for your DSAR audit trail.
What is detected vs. what you must redact manually
The engine has exactly four patterns. Anything not in the 'Detected' column is your responsibility — there is no name, address, or DOB detection.
| PII type | Auto-detected? | How to handle it |
|---|---|---|
| Email address | Yes | Boxed by the email pattern (canonical ASCII addresses) |
| Phone / fax number | Yes | Boxed by the shape-based numeric pattern (UK/US/intl groupings) |
US SSN (nnn-nn-nnnn) | Yes (dash form only) | Boxed; un-hyphenated and space forms are missed |
| Credit/debit card number | Yes (13–16 digits) | Boxed by the card pattern |
| Personal name | No | Manual — no name pattern exists in the tool |
| Postal address | No | Manual — no address pattern |
| Date of birth | No | Manual — no date pattern |
| UK National Insurance number | No | Manual — no NI pattern (despite older claims) |
| Passport / driving-licence number | No | Manual — not in the four patterns |
Why a visual box is not yet GDPR-safe
Disclosure under GDPR must not leak the data. A drawn box that hides but does not delete is not enough on its own.
| Step | State of the personal data | Disclosure-safe? |
|---|---|---|
| After running the redactor | Hidden under black boxes; glyphs remain in the content stream | No — recoverable by extraction |
| After flattening / rasterising | Glyphs destroyed; only an image of a black box remains | Yes, for the detected fields |
| After manual redaction of names/addresses | Un-detected identifiers covered and flattened too | Yes, for those you handled |
| After metadata scrub | Author/producer/date fields cleared | Yes — removes identity leaks in properties |
Cookbook
DSAR / erasure scenarios showing exactly what the tool does and what you still owe. 'Detected' fields are boxed automatically; 'manual' fields are not.
A DSAR letter — what the tool catches
Contact details are boxed automatically; the data subject's name and address are not.
Source line: Dear Mr John Smith, 14 Rowan Ave, London — tel 020 7946 0991, jsmith@acme.com After auto-redaction: Dear Mr John Smith, 14 Rowan Ave, London — tel [ ████████ ], [ ██████████ ] ^ name + address NOT boxed (no name/address pattern) -> redact manually
Article 17 erasure of a third party
You must erase a third party's details from a record you keep. The tool boxes their email/phone; their name must be handled separately.
Before: Complaint about Jane Doe (jane.doe@x.com, 07700 900123)
After: Complaint about Jane Doe ([ ██████████ ], [ ████████ ])
^ "Jane Doe" still visible -> manual redaction requiredCard number on an invoice in the bundle
A 13–16 digit card number is boxed by the card pattern in the same pass.
Before: Paid by card 4111 1111 1111 1111 After: Paid by card [ ████████████████ ] (13-16 digit run matched the credit-card pattern)
The two-stage workflow
Manual first for the un-detected identifiers, then automated for the contact/account fields, then flatten.
1. Cover names/addresses/DOBs in your editor (un-detected types) 2. pdf-pii-redactor -> boxes emails/phones/SSNs/cards 3. pdf-to-png + image-to-pdf (or pdf-flatten) -> destroy glyphs 4. pdf-metadata-scrubber -> clear author/producer/dates 5. pdf-to-text -> confirm no detected PII extracts
Proving the redaction held
The verification step for your DSAR audit trail.
After flattening, run pdf-to-text on the output: - search for '@' -> no email survives - search for digits -> no phone/SSN/card survives - the boxed regions now extract as nothing (rasterised)
Edge cases and what actually happens
Names, addresses, and DOBs are not detected
Not coveredThe engine has no name, address, or date pattern — only email, phone, dash-SSN, and card. The bulk of GDPR-relevant identifiers in prose (names, postal addresses, dates of birth) must be redacted manually. Treat this tool as the contact/account-number step, not the whole job.
UK National Insurance number not detected
Not coveredThere is no NI-number pattern in the tool, despite any older documentation that implied one. NI numbers, passport numbers, and licence numbers must be redacted manually.
Boxes hide but do not delete
RecoverableUntil you flatten or rasterise, the personal data sits in the content stream beneath each box and can be extracted. A DSAR disclosure that still contains extractable third-party data is a breach — always finish with PDF Flatten or a PDF to PNG round-trip.
Scanned correspondence with no text layer
0 matchesDetection needs extractable text. Run PDF OCR first, then verify — OCR errors can leave PII visible.
Un-hyphenated SSN / space-delimited numbers
PartialThe SSN rule matches only nnn-nn-nnnn. Compact or space-delimited identifiers slip through. Review numeric identifiers manually.
Over-redaction of surrounding text
By designEach match boxes the whole pdf.js text item, so adjacent words are covered. For disclosure this is usually acceptable, but check you did not hide context the requester is entitled to.
Encrypted disclosure file
fails to parseAn encrypted PDF cannot be read by pdf.js without the open password, so the redaction will not run. Unlock with PDF Unlock or Remove Password first.
PII in metadata, not the page
Not coveredAuthor, producer, and title fields can carry names and software identity. The redactor scans page text only. Run Metadata Scrubber to clear those.
PII inside annotations or form fields
Not coveredComments and form-field values are not page content text. Remove them with Annotation Remover or Flatten before redacting.
Disclosure file over the tier cap
rejectedFree tier rejects PDFs over 2 MB / 50 pages. Upgrade (Pro: 50 MB / 500 pages; Pro Media: 500 MB / 2000 pages) or split with PDF Split by Range.
Frequently asked questions
Is this a one-click GDPR redaction tool?
No, and it is important to be clear about that. It auto-detects four PII types — email, phone, dash-delimited SSN, and 13–16 digit card numbers — and boxes them. It does not detect names, postal addresses, dates of birth, or NI/passport numbers, which are the bulk of GDPR-relevant identifiers in prose. Use it as a fast pass for contact and account details, then redact the rest manually and flatten the result.
Which PII types does it actually detect?
Exactly four: email addresses, phone-shaped numbers (UK/US/international groupings), US SSNs in nnn-nn-nnnn form, and credit/debit card numbers of 13–16 digits. There are no other patterns — no name, address, date, or national-ID detection.
Does it detect UK National Insurance numbers?
No. There is no NI-number pattern in the tool. NI numbers, passport numbers, and driving-licence numbers must be redacted manually. (Earlier marketing copy that implied NI detection was inaccurate — the engine has only the four patterns.)
Does it detect names and addresses?
No. There is no name or address pattern. Names like 'Jane Doe' and postal addresses are left fully visible. You must redact those manually before disclosure — for example by editing the source document, or by covering them in a PDF editor and flattening.
Is a redaction from this tool safe to disclose?
Not until you flatten it. By default the tool draws boxes but leaves the underlying glyphs in the content stream, so the data is recoverable by text extraction — which would be a breach in a DSAR. Rasterise the output (PDF to PNG + Image to PDF) or run PDF Flatten, then verify with PDF to Text.
Does running this tool count as 'processing' under GDPR?
It runs entirely in your browser — no upload, no third-party processor, no transfer — so it does not add a new processing location or sub-processor. The personal data never leaves your device. That makes it a privacy-friendly default compared to upload-based redaction services, but your overall DSAR handling is still your responsibility.
Can I select which PII categories to redact?
No. There is no category selector or options panel; all four patterns auto-run together the moment you drop the file. You cannot, for example, box only emails.
How do I handle the identifiers it cannot detect?
Redact them manually before or after running the tool. A common approach: cover names, addresses, and dates of birth in your PDF editor and flatten, then run this tool for emails/phones/SSNs/cards, then flatten again so everything is destroyed. Keep a log of manual redactions for your audit trail.
Does it scrub metadata that might identify someone?
No — it only scans the page text layer. Author, producer, and title fields can leak a name or the software used. Run Metadata Scrubber as a separate step to clear those.
What about a scanned DSAR bundle?
Detection reads text, so a scan must be OCR'd first with PDF OCR. Verify the OCR output, because recognition errors can leave PII unmatched and therefore unredacted.
What is the limit on file size?
Free: 2 MB / 50 pages. Pro: 50 MB / 500 pages. Pro Media: 500 MB / 2000 pages. Split a large disclosure file with PDF Split by Range, redact each part, then recombine.
What is a defensible end-to-end GDPR redaction workflow?
Map your PII against the four detectable types → OCR if scanned → manually redact names, addresses, DOBs, and national IDs the tool cannot detect → run the PII Redactor for contact/account fields → flatten or rasterise so all boxed data is destroyed → scrub document properties with Metadata Scrubber → confirm with PDF to Text that nothing extracts → record what you did for the DSAR audit trail.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.