GDPR PII Redaction for PDFs — What This Tool Does (and Doesn't)

How to redact personal data from a pdf for gdpr

Step 1
Map your PII against the four patterns — List the personal data in the document. Emails, phones, dash-SSNs, and card numbers are auto-detected. Names, addresses, DOBs, NI numbers, and any free-text identifiers are NOT — flag those for manual redaction.
Step 2
Ensure a text layer exists — Detection reads text via pdf.js. Scanned correspondence needs PDF OCR first to create a text layer; verify the OCR result.
Step 3
Open the redactor and drop the file — Load the document into the PDF PII Redactor. Browser-only, nothing uploaded, no options panel. It runs immediately.
Step 4
Redact the un-detected identifiers manually — Because the tool cannot box names/addresses/DOBs, handle those separately — for example by editing the source before export, or by overlaying covers in your PDF editor — then run this tool for the contact/account fields.
Step 5
Flatten / rasterise the output — The boxes are visual; the data survives beneath them. Rasterise via PDF to PNG + Image to PDF, or use PDF Flatten, so the personal data is actually destroyed.
Step 6
Scrub metadata and verify — Run Metadata Scrubber to strip author/producer/dates, then PDF to Text to confirm no detected PII remains extractable. Keep a record of your manual redactions for your DSAR audit trail.

What is detected vs. what you must redact manually

The engine has exactly four patterns. Anything not in the 'Detected' column is your responsibility — there is no name, address, or DOB detection.

PII type	Auto-detected?	How to handle it
Email address	Yes	Boxed by the email pattern (canonical ASCII addresses)
Phone / fax number	Yes	Boxed by the shape-based numeric pattern (UK/US/intl groupings)
US SSN (`nnn-nn-nnnn`)	Yes (dash form only)	Boxed; un-hyphenated and space forms are missed
Credit/debit card number	Yes (13–16 digits)	Boxed by the card pattern
Personal name	No	Manual — no name pattern exists in the tool
Postal address	No	Manual — no address pattern
Date of birth	No	Manual — no date pattern
UK National Insurance number	No	Manual — no NI pattern (despite older claims)
Passport / driving-licence number	No	Manual — not in the four patterns

Why a visual box is not yet GDPR-safe

Disclosure under GDPR must not leak the data. A drawn box that hides but does not delete is not enough on its own.

Step	State of the personal data	Disclosure-safe?
After running the redactor	Hidden under black boxes; glyphs remain in the content stream	No — recoverable by extraction
After flattening / rasterising	Glyphs destroyed; only an image of a black box remains	Yes, for the detected fields
After manual redaction of names/addresses	Un-detected identifiers covered and flattened too	Yes, for those you handled
After metadata scrub	Author/producer/date fields cleared	Yes — removes identity leaks in properties

Cookbook

DSAR / erasure scenarios showing exactly what the tool does and what you still owe. 'Detected' fields are boxed automatically; 'manual' fields are not.

A DSAR letter — what the tool catches

Contact details are boxed automatically; the data subject's name and address are not.

Source line:
  Dear Mr John Smith, 14 Rowan Ave, London — tel 020 7946 0991, jsmith@acme.com
After auto-redaction:
  Dear Mr John Smith, 14 Rowan Ave, London — tel [ ████████ ], [ ██████████ ]
  ^ name + address NOT boxed (no name/address pattern) -> redact manually

Article 17 erasure of a third party

You must erase a third party's details from a record you keep. The tool boxes their email/phone; their name must be handled separately.

Before: Complaint about Jane Doe (jane.doe@x.com, 07700 900123)
After:  Complaint about Jane Doe ([ ██████████ ], [ ████████ ])
        ^ "Jane Doe" still visible -> manual redaction required

Card number on an invoice in the bundle

A 13–16 digit card number is boxed by the card pattern in the same pass.

Before: Paid by card 4111 1111 1111 1111
After:  Paid by card [ ████████████████ ]
(13-16 digit run matched the credit-card pattern)

The two-stage workflow

Manual first for the un-detected identifiers, then automated for the contact/account fields, then flatten.

1. Cover names/addresses/DOBs in your editor (un-detected types)
2. pdf-pii-redactor -> boxes emails/phones/SSNs/cards
3. pdf-to-png + image-to-pdf (or pdf-flatten) -> destroy glyphs
4. pdf-metadata-scrubber -> clear author/producer/dates
5. pdf-to-text -> confirm no detected PII extracts

Proving the redaction held

The verification step for your DSAR audit trail.

After flattening, run pdf-to-text on the output:
  - search for '@'      -> no email survives
  - search for digits   -> no phone/SSN/card survives
  - the boxed regions now extract as nothing (rasterised)

Edge cases and what actually happens

Names, addresses, and DOBs are not detected

Not covered

The engine has no name, address, or date pattern — only email, phone, dash-SSN, and card. The bulk of GDPR-relevant identifiers in prose (names, postal addresses, dates of birth) must be redacted manually. Treat this tool as the contact/account-number step, not the whole job.

UK National Insurance number not detected

Not covered

There is no NI-number pattern in the tool, despite any older documentation that implied one. NI numbers, passport numbers, and licence numbers must be redacted manually.

Boxes hide but do not delete

Recoverable

Until you flatten or rasterise, the personal data sits in the content stream beneath each box and can be extracted. A DSAR disclosure that still contains extractable third-party data is a breach — always finish with PDF Flatten or a PDF to PNG round-trip.

Scanned correspondence with no text layer

0 matches

Detection needs extractable text. Run PDF OCR first, then verify — OCR errors can leave PII visible.

Un-hyphenated SSN / space-delimited numbers

Partial

The SSN rule matches only nnn-nn-nnnn. Compact or space-delimited identifiers slip through. Review numeric identifiers manually.

Over-redaction of surrounding text

By design

Each match boxes the whole pdf.js text item, so adjacent words are covered. For disclosure this is usually acceptable, but check you did not hide context the requester is entitled to.

Encrypted disclosure file

fails to parse

An encrypted PDF cannot be read by pdf.js without the open password, so the redaction will not run. Unlock with PDF Unlock or Remove Password first.

PII in metadata, not the page

Not covered

Author, producer, and title fields can carry names and software identity. The redactor scans page text only. Run Metadata Scrubber to clear those.

PII inside annotations or form fields

Not covered

Comments and form-field values are not page content text. Remove them with Annotation Remover or Flatten before redacting.

Disclosure file over the tier cap

rejected

Free tier rejects PDFs over 2 MB / 50 pages. Upgrade (Pro: 50 MB / 500 pages; Pro Media: 500 MB / 2000 pages) or split with PDF Split by Range.

Frequently asked questions

Is this a one-click GDPR redaction tool?

No, and it is important to be clear about that. It auto-detects four PII types — email, phone, dash-delimited SSN, and 13–16 digit card numbers — and boxes them. It does not detect names, postal addresses, dates of birth, or NI/passport numbers, which are the bulk of GDPR-relevant identifiers in prose. Use it as a fast pass for contact and account details, then redact the rest manually and flatten the result.

Which PII types does it actually detect?

Exactly four: email addresses, phone-shaped numbers (UK/US/international groupings), US SSNs in nnn-nn-nnnn form, and credit/debit card numbers of 13–16 digits. There are no other patterns — no name, address, date, or national-ID detection.

Does it detect UK National Insurance numbers?

No. There is no NI-number pattern in the tool. NI numbers, passport numbers, and driving-licence numbers must be redacted manually. (Earlier marketing copy that implied NI detection was inaccurate — the engine has only the four patterns.)

Does it detect names and addresses?

No. There is no name or address pattern. Names like 'Jane Doe' and postal addresses are left fully visible. You must redact those manually before disclosure — for example by editing the source document, or by covering them in a PDF editor and flattening.

Is a redaction from this tool safe to disclose?

Not until you flatten it. By default the tool draws boxes but leaves the underlying glyphs in the content stream, so the data is recoverable by text extraction — which would be a breach in a DSAR. Rasterise the output (PDF to PNG + Image to PDF) or run PDF Flatten, then verify with PDF to Text.

Does running this tool count as 'processing' under GDPR?

It runs entirely in your browser — no upload, no third-party processor, no transfer — so it does not add a new processing location or sub-processor. The personal data never leaves your device. That makes it a privacy-friendly default compared to upload-based redaction services, but your overall DSAR handling is still your responsibility.

Can I select which PII categories to redact?

No. There is no category selector or options panel; all four patterns auto-run together the moment you drop the file. You cannot, for example, box only emails.

How do I handle the identifiers it cannot detect?

Redact them manually before or after running the tool. A common approach: cover names, addresses, and dates of birth in your PDF editor and flatten, then run this tool for emails/phones/SSNs/cards, then flatten again so everything is destroyed. Keep a log of manual redactions for your audit trail.

Does it scrub metadata that might identify someone?

No — it only scans the page text layer. Author, producer, and title fields can leak a name or the software used. Run Metadata Scrubber as a separate step to clear those.

What about a scanned DSAR bundle?

Detection reads text, so a scan must be OCR'd first with PDF OCR. Verify the OCR output, because recognition errors can leave PII unmatched and therefore unredacted.

What is the limit on file size?

Free: 2 MB / 50 pages. Pro: 50 MB / 500 pages. Pro Media: 500 MB / 2000 pages. Split a large disclosure file with PDF Split by Range, redact each part, then recombine.

What is a defensible end-to-end GDPR redaction workflow?

Map your PII against the four detectable types → OCR if scanned → manually redact names, addresses, DOBs, and national IDs the tool cannot detect → run the PII Redactor for contact/account fields → flatten or rasterise so all boxed data is destroyed → scrub document properties with Metadata Scrubber → confirm with PDF to Text that nothing extracts → record what you did for the DSAR audit trail.

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

How to redact personal data from a pdf for gdpr

Step 1
Map your PII against the four patterns — List the personal data in the document. Emails, phones, dash-SSNs, and card numbers are auto-detected. Names, addresses, DOBs, NI numbers, and any free-text identifiers are NOT — flag those for manual redaction.
Step 2
Ensure a text layer exists — Detection reads text via pdf.js. Scanned correspondence needs PDF OCR first to create a text layer; verify the OCR result.
Step 3
Open the redactor and drop the file — Load the document into the PDF PII Redactor. Browser-only, nothing uploaded, no options panel. It runs immediately.
Step 4
Redact the un-detected identifiers manually — Because the tool cannot box names/addresses/DOBs, handle those separately — for example by editing the source before export, or by overlaying covers in your PDF editor — then run this tool for the contact/account fields.
Step 5
Flatten / rasterise the output — The boxes are visual; the data survives beneath them. Rasterise via PDF to PNG + Image to PDF, or use PDF Flatten, so the personal data is actually destroyed.
Step 6
Scrub metadata and verify — Run Metadata Scrubber to strip author/producer/dates, then PDF to Text to confirm no detected PII remains extractable. Keep a record of your manual redactions for your DSAR audit trail.

What is detected vs. what you must redact manually

The engine has exactly four patterns. Anything not in the 'Detected' column is your responsibility — there is no name, address, or DOB detection.

PII type	Auto-detected?	How to handle it
Email address	Yes	Boxed by the email pattern (canonical ASCII addresses)
Phone / fax number	Yes	Boxed by the shape-based numeric pattern (UK/US/intl groupings)
US SSN (`nnn-nn-nnnn`)	Yes (dash form only)	Boxed; un-hyphenated and space forms are missed
Credit/debit card number	Yes (13–16 digits)	Boxed by the card pattern
Personal name	No	Manual — no name pattern exists in the tool
Postal address	No	Manual — no address pattern
Date of birth	No	Manual — no date pattern
UK National Insurance number	No	Manual — no NI pattern (despite older claims)
Passport / driving-licence number	No	Manual — not in the four patterns

Why a visual box is not yet GDPR-safe

Disclosure under GDPR must not leak the data. A drawn box that hides but does not delete is not enough on its own.

Step	State of the personal data	Disclosure-safe?
After running the redactor	Hidden under black boxes; glyphs remain in the content stream	No — recoverable by extraction
After flattening / rasterising	Glyphs destroyed; only an image of a black box remains	Yes, for the detected fields
After manual redaction of names/addresses	Un-detected identifiers covered and flattened too	Yes, for those you handled
After metadata scrub	Author/producer/date fields cleared	Yes — removes identity leaks in properties

Cookbook

DSAR / erasure scenarios showing exactly what the tool does and what you still owe. 'Detected' fields are boxed automatically; 'manual' fields are not.

A DSAR letter — what the tool catches

Contact details are boxed automatically; the data subject's name and address are not.

Source line:
  Dear Mr John Smith, 14 Rowan Ave, London — tel 020 7946 0991, jsmith@acme.com
After auto-redaction:
  Dear Mr John Smith, 14 Rowan Ave, London — tel [ ████████ ], [ ██████████ ]
  ^ name + address NOT boxed (no name/address pattern) -> redact manually

Article 17 erasure of a third party

You must erase a third party's details from a record you keep. The tool boxes their email/phone; their name must be handled separately.

Before: Complaint about Jane Doe (jane.doe@x.com, 07700 900123)
After:  Complaint about Jane Doe ([ ██████████ ], [ ████████ ])
        ^ "Jane Doe" still visible -> manual redaction required

Card number on an invoice in the bundle

A 13–16 digit card number is boxed by the card pattern in the same pass.

Before: Paid by card 4111 1111 1111 1111
After:  Paid by card [ ████████████████ ]
(13-16 digit run matched the credit-card pattern)

The two-stage workflow

Manual first for the un-detected identifiers, then automated for the contact/account fields, then flatten.

1. Cover names/addresses/DOBs in your editor (un-detected types)
2. pdf-pii-redactor -> boxes emails/phones/SSNs/cards
3. pdf-to-png + image-to-pdf (or pdf-flatten) -> destroy glyphs
4. pdf-metadata-scrubber -> clear author/producer/dates
5. pdf-to-text -> confirm no detected PII extracts

Proving the redaction held

The verification step for your DSAR audit trail.

After flattening, run pdf-to-text on the output:
  - search for '@'      -> no email survives
  - search for digits   -> no phone/SSN/card survives
  - the boxed regions now extract as nothing (rasterised)

Edge cases and what actually happens

Names, addresses, and DOBs are not detected

Not covered

UK National Insurance number not detected

Not covered

There is no NI-number pattern in the tool, despite any older documentation that implied one. NI numbers, passport numbers, and licence numbers must be redacted manually.

Boxes hide but do not delete

Recoverable

Scanned correspondence with no text layer

0 matches

Detection needs extractable text. Run PDF OCR first, then verify — OCR errors can leave PII visible.

Un-hyphenated SSN / space-delimited numbers

Partial

The SSN rule matches only nnn-nn-nnnn. Compact or space-delimited identifiers slip through. Review numeric identifiers manually.

Over-redaction of surrounding text

By design

Each match boxes the whole pdf.js text item, so adjacent words are covered. For disclosure this is usually acceptable, but check you did not hide context the requester is entitled to.

Encrypted disclosure file

fails to parse

An encrypted PDF cannot be read by pdf.js without the open password, so the redaction will not run. Unlock with PDF Unlock or Remove Password first.

PII in metadata, not the page

Not covered

Author, producer, and title fields can carry names and software identity. The redactor scans page text only. Run Metadata Scrubber to clear those.

PII inside annotations or form fields

Not covered

Comments and form-field values are not page content text. Remove them with Annotation Remover or Flatten before redacting.

Disclosure file over the tier cap

rejected

Free tier rejects PDFs over 2 MB / 50 pages. Upgrade (Pro: 50 MB / 500 pages; Pro Media: 500 MB / 2000 pages) or split with PDF Split by Range.

Frequently asked questions

Is this a one-click GDPR redaction tool?

Which PII types does it actually detect?

Does it detect UK National Insurance numbers?

Does it detect names and addresses?

Is a redaction from this tool safe to disclose?

Does running this tool count as 'processing' under GDPR?

Can I select which PII categories to redact?

No. There is no category selector or options panel; all four patterns auto-run together the moment you drop the file. You cannot, for example, box only emails.

How do I handle the identifiers it cannot detect?

Does it scrub metadata that might identify someone?

No — it only scans the page text layer. Author, producer, and title fields can leak a name or the software used. Run Metadata Scrubber as a separate step to clear those.

What about a scanned DSAR bundle?

Detection reads text, so a scan must be OCR'd first with PDF OCR. Verify the OCR output, because recognition errors can leave PII unmatched and therefore unredacted.

What is the limit on file size?

Free: 2 MB / 50 pages. Pro: 50 MB / 500 pages. Pro Media: 500 MB / 2000 pages. Split a large disclosure file with PDF Split by Range, redact each part, then recombine.

What is a defensible end-to-end GDPR redaction workflow?

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

Redact Personal Data from a PDF for GDPR

How to redact personal data from a pdf for gdpr

What is detected vs. what you must redact manually

Why a visual box is not yet GDPR-safe

Cookbook

A DSAR letter — what the tool catches

Article 17 erasure of a third party

Card number on an invoice in the bundle

The two-stage workflow

Proving the redaction held

Edge cases and what actually happens

Names, addresses, and DOBs are not detected

UK National Insurance number not detected

Boxes hide but do not delete

Scanned correspondence with no text layer

Un-hyphenated SSN / space-delimited numbers

Over-redaction of surrounding text

Encrypted disclosure file

PII in metadata, not the page

PII inside annotations or form fields

Disclosure file over the tier cap

Frequently asked questions

Is this a one-click GDPR redaction tool?

Which PII types does it actually detect?

Does it detect UK National Insurance numbers?

Does it detect names and addresses?

Is a redaction from this tool safe to disclose?

Does running this tool count as 'processing' under GDPR?

Can I select which PII categories to redact?

How do I handle the identifiers it cannot detect?

Does it scrub metadata that might identify someone?

What about a scanned DSAR bundle?

What is the limit on file size?

What is a defensible end-to-end GDPR redaction workflow?

Privacy first

Related guides

Redact Personal Data from a PDF for GDPR

How to redact personal data from a pdf for gdpr

What is detected vs. what you must redact manually

Why a visual box is not yet GDPR-safe

Cookbook

A DSAR letter — what the tool catches

Article 17 erasure of a third party

Card number on an invoice in the bundle

The two-stage workflow

Proving the redaction held

Edge cases and what actually happens

Names, addresses, and DOBs are not detected

UK National Insurance number not detected

Boxes hide but do not delete

Scanned correspondence with no text layer

Un-hyphenated SSN / space-delimited numbers

Over-redaction of surrounding text

Encrypted disclosure file

PII in metadata, not the page

PII inside annotations or form fields

Disclosure file over the tier cap

Frequently asked questions

Is this a one-click GDPR redaction tool?

Which PII types does it actually detect?

Does it detect UK National Insurance numbers?

Does it detect names and addresses?

Is a redaction from this tool safe to disclose?

Does running this tool count as 'processing' under GDPR?

Can I select which PII categories to redact?

How do I handle the identifiers it cannot detect?

Does it scrub metadata that might identify someone?

What about a scanned DSAR bundle?

What is the limit on file size?

What is a defensible end-to-end GDPR redaction workflow?

Privacy first

Related guides