Auto-Redact Email, Phone & SSN from a PDF — Free Browser Tool

How to auto-detect and black-box pii across a pdf

Step 1
Open the canonical PDF PII redactor — This Security entry routes through to the real engine at /pdf-tools/pdf-pii-redactor. It is Pro-tier (minTier: pro). Free accounts can run other PDF privacy tools but this redactor needs Pro.
Step 2
Drop a text-layer PDF — Upload a single PDF that contains a real text layer (one file at a time — acceptsMultiple: false). Born-digital exports from Word, Google Docs, accounting software, and most government forms have a text layer. Scanned or photographed pages do not — OCR them first.
Step 3
Let the scanner walk every page — pdfjs reads each page's getTextContent() items; pdf-lib loads the same document. For each text item, the four PII regexes run in order — email, phone, SSN, credit-card — and the first match flags the item.
Step 4
Boxes are drawn over each matched item — When an item matches, a black rectangle is drawn at that item's x/y position spanning its full width and height (plus 2 pt). One box per text item is enough, so the whole span is covered — not just the matched substring.
Step 5
Download the redacted PDF — The result is saved as a new PDF blob and downloaded. Page count and the rest of the layout are preserved; only black boxes are added on top of matched text.
Step 6
Make it permanent before sharing — Critical: open the downloaded file and verify with Ctrl+A → copy — if redacted text still pastes out, the glyphs are still in the stream. Flatten/rasterise the PDF (print-to-PDF as image, or a flatten tool) so the boxes become pixels and the text is gone for good.

What the redactor detects (the four built-in patterns)

These are the exact PII_PATTERNS the PDF redactor runs against each text item, in order. There are no toggles or custom patterns — the set is fixed in code. Note these differ from the richer text-scrubber set (no IBAN, no UK NI, no Luhn check, no name detection).

PII class	What it matches	Validation	Notes / gotchas
Email	`local@domain.tld` — letters, digits, `._%+-` in the local part; a domain with a 2+ letter TLD	Regex shape only	Catches almost all real addresses; very long or unusual TLDs are fine. No DNS/validity check
Phone	Optional `+` country code, optional area code in parentheses, then 3–4 + 3–4 digit groups separated by space, dot, or dash	Regex shape only	Deliberately loose. Can also match other digit strings shaped like phones (invoice numbers, IDs) — see edge cases
US SSN	`NNN-NN-NNNN` — exactly 3-2-4 digits with literal dashes	Format only (no SSA invalid-block exclusion here)	Requires the dashes. `123456789` (no dashes) is NOT caught as an SSN by this pattern
Card number	A run of 13–16 digits, optionally separated by spaces or dashes	No Luhn check in the PDF redactor	Any 13–16 digit run matches, so long order/account numbers can be flagged. Strict 19-digit cards are not the target here

Redaction behaviour — what it does vs. what it does NOT do

The single most important table on this page. "Visual" means a filled rectangle is drawn over the text; the characters underneath are not removed.

Behaviour	Reality in this tool	Why it matters
Redaction method	Filled black `drawRectangle` over the matched text item (pdf-lib), at the item's coordinates from pdfjs	It is real ink on the page, visible in every viewer — but it is an overlay, not a deletion
Text removal	Not removed. The glyphs stay in the content stream	`Ctrl+A → copy` and text-extraction tools can still recover the "redacted" text — flatten/rasterise to fix
Redaction granularity	Whole text item (the run pdfjs returns), not the exact matched substring	An item like `Call 555-123-4567 now` gets one box over the whole run — adjacent words are covered too
Scanned / image PDFs	No text layer → zero text items → zero matches → nothing redacted	Image-only pages pass through untouched; OCR first or use a manual region tool
Review / preview UI	None surfaced — the tool returns the redacted PDF directly	There is no per-match confirm step or confidence list to approve; verify the output yourself
Options / settings	None (`needsOptions: false`) — patterns and box style are fixed	You cannot add a pattern, change the box colour, or redact only some classes

Tier and file limits (PDF family)

This redactor is gated at Pro (minTier: pro) and runs through the PDF tool family, so PDF-family file/page limits apply. One file at a time.

Tier	Max file size	Max pages	Files per run
Free	Tool gated — Pro required to run this redactor	—	—
Pro	50 MB	500 pages	5 (this tool: 1 at a time)
Developer	2 GB	10,000 pages	Unlimited (this tool: 1 at a time)

Cookbook

Real before/after page snippets from the kinds of documents FOIA and compliance teams redact. PII values are fabricated. "Before" is the page text; "After" shows what a viewer displays once boxes are drawn — and what copy/paste still recovers underneath.

A benefits letter with an SSN and email

Born-digital PDF from HR software — full text layer. The SSN is in dashed NNN-NN-NNNN form and the email is standard, so both are caught. Each whole text item is boxed.

Before (page text):
  Member: Dana Cole
  SSN: 532-19-4471
  Contact: dana.cole@example.org
  Plan: Gold PPO

After (what the viewer shows):
  Member: Dana Cole
  SSN: ███████████
  Contact: ███████████████████████
  Plan: Gold PPO

Verify: Ctrl+A -> copy still pastes:
  SSN: 532-19-4471
  Contact: dana.cole@example.org
  -> flatten/rasterise to remove the text for real.

An invoice where a long account number gets boxed

The card pattern matches any 13–16 digit run. A 16-digit purchase-order or account number on an invoice will be flagged as a card. This is over-redaction, not a card leak — but it shows why you should eyeball the output.

Before:
  PO Number: 4002 8812 3456 7890
  Card on file: ending 0042
  Amount: $1,240.00

After:
  PO Number: █████████████████████
  Card on file: ending 0042
  Amount: $1,240.00

The PO (16 digits) matched the card pattern and was boxed.
The masked 'ending 0042' was NOT (only 4 digits).

Phone pattern also covering nearby words

Redaction is per text item, not per substring. If pdfjs returns a phone number inside a longer run, the entire run is covered — useful when context itself is sensitive, surprising when it hides wanted text.

Before (single text item from pdfjs):
  'Reach the case officer at (202) 555-0148 ext 6'

After:
  '███████████████████████████████████████████'

The whole item is boxed because it contained a phone match,
not just the digits. Reflow/copy of that item recovers all of it.

Scanned FOIA page with no text layer

A photocopied, scanned packet has only images — no text items for pdfjs to read. The auto-redactor finds nothing and the page is returned untouched. OCR it first to add a text layer, or use a manual region tool.

Input: scanned_complaint_packet.pdf (image-only)

Scan result: 0 text items -> 0 matches -> 0 boxes
Output: identical pages, no redactions.

Fix path:
  1. Run OCR via /pdf-tools/pdf-ocr to add a text layer
  2. Re-run this redactor, OR
  3. Burn manual rectangles with /security-tools/signature-burner

Undashed SSN slips through

The SSN pattern requires the literal dashes (NNN-NN-NNNN). A 9-digit string with no separators is not matched as an SSN, and 9 digits is below the 13-digit card threshold — so it is missed entirely. Normalise SSNs to dashed form before redacting, or add a text pass.

Before:
  Taxpayer ID: 532194471
  SSN: 532-19-4471

After:
  Taxpayer ID: 532194471        <- NOT redacted (no dashes, < 13 digits)
  SSN: ███████████             <- redacted (dashed form matched)

Mitigation: search/replace IDs into dashed form first, or pre-scrub
the text with /security-tools/email-phone-scrubber.

Edge cases and what actually happens

Redacted text is still copy-pasteable

By design (visual only)

This is the headline caveat. The tool draws a filled rectangle over the text; it does NOT delete glyphs from the content stream. The code comment is explicit: "the glyphs underneath are still in the file's content stream." So Ctrl+A → copy, text extraction, and accessibility readers can recover the redacted values. For genuine removal, flatten or rasterise the downloaded PDF (e.g. print-to-PDF as an image) so the boxes become pixels.

Scanned / image-only PDF produces no redactions

No matches

Detection relies on pdfjs reading a text layer (getTextContent()). A scanned or photographed document has only images, so there are zero text items and zero matches — the file comes back unchanged. Add a text layer with PDF OCR first, then re-run, or burn manual rectangles with signature-burner.

Whole text item is boxed, not just the matched value

Expected

Redaction granularity is one box per matched text item. If pdfjs returns a phone or email inside a longer run (Reach us at (202) 555-0148 today), the entire run is covered. This over-covers neighbouring words. It is intentional ("one redaction box per item is enough") and usually safer, but check the output if you needed adjacent text to stay visible.

Long account / order number flagged as a card

Over-redaction

The card pattern matches any 13–16 digit run with optional spaces/dashes and performs no Luhn check. Purchase-order numbers, account IDs, and tracking numbers in that length range get boxed even though they aren't cards. That's a false positive in the safe direction (it hides, doesn't leak), but it can obscure wanted data — review the result.

SSN without dashes is not detected

Missed match

The SSN regex requires the dashed form NNN-NN-NNNN. A bare 532194471 is not matched as an SSN, and at 9 digits it is below the 13-digit card threshold, so it slips through entirely. Normalise IDs to dashed form before redacting, or pre-scrub the text content with email-phone-scrubber.

Names, addresses, dates of birth are not redacted

Out of scope

Only four classes are detected: email, phone, SSN, and card-number runs. There is no name, address, DOB, IBAN, or UK National Insurance detection in this PDF redactor (despite a registry FAQ mentioning "name patterns" — the code does not implement that). Redact those manually, or use signature-burner for arbitrary regions.

Text split across multiple items isn't matched

Missed match

Regexes run against each text item independently. If a PDF's text engine split an email or phone across two items (john.doe@ in one item, example.com in the next), neither fragment matches and nothing is boxed. This happens with justified text and certain export pipelines. Spot-check critical pages; flatten + re-OCR can re-flow text into single items.

Encrypted / password-protected PDF

Loaded with ignoreEncryption

The redactor loads with ignoreEncryption: true, so many lightly-protected PDFs open and process. Strongly encrypted files (those pdfjs/pdf-lib can't parse) will error out before scanning. Remove the password first with pdf-password-protect / an unlock tool, then redact.

Box doesn't perfectly cover rotated or skewed text

Visual mismatch possible

The rectangle is drawn axis-aligned at the item's x/y with its width/height. For rotated pages or text with unusual transforms, the box may sit slightly off the glyphs. Always visually inspect the rendered output before treating any page as redacted.

Free tier can't run this tool

Pro required

The redactor is gated at minTier: pro. On the Free tier the run is blocked before processing. Pro allows up to 50 MB / 500 pages; Developer raises that to 2 GB / 10,000 pages. The tool processes one PDF at a time.

Frequently asked questions

Is this real redaction — is the text actually removed?

No, and this is the most important thing to know. The tool draws a black rectangle over each matched text item with pdf-lib, but the underlying characters stay in the PDF content stream. The code comment says so directly. That means Ctrl+A → copy or any text-extraction tool can still recover the "redacted" values. Treat this as a fast visual pass, then flatten or rasterise the file (print-to-PDF as an image, or a flatten step) to delete the text for real before you share it.

What PII does it detect?

Four fixed patterns: emails, phone numbers, US Social Security Numbers in dashed NNN-NN-NNNN form, and runs of 13–16 digits (treated as card numbers). These are the exact PII_PATTERNS in the redactor. There is no IBAN, UK National Insurance, name, address, or date-of-birth detection in this PDF tool — that richer set lives in the text-based email-phone-scrubber.

Does it work on scanned PDFs?

No. Detection reads the PDF text layer via pdfjs. A scanned or photographed document is just images — zero text items, zero matches, nothing redacted. Run PDF OCR to add a text layer first, then re-run this redactor, or draw your own rectangles with signature-burner.

Why did it black out a whole sentence instead of just the email?

Redaction is per text item, not per substring. pdfjs returns text in runs, and if a match lands inside a longer run the whole run gets one box ("one redaction box per item is enough"). It over-covers neighbouring words, which is usually safer. If you need surrounding text visible, you'll have to redact that region manually instead.

Why was a long order number redacted as a card?

The card pattern matches any 13–16 digit run with optional spaces/dashes and does not run a Luhn check. Purchase-order numbers, account IDs, and tracking numbers in that length range get boxed too. It's a false positive in the safe direction — it hides rather than leaks — but eyeball the output if you needed those numbers to stay readable.

My SSN wasn't redacted — why?

The SSN pattern requires the literal dashes (NNN-NN-NNNN). A bare 9-digit string like 532194471 doesn't match, and 9 digits is below the 13-digit card threshold, so it's missed entirely. Reformat SSNs into dashed form before redacting, or pre-scrub the text with email-phone-scrubber first.

Can I choose which PII classes to redact, or add my own pattern?

No. The tool has no options (needsOptions: false). All four patterns always run, the box is always black, and you can't add a custom pattern or change the colour. If you need configurable masking with [REDACTED_*] labels, use the text-based email-phone-scrubber or csv-json-data-scrambler.

Is there a review step before I download?

No per-match review or confidence list is surfaced on this path — the tool returns the redacted PDF directly. You should open the result yourself, page through it, and verify (including a copy-paste check) before treating any document as redacted.

Does the file get uploaded anywhere?

No. The whole pipeline runs in your browser — pdfjs reads the pages, pdf-lib draws the boxes, and the result is saved locally. The PDF and its contents never leave your device, which is what makes it usable for HIPAA, GDPR, and FOIA source material.

What file size and page limits apply?

The redactor is gated at Pro. Pro allows up to 50 MB and 500 pages per PDF; Developer raises that to 2 GB and 10,000 pages. It processes one file at a time (acceptsMultiple: false). Free accounts can't run this particular tool.

How do I make the redaction permanent for a FOIA release?

Run this tool to place the boxes, then flatten or rasterise the output so the glyphs are destroyed: print-to-PDF as an image, or use a flatten tool, so each page becomes pixels with no recoverable text. Re-verify with copy-paste afterward — if nothing pastes from the boxed areas, the text is gone. Only then is the document safe to release.

What if my data is in a CSV, JSON, or plain text file instead of a PDF?

Use the text-native siblings. email-phone-scrubber replaces PII in pasted text or .txt with [REDACTED_*] labels (and supports a richer set including IBAN and UK NI), and csv-json-data-scrambler handles structured rows. Those genuinely remove/replace the values rather than covering them, since text formats have no glyph-layer problem.

Privacy first

Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.

How to auto-detect and black-box pii across a pdf

Step 1
Open the canonical PDF PII redactor — This Security entry routes through to the real engine at /pdf-tools/pdf-pii-redactor. It is Pro-tier (minTier: pro). Free accounts can run other PDF privacy tools but this redactor needs Pro.
Step 2
Drop a text-layer PDF — Upload a single PDF that contains a real text layer (one file at a time — acceptsMultiple: false). Born-digital exports from Word, Google Docs, accounting software, and most government forms have a text layer. Scanned or photographed pages do not — OCR them first.
Step 3
Let the scanner walk every page — pdfjs reads each page's getTextContent() items; pdf-lib loads the same document. For each text item, the four PII regexes run in order — email, phone, SSN, credit-card — and the first match flags the item.
Step 4
Boxes are drawn over each matched item — When an item matches, a black rectangle is drawn at that item's x/y position spanning its full width and height (plus 2 pt). One box per text item is enough, so the whole span is covered — not just the matched substring.
Step 5
Download the redacted PDF — The result is saved as a new PDF blob and downloaded. Page count and the rest of the layout are preserved; only black boxes are added on top of matched text.
Step 6
Make it permanent before sharing — Critical: open the downloaded file and verify with Ctrl+A → copy — if redacted text still pastes out, the glyphs are still in the stream. Flatten/rasterise the PDF (print-to-PDF as image, or a flatten tool) so the boxes become pixels and the text is gone for good.

What the redactor detects (the four built-in patterns)

PII class	What it matches	Validation	Notes / gotchas
Email	`local@domain.tld` — letters, digits, `._%+-` in the local part; a domain with a 2+ letter TLD	Regex shape only	Catches almost all real addresses; very long or unusual TLDs are fine. No DNS/validity check
Phone	Optional `+` country code, optional area code in parentheses, then 3–4 + 3–4 digit groups separated by space, dot, or dash	Regex shape only	Deliberately loose. Can also match other digit strings shaped like phones (invoice numbers, IDs) — see edge cases
US SSN	`NNN-NN-NNNN` — exactly 3-2-4 digits with literal dashes	Format only (no SSA invalid-block exclusion here)	Requires the dashes. `123456789` (no dashes) is NOT caught as an SSN by this pattern
Card number	A run of 13–16 digits, optionally separated by spaces or dashes	No Luhn check in the PDF redactor	Any 13–16 digit run matches, so long order/account numbers can be flagged. Strict 19-digit cards are not the target here

Redaction behaviour — what it does vs. what it does NOT do

The single most important table on this page. "Visual" means a filled rectangle is drawn over the text; the characters underneath are not removed.

Behaviour	Reality in this tool	Why it matters
Redaction method	Filled black `drawRectangle` over the matched text item (pdf-lib), at the item's coordinates from pdfjs	It is real ink on the page, visible in every viewer — but it is an overlay, not a deletion
Text removal	Not removed. The glyphs stay in the content stream	`Ctrl+A → copy` and text-extraction tools can still recover the "redacted" text — flatten/rasterise to fix
Redaction granularity	Whole text item (the run pdfjs returns), not the exact matched substring	An item like `Call 555-123-4567 now` gets one box over the whole run — adjacent words are covered too
Scanned / image PDFs	No text layer → zero text items → zero matches → nothing redacted	Image-only pages pass through untouched; OCR first or use a manual region tool
Review / preview UI	None surfaced — the tool returns the redacted PDF directly	There is no per-match confirm step or confidence list to approve; verify the output yourself
Options / settings	None (`needsOptions: false`) — patterns and box style are fixed	You cannot add a pattern, change the box colour, or redact only some classes

Tier and file limits (PDF family)

This redactor is gated at Pro (minTier: pro) and runs through the PDF tool family, so PDF-family file/page limits apply. One file at a time.

Tier	Max file size	Max pages	Files per run
Free	Tool gated — Pro required to run this redactor	—	—
Pro	50 MB	500 pages	5 (this tool: 1 at a time)
Developer	2 GB	10,000 pages	Unlimited (this tool: 1 at a time)

Cookbook

A benefits letter with an SSN and email

Born-digital PDF from HR software — full text layer. The SSN is in dashed NNN-NN-NNNN form and the email is standard, so both are caught. Each whole text item is boxed.

Before (page text):
  Member: Dana Cole
  SSN: 532-19-4471
  Contact: dana.cole@example.org
  Plan: Gold PPO

After (what the viewer shows):
  Member: Dana Cole
  SSN: ███████████
  Contact: ███████████████████████
  Plan: Gold PPO

Verify: Ctrl+A -> copy still pastes:
  SSN: 532-19-4471
  Contact: dana.cole@example.org
  -> flatten/rasterise to remove the text for real.

An invoice where a long account number gets boxed

Before:
  PO Number: 4002 8812 3456 7890
  Card on file: ending 0042
  Amount: $1,240.00

After:
  PO Number: █████████████████████
  Card on file: ending 0042
  Amount: $1,240.00

The PO (16 digits) matched the card pattern and was boxed.
The masked 'ending 0042' was NOT (only 4 digits).

Phone pattern also covering nearby words

Before (single text item from pdfjs):
  'Reach the case officer at (202) 555-0148 ext 6'

After:
  '███████████████████████████████████████████'

The whole item is boxed because it contained a phone match,
not just the digits. Reflow/copy of that item recovers all of it.

Scanned FOIA page with no text layer

Input: scanned_complaint_packet.pdf (image-only)

Scan result: 0 text items -> 0 matches -> 0 boxes
Output: identical pages, no redactions.

Fix path:
  1. Run OCR via /pdf-tools/pdf-ocr to add a text layer
  2. Re-run this redactor, OR
  3. Burn manual rectangles with /security-tools/signature-burner

Undashed SSN slips through

Before:
  Taxpayer ID: 532194471
  SSN: 532-19-4471

After:
  Taxpayer ID: 532194471        <- NOT redacted (no dashes, < 13 digits)
  SSN: ███████████             <- redacted (dashed form matched)

Mitigation: search/replace IDs into dashed form first, or pre-scrub
the text with /security-tools/email-phone-scrubber.

Edge cases and what actually happens

Redacted text is still copy-pasteable

By design (visual only)

Scanned / image-only PDF produces no redactions

No matches

Whole text item is boxed, not just the matched value

Expected

Long account / order number flagged as a card

Over-redaction

SSN without dashes is not detected

Missed match

Names, addresses, dates of birth are not redacted

Out of scope

Text split across multiple items isn't matched

Missed match

Encrypted / password-protected PDF

Loaded with ignoreEncryption

Box doesn't perfectly cover rotated or skewed text

Visual mismatch possible

Free tier can't run this tool

Pro required

Frequently asked questions

Is this real redaction — is the text actually removed?

What PII does it detect?

Does it work on scanned PDFs?

Why did it black out a whole sentence instead of just the email?

Why was a long order number redacted as a card?

My SSN wasn't redacted — why?

Can I choose which PII classes to redact, or add my own pattern?

Is there a review step before I download?

Does the file get uploaded anywhere?

What file size and page limits apply?

How do I make the redaction permanent for a FOIA release?

What if my data is in a CSV, JSON, or plain text file instead of a PDF?

Privacy first

Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.

Auto-Detect and Black-Box PII Across a PDF

How to auto-detect and black-box pii across a pdf

What the redactor detects (the four built-in patterns)

Redaction behaviour — what it does vs. what it does NOT do

Tier and file limits (PDF family)

Cookbook

A benefits letter with an SSN and email

An invoice where a long account number gets boxed

Phone pattern also covering nearby words

Scanned FOIA page with no text layer

Undashed SSN slips through

Edge cases and what actually happens

Redacted text is still copy-pasteable

Scanned / image-only PDF produces no redactions

Whole text item is boxed, not just the matched value

Long account / order number flagged as a card

SSN without dashes is not detected

Names, addresses, dates of birth are not redacted

Text split across multiple items isn't matched

Encrypted / password-protected PDF

Box doesn't perfectly cover rotated or skewed text

Free tier can't run this tool

Frequently asked questions

Is this real redaction — is the text actually removed?

What PII does it detect?

Does it work on scanned PDFs?

Why did it black out a whole sentence instead of just the email?

Why was a long order number redacted as a card?

My SSN wasn't redacted — why?

Can I choose which PII classes to redact, or add my own pattern?

Is there a review step before I download?

Does the file get uploaded anywhere?

What file size and page limits apply?

How do I make the redaction permanent for a FOIA release?

What if my data is in a CSV, JSON, or plain text file instead of a PDF?

Privacy first

Related guides

Auto-Detect and Black-Box PII Across a PDF

How to auto-detect and black-box pii across a pdf

What the redactor detects (the four built-in patterns)

Redaction behaviour — what it does vs. what it does NOT do

Tier and file limits (PDF family)

Cookbook

A benefits letter with an SSN and email

An invoice where a long account number gets boxed

Phone pattern also covering nearby words

Scanned FOIA page with no text layer

Undashed SSN slips through

Edge cases and what actually happens

Redacted text is still copy-pasteable

Scanned / image-only PDF produces no redactions

Whole text item is boxed, not just the matched value

Long account / order number flagged as a card

SSN without dashes is not detected

Names, addresses, dates of birth are not redacted

Text split across multiple items isn't matched

Encrypted / password-protected PDF

Box doesn't perfectly cover rotated or skewed text

Free tier can't run this tool

Frequently asked questions

Is this real redaction — is the text actually removed?

What PII does it detect?

Does it work on scanned PDFs?

Why did it black out a whole sentence instead of just the email?

Why was a long order number redacted as a card?

My SSN wasn't redacted — why?

Can I choose which PII classes to redact, or add my own pattern?

Is there a review step before I download?

Does the file get uploaded anywhere?

What file size and page limits apply?

How do I make the redaction permanent for a FOIA release?

What if my data is in a CSV, JSON, or plain text file instead of a PDF?

Privacy first

Related guides