How to auto pii redaction in the browser vs. manual acrobat redaction
- Step 1Open the canonical PDF PII redactor — This Security entry routes to the real engine at /pdf-tools/pdf-pii-redactor. It is Pro-tier (
minTier: pro); Free accounts can't run this redactor. - Step 2Upload one text-layer PDF — Drop a single file (
acceptsMultiple: false). Born-digital documents have a real text layer for the auto-detector to scan. Scanned pages don't — those are exactly the case where Acrobat's manual marking (or OCR first) still wins. - Step 3Let auto-detection do the find — pdfjs reads each page's
getTextContent()items; pdf-lib loads the same document. The four patterns run per item — email, phone, SSN, card — and the first match flags it. This is the step Acrobat makes you do by hand. - Step 4Black boxes are drawn at each match — A filled rectangle is drawn at each matched item's
x/yspanning its width and height (plus 2 pt). This is a genuine filled box, unlike the recoverable 'black highlight' a careless reader applies. - Step 5Download and review the boxed PDF — The result is saved as a new PDF blob with layout preserved. Page through it and confirm the matches — there is no per-match approval UI on this path, so the review is on you.
- Step 6Flatten / rasterise to match Acrobat's permanence — This is the step that closes the gap. Run
Ctrl+A → copy; if text still pastes, flatten (/pdf-tools/pdf-flatten) or rasterise (/pdf-tools/pdf-to-image-strip) so the boxes become pixels — now the text is gone like Acrobat's true redaction.
This browser tool vs. Acrobat's Redact vs. a fake black highlight
Three things people call 'redaction'. The key column is permanence: only true content-stream removal (Acrobat, or this tool after a flatten) actually deletes the text.
| Capability | This tool (auto + flatten) | Acrobat Redact | Black highlight in a reader |
|---|---|---|---|
| Finds PII automatically | Yes — 4 fixed patterns | No — you mark each item | No |
| Text removed from stream | Only after you flatten/rasterise | Yes, immediately | No — fully recoverable |
| Cost | Free (Pro tier on JAD) | Paid licence | Free |
| Uploads your file | No — runs in your browser | No (desktop app) | No |
| Works on scanned PDFs | No (needs text layer) | Manual marking works | Manual works |
What the auto-detector catches (the four fixed patterns)
The exact PII_PATTERNS this tool runs, in order. No toggles, no custom patterns — this is the trade for 'automatic'. Acrobat lets you search any pattern; this tool gives you these four with zero setup.
| PII class | What it matches | Validation | Gotcha |
|---|---|---|---|
local@domain.tld, 2+ letter TLD | Regex shape only | No DNS check; split-across-items emails missed | |
| Phone | Optional country/area code, loose digit grouping | Regex shape only | Loose — can catch phone-shaped IDs |
| US SSN | NNN-NN-NNNN with literal dashes | Format only | Undashed 9-digit SSN is missed |
| Card run | 13–16 digit run, optional spaces/dashes | No Luhn check | Long PO/account numbers boxed too |
Tier and file limits (PDF family)
Gated at Pro (minTier: pro); runs through the PDF tool family. One file at a time — there is no multi-file batch on this path.
| Tier | Max file size | Max pages | Files per run |
|---|---|---|---|
| Free | Tool gated — Pro required | — | — |
| Pro | 50 MB | 500 pages | 5 (this tool: 1 at a time) |
| Pro-media | 500 MB | 2,000 pages | 50 (this tool: 1 at a time) |
| Developer | 2 GB | 10,000 pages | Unlimited (this tool: 1 at a time) |
Cookbook
Side-by-side outcomes that show where the auto-pass wins, where Acrobat still wins, and how the flatten step closes the gap. Values are fabricated. "Before" is the page text; "After" is the boxed result; the note explains the permanence state.
The auto-pass beats manual marking on a long form
A 40-page born-digital form full of emails and phones. The auto-detector boxes them all in one pass; in Acrobat you'd mark each by hand. This is the tool's real advantage — speed of find.
Input: vendor_intake_40pp.pdf (text layer) Auto-pass result: every email + phone item boxed across 40 pages Manual Acrobat: mark each match by hand, page by page Then: flatten the output so the boxes become permanent. Verify with Ctrl+A -> copy (nothing should paste).
Boxes are recoverable until you flatten
Right after running the tool, the document looks redacted but isn't — the same trap as a fake black highlight. The flatten step is what makes it real.
After auto-pass (NOT yet flattened): Contact: ███████████████████ Ctrl+A -> copy still yields: jane.doe@acme.com After flatten/rasterise: Contact: ███████████████████ Ctrl+A -> copy yields: (nothing) Only the second state matches Acrobat's true redaction.
Where Acrobat still wins: scanned pages
A scanned packet has no text layer, so the auto-detector finds nothing. Acrobat's manual marking (or OCR first here) handles it. Know this boundary before you choose the tool.
Input: scanned_packet.pdf (image-only) Auto-pass: 0 text items -> 0 matches -> 0 boxes (unchanged) Fix here: OCR via /pdf-tools/pdf-ocr, then re-run Or: box regions manually with /security-tools/signature-burner
Over-redaction the auto-pass introduces
No Luhn check means a 16-digit PO is boxed like a card. Acrobat's manual marking wouldn't touch it unless you told it to. A reviewer must eyeball the auto output.
Before: PO Number: 4002 8812 3456 7890 Card: 5555 4444 3333 1111 After (auto-pass): PO Number: █████████████████████ <- false positive Card: ███████████████████ Safe direction (hides, doesn't leak) but review before release.
Migration: PDF auto-pass + text scrub for the rest
A real migration off Acrobat usually isn't only PDFs. Use this tool for the PDFs and the text-native siblings for exports, so each format is handled by the right engine.
PDFs: this tool -> flatten/rasterise to finalise Pasted text: /security-tools/email-phone-scrubber ([REDACTED_*]) CSV / JSON: /security-tools/csv-json-data-scrambler Text siblings genuinely replace values (no glyph-layer problem), so no flatten step is needed for those.
Edge cases and what actually happens
Output looks redacted but text is recoverable
By design (visual only)Straight after the auto-pass the boxes are an overlay — the code comment notes "the glyphs underneath are still in the file's content stream." Ctrl+A → copy recovers the values, the same failure as a fake black highlight. Flatten (/pdf-tools/pdf-flatten) or rasterise (/pdf-tools/pdf-to-image-strip) to match Acrobat's permanence, then re-verify with copy-paste.
No per-match approval UI
Manual review neededUnlike Acrobat's redaction review pane, this path returns the boxed PDF directly with no confidence list or per-match confirm step. You must page through the output and verify it yourself before treating any document as redacted.
Only four patterns — no custom search
Scope limitAcrobat lets you search arbitrary patterns or words; this tool runs four fixed PII_PATTERNS only (email, phone, dashed SSN, 13–16 digit runs) with no toggles (needsOptions: false). Names, addresses, DOBs, and custom terms aren't detected — box those manually with signature-burner.
Over-redaction from the no-Luhn card rule
Over-redactionThe card pattern matches any 13–16 digit run with no Luhn check, so POs and account IDs get boxed too. It hides rather than leaks, but it's noise Acrobat's manual marking wouldn't introduce. Review the auto output.
Scanned / image-only PDF produces no redactions
No matchesDetection needs a text layer via pdfjs. A scanned PDF has zero text items, so the auto-pass does nothing — a case where manual marking or OCR is required. Add a text layer with PDF OCR first, then re-run, or box regions with signature-burner.
Undashed SSN is missed
Missed matchThe SSN pattern requires the dashed form NNN-NN-NNNN. A bare 9-digit string isn't matched and is below the 13-digit card floor, so it slips through. Reformat to dashed form, or pre-scrub the text with email-phone-scrubber.
Match split across two text items
Missed matchRegexes run per item. If an email or phone is split across two runs by the PDF's text engine, neither fragment matches. Spot-check important pages; flatten + re-OCR can re-flow text into single items.
No multi-file batch
One file at a timeThis path processes a single PDF per run (acceptsMultiple: false). If you came looking to 'bulk redact a folder', you run each file in turn — there's no queue. The speed advantage is in the auto-detection within each file, not cross-file batching.
Free tier can't run this tool
Pro requiredGated at minTier: pro. On Free the run is blocked before processing. Pro allows up to 50 MB / 500 pages; Developer raises that to 2 GB / 10,000 pages.
Frequently asked questions
Is this a true replacement for Acrobat's redaction?
For the find, yes — it auto-detects four PII classes Acrobat makes you mark by hand. For permanence, only after a flatten: out of the box the boxes are visual (text stays in the content stream, Ctrl+A → copy recovers it). Run this to find and cover fast, then flatten (/pdf-tools/pdf-flatten) or rasterise to match Acrobat's content-stream removal.
How is this different from a black highlight in a PDF reader?
Two ways. First, this tool finds the PII automatically; a highlight is fully manual. Second, it draws a real filled rectangle (not a recoverable highlight annotation). But both leave the text in the stream until you flatten — so the permanence gap is the same until that final step.
Does my file get uploaded, like some online redactors?
No. pdfjs reads the pages, pdf-lib draws the boxes, and the result is saved locally — entirely in your browser. The PDF and its contents never leave your device, which is a key reason to use it over upload-based web redactors.
What does the auto-detector actually find?
Four fixed patterns: emails, phone numbers, US SSNs in dashed NNN-NN-NNNN form, and runs of 13–16 digits treated as cards. That's the whole set — no names, addresses, DOBs, or custom search terms. Acrobat is more flexible on what you can search; this tool trades that for zero-setup automatic detection of these four.
Can I bulk-redact a whole folder?
Not in one run — it processes a single PDF at a time (acceptsMultiple: false). The 'bulk' advantage is within each file (every match across every page in one pass), not across files. Run each PDF in turn.
Is there a review step before I finalise?
No per-match approval UI is surfaced on this path — the tool returns the boxed PDF directly, unlike Acrobat's review pane. Page through the output and verify it yourself (including a copy-paste check) before treating it as redacted.
Why did it black out a number that wasn't a card?
The card pattern matches any 13–16 digit run with no Luhn check, so purchase-order and account numbers in that range get boxed too. It's over-redaction in the safe direction (hides, doesn't leak) — noise Acrobat's manual marking wouldn't add. Review the output.
Does it work on scanned documents?
No. The auto-detector needs a text layer via pdfjs; a scanned PDF has none, so nothing is boxed. This is exactly where Acrobat's manual marking or an OCR step still wins. Run PDF OCR to add a text layer, then re-run, or box regions with signature-burner.
How do I make the redaction permanent?
Flatten (/pdf-tools/pdf-flatten) or rasterise (/pdf-tools/pdf-to-image-strip) the output so each page becomes pixels and the boxed text is destroyed. Re-verify with Ctrl+A → copy — if nothing pastes from the boxed areas, you've matched Acrobat's true redaction.
Can I change the box colour or add a custom pattern?
No. The tool has no options (needsOptions: false); the four patterns always run and the box is always black. For configurable, label-based masking, use the text-native email-phone-scrubber or csv-json-data-scrambler instead.
What file size and page limits apply?
Gated at Pro. Pro allows up to 50 MB and 500 pages per PDF; Pro-media 500 MB / 2,000 pages; Developer 2 GB / 10,000 pages. Free accounts can't run this redactor. One file at a time.
I'm migrating off Acrobat for more than PDFs — what handles the rest?
Use the text-native siblings: email-phone-scrubber for pasted text or .txt (replaces PII with [REDACTED_*] labels, richer pattern set), and csv-json-data-scrambler for structured rows. Those genuinely replace values, so no flatten step is needed for them.
Privacy first
Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.