How to black-box ssn and phone on patient record pdfs
- Step 1Open the canonical PDF PII redactor — This Security entry routes to the real engine at /pdf-tools/pdf-pii-redactor. It is Pro-tier (
minTier: pro); Free accounts can't run this redactor. - Step 2Upload one record PDF with a text layer — Drop a single file (
acceptsMultiple: false). EHR exports, lab reports, and discharge summaries are usually born-digital with a real text layer. A scanned paper chart is image-only and won't be scanned — OCR it first. - Step 3Let the scanner walk every page — pdfjs reads each page's
getTextContent()items; pdf-lib loads the same document. For each item the four patterns run in order — email, phone, SSN, card — and the first match flags the item. - Step 4SSN and phone get boxed — When a dashed SSN or a phone matches, a black rectangle is drawn at the item's
x/yspanning its full width and height (plus 2 pt). The patient name, MRN, and DOB are NOT detected, so handle those separately. - Step 5Download the partially de-identified record — The result is saved as a new PDF blob. Page count and clinical layout are preserved — only matched identifier items are boxed.
- Step 6Box the rest, then flatten before release — Critical: box names, MRNs, DOBs, and addresses manually with signature-burner, then flatten (/pdf-tools/pdf-flatten) or rasterise so all boxes become pixels. Verify with
Ctrl+A → copybefore the record leaves your hands.
PHI fields: what this tool catches vs. what you must do by hand
HIPAA Safe Harbor lists 18 identifier types. This tool detects only four PII_PATTERNS; on a medical record the SSN (dashed) and phone are the two that match. Everything else is your responsibility.
| PHI identifier | Detected here? | How to handle it |
|---|---|---|
SSN in dashed form NNN-NN-NNNN | Yes (SSN pattern) | Auto-boxed; verify and flatten |
| Phone / fax number | Yes (phone pattern) | Auto-boxed; loose pattern may also catch other digit groups |
| Email address | Yes (email pattern) | Auto-boxed if present |
| SSN without dashes (9 digits) | No | Reformat to dashed, or box manually |
| Patient name | No (no name detection) | Box manually with signature-burner |
| Medical record number (MRN) | No | Box manually; an MRN length may or may not hit the 13–16 card range |
| Date of birth | No (no date pattern) | Box manually |
| Address | No (no address pattern) | Box manually |
Redaction behaviour — what it does vs. does NOT do
"Visual" means a filled rectangle is drawn over the text; the characters underneath are not removed. This is the table to read before any record is released.
| Behaviour | Reality in this tool | Why it matters for PHI |
|---|---|---|
| Redaction method | Filled black drawRectangle over the matched item (pdf-lib) | Real ink, visible everywhere — but an overlay, not a deletion |
| Text removal | Not removed — glyphs stay in the content stream | Ctrl+A → copy recovers the SSN/phone until you flatten — a HIPAA breach risk if you skip it |
| SSN form | Requires dashes NNN-NN-NNNN | A bare 9-digit SSN is missed entirely — reformat first |
| Name / MRN / DOB / address | Not detected | These remain visible — box them by hand before release |
| Granularity | Whole text item, not the exact value | An identifier inside a longer run boxes the whole run |
| Options | None (needsOptions: false) | No way to add an MRN or DOB pattern here |
Tier and file limits (PDF family)
Gated at Pro (minTier: pro); runs through the PDF tool family. One file at a time.
| Tier | Max file size | Max pages | Files per run |
|---|---|---|---|
| Free | Tool gated — Pro required | — | — |
| Pro | 50 MB | 500 pages | 5 (this tool: 1 at a time) |
| Pro-media | 500 MB | 2,000 pages | 50 (this tool: 1 at a time) |
| Developer | 2 GB | 10,000 pages | Unlimited (this tool: 1 at a time) |
Cookbook
Before/after snippets from born-digital record layouts. All patient data is fabricated. "Before" is the page text; "After" shows the boxed result — and what copy/paste still recovers until you flatten. Remember names/MRNs/DOBs are NOT auto-detected.
A patient header with SSN and phone
EHR export with a full text layer. The dashed SSN and the phone both match and get boxed. The patient name and MRN are not detected, so they stay visible until you box them by hand.
Before (header text): Patient: Maria Alvarez MRN: 00481922 SSN: 532-19-4471 Phone: (312) 555-0148 DOB: 1984-03-09 After (what the recipient sees): Patient: Maria Alvarez MRN: 00481922 SSN: ███████████ Phone: ███████████████ DOB: 1984-03-09 Name, MRN, and DOB are STILL visible — box them manually with /security-tools/signature-burner before release.
An undashed SSN slips through
The SSN pattern requires the literal dashes. A 9-digit string with no separators isn't matched as an SSN, and 9 digits is below the 13-digit card floor — so it's missed entirely. Reformat before redacting.
Before: Member ID: 532194471 <- NOT boxed (no dashes, < 13 digits) SSN: 532-19-4471 <- boxed (dashed form matched) After: Member ID: 532194471 <- still visible SSN: ███████████ Mitigation: reformat IDs to dashed SSN form first, or pre-scrub the text with /security-tools/email-phone-scrubber.
Phone inside a longer line is fully covered
Redaction is per text item. If the phone is inside a longer run, the whole run is boxed — which here also hides the clinic name in that item.
Before (single pdfjs item): 'Call the referral coordinator at (312) 555-0148 (Westside Clinic)' After: '████████████████████████████████████████████████████████████████' The whole item is boxed because it contained a phone match, so the clinic name got covered too.
MRN that happens to be 14 digits gets boxed
An MRN is not a recognised identifier here, but if it happens to be 13–16 digits it matches the card pattern and is boxed. Shorter MRNs are missed. Don't rely on this for MRNs — box them deliberately.
Before: MRN: 00481922 <- 8 digits, NOT boxed MRN (legacy): 4002881234567 <- 13 digits, boxed as 'card' After: MRN: 00481922 <- still visible MRN (legacy): █████████████ Incidental — treat MRN redaction as a manual step, not automatic.
Scanned paper chart with no text layer
A chart scanned to PDF is just images — no text items. The redactor finds nothing and returns the file unchanged. OCR it first, or box identifiers by hand.
Input: scanned_chart.pdf (image-only) Scan result: 0 text items -> 0 matches -> 0 boxes Output: identical pages, no redactions. Fix path: 1. Run OCR via /pdf-tools/pdf-ocr to add a text layer 2. Re-run this redactor, OR 3. Burn manual rectangles with /security-tools/signature-burner
Edge cases and what actually happens
Boxed SSN/phone is still copy-pasteable
By design (visual only)The tool draws a rectangle over the text but does not delete it — the code comment says "the glyphs underneath are still in the file's content stream." Ctrl+A → copy recovers the SSN or phone, which would be a HIPAA breach in a released record. Flatten (/pdf-tools/pdf-flatten) or rasterise the output so the boxes become pixels, then re-verify with copy-paste before release.
Names, MRNs, DOBs, addresses are not redacted
Out of scopeOnly email, phone, SSN, and 13–16 digit runs are detected — there is no name, MRN, DOB, or address pattern. These PHI fields stay visible. Box them manually with signature-burner before treating a record as de-identified. This tool is a partial pass, not HIPAA Safe Harbor.
Undashed (9-digit) SSN is missed
Missed matchThe SSN regex requires the dashed form NNN-NN-NNNN. A bare 532194471 isn't matched, and at 9 digits it's below the 13-digit card floor, so it slips through entirely. Reformat to dashed form before redacting, or pre-scrub the text with email-phone-scrubber.
Whole line is boxed, not just the identifier
ExpectedRedaction is one box per matched text item. If a phone or SSN sits inside a longer run (Call the coordinator at (312) 555-0148), the whole run is covered ("one redaction box per item is enough"). It can hide a clinic name or note you wanted visible — review the output.
MRN incidentally boxed (or missed) by the card pattern
Unreliable for MRNsAn MRN is not a recognised identifier here. If it happens to be 13–16 digits it matches the card pattern and is boxed; shorter MRNs are missed. Don't rely on this for MRNs — box them deliberately with signature-burner.
Identifier split across two text items
Missed matchRegexes run per item. If the export split an SSN or phone across two runs, neither fragment matches and nothing is boxed. This happens with justified text and some EHR export pipelines. Spot-check; flatten + re-OCR can re-flow text into single items.
Scanned chart produces no redactions
No matchesDetection reads the text layer via pdfjs. An image-only chart has zero text items and nothing is boxed. Add a text layer with PDF OCR first, then re-run, or box identifiers manually with signature-burner.
Encrypted record PDF
Loaded with ignoreEncryptionThe redactor loads with ignoreEncryption: true, so many lightly-protected records open and process. Strongly encrypted files the libraries can't parse error out before scanning. Remove the password first with pdf-remove-password, then redact.
Free tier can't run this tool
Pro requiredGated at minTier: pro. On Free the run is blocked before processing. Pro allows up to 50 MB / 500 pages; Developer raises that to 2 GB / 10,000 pages. One file at a time.
Box sits slightly off rotated text
Visual mismatch possibleThe rectangle is axis-aligned at the item's x/y with its width/height. On a rotated page or unusual transform the box can land slightly off the glyphs. Always inspect the rendered output before treating a record as redacted.
Frequently asked questions
Is this a full HIPAA de-identification?
No. It detects only four patterns (email, phone, dashed SSN, 13–16 digit runs). HIPAA Safe Harbor lists 18 identifier types — names, MRNs, dates of birth, addresses, and more are NOT detected here. Treat this as a partial pass that handles SSN and phone, then box the remaining identifiers manually with signature-burner.
Does the SSN actually get removed from the file?
No — it gets covered. The tool draws a black box over each matched item with pdf-lib, but the characters stay in the content stream (the code comment says so). Ctrl+A → copy recovers the SSN, which would be a breach in a released record. Flatten (/pdf-tools/pdf-flatten) or rasterise the output so the boxes become pixels, then verify with copy-paste.
Why wasn't my patient's SSN redacted?
The SSN pattern requires the dashed form NNN-NN-NNNN. A bare 9-digit string like 532194471 isn't matched, and 9 digits is below the 13-digit card floor, so it's missed entirely. Reformat SSNs to dashed form before redacting, or pre-scrub the text with email-phone-scrubber.
Will it redact the patient's name?
No. There is no name detection in this tool. The patient name stays visible. Box it manually with signature-burner after running this pass — names are a required Safe Harbor identifier.
What about the MRN or date of birth?
Neither is reliably handled. There's no MRN or DOB pattern. An MRN that happens to be 13–16 digits may incidentally match the card pattern and get boxed; shorter MRNs are missed, and DOBs aren't detected at all. Box MRNs and DOBs deliberately with signature-burner.
Does PHI get uploaded to a server?
No. pdfjs reads the pages, pdf-lib draws the boxes, and the result is saved locally — entirely in your browser. PHI and the record never leave your device, which is what makes it usable for HIPAA-sensitive material.
Why did it black out a whole sentence with the phone in it?
Redaction is per text item, not per substring. If the phone is inside a longer run, the whole run gets one box ("one redaction box per item is enough"). It can cover a clinic name or note in that item — check the output if you needed surrounding text visible.
Does it work on a scanned paper chart?
No. Detection reads the PDF text layer via pdfjs; a scanned chart is just images with zero text items, so nothing is boxed. Run PDF OCR to add a text layer first, then re-run, or box identifiers by hand with signature-burner.
Can I add a pattern for our MRN format?
No. The tool has no options (needsOptions: false) and the four patterns are fixed in code. For configurable, label-based redaction of text or structured data, use email-phone-scrubber (pasted text/.txt) or csv-json-data-scrambler (rows).
How do I make the record safe to release?
Run this tool for SSN/phone, box the remaining identifiers (name, MRN, DOB, address) with signature-burner, then flatten (/pdf-tools/pdf-flatten) or rasterise so every box becomes pixels. Re-verify with Ctrl+A → copy — if nothing pastes from the boxed areas, the text is gone. Only then release the record.
What file size and page limits apply?
Gated at Pro. Pro allows up to 50 MB and 500 pages; Pro-media 500 MB / 2,000 pages; Developer 2 GB / 10,000 pages. Free accounts can't run this redactor. One record at a time.
I need to de-identify a whole patient dataset, not one PDF — what then?
Use csv-json-data-scrambler for structured patient rows — it genuinely replaces values with [REDACTED_*] labels rather than covering them. For pasted clinical notes or .txt, email-phone-scrubber does the same with a richer pattern set (including IBAN and UK NI).
Privacy first
Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.