How to redact social security numbers from a pdf
- Step 1Confirm a text layer exists — Detection reads text via pdf.js. If you can select an SSN in your reader, you are set. If the form is a scan, run PDF OCR first — and verify, since OCR misreads digits.
- Step 2Normalise un-hyphenated SSNs if present — The SSN rule matches only
nnn-nn-nnnn. If your documents store SSNs as nine bare digits, the SSN pattern will not catch them. There is no setting to change this — plan a manual check for compact SSNs. - Step 3Open the redactor and drop the file — Load the document into the PDF PII Redactor. Browser-only; nothing uploaded; no options panel.
- Step 4Let it auto-run — On drop, all four patterns fire. SSNs in
nnn-nn-nnnnform are boxed; emails, phones, and 13–16 digit card-shaped runs present are also boxed. - Step 5Download and verify — Save the output. Then open it and select text under each box to confirm coverage, and scan for any un-hyphenated SSN the pattern would have missed.
- Step 6Flatten for HIPAA-grade removal — The digits survive beneath the box. Rasterise via PDF to PNG + Image to PDF, or use PDF Flatten, then re-extract to confirm no SSN remains in the file.
What the SSN pattern matches
The SSN rule is \b\d{3}-\d{2}-\d{4}\b — dash-delimited only. Other digit shapes fall to other patterns or are missed.
| Value in the PDF | Matched by SSN rule? | Notes |
|---|---|---|
123-45-6789 | Yes | The exact 3-2-4 dash form; the \b word boundaries keep it from grabbing longer runs |
123 45 6789 (spaces) | No | The SSN rule requires hyphens; space-delimited SSNs are not matched by it |
123456789 (no separators) | No | Nine bare digits do not match the SSN rule; usually missed entirely (too short for the 13–16 digit card rule) |
SSN: 123-45-6789 | Yes (number portion's run) | The dash form matches; the whole text item containing it is boxed |
1234567890123 (13+ digits) | No (SSN) / Maybe (card) | Too long for SSN; the 13–16 digit credit-card pattern may box it |
123-45-67890 (extra digit) | No | Five trailing digits break the \d{4}\b boundary |
Redaction behaviour vs. expectations
Critical for HR / HIPAA use: a visual box is not the same as deletion.
| Aspect | What actually happens | Implication |
|---|---|---|
| SSN format scope | Only nnn-nn-nnnn (hyphenated) is matched by the SSN rule | Un-hyphenated and space-delimited SSNs are not caught — manual review needed |
| Underlying digits | Remain in the content stream beneath the black box | Recoverable by extraction; flatten/rasterise for true removal |
| Box scope | Whole pdf.js text item is boxed | An SSN: label sharing the run is covered too |
| Trigger | Auto-runs on drop, all four patterns | Cannot run SSN-only; expect emails/phones/cards to be boxed too |
| Reporting | Count discarded by the processor | No 'N SSNs redacted' figure |
Cookbook
SSN-redaction cases on HR, tax, and insurance documents. 'Box' = opaque rectangle; 'recoverable' = digits extract until flattened.
Standard hyphenated SSN
The canonical form is boxed cleanly.
Before: Employee SSN: 123-45-6789 Text items: ["Employee SSN: "]["123-45-6789"] After: Employee SSN: [ ██████████ ]
Un-hyphenated SSN slips through
Nine bare digits are not the dash form, so the SSN rule misses them, and they are too short for the card rule. The number stays visible.
Before: SSN 123456789 After: SSN 123456789 <- NOT boxed Fix: manually redact compact SSNs, or reformat to nnn-nn-nnnn first.
Space-delimited SSN missed
Some forms print SSNs with spaces. The SSN rule requires hyphens, so this is not matched.
Before: Soc. Sec. No. 123 45 6789
After: Soc. Sec. No. 123 45 6789 <- NOT boxed
(the space-delimited form does not match \b\d{3}-\d{2}-\d{4}\b)Mixed identifiers on a benefits form
All four patterns run, so a form with an SSN, an email, and a phone gets all three boxed in one pass.
Page: SSN 123-45-6789 | jane.doe@acme.com | (212) 555-0143 After: SSN [ ██████████ ] | [ ████████████ ] | [ ████████████ ]
HIPAA-grade unrecoverable removal
The step the tool itself does not perform — destroy the digits.
1. pdf-pii-redactor -> SSN boxed, digits still present 2. pdf-to-png -> rasterise each page 3. image-to-pdf -> rebuild glyph-free PDF 4. pdf-to-text -> confirm SSN no longer extractable
Edge cases and what actually happens
Un-hyphenated SSN not detected
Not matchedThe SSN rule is \b\d{3}-\d{2}-\d{4}\b — hyphens required. A nine-digit SSN written as 123456789 does not match it and is too short for the 13–16 digit card rule, so it is missed entirely. This is the most important caveat for SSN work: review compact SSNs manually or reformat them with hyphens before redacting.
Space-delimited SSN not detected
Not matched123 45 6789 uses spaces, not hyphens, so the SSN rule skips it. Forms that print SSNs with spaces need a manual pass.
Digits survive under the box
RecoverableThe redaction is a drawn rectangle; the SSN digits remain in the content stream and can be extracted until you flatten. For HR/HIPAA disclosure, always finish with PDF Flatten or a PDF to PNG round-trip and re-verify.
Scanned tax form with no text layer
0 matchesDetection needs extractable text. Run PDF OCR first; verify carefully because OCR routinely misreads digits, which can leave an SSN partially or wholly visible.
Over-redaction of the label
By designThe whole text item is boxed, so an SSN: label or a name in the same run is covered. Safer for redaction, but confirm you did not hide a field you needed to keep.
SSN split across lines
PartialIf the number wraps, the two halves are separate items; neither half is the full nnn-nn-nnnn form, so neither is boxed. Review wrapped SSNs manually.
Encrypted / password-protected PDF
fails to parsepdf.js cannot read an encrypted PDF without the open password, so redaction will not run. Unlock first with PDF Unlock or Remove Password.
SSN in a form field rather than page text
Not coveredOnly the page content text layer is scanned. An SSN typed into an interactive form field is not page text. Flatten the form first so field values become page content, then redact.
No count of redactions shown
ExpectedThe processor discards the match count, so there is no '3 SSNs redacted' summary. Verify by selecting text under each box.
File over the tier cap
rejectedFree tier rejects PDFs over 2 MB / 50 pages. Upgrade (Pro: 50 MB / 500 pages) or split with PDF Split by Range.
Frequently asked questions
Does it detect SSNs without dashes (123456789)?
No. The SSN pattern is \b\d{3}-\d{2}-\d{4}\b, which requires the two hyphens. A nine-digit SSN with no separators is not matched by the SSN rule, and at nine digits it is also too short for the 13–16 digit credit-card rule, so it is typically missed altogether. Reformat compact SSNs to nnn-nn-nnnn before redacting, or redact them manually.
What about SSNs written with spaces (123 45 6789)?
Those are not matched either — the rule requires hyphens, not spaces. Space-delimited SSNs need a manual review pass. There is no setting to widen the pattern.
Are the SSNs truly removed?
By default, no. The tool draws a black box over the text run; the SSN digits stay in the content stream and can be recovered by copy-paste or text extraction. For HIPAA-grade removal, rasterise the output (PDF to PNG + Image to PDF) or run PDF Flatten, then confirm the SSN no longer extracts.
Does this satisfy HIPAA Safe Harbor de-identification?
SSN is one of the 18 HIPAA Safe Harbor identifiers, and removing it is necessary but not sufficient. You must also handle names, dates, addresses, record numbers, and the other identifiers — which this tool does NOT detect (it has no name/date/address pattern). Treat this as the SSN/contact step of a broader de-identification process, not a complete Safe Harbor solution, and always flatten so the digits are actually gone.
Will it detect names or dates of birth on the form?
No. The redactor has exactly four patterns: email, phone, dash-delimited SSN, and 13–16 digit card numbers. It has no name, date-of-birth, or address detection. Those identifiers must be redacted manually for HR/HIPAA work.
Can I redact only SSNs?
No. There is no per-category option — all four patterns auto-run together on drop. On a benefits or claims form, expect emails, phone numbers, and any card-shaped digit runs to be boxed alongside the SSNs.
What happens with an SSN inside a fillable form field?
It is not detected — only the page content text layer is scanned, and field values live in the form, not the page text. Flatten the form first so the values become page content, then run the redactor.
Does it work on a scanned SSN form?
Only after OCR. Run PDF OCR to add a text layer first, then redact — but verify carefully, because OCR commonly misreads digits and could leave an SSN visible or boxed incompletely.
How much of the line gets covered?
The whole pdf.js text item containing the SSN, which may include an SSN: label or an employee name in the same run. This over-redacts slightly. Check you did not hide a field you needed.
Is the document uploaded anywhere?
No. Everything runs in your browser via pdf.js and pdf-lib; SSNs never leave your device. Only an anonymous usage counter is recorded when signed in — which is exactly what HR and HIPAA workflows require.
What file sizes are supported?
Free: 2 MB / 50 pages. Pro: 50 MB / 500 pages. Pro Media: 500 MB / 2000 pages. For a large HR bundle, split with PDF Split by Range, redact each part, and recombine.
What is the safest SSN-redaction workflow before disclosure?
Reformat un-hyphenated SSNs to nnn-nn-nnnn (or note them for manual redaction) → OCR if scanned → run the PII Redactor → manually redact names, DOBs, and addresses the tool does not detect → flatten or rasterise so all boxed digits are destroyed → scrub properties with Metadata Scrubber → extract text once more and confirm no SSN survives.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.