How to redact phone numbers from a pdf
- Step 1Check for a text layer — The redactor reads text with pdf.js. If you can select a number in your reader, there is a text layer. If the page is a scan, run PDF OCR first — OCR can misread digits, so verify after.
- Step 2Open the redactor and drop the file — Load the document into the PDF PII Redactor. Everything runs in your browser; nothing is uploaded. There is no options panel.
- Step 3Let it auto-run — On drop, all four patterns fire together. The phone pattern boxes number-shaped runs; email/SSN/card patterns also run, so any of those present will be boxed too.
- Step 4Download the redacted PDF — Save the output from the result panel. The filename is
yourfile.pii-redactor.pdf. No count of boxed numbers is shown. - Step 5Verify matches and misses — Open the output: confirm each phone number is fully covered, and scan for over-matches — long reference codes or dates that fit the digit grouping may also be boxed, while extension-only or oddly-formatted numbers may be missed.
- Step 6Flatten for forensic safety — The digits survive under the box. Rasterise via PDF to PNG + Image to PDF, or use PDF Flatten, then re-extract text to confirm no number remains.
What the phone pattern matches
The pattern (?:\+?\d{1,3}[\s.-]?)?(?:\(?\d{2,4}\)?[\s.-]?)\d{3,4}[\s.-]?\d{3,4} is shape-based: optional country code, an area group, then two 3–4 digit groups separated by space, dot, or hyphen.
| Number in the PDF | Matched? | Why |
|---|---|---|
020 7946 0991 | Yes | Area group + two digit groups, space-separated — UK landline shape |
+44 20 7946 0991 | Yes | Leading +44 country code is the optional first group |
(212) 555-0143 | Yes | Parenthesised area code, hyphen separator — US shape |
+1 212 555 0143 | Yes | +1 country code plus three groups |
+49.30.123456 | Likely | Dot separators are allowed; exact grouping depends on digit counts |
07700 900123 | Yes | UK mobile: 5-digit then 6-digit groups fit the 2–4 / 3–4 ranges loosely via the trailing groups |
ext. 4821 (extension only) | No | Four bare digits with no area/grouping fall short of the pattern's group structure |
INV-2024-0099143 (reference) | Possible over-match | A long digit run with separators can resemble the phone shape and be boxed |
Redaction behaviour vs. expectations
Identical for all four PII patterns. The phone pattern is the most prone to both over- and under-matching because it keys purely on digit shape.
| Aspect | What actually happens | Implication |
|---|---|---|
| Validation | No dialling-plan lookup — matches by digit grouping, not real number validity | Valid-looking non-phone digit strings can be boxed; unusual real numbers can be missed |
| Box scope | Whole pdf.js text item is boxed, not just the digits | Adjacent words (Tel:, a name) sharing the run are covered too |
| Underlying text | Digits remain in the content stream under the box | Recoverable by copy-paste / extraction until flattened |
| Trigger | Auto-runs on drop, all four patterns at once | Cannot scope to phone numbers only |
| Reporting | Match count discarded by the processor | No 'N numbers redacted' figure; verify visually |
Cookbook
Real phone-redaction cases. 'Box' = opaque rectangle over the text item; 'recoverable' = digits still extract until you flatten.
A UK landline in a contact line
A standard UK number in its own run is cleanly boxed.
Before: Tel: 020 7946 0991 Text items: ["Tel: "]["020 7946 0991"] After: Tel: [ ███████████ ] (only the number item is boxed)
International number with country code
The optional leading +44 / +1 group is part of the match, so the full international form is covered.
Before: Call +44 20 7946 0991 or +1 212 555 0143 After: Call [ ██████████████ ] or [ ████████████ ]
Over-match on a reference code
Because the pattern is shape-based, a separated digit run that looks phone-like can be boxed even though it is an invoice or case reference. Review for false positives.
Page text: Case ref 020-2024-0991 | Tel 020 7946 0991 After: Case ref [ ████████████ ] | Tel [ ███████████ ] (the case reference matched the grouping and was boxed)
Missed extension
A bare extension with no area grouping does not satisfy the pattern's structure and is left visible. Add it to a manual pass.
Page text: Reception ext. 4821 After: Reception ext. 4821 <- NOT boxed (no area group) Fix: review extensions and short internal numbers by eye.
Making it unrecoverable
Destroy the glyphs the tool leaves behind.
1. pdf-pii-redactor -> visual boxes over numbers 2. pdf-to-png -> flatten each page to an image 3. image-to-pdf -> rebuild glyph-free PDF 4. pdf-to-text -> confirm no phone digits remain
Edge cases and what actually happens
Scanned PDF with no text layer
0 matchesNo extractable text means no matches. Run PDF OCR first. Note OCR frequently misreads digits (0/O, 1/l, 5/S), which can cause both misses and garbled output — verify the result.
Digits survive under the box
RecoverableThe box is a drawn rectangle; the digits stay in the content stream. Copy-paste or extraction recovers the number until you flatten or rasterise the output. Always finish with PDF Flatten or a PNG round-trip for anything sensitive.
False positive on a non-phone digit run
Over-matchThe pattern keys on digit shape, not real numbers. Invoice references, dates with separators, or account numbers that fit the grouping (020-2024-0991) can be boxed. Review the output so you do not redact data you needed to keep.
Extension-only or short internal number
Not matchedA bare ext. 4821 has no area/country grouping, so it falls outside the pattern. Short internal extensions are not detected — add them to a manual review pass.
Over-redaction of the surrounding run
By designThe whole text item is boxed, so a Tel: label or a name sharing the run with the number is covered too. Safer for redaction, but check nothing you needed was hidden.
Number split across two lines
PartialIf a number wraps, pdf.js emits two items; each half may fall short of the full pattern and stay visible. Review wrapped numbers manually.
Encrypted / password-protected PDF
fails to parsepdf.js cannot read text in an encrypted PDF without the open password, so the redaction will not run. Remove the password with PDF Unlock or Remove Password first.
Number inside an annotation or form field
Not coveredOnly the page content text layer is scanned. Numbers in comments or form fields are not page text. Strip them with Annotation Remover or Flatten first.
No count of redactions shown
ExpectedThe match count is discarded before the UI; you will not see how many numbers were boxed. Confirm coverage by selecting under each box in the output.
File over the tier cap
rejectedFree tier rejects PDFs over 2 MB or 50 pages. Upgrade (Pro: 50 MB / 500 pages) or split the file with PDF Split by Range before redacting.
Frequently asked questions
Which phone formats does it detect?
It is shape-based, recognising common UK (020 7946 0991, 07700 900123), US ((212) 555-0143), and international (+44 …, +1 …) groupings with space, dot, hyphen, parentheses, and an optional leading country code. It does not validate against real dialling plans, so it keys on how the digits are grouped rather than whether the number is genuine.
Will it redact fax numbers too?
Yes — a fax number has the same digit grouping as a phone number, so it matches the same pattern and is boxed. There is no separate fax category; both are caught by the one numeric pattern.
Can I add a custom pattern for internal extensions?
No. The tool has no options or custom-pattern field — it runs four fixed patterns and auto-runs on drop. Bare extensions like ext. 4821 have no area grouping and are not matched, so you must redact those manually (for example by covering them before export, or by removing the surrounding text).
Are the numbers actually deleted?
No — by default they are covered with a black box while the digits remain in the content stream underneath, recoverable by copy-paste or text extraction. For unrecoverable removal, rasterise the output (PDF to PNG + Image to PDF) or run PDF Flatten, then verify extraction returns nothing.
Why did it black out a reference code that is not a phone number?
Because the pattern matches digit shape, not real phone numbers. A separated digit run like 020-2024-0991 can resemble the grouping and be boxed. This is a known trade-off of shape-based matching — review the output and, if it over-redacted data you need, note that this tool cannot be scoped to skip it.
Will it catch a number written without spaces, like 02079460991?
A long unseparated digit run may not satisfy the pattern's group structure, which expects separators between groups. Such compact numbers can be missed. If your documents use unseparated numbers, plan a manual review or normalise the formatting before redacting.
Does it work on scanned case files?
Only after OCR. Detection reads the text layer; a scan has none. Run PDF OCR first, but be aware OCR misreads digits often (0↔O, 1↔l), so verify the redacted output carefully.
Can I redact only phone numbers and keep emails visible?
No. There is no per-category control. All four patterns (email, phone, SSN, card) run together automatically. If your document also contains emails or card-shaped numbers, those will be boxed as well.
How much of the line is covered?
The whole pdf.js text item containing the number — which may include a Tel: label or a name in the same run. This over-redacts slightly, which is safer, but check you did not lose a label or value you needed.
Is the document uploaded anywhere?
No. Detection and redaction run entirely in your browser via pdf.js and pdf-lib. The numbers never leave your device; only an anonymous usage counter is stored when you are signed in.
How big a file can I process?
Free: 2 MB / 50 pages. Pro: 50 MB / 500 pages. Pro Media: 500 MB / 2000 pages. Split larger case files with PDF Split by Range, redact each, and recombine.
What is a safe end-to-end redaction process for a case file?
OCR if scanned → run the PII Redactor → verify every number is covered and no reference code was wrongly boxed → flatten or rasterise so the digits are gone → scrub document properties with Metadata Scrubber → extract text once more and confirm no number survives.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.