Redact Email Addresses from a PDF Document — Free Browser Tool

How to redact email addresses from a pdf

Step 1
Confirm the PDF has a real text layer — The redactor reads text via pdf.js getTextContent(). Select an email in your reader — if it highlights, there is a text layer. If the page is a scan (nothing selects), run PDF OCR first to add a text layer, then come back.
Step 2
Open the redactor and drop the PDF — Load the file into the PDF PII Redactor. Processing happens in your browser; 0 bytes are uploaded. Note there is no options panel for this tool.
Step 3
Let it auto-run — The tool runs immediately on drop — email, phone, SSN, and credit-card patterns all fire together. You cannot restrict it to emails only; the other patterns simply find nothing in an email-only document, or box additional matches if present.
Step 4
Download the redacted PDF — The result panel shows output size and page count. Click Download to save yourfile.pii-redactor.pdf. There is no on-screen list of how many addresses were boxed.
Step 5
Verify coverage in a reader — Open the output and try to select text under each black box. Because the box covers the whole text run, check that no fragment of an address peeks out at the edges, and that wrapped addresses split across two lines are both covered.
Step 6
Flatten to make it unrecoverable — The glyphs still exist beneath the box. To destroy them, rasterise via PDF to PNG then Image to PDF, or open the result in PDF Flatten. Then re-verify that copy-paste yields nothing.

What the email pattern matches (and misses)

The single email regex is [A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}, applied to each pdf.js text item. It is greedy on common shapes but not exhaustive.

Address in the PDF	Matched?	Why
`jane.doe@example.com`	Yes	Local part, `@`, domain, 2+ letter TLD — the canonical case
`info+newsletter@sub.domain.co.uk`	Yes	`+` and `.` are in the local-part class; multi-label domains match
`J.Smith@NHS.NET`	Yes	Case-insensitive character classes cover upper-case addresses
`user@localhost`	No	No dot-TLD, so the `\.[A-Za-z]{2,}` tail fails — bare hostnames are skipped
`"odd name"@example.com`	Partial	Quoted local parts with spaces are not in the class; only the trailing `name@example.com` portion can match
`josé@example.com`	Partial	The accented `é` is outside `[A-Za-z0-9._%+-]`, so only `@example.com` and following ASCII matches — the local part may be left visible
Address split across two lines by wrapping	Per-line	pdf.js emits each line as a separate item; only the part on a line that forms a full pattern is boxed — see edge cases

Redaction behaviour vs. expectations

Read this before you treat the output as safe. The behaviour is identical for all four PII patterns; only the regex differs.

Aspect	What actually happens	Implication
Box scope	The whole pdf.js text item containing the match is boxed (x, y, item width, item height + 2pt), not just the address substring	Surrounding words on the same run get covered too — usually fine, occasionally over-redacts a label
Underlying text	The original glyphs remain in the content stream beneath the black rectangle	Copy-paste and text-extraction tools can still recover the address until you flatten / rasterise
Match reporting	The engine counts matches internally but the processor discards the count	The UI shows no '12 emails redacted' figure — verify visually
Trigger	Runs automatically on file drop; all four patterns fire	You cannot scope it to emails only; expect phone/SSN/card matches to be boxed too if present
Encryption	pdf.js needs to read the text; an encrypted PDF without its password fails to parse	Unlock first with PDF Unlock or Remove Password

Cookbook

Concrete before/after cases for email-heavy documents. 'Box' means an opaque black rectangle over the text item; 'recoverable' means the glyphs still extract until you flatten.

A staff directory line

A directory row where the address sits in its own text run gets cleanly boxed. Because the run is just the address, nothing extra is covered.

Before (page text item):  jane.doe@example.com
After (visual):           [ ████████████████ ]
Under the box (extract):  jane.doe@example.com   <- still there until flattened

Address embedded in a sentence

When the address shares a text run with surrounding words, the whole run is boxed — you lose the sentence fragment too, which is usually acceptable for a redaction.

Before:  Please contact jane.doe@example.com for access.
Text items: ["Please contact "]["jane.doe@example.com"][" for access."]
After:   Please contact [ ██████████████ ] for access.
(only the address item is boxed; layout decides item boundaries)

Wrapped address split across two lines

A long address that wraps becomes two text items. Each line is tested independently, so a fragment that does not itself form a full user@domain.tld is left visible.

Visible in PDF:
   a.very.long.name@
   department.example.com
pdf.js items: ["a.very.long.name@"] ["department.example.com"]
Neither line alone is a full pattern -> NEITHER is boxed.
Fix: review wrapped addresses manually before sharing.

Mixed PII on the same page

Because all four patterns fire together, an email document that also contains a phone number gets both boxed in the single pass — there is no way to box only the email.

Page text:  Email jane.doe@example.com / Tel 020 7946 0991
After:      Email [ ██████████ ] / Tel [ ████████ ]
(email AND phone patterns matched; both items boxed)

Making it unrecoverable

The destroy-the-glyphs step the tool itself does not do. Rasterise to images then rebuild, or flatten, then confirm extraction returns nothing.

1. pdf-pii-redactor  -> visual boxes, glyphs still present
2. pdf-to-png        -> each page becomes a flat image
3. image-to-pdf      -> rebuild a glyph-free PDF
4. pdf-to-text       -> should now return NO email addresses

Edge cases and what actually happens

Scanned PDF with no text layer

0 matches

The redactor reads text via pdf.js getTextContent(). A photographed or scanned page has no extractable text, so the email pattern finds nothing and no boxes are drawn. Run PDF OCR first to add a text layer, then redact.

Glyphs survive under the black box

Recoverable

This is the single most important caveat. The tool draws a rectangle; it does not delete the underlying text. Anyone can copy-paste or run text extraction on the output and recover the address. To make it forensically safe, rasterise (via PDF to PNG + Image to PDF) or flatten the result.

Address wraps across two lines

Partial

pdf.js emits each visual line as a separate text item. A wrapped address (name@ on one line, domain.com on the next) is tested per line; neither half forms a complete user@domain.tld, so neither is boxed. Review wrapped addresses by eye.

Internationalised local part (accents)

Partial

Characters like é, ü, or non-Latin scripts are outside the local-part class [A-Za-z0-9._%+-]. An address such as josé@example.com may only get @example.com onward boxed, leaving the name visible. Check international addresses manually.

Over-redaction of the surrounding run

By design

Because the whole text item is boxed, words sharing the run with the address are covered too. This errs on the side of removing more, not less — generally desirable for redaction, but verify a label or amount you needed to keep was not hidden.

Encrypted / password-protected PDF

fails to parse

pdf.js must read the text to find matches. A PDF encrypted without the open password cannot be parsed and the redaction will not run. Remove the password first with PDF Unlock or Remove Password, redact, then re-protect if needed.

Email inside an annotation or form field

Not covered

The redactor scans the page content text layer only. An address typed into a comment, sticky note, or form field is not page text and will not be boxed. Strip those first with Annotation Remover or Flatten.

Bare hostname address (no TLD)

Not matched

The pattern requires a .tld of two or more letters. Intranet addresses like user@mailhost or admin@localhost have no dotted TLD and are skipped. These are rare in shared documents but worth a manual scan if your org uses them.

No on-screen count of redactions

Expected

The processor returns only the redacted PDF; the internal match count is discarded. You will not see a '7 emails redacted' summary. Confirm coverage by trying to select text under each box in the output.

Free-tier size or page cap exceeded

rejected

Free tier caps PDFs at 2 MB and 50 pages. A larger correspondence bundle is rejected until you upgrade (Pro: 50 MB / 500 pages) or split it first with PDF Split by Range.

Frequently asked questions

Are the email addresses actually removed, or just covered?

Just covered, by default. The tool draws an opaque black rectangle over the text run; the original glyphs remain in the PDF content stream beneath it. That means the address can still be recovered by copy-paste or text extraction. To make it unrecoverable, rasterise the output (PDF to PNG then Image to PDF) or run it through PDF Flatten, then confirm extraction returns nothing.

Can I redact only emails and leave phone numbers visible?

No. The redactor has no options panel — it auto-runs all four patterns (email, phone, SSN, credit card) together the moment you drop the file. In an email-only document the other patterns simply find no matches, but if a phone number or card-shaped digit string is present it will also be boxed. There is no per-category toggle.

Does it catch every email format?

It catches canonical ASCII addresses — first.last@sub.domain.co.uk, info+tag@example.org, upper-case variants. It misses bare hostnames with no dot-TLD (user@localhost), quoted local parts with spaces, and the accented part of internationalised addresses (only the @domain tail matches). Review unusual addresses by eye.

Will it work on a scanned PDF?

Not on its own. Detection reads the text layer via pdf.js; a scan has no extractable text, so nothing is boxed. Run PDF OCR first to create a text layer, then redact. OCR mistakes can also cause misses, so verify visually.

What happens to an address that wraps across two lines?

pdf.js treats each line as a separate text item. A wrapped address (name@ then domain.com) is tested line by line; neither half is a complete user@domain.tld, so neither is boxed. Wrapped addresses need manual review before sharing.

How much of the line gets blacked out?

The whole pdf.js text item containing the match — not just the address characters. If the address sits in its own run, only it is covered. If it shares a run with surrounding words, those are covered too. This over-redacts slightly, which is safer for a redaction but can hide a label you wanted to keep.

Does it tell me how many emails it redacted?

No. The engine counts matches internally but the count is discarded before it reaches the UI. The result panel shows only output size and page count. Verify by trying to select text under each black box in the downloaded file.

Is anything uploaded to a server?

No. Detection and redaction both run in your browser using pdf.js and pdf-lib. The document and the addresses in it never leave your device — only an anonymous usage counter is recorded when you are signed in. This is what makes it suitable for privileged correspondence and GDPR work.

Can I redact an email inside a PDF comment or form field?

No — the scan is limited to the page content text layer. Addresses in annotations, sticky notes, or form fields are not page text. Remove those first with Annotation Remover or Flatten, then run the redactor.

My PDF is password-protected — can I redact it?

Not while it is encrypted. pdf.js needs to read the text, which it cannot do without the open password. Remove the password with PDF Unlock or Remove Password, redact, then re-apply protection with Password Protect if you still need it.

What size of file can I redact?

Free tier: up to 2 MB and 50 pages. Pro: 50 MB / 500 pages. Pro Media: 500 MB / 2000 pages. For a bundle over your limit, split it with PDF Split by Range, redact each part, and recombine.

What is a complete safe-to-share workflow?

Add a text layer if scanned (OCR) → run the PII Redactor → verify no fragments peek out and wrapped addresses are covered → flatten or rasterise so the glyphs are gone → scrub document properties with Metadata Scrubber. Then extract text one last time and confirm no address survives.

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

How to redact email addresses from a pdf

Step 1
Confirm the PDF has a real text layer — The redactor reads text via pdf.js getTextContent(). Select an email in your reader — if it highlights, there is a text layer. If the page is a scan (nothing selects), run PDF OCR first to add a text layer, then come back.
Step 2
Open the redactor and drop the PDF — Load the file into the PDF PII Redactor. Processing happens in your browser; 0 bytes are uploaded. Note there is no options panel for this tool.
Step 3
Let it auto-run — The tool runs immediately on drop — email, phone, SSN, and credit-card patterns all fire together. You cannot restrict it to emails only; the other patterns simply find nothing in an email-only document, or box additional matches if present.
Step 4
Download the redacted PDF — The result panel shows output size and page count. Click Download to save yourfile.pii-redactor.pdf. There is no on-screen list of how many addresses were boxed.
Step 5
Verify coverage in a reader — Open the output and try to select text under each black box. Because the box covers the whole text run, check that no fragment of an address peeks out at the edges, and that wrapped addresses split across two lines are both covered.
Step 6
Flatten to make it unrecoverable — The glyphs still exist beneath the box. To destroy them, rasterise via PDF to PNG then Image to PDF, or open the result in PDF Flatten. Then re-verify that copy-paste yields nothing.

What the email pattern matches (and misses)

The single email regex is [A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}, applied to each pdf.js text item. It is greedy on common shapes but not exhaustive.

Address in the PDF	Matched?	Why
`jane.doe@example.com`	Yes	Local part, `@`, domain, 2+ letter TLD — the canonical case
`info+newsletter@sub.domain.co.uk`	Yes	`+` and `.` are in the local-part class; multi-label domains match
`J.Smith@NHS.NET`	Yes	Case-insensitive character classes cover upper-case addresses
`user@localhost`	No	No dot-TLD, so the `\.[A-Za-z]{2,}` tail fails — bare hostnames are skipped
`"odd name"@example.com`	Partial	Quoted local parts with spaces are not in the class; only the trailing `name@example.com` portion can match
`josé@example.com`	Partial	The accented `é` is outside `[A-Za-z0-9._%+-]`, so only `@example.com` and following ASCII matches — the local part may be left visible
Address split across two lines by wrapping	Per-line	pdf.js emits each line as a separate item; only the part on a line that forms a full pattern is boxed — see edge cases

Redaction behaviour vs. expectations

Read this before you treat the output as safe. The behaviour is identical for all four PII patterns; only the regex differs.

Aspect	What actually happens	Implication
Box scope	The whole pdf.js text item containing the match is boxed (x, y, item width, item height + 2pt), not just the address substring	Surrounding words on the same run get covered too — usually fine, occasionally over-redacts a label
Underlying text	The original glyphs remain in the content stream beneath the black rectangle	Copy-paste and text-extraction tools can still recover the address until you flatten / rasterise
Match reporting	The engine counts matches internally but the processor discards the count	The UI shows no '12 emails redacted' figure — verify visually
Trigger	Runs automatically on file drop; all four patterns fire	You cannot scope it to emails only; expect phone/SSN/card matches to be boxed too if present
Encryption	pdf.js needs to read the text; an encrypted PDF without its password fails to parse	Unlock first with PDF Unlock or Remove Password

Cookbook

Concrete before/after cases for email-heavy documents. 'Box' means an opaque black rectangle over the text item; 'recoverable' means the glyphs still extract until you flatten.

A staff directory line

A directory row where the address sits in its own text run gets cleanly boxed. Because the run is just the address, nothing extra is covered.

Before (page text item):  jane.doe@example.com
After (visual):           [ ████████████████ ]
Under the box (extract):  jane.doe@example.com   <- still there until flattened

Address embedded in a sentence

When the address shares a text run with surrounding words, the whole run is boxed — you lose the sentence fragment too, which is usually acceptable for a redaction.

Before:  Please contact jane.doe@example.com for access.
Text items: ["Please contact "]["jane.doe@example.com"][" for access."]
After:   Please contact [ ██████████████ ] for access.
(only the address item is boxed; layout decides item boundaries)

Wrapped address split across two lines

A long address that wraps becomes two text items. Each line is tested independently, so a fragment that does not itself form a full user@domain.tld is left visible.

Visible in PDF:
   a.very.long.name@
   department.example.com
pdf.js items: ["a.very.long.name@"] ["department.example.com"]
Neither line alone is a full pattern -> NEITHER is boxed.
Fix: review wrapped addresses manually before sharing.

Mixed PII on the same page

Because all four patterns fire together, an email document that also contains a phone number gets both boxed in the single pass — there is no way to box only the email.

Page text:  Email jane.doe@example.com / Tel 020 7946 0991
After:      Email [ ██████████ ] / Tel [ ████████ ]
(email AND phone patterns matched; both items boxed)

Making it unrecoverable

The destroy-the-glyphs step the tool itself does not do. Rasterise to images then rebuild, or flatten, then confirm extraction returns nothing.

1. pdf-pii-redactor  -> visual boxes, glyphs still present
2. pdf-to-png        -> each page becomes a flat image
3. image-to-pdf      -> rebuild a glyph-free PDF
4. pdf-to-text       -> should now return NO email addresses

Edge cases and what actually happens

Scanned PDF with no text layer

0 matches

Glyphs survive under the black box

Recoverable

Address wraps across two lines

Partial

Internationalised local part (accents)

Partial

Over-redaction of the surrounding run

By design

Encrypted / password-protected PDF

fails to parse

Email inside an annotation or form field

Not covered

Bare hostname address (no TLD)

Not matched

No on-screen count of redactions

Expected

Free-tier size or page cap exceeded

rejected

Free tier caps PDFs at 2 MB and 50 pages. A larger correspondence bundle is rejected until you upgrade (Pro: 50 MB / 500 pages) or split it first with PDF Split by Range.

Frequently asked questions

Are the email addresses actually removed, or just covered?

Can I redact only emails and leave phone numbers visible?

Does it catch every email format?

Will it work on a scanned PDF?

What happens to an address that wraps across two lines?

How much of the line gets blacked out?

Does it tell me how many emails it redacted?

Is anything uploaded to a server?

Can I redact an email inside a PDF comment or form field?

My PDF is password-protected — can I redact it?

What size of file can I redact?

Free tier: up to 2 MB and 50 pages. Pro: 50 MB / 500 pages. Pro Media: 500 MB / 2000 pages. For a bundle over your limit, split it with PDF Split by Range, redact each part, and recombine.

What is a complete safe-to-share workflow?

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

Redact Email Addresses from a PDF

How to redact email addresses from a pdf

What the email pattern matches (and misses)

Redaction behaviour vs. expectations

Cookbook

A staff directory line

Address embedded in a sentence

Wrapped address split across two lines

Mixed PII on the same page

Making it unrecoverable

Edge cases and what actually happens

Scanned PDF with no text layer

Glyphs survive under the black box

Address wraps across two lines

Internationalised local part (accents)

Over-redaction of the surrounding run

Encrypted / password-protected PDF

Email inside an annotation or form field

Bare hostname address (no TLD)

No on-screen count of redactions

Free-tier size or page cap exceeded

Frequently asked questions

Are the email addresses actually removed, or just covered?

Can I redact only emails and leave phone numbers visible?

Does it catch every email format?

Will it work on a scanned PDF?

What happens to an address that wraps across two lines?

How much of the line gets blacked out?

Does it tell me how many emails it redacted?

Is anything uploaded to a server?

Can I redact an email inside a PDF comment or form field?

My PDF is password-protected — can I redact it?

What size of file can I redact?

What is a complete safe-to-share workflow?

Privacy first

Related guides

Redact Email Addresses from a PDF

How to redact email addresses from a pdf

What the email pattern matches (and misses)

Redaction behaviour vs. expectations

Cookbook

A staff directory line

Address embedded in a sentence

Wrapped address split across two lines

Mixed PII on the same page

Making it unrecoverable

Edge cases and what actually happens

Scanned PDF with no text layer

Glyphs survive under the black box

Address wraps across two lines

Internationalised local part (accents)

Over-redaction of the surrounding run

Encrypted / password-protected PDF

Email inside an annotation or form field

Bare hostname address (no TLD)

No on-screen count of redactions

Free-tier size or page cap exceeded

Frequently asked questions

Are the email addresses actually removed, or just covered?

Can I redact only emails and leave phone numbers visible?

Does it catch every email format?

Will it work on a scanned PDF?

What happens to an address that wraps across two lines?

How much of the line gets blacked out?

Does it tell me how many emails it redacted?

Is anything uploaded to a server?

Can I redact an email inside a PDF comment or form field?

My PDF is password-protected — can I redact it?

What size of file can I redact?

What is a complete safe-to-share workflow?

Privacy first

Related guides