How to strip pdf metadata, author, and edit history
- Step 1Check what your PDF is actually exposing first — Open the file in Acrobat or Preview and read File → Properties → Description. The
/Authorfield is the usual offender — on Windows and macOS it auto-populates from the OS account name when the PDF is first created or printed. Note the/Producerstring too: it fingerprints your exact toolchain (Microsoft: Print To PDF,Adobe PDF Library 17.0,Skia/PDF m120). - Step 2Drop the PDF onto the sanitizer — Add your file to the tool above. It routes to the canonical PDF Metadata Scrubber, which processes the document with pdf-lib in your browser. There are no options to configure — the field set is fixed, so there is nothing to forget or misconfigure.
- Step 3Let the single pass run — pdf-lib loads the document with
ignoreEncryption: true, calls the empty-string setters for Title, Author, Subject, Keywords, Producer, and Creator, sets both dates tonew Date(0), and re-saves withupdateFieldAppearances: false. Pages are not re-rendered or re-encoded. - Step 4Download the sanitized PDF — Save the result. The file is the binary output of the scrub — page count and visible content are identical to the input; only the
/Infodictionary entries changed. - Step 5Verify in Document Properties — Re-open File → Properties. Author, Creator, Producer, Title, Subject, and Keywords now read blank. The two dates will show as 1 January 1970 (the epoch) rather than disappearing — that is expected and is itself a privacy-neutral value, since every scrubbed file shares it.
- Step 6Chain the tools that cover the rest of the document — Metadata is one layer. For names still visible in the page text use pdf-pii-redactor; for reviewer comments and sticky notes use pdf-annotation-remover; to bake in form-field values use pdf-flatten; and to confirm two release builds are byte-identical, fingerprint them with multi-hash-fingerprinter.
Document-info fields after sanitizing
Every field below is handled in the single pdf-lib pass. The six text fields become empty strings; the two dates are reset to the Unix epoch rather than removed.
| Field | What it typically leaks | After sanitizing |
|---|---|---|
/Author | The named drafter — usually the OS account name on the machine that made the PDF | Empty string |
/Creator | The authoring application (Microsoft Word, LibreOffice, Acrobat) | Empty string |
/Producer | The PDF library and version — a precise toolchain fingerprint | Empty string |
/Title | Internal working title, matter number, or codename | Empty string |
/Subject | Classification line or short internal brief | Empty string |
/Keywords | Project tags, client names, or routing labels | Empty (cleared to no keywords) |
/CreationDate | When the document was first created — anchors the drafting timeline | Reset to 1970-01-01T00:00:00Z (epoch) |
/ModDate | Last save time — reveals last-minute edits before release | Reset to 1970-01-01T00:00:00Z (epoch) |
What a metadata scrub does and does NOT cover
A document-info scrub is one layer of a real pre-release SOP. Each row this tool does not own points to the sibling tool that does.
| Risk in the document | Owned by this tool? | Tool to use |
|---|---|---|
/Author /Creator /Producer in Document Properties | Yes | This tool (pdf-metadata-scrubber) |
/Title /Subject /Keywords naming internal matters | Yes | This tool |
| Creation / modification timeline | Yes — reset to epoch | This tool |
XMP metadata packet (dc:creator, xmp:CreateDate) | No — not rewritten | Re-save via pdf-compress-lossless |
| Names / emails still readable in page text | No | pdf-pii-redactor |
| Comments, sticky notes, reviewer markup | No | pdf-annotation-remover |
| Interactive form-field values | No | pdf-flatten |
| Embedded preview thumbnail that pre-dates a crop/edit | No | hidden-thumbnail-extractor to inspect |
File-size limits by tier (PDF input)
PDF input is file-based, so Security-family tier limits apply. One file per pass on the metadata scrubber.
| Tier | Max file size | Files per pass |
|---|---|---|
| Free | 10 MB | 1 |
| Pro | 100 MB | 5 (this tool processes one at a time) |
| Pro-media | 500 MB | 50 |
| Developer | 2 GB | Unlimited |
Cookbook
Real Document Properties before and after a sanitizing pass, plus the leaks a metadata-only scrub does not touch. Names and matter numbers are illustrative.
OS account name leaking through /Author on a settlement draft
The most common single leak: the drafting lawyer's machine wrote their Windows account name into /Author when Word exported the PDF. The opposing side opens Properties and learns who actually drafted the 'firm' document.
Before (File -> Properties): Author: j.okafor Creator: Microsoft Word for Microsoft 365 Producer: Microsoft: Print To PDF Title: SETTLEMENT_v7_FINAL_use-this-one Created: 2026-05-28 16:41:09 Modified: 2026-06-02 09:12:55 After sanitizing: Author: (blank) Creator: (blank) Producer: (blank) Title: (blank) Created: 1970-01-01 00:00:00 Modified: 1970-01-01 00:00:00
Producer string fingerprinting the toolchain on a regulatory filing
Even with the author removed, /Producer betrays the exact software stack — useful to an adversary profiling your office. Sanitizing empties it so a published filing reveals nothing about how it was made.
Before: Producer: Adobe PDF Library 17.0 Creator: Adobe InDesign 19.5 (Macintosh) After: Producer: (blank) Creator: (blank) Note: the /Producer-version pairing is a reliable fingerprint. Empty is the safe state for public release.
Dates reset to epoch, not deleted — and why that is fine
A frequent surprise: after scrubbing, Document Properties still shows a date — 1 January 1970. The tool sets both stamps to the Unix epoch rather than removing the keys. Because every sanitized file shares the same epoch value, it carries no information about your actual timeline.
Before: CreationDate: D:20260528164109+01'00' ModDate: D:20260602091255+01'00' After: CreationDate: D:19700101000000Z ModDate: D:19700101000000Z The drafting/revision window (28 May -> 02 Jun) is gone; all that remains is the privacy-neutral epoch constant.
What a metadata scrub leaves visible — the name still on the page
Critical for privacy work: emptying /Author does nothing to a signature block or a name typed into the body text. Those are page content, not metadata. Chain a redactor.
Document Properties after sanitizing: all clean. But page 3 still reads: "Prepared by Jane Okafor, Senior Associate" "Contact: j.okafor@firm.example" These survive a metadata scrub. Run /pdf-tools/pdf-pii-redactor to remove the email and name from the visible text stream.
Two release builds differ only in metadata — confirm with a hash
After sanitizing, you may want proof that the version you publish is the version you reviewed. Fingerprint the file so any later byte change is detectable.
Workflow:
1. Sanitize report-2026Q2.pdf -> report-2026Q2.clean.pdf
2. Drop the clean file on
/security-tools/multi-hash-fingerprinter
SHA-256: 9f2c...b1e4
3. Record the digest in your release log.
If the published file's SHA-256 ever differs from the
recorded value, the file was altered after sign-off.Edge cases and what actually happens
Dates show as 1970, not blank
ExpectedThe sanitizer sets CreationDate and ModDate to new Date(0) — the Unix epoch (1970-01-01T00:00:00Z), written as D:19700101000000Z. It does not delete the date keys. This is by design and is privacy-safe: a fixed constant shared by every scrubbed file carries no information about your real timeline. If your downstream system requires the keys to be absent rather than epoch-valued, that capability is not provided here.
XMP metadata packet still names the author
Not coveredModern PDFs carry a second metadata channel — an XMP packet with fields like dc:creator and xmp:CreateDate — in addition to the legacy /Info dictionary. This tool empties /Info but does not rewrite the XMP stream. Some viewers read XMP in preference to /Info, so an author name can survive there. To clear it, re-serialize the file through pdf-compress-lossless, which rebuilds the document structure.
Encrypted / password-protected PDF
Loaded with ignoreEncryptionThe scrubber loads with ignoreEncryption: true, so it can open many protected files to clear metadata, but it does not decrypt content for you and is not a password-removal tool. If the PDF requires a password to view its pages, remove protection first with pdf-remove-password (you must own and know the password), then sanitize the unlocked copy.
Annotations and comments survive
Not coveredReviewer comments, sticky notes, and highlight markup are stored as page annotations, not as document metadata. A metadata scrub leaves every one of them in place — and they frequently name reviewers and quote internal discussion. Strip them with pdf-annotation-remover as a separate step before release.
Visible page text is unchanged
Not coveredEmptying the /Info dictionary changes nothing you can read on the page. Names, emails, SSNs, and account numbers printed in the body text are page content and remain fully visible and selectable. For those, use pdf-pii-redactor, which removes the underlying text rather than covering it.
Form-field values persist
Not coveredInteractive AcroForm field values (a name typed into a fillable field) are not metadata and are not touched. The save uses updateFieldAppearances: false, so field appearances are left as-is. To lock and bake in field values so they cannot be edited or read as form data, run pdf-flatten.
Embedded preview thumbnail predates a crop
Not coveredSome PDFs and the images inside them carry an embedded preview thumbnail that can reflect an earlier, uncropped or uncensored state. The metadata scrubber does not search for or strip these. Inspect a file for hidden previews with hidden-thumbnail-extractor before publishing sensitive imagery.
Digital signature over page content still validates
PreservedAn approval signature covers the document's content bytes, not the /Info metadata dictionary, so clearing metadata generally does not invalidate it. However, if a signature was applied as the very last incremental save and covers the entire byte range, re-serializing the file can break that coverage. Verify signature status after scrubbing if a valid signature is required, and prefer sanitizing before signing where possible.
File exceeds the tier size cap
RejectedPDF input is file-based, so Security-family limits apply: Free 10 MB, Pro 100 MB, Pro-media 500 MB, Developer 2 GB. A scanned, image-heavy contract can blow past the Free cap quickly. Either upgrade the tier or reduce size first with pdf-compress-lossless, which also has the side benefit of re-serializing the XMP packet.
Corrupt or truncated PDF fails to load
ErrorIf pdf-lib cannot parse the file — a truncated download, a non-PDF renamed to .pdf, or a structurally broken document — the load throws and no output is produced. Confirm the file really is a valid PDF first; magic-byte-validator will tell you whether the bytes actually match the .pdf extension before you try to sanitize.
Frequently asked questions
Exactly which metadata fields are cleared?
Eight fields in the document-information dictionary. Six text fields are emptied to blank strings: /Title, /Author, /Subject, /Keywords, /Producer, and /Creator. The two timestamps — /CreationDate and /ModDate — are reset to the Unix epoch (1970-01-01T00:00:00Z) rather than removed. That fixed eight-field set is processed on every run; there is nothing to configure.
Why do the dates show 1970 instead of being deleted?
The tool sets both date stamps to new Date(0), the Unix epoch. It is privacy-neutral on purpose: because every sanitized file shares the same 1970 value, the date no longer reveals anything about when your document was actually created or last edited. If you specifically need the date keys to be absent, this tool does not offer that — re-serializing through pdf-compress-lossless is the closest workaround.
Does this remove the edit history from the PDF?
It removes the metadata that exposes timeline and authorship — the dates and the /Author / /Creator / /Producer fields. It does not perform a forensic strip of incremental-update revision data; pdf-lib re-saves the document, which collapses the working structure, but treat this as a metadata sanitizer, not a guaranteed revision-history eraser. For maximum assurance, re-serialize through pdf-compress-lossless so the file is rebuilt from a single clean structure.
Will sanitizing break the PDF or change the pages?
No. Only the /Info dictionary entries change. Pages, text, images, embedded fonts, and bookmarks are preserved exactly as pdf-lib leaves them — the document is re-saved, not re-rendered or re-encoded. The page count of the output matches the input.
Does my file get uploaded to a server?
No. The processing runs entirely in your browser via pdf-lib. A sensitive draft, exhibit, or filing never leaves your device. That is the whole point of running it locally rather than through an online converter that you would have to trust with the document.
What about the XMP metadata — is that cleared too?
No, and this is important to get right. Modern PDFs store metadata twice: in the legacy /Info dictionary (which this tool empties) and in an XMP packet (dc:creator, xmp:CreateDate, and more) which this tool does not rewrite. Some viewers prefer XMP, so an author name can survive there. To clear the XMP packet, re-save the file through pdf-compress-lossless, which rebuilds the document and drops the stale packet.
Will names that are visible on the page be removed?
No. A signature block, a 'Prepared by' line, or an email address printed in the body is page content, not metadata — it stays fully visible and selectable after a metadata scrub. Use the PDF PII Redactor (/pdf-tools/pdf-pii-redactor) to remove the underlying text, or the Signature Burner (/security-tools/signature-burner) to pixel-redact a handwritten signature.
Are reviewer comments and annotations stripped?
No. Comments, sticky notes, and highlights are page annotations, not document metadata, so they survive a metadata scrub — and they often name reviewers and quote internal discussion. Remove them separately with the PDF Annotation Remover (/pdf-tools/pdf-annotation-remover) before the document goes out.
Can I sanitize an encrypted or password-protected PDF?
The scrubber loads with encryption ignored, so it can often open a protected file to clear its metadata, but it is not a decryption or password-removal tool. If the PDF needs a password just to view its pages, remove protection first with PDF Remove Password (/pdf-tools/pdf-remove-password) — for a file you own and whose password you know — then sanitize the unlocked copy.
Will this invalidate a digital signature?
Usually not — an approval signature covers the page content, not the /Info metadata dictionary, so clearing metadata typically leaves it valid. The exception is a signature applied as the final incremental save covering the whole byte range: re-serializing can break that coverage. Where a valid signature is required, prefer sanitizing before signing, and verify the signature after scrubbing.
How is this different from the PDF Metadata Scrubber in the PDF suite?
It is the same engine. The PDF History Sanitizer is the Security-suite entry point; it routes to the canonical PDF Metadata Scrubber (/pdf-tools/pdf-metadata-scrubber), which does the actual pdf-lib work. Use whichever surface fits your workflow — the field set, behaviour, and browser-local processing are identical.
How big a PDF can I sanitize?
PDF input is file-based, so Security-family tier limits apply: Free handles up to 10 MB and one file; Pro up to 100 MB; Pro-media up to 500 MB; Developer up to 2 GB. Scanned, image-heavy documents hit these caps fastest. If you are over the limit, compress first with pdf-compress-lossless (/pdf-tools/pdf-compress-lossless) or move up a tier.
What is the complete pre-publication scrub checklist?
Five steps for a genuinely clean release. (1) Sanitize metadata with this tool. (2) Re-serialize via pdf-compress-lossless (/pdf-tools/pdf-compress-lossless) to clear the XMP packet. (3) Redact visible PII with pdf-pii-redactor (/pdf-tools/pdf-pii-redactor). (4) Strip comments with pdf-annotation-remover (/pdf-tools/pdf-annotation-remover) and flatten forms with pdf-flatten (/pdf-tools/pdf-flatten). (5) Fingerprint the final file with multi-hash-fingerprinter (/security-tools/multi-hash-fingerprinter) and log the digest so any later change is detectable.
Privacy first
Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.