How to scrub metadata from a pdf before public release
- Step 1Redact visible content first — Metadata is the last step, not the first. Black out names, emails, and sensitive text on the page with pdf-pii-redactor before you scrub the metadata — redaction changes content and is what regulators actually inspect.
- Step 2Remove comments and markup — Run pdf-annotation-remover to strip sticky notes, highlights, and reviewer names. These carry author identities the metadata scrubber does not reach.
- Step 3Flatten form fields if any — If the document has interactive fields, flatten them with pdf-flatten so field values become static content and can't be edited or leak default values.
- Step 4Drop the prepared PDF onto the scrubber — Now add the file here. It loads locally with pdf-lib and the scrub runs automatically — there is no options panel to set. All eight document-info fields are cleared.
- Step 5Drop XMP by re-saving losslessly — Because this tool does not rewrite the XMP packet, finish by re-saving through pdf-compress-lossless, which rebuilds the document and removes the stale XMP metadata. Then download.
- Step 6Audit before you publish — Open the final file in Acrobat (File → Properties → Description and Custom tabs) or run
exiftool final.pdf. Confirm Author/Creator/Producer/Title are blank, dates read 1970-01-01, and no XMP author/date remains.
The pre-publication checklist — and which tool owns each item
A metadata scrub alone does not make a document safe to publish. This is the full chain; the metadata scrubber owns one row.
| Risk before release | Owned by this tool? | Tool to use |
|---|---|---|
| Author / Creator / Producer in Document Properties | Yes | This tool (pdf-metadata-scrubber) |
| Title / Subject / Keywords naming internal projects | Yes | This tool |
| Creation / modification timeline | Yes (reset to epoch) | This tool |
| XMP metadata packet (dc:creator, xmp:CreateDate) | No | Re-save via pdf-compress-lossless |
| Names/emails in visible page text | No | pdf-pii-redactor |
| Comments, sticky notes, reviewer markup | No | pdf-annotation-remover |
| Interactive form field values | No | pdf-flatten |
Document-info fields after a public-release scrub
Every field below is processed in the single pass; text fields are emptied and dates are reset.
| Field | What it typically leaks | After scrubbing |
|---|---|---|
/Author | The official who drafted the document | Empty |
/Creator | Authoring app / template owner | Empty |
/Producer | PDF library and version (toolchain fingerprint) | Empty |
/Title | Internal working title or codename | Empty |
/Subject | Classification line or internal brief text | Empty |
/Keywords | Project tags | Cleared |
/CreationDate + /ModDate | Drafting and last-edit timeline | Reset to 1970-01-01T00:00:00Z |
Cookbook
Practical release workflows. The metadata-scrubber step is shown in context with the redaction, annotation, and XMP steps that surround it.
FOI response — full release chain
A document released under freedom-of-information law must hide both the visible exemptions and the hidden metadata. Metadata scrubbing is the second-to-last step, after redaction and annotation removal.
1. pdf-pii-redactor -> black out exempt names/addresses 2. pdf-annotation-remover-> remove caseworker comments 3. pdf-metadata-scrubber -> clear Author/Creator/dates (this tool) 4. pdf-compress-lossless -> drop XMP packet, finalise 5. exiftool final.pdf -> audit: all fields blank?
Before/after Document Properties for a published report
A research report's pre-release metadata named the lead author and the lab's template. After scrubbing, Document Properties is empty.
Before (Acrobat → Description): Author: Dr A. Researcher Creator: LabReport-Template-v4 Producer: Microsoft: Print To PDF Title: Q2-internal-draft Created: 2026-04-09 After scrubbing: Author/Creator/Producer/Title: (all empty) Created/Modified: 1970-01-01 UTC
The trap: scrubbed metadata but the dossier author is in XMP
The classic public-release mistake. The info-dictionary Author is blank, but the XMP packet still carries dc:creator with the real name — which is exactly what investigators read first.
After metadata-scrubber only: Info dict Author: (empty) ✓ XMP dc:creator: 'A. Civil Servant' ✗ STILL THERE Fix: re-save via pdf-compress-lossless to drop XMP, then confirm with: exiftool -XMP-dc:Creator final.pdf
Consultation paper with reviewer comments
Internal review comments were left on the draft. Metadata scrubbing won't remove them — they carry reviewer names and must be stripped separately before publication.
Metadata after scrub: clean ✓ Sticky notes: 'Legal: soften para 12 - JR' ✗ Fix order: pdf-annotation-remover -> pdf-metadata-scrubber Never publish before removing the markup layer.
Final audit command
One ExifTool command surfaces both the info-dictionary and XMP metadata so you can sign off the release.
$ exiftool -G1 -a -s final.pdf | grep -Ei 'author|creator|date|title|subject' [PDF] Author: (blank) [PDF] Creator: (blank) [PDF] CreateDate: 1970:01:01 00:00:00Z [XMP-dc] (no Creator line) <- good, XMP also clean
Edge cases and what actually happens
XMP author/date survives the scrub
XMP not rewrittenThis is the single most important caveat for public release: the tool clears the classic document-info dictionary but does not rewrite the XMP metadata packet. A name in XMP dc:creator or a real xmp:CreateDate will survive — and these are exactly what journalists and analysts inspect. Finish the chain by re-saving through pdf-compress-lossless.
Reviewer names left in comments
Out of scopeAnnotations carry their own author names and are not part of the metadata scrub. Always run pdf-annotation-remover before publishing a document that went through internal review.
Sensitive text still visible on the page
Not redactedScrubbing metadata does nothing to text or images on the page. Names, addresses, and exempt content must be redacted with pdf-pii-redactor — and true redaction must remove the underlying content, not just draw a black box, which is a separate concern this tool does not address.
Form fields hold default values
Not flattenedInteractive form fields can leak default values and field names. Flatten them with pdf-flatten before scrubbing so the values become static, non-editable content.
Dates display as 1970-01-01, not blank
ExpectedThe two date fields are reset to the Unix epoch rather than deleted, so Document Properties shows 01/01/1970. The real timeline is gone; the epoch value is the intended output.
Incremental-update history embedded in the file
May persistPDFs saved with incremental updates can retain earlier content layers. A plain metadata scrub does not collapse them. Re-saving through pdf-compress-lossless or pdf-flatten rebuilds the document and drops the historical layers before release.
File over the tier size or page limit
BlockedFree handles 2 MB / 50 pages; Pro 50 MB / 500 pages; Pro+Media 500 MB / 2,000 pages. A large publication-ready PDF may exceed Free — the tool blocks before processing with an upgrade prompt.
Document is digitally signed
Signature breaksScrubbing re-saves the file and invalidates any existing signature. For a public release you usually want to scrub first and (if required) re-sign the final clean version. Verify with pdf-signature-verify.
PDF/A archival copy
Conflicts with PDF/APDF/A requires certain metadata to be present and consistent. A scrubbed copy is for distribution, not archiving. Keep an unscrubbed PDF/A master if you also need to archive — see the remove-software-info guide for the PDF/A interaction.
Frequently asked questions
Does this tool remove ALL metadata before I publish?
It removes all of the classic document-information dictionary: Title, Author, Subject, Keywords, Producer, Creator, and the two dates (reset to epoch). It does NOT rewrite the XMP metadata packet. For a true pre-publication scrub, follow up by re-saving through pdf-compress-lossless to drop the XMP, then audit with ExifTool.
What's the correct order of steps before public release?
Redact visible content (pdf-pii-redactor) → remove comments (pdf-annotation-remover) → flatten forms (pdf-flatten) → scrub metadata (this tool) → drop XMP and finalise (pdf-compress-lossless) → audit with ExifTool. Metadata is near the end because earlier steps re-save the file.
How do government and FOI teams verify a clean release?
Open Acrobat's File → Properties → Description and Custom tabs and confirm everything is blank, then run exiftool -G1 -a -s file.pdf to catch any residual XMP author or date. The Custom tab and XMP are where forensic checks find leaks.
Why might the author name still appear after scrubbing?
Because it's in the XMP packet (which this tool doesn't rewrite), in a comment, or printed on the page. The famous public-release leaks were XMP and annotation leaks, not info-dictionary ones — so always finish with the lossless re-save and the annotation removal step.
Does scrubbing change the published document's appearance?
No. Metadata is invisible to readers. The pages look identical; only the hidden document-info fields and the date stamps change.
Is the document uploaded to a server?
No. The scrub runs in your browser with pdf-lib. A sensitive pre-release document never leaves your device. Only an anonymous run counter is recorded for signed-in users.
Does it remove tracked changes or revision history?
Not directly. Tracked-change names usually live in annotations (use pdf-annotation-remover); incremental-update history is collapsed by re-saving through pdf-compress-lossless or pdf-flatten.
Can I publish a scrubbed file as PDF/A?
No — PDF/A requires certain metadata to be present, so a scrubbed copy is for distribution rather than archiving. Keep a separate PDF/A master if you need to archive the document long-term.
What metadata does a published research PDF typically need stripped?
Author, Creator (template/app), Producer, Title (often a working draft name), and the dates — all handled here — plus the XMP equivalents, which need the lossless re-save. Subject and Keywords sometimes carry internal classification text and are cleared too.
What's the largest document I can scrub before release?
Free: 2 MB / 50 pages. Pro: 50 MB / 500 pages. Pro+Media: 500 MB / 2,000 pages. Large publication PDFs may need Pro; the metadata operation itself is fast.
Can I make this part of an automated publishing pipeline?
Yes. Pair the @jadapps/runner and POST files to 127.0.0.1:9789/v1/tools/pdf-metadata-scrubber/run (the tool takes no options). Chain it with the redactor and compressor endpoints for a repeatable release pipeline that runs entirely on your own machine.
Does the tool guarantee anonymity of the document?
No single tool can. Metadata scrubbing removes the document-info fingerprint, but anonymity also depends on redacted content, removed annotations, dropped XMP, and even writing style. Treat this as one verified, deterministic layer in a broader checklist.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.