Anonymise a PDF by Clearing All Document Metadata

How to anonymise a pdf by clearing its document metadata

Step 1
Redact identifying content first — Black out participant names, IDs, emails, and addresses on the page with pdf-pii-redactor. Content de-identification is the part ethics committees and GDPR actually scrutinise; metadata is secondary.
Step 2
Remove comments and markup — Run pdf-annotation-remover to strip reviewer notes, which carry annotator names the metadata scrubber cannot reach.
Step 3
Flatten interactive fields — If the document has form fields (a survey, a consent form), flatten them with pdf-flatten so values become static content and any incremental layers are collapsed.
Step 4
Drop the prepared PDF onto the scrubber — Add the file here. It loads locally with pdf-lib and the scrub runs automatically — no options to set. All eight document-info fields are cleared.
Step 5
Drop the XMP packet — Because this tool does not rewrite XMP, re-save through pdf-compress-lossless to rebuild the document and remove any residual XMP author or date. Then download.
Step 6
Audit the final file — Run exiftool -G1 -a -s final.pdf and confirm there is no Author, Creator, or real date in either the PDF or XMP groups before you share the dataset.

The anonymisation layers — and which tool owns each

Metadata scrubbing is one layer. A document that is anonymous in metadata but not in content is not anonymous. This is the full chain.

Identifying layer	Owned by this tool?	Tool
Document-info Author / Creator / Producer	Yes	This tool (pdf-metadata-scrubber)
Document-info Title / Subject / Keywords	Yes	This tool
Creation / modification timestamps	Yes (reset to epoch)	This tool
Names / IDs / emails in visible text	No	pdf-pii-redactor
Comments / annotation authors	No	pdf-annotation-remover
Form field values / incremental layers	No	pdf-flatten
XMP packet (dc:creator, dates)	No	Re-save via pdf-compress-lossless

What the metadata step clears

The single pass applied to the document-information dictionary during anonymisation.

Field	Identifying risk	After scrubbing
`/Author`	Names the researcher / preparer	Empty
`/Creator`	Authoring app or template owner	Empty
`/Producer`	Toolchain fingerprint	Empty
`/Title`	Often a participant ID or study codename	Empty
`/Subject` + `/Keywords`	Study tags, classification	Cleared
`/CreationDate` + `/ModDate`	Re-identifiable timing	Reset to 1970-01-01T00:00:00Z

Cookbook

Anonymisation workflows for real data-sharing situations. The metadata-scrubber step is shown in its proper place within the chain.

Research dataset PDF — full de-identification chain

A consent form or case report shared with collaborators must be anonymous in both content and metadata. Metadata scrubbing comes after content redaction and form flattening.

1. pdf-pii-redactor      -> redact participant name, DOB, ID
2. pdf-annotation-remover-> remove reviewer notes
3. pdf-flatten           -> flatten consent-form fields
4. pdf-metadata-scrubber -> clear Author/Title/dates  (this tool)
5. pdf-compress-lossless -> drop XMP, finalise
6. exiftool -G1 -a final.pdf  -> audit

Title field leaked a participant ID

The PDF's Title was 'Subject-0427-interview' — a re-identifier on its own. The scrubber empties Title along with Author and dates.

Before (Acrobat → Description):
  Title:    Subject-0427-interview
  Author:   Dr Researcher
  Created:  2026-03-18 14:02

After scrubbing:
  Title:    (empty)
  Author:   (empty)
  Created:  1970-01-01 00:00 UTC

Metadata clean, but the name is still on the page

The most dangerous false sense of security: Document Properties is blank, but the participant's name is printed in the body. Metadata scrubbing is not content redaction.

Author (metadata):  (empty after scrub)  ✓
Page 1 body:        'Interview with Jane D., 42'  ✗ STILL VISIBLE

Fix: pdf-pii-redactor must run BEFORE you call the doc anonymous.
Metadata scrub alone does not de-identify content.

Annotator name survives in a comment

A coder's initials are attached to a margin comment. The metadata scrub leaves it; the annotation remover clears it.

Metadata: clean  ✓
Comment:  'coded as theme 3 - RP'  ✗

Fix: /pdf-tools/pdf-annotation-remover before metadata scrub.

GDPR data-minimisation: nothing leaves your device

Because the scrub is browser-local, the personal data in the source PDF never transits a server during anonymisation — supporting a data-minimisation posture for the processing step.

Processing model:
  file -> browser (pdf-lib) -> scrubbed file
  No upload of document content.

Only an anonymous run counter is recorded for signed-in users
(opt-out in account settings).

Edge cases and what actually happens

Visible PII still on the page after metadata scrub

Not anonymised

Clearing metadata does nothing to text or images you can see. A document with blank Document Properties but a participant name on page 1 is NOT anonymous. Redact content with pdf-pii-redactor before treating the file as de-identified — this is the most common anonymisation mistake.

Annotation author names remain

Out of scope

Reviewer or coder names attached to comments live in the annotation layer, not the metadata. Remove them with pdf-annotation-remover as part of the chain.

XMP author/date survives

XMP not rewritten

The tool clears the document-info dictionary but not the XMP packet. A residual dc:creator or real xmp:CreateDate can re-identify the source. Re-save through pdf-compress-lossless to drop the XMP, then audit with ExifTool.

Form field values reveal identity

Not flattened

A filled consent form or survey can carry the respondent's entries in interactive fields. Flatten with pdf-flatten so the values become static content the scrub and a viewer treat as page content (then redact if visible).

Incremental-update history retains earlier content

May persist

PDFs saved incrementally can keep prior, pre-redaction content layers. A plain metadata scrub does not remove them. Flatten or re-save through pdf-compress-lossless to rebuild the file and drop the history before sharing.

Dates show 1970-01-01 rather than blank

Expected

The two date fields are reset to the Unix epoch, not deleted, so a viewer shows 01/01/1970. The real, potentially re-identifying timestamp is gone — the epoch value is the intended output.

File over the free 2 MB / 50-page limit

Blocked

Free handles 2 MB and 50 pages; Pro 50 MB / 500 pages; Pro+Media 500 MB / 2,000 pages. Large research scans may exceed Free — the tool blocks before processing with an upgrade prompt.

Document is digitally signed

Signature breaks

Anonymising re-saves the file and invalidates a signature, which itself can name the signer. For anonymisation that is usually fine; verify the resulting file with pdf-signature-verify if signature state matters.

Frequently asked questions

Does scrubbing metadata make a PDF fully anonymous?

No. It anonymises the document-information metadata layer (Author, Creator, Producer, Title, Subject, Keywords, and the dates). Real anonymisation also requires redacting visible content (pdf-pii-redactor), removing comments (pdf-annotation-remover), flattening form fields (pdf-flatten), and dropping the XMP packet (a lossless re-save). This tool owns one verified layer of that chain.

What's the correct order for anonymising a research PDF?

Redact content → remove comments → flatten fields → scrub metadata (this tool) → re-save losslessly to drop XMP → audit with ExifTool. Metadata is near the end because earlier steps re-save the file and could otherwise reintroduce metadata.

Does it read or redact the visible text on the page?

No. The scrubber only touches the hidden document-info fields. Any name, ID, or email printed on the page survives — redact those with pdf-pii-redactor before calling the document anonymous.

What about revision history embedded in the PDF?

Linearised/rebuilt PDFs don't retain history, but incremental-update files can keep earlier content layers. A metadata scrub doesn't collapse them — flatten with pdf-flatten or re-save through pdf-compress-lossless first for the most thorough result.

Should I flatten before scrubbing?

Yes, if the document has form fields or you suspect incremental layers. Flatten with pdf-flatten first to turn field values into static content and collapse the file, then scrub the metadata.

Does this support GDPR data-minimisation?

The processing step does: the scrub runs in your browser via pdf-lib, so the document's personal data never transits a server. Note the tool doesn't track consent or legal basis — those remain your responsibility — but it gives you a no-upload way to strip the metadata fingerprint.

Are the dates removed or reset?

Reset. Both /CreationDate and /ModDate are set to the Unix epoch (1970-01-01T00:00:00Z), so a viewer shows that date rather than a blank. The original, potentially re-identifying timestamp is unrecoverable from the info dictionary.

Is the file uploaded anywhere?

No. Everything runs locally in your browser. The document and its metadata never leave your device; only an anonymous run counter is recorded for signed-in users, which you can opt out of.

Can I anonymise a whole folder of documents at once?

This tool is single-file in the browser. For a batch, pair the @jadapps/runner and POST each file to 127.0.0.1:9789/v1/tools/pdf-metadata-scrubber/run (no options needed). Processing stays on your machine, which suits sensitive datasets.

Will the document look different after anonymising the metadata?

No — metadata is invisible to readers. Only the hidden fields and date stamps change. Visible content is unchanged, which is exactly why content redaction is a separate, essential step.

What's the largest PDF I can anonymise?

Free: 2 MB / 50 pages. Pro: 50 MB / 500 pages. Pro+Media: 500 MB / 2,000 pages. The metadata operation is fast; for image-heavy scans you may need a higher tier or to compress first with pdf-compress-lossy.

Does it break a digital signature?

Yes — re-saving invalidates an existing signature, which can itself name the signer. For anonymisation that is normally acceptable; if you need to confirm signature state, check the result with pdf-signature-verify.

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

Anonymise a PDF by Clearing Its Document Metadata

How to anonymise a pdf by clearing its document metadata

The anonymisation layers — and which tool owns each

What the metadata step clears

Cookbook

Research dataset PDF — full de-identification chain

Title field leaked a participant ID

Metadata clean, but the name is still on the page

Annotator name survives in a comment

GDPR data-minimisation: nothing leaves your device

Edge cases and what actually happens

Visible PII still on the page after metadata scrub

Annotation author names remain

XMP author/date survives

Form field values reveal identity

Incremental-update history retains earlier content

Dates show 1970-01-01 rather than blank

File over the free 2 MB / 50-page limit

Document is digitally signed

Frequently asked questions

Does scrubbing metadata make a PDF fully anonymous?

What's the correct order for anonymising a research PDF?

Does it read or redact the visible text on the page?

What about revision history embedded in the PDF?

Should I flatten before scrubbing?

Does this support GDPR data-minimisation?

Are the dates removed or reset?

Is the file uploaded anywhere?

Can I anonymise a whole folder of documents at once?

Will the document look different after anonymising the metadata?

What's the largest PDF I can anonymise?

Does it break a digital signature?

Privacy first

Related guides

Anonymise a PDF by Clearing Its Document Metadata

How to anonymise a pdf by clearing its document metadata

The anonymisation layers — and which tool owns each

What the metadata step clears

Cookbook

Research dataset PDF — full de-identification chain

Title field leaked a participant ID

Metadata clean, but the name is still on the page

Annotator name survives in a comment

GDPR data-minimisation: nothing leaves your device

Edge cases and what actually happens

Visible PII still on the page after metadata scrub

Annotation author names remain

XMP author/date survives

Form field values reveal identity

Incremental-update history retains earlier content

Dates show 1970-01-01 rather than blank

File over the free 2 MB / 50-page limit

Document is digitally signed

Frequently asked questions

Does scrubbing metadata make a PDF fully anonymous?

What's the correct order for anonymising a research PDF?

Does it read or redact the visible text on the page?

What about revision history embedded in the PDF?

Should I flatten before scrubbing?

Does this support GDPR data-minimisation?

Are the dates removed or reset?

Is the file uploaded anywhere?

Can I anonymise a whole folder of documents at once?

Will the document look different after anonymising the metadata?

What's the largest PDF I can anonymise?

Does it break a digital signature?

Privacy first

Related guides