Sanitize a PDF Before a FOIA or Public-Records Release

How to clean pdf metadata before a foia / public-records disclosure

Step 1
Read the document properties of the responsive record first — Open the file in Acrobat or Preview and check File → Properties → Description before it goes near the public record. The /Author field usually carries the analyst's OS or network account name, and /Producer fingerprints the exact agency software stack. Note /Title too — internal tracking numbers are routinely parked there.
Step 2
Drop the responsive PDF onto the sanitizer — Add the file to the tool above. It routes to the canonical PDF Metadata Scrubber, which processes the document with pdf-lib in your browser. There are no options to configure — the field set is fixed, so a records team cannot accidentally leave a field selected.
Step 3
Run the single metadata pass — pdf-lib loads the document with ignoreEncryption: true, calls the empty-string setters for Title, Author, Subject, Keywords, Producer, and Creator, sets both dates to new Date(0), and re-saves with updateFieldAppearances: false. Pages are not re-rendered or re-encoded.
Step 4
Confirm exemption redactions are already on the page — Metadata scrubbing is independent of substantive redaction. Names, addresses, or other exempt material visible in the page text must be removed with pdf-pii-redactor first — a metadata pass never touches the visible page stream.
Step 5
Verify the cleaned properties — Re-open File → Properties. Author, Creator, Producer, Title, Subject, and Keywords now read blank, and the two dates show 1 January 1970 — the epoch — rather than vanishing. That fixed constant is privacy-neutral because every sanitized file shares it.
Step 6
Fingerprint the release version for the file — Record proof that the posted file is the file you cleared. Drop the result on multi-hash-fingerprinter and log the SHA-256 in the release record so any later alteration to the posted PDF is detectable.

Document-info fields after sanitizing a responsive record

All eight fields are handled in the single pdf-lib pass. The six text fields become empty strings; the two dates reset to the Unix epoch rather than being removed.

Field	What it leaks in a public-records context	After sanitizing
`/Author`	The analyst or staffer who drafted the record — usually their network account name	Empty string
`/Creator`	The authoring application used inside the agency	Empty string
`/Producer`	The exact PDF library and version — an agency toolchain fingerprint	Empty string
`/Title`	Internal matter, case, or request-tracking number	Empty string
`/Subject`	Classification or routing note added during review	Empty string
`/Keywords`	Internal tags, program names, or distribution labels	Empty (cleared to no keywords)
`/CreationDate`	When the record was first drafted — anchors the pre-release timeline	Reset to 1970-01-01T00:00:00Z (epoch)
`/ModDate`	Last save before release — reveals last-minute edits	Reset to 1970-01-01T00:00:00Z (epoch)

What a metadata scrub covers vs. the rest of a disclosure

Metadata is one layer of a defensible release. Exemptions and comments are owned by other tools.

Risk in the released file	Owned by this tool?	Tool to use
`/Author` `/Creator` `/Producer` in Document Properties	Yes	This tool (pdf-metadata-scrubber)
`/Title` `/Subject` `/Keywords` with internal tracking data	Yes	This tool
Drafting / review timeline (the two date stamps)	Yes — reset to epoch	This tool
Exempt names / addresses still readable on the page	No	pdf-pii-redactor
Reviewer comments and sticky notes about the request	No	pdf-annotation-remover
XMP metadata packet (`dc:creator`, `xmp:CreateDate`)	No — not rewritten	Re-save via pdf-compress-lossless
An Office source doc released alongside the PDF	No	office-doc-property-wiper

File-size limits by tier (PDF input)

PDF input is file-based, so Security-family tier limits apply. One file per pass on the metadata scrubber.

Tier	Max file size	Files per pass
Free	10 MB	1
Pro	100 MB	5 (processed one at a time)
Pro-media	500 MB	50
Developer	2 GB	Unlimited

Cookbook

Real Document Properties before and after a sanitizing pass for a public-records release, plus the leaks a metadata-only scrub does not touch. Names and tracking numbers are illustrative.

Analyst account name leaking through /Author on a responsive memo

A staffer exported a memo to PDF and Word wrote their agency network account into /Author. A requester opens Properties and learns exactly who drafted the released record — information that was never part of the request.

Before (File -> Properties):
  Author:   a.santos
  Creator:  Microsoft Word for Microsoft 365
  Producer: Microsoft: Print To PDF
  Title:    REQ-2026-0481_response_DRAFT
  Created:  2026-05-19 14:02:33
  Modified: 2026-06-04 10:48:17

After sanitizing:
  Author:   (blank)
  Creator:  (blank)
  Producer: (blank)
  Title:    (blank)
  Created:  1970-01-01 00:00:00
  Modified: 1970-01-01 00:00:00

Internal tracking number parked in /Title

Records teams often store the request-tracking number in the document title. Released as-is, it maps the file to internal case-management data. Sanitizing empties it.

Before:
  Title:    REQ-2026-0481_response_DRAFT
  Subject:  FOR REVIEW - legal cleared 06/03

After:
  Title:    (blank)
  Subject:  (blank)

The tracking number and the internal review note
are gone from Document Properties.

The pre-release timeline collapses to the epoch

The two date stamps reveal that the record was drafted on 19 May and quietly edited as late as 4 June — minutes before release. After sanitizing, both show 1 January 1970, a constant shared by every cleaned file.

Before:
  CreationDate: D:20260519140233-04'00'
  ModDate:      D:20260604104817-04'00'

After:
  CreationDate: D:19700101000000Z
  ModDate:      D:19700101000000Z

The 19 May -> 04 Jun drafting/edit window is no longer
readable from the file.

Metadata is clean but an exempt name is still on the page

Critical: emptying /Author does nothing to an exempt name printed in the body. That is page content, not metadata, and it remains fully visible and selectable. Redact it before — or alongside — the metadata pass.

Document Properties after sanitizing: all clean.

But page 2 still reads:
  "Complainant: Maria Alvarez, 14 Birch Lane"

This survives a metadata scrub. Run
/pdf-tools/pdf-pii-redactor to remove the exempt
detail from the visible text stream.

Logging the SHA-256 of the posted release

After sanitizing, fingerprint the file so the agency can prove the posted version matches the version that was cleared for disclosure.

Workflow:
  1. Sanitize REQ-2026-0481.pdf -> REQ-2026-0481.clean.pdf
  2. Drop the clean file on
     /security-tools/multi-hash-fingerprinter
     SHA-256: 4ad9...c70b
  3. Record the digest in the release log.

If the public copy's SHA-256 later differs, the file
was altered after it was cleared.

Edge cases and what actually happens

Dates show 1970, not blank

Expected

The sanitizer sets CreationDate and ModDate to new Date(0) — the Unix epoch (1970-01-01T00:00:00Z), written as D:19700101000000Z. It does not delete the date keys. This is by design and privacy-safe: a fixed constant shared by every released file carries no information about the real drafting timeline.

Exempt material on the page is unchanged

Not covered

A metadata scrub never touches the visible page stream. Names, addresses, and other exempt details printed in the body remain fully visible and selectable. Substantive redaction is a separate step — use pdf-pii-redactor to remove the underlying text rather than relying on the metadata pass.

XMP packet still names the author

Not covered

Modern PDFs carry a second metadata channel — an XMP packet with fields like dc:creator and xmp:CreateDate — alongside the legacy /Info dictionary. This tool empties /Info but does not rewrite the XMP stream, and some viewers prefer XMP. To clear it, re-serialize through pdf-compress-lossless, which rebuilds the document structure.

Reviewer comments about the request survive

Not covered

Comments and sticky notes added during legal or exemption review are stored as page annotations, not metadata, so a metadata scrub leaves every one of them in place — and they often quote internal deliberation. Strip them with pdf-annotation-remover before posting.

Encrypted / password-protected record

Loaded with ignoreEncryption

The scrubber loads with ignoreEncryption: true, so it can often open a protected file to clear metadata, but it does not decrypt content and is not a password-removal tool. If the PDF needs a password just to view its pages, remove protection first with pdf-remove-password for a file the agency owns, then sanitize the unlocked copy.

Office source document released alongside the PDF

Not covered

If a request also calls for the native Word or Excel source, this PDF tool does nothing to that file's own metadata. Office documents carry their own author and revision properties — wipe them with office-doc-property-wiper before disclosure.

Embedded preview thumbnail predates a redaction

Not covered

Some PDFs and embedded images carry a preview thumbnail that can reflect an earlier, un-redacted state. The metadata scrubber does not search for or strip these. Inspect a file for hidden previews with hidden-thumbnail-extractor before posting sensitive imagery.

File exceeds the tier size cap

Rejected

PDF input is file-based, so Security-family limits apply: Free 10 MB, Pro 100 MB, Pro-media 500 MB, Developer 2 GB. A scanned, image-heavy responsive packet can blow past the Free cap quickly. Upgrade the tier or reduce size first with pdf-compress-lossless, which also re-serializes the XMP packet.

Corrupt or truncated PDF fails to load

Error

If pdf-lib cannot parse the file — a truncated scan, a non-PDF renamed to .pdf, or a structurally broken document — the load throws and no output is produced. Confirm the bytes really are a valid PDF first with magic-byte-validator before trying to sanitize.

Frequently asked questions

Which metadata fields does the sanitizer clear?

Eight fields in the document-information dictionary. Six text fields are emptied to blank strings: /Title, /Author, /Subject, /Keywords, /Producer, and /Creator. The two timestamps — /CreationDate and /ModDate — are reset to the Unix epoch (1970-01-01T00:00:00Z). That fixed eight-field set is processed on every run; there is nothing to configure.

Is metadata scrubbing the same as redacting exemptions?

No, and conflating them is a serious mistake in records work. This tool only clears the metadata layer in Document Properties. Exempt names, addresses, or other material visible on the page are page content and remain fully readable after a metadata pass. Substantive redaction is a separate step with the PDF PII Redactor (/pdf-tools/pdf-pii-redactor).

Why do the dates show 1970 instead of disappearing?

The tool sets both date stamps to new Date(0), the Unix epoch. It is privacy-neutral on purpose: because every sanitized file shares the same 1970 value, the date no longer reveals when the record was actually drafted or last edited before release. If you specifically need the date keys absent, this tool does not offer that.

Does the responsive file get uploaded anywhere?

No. Processing runs entirely in your browser via pdf-lib. A record being prepared for disclosure never leaves the workstation — which is the whole reason to run it locally rather than through an online converter you would have to trust with pre-release material.

Will sanitizing change the released pages?

No. Only the /Info dictionary entries change. Pages, text, images, embedded fonts, and bookmarks are preserved exactly as pdf-lib leaves them — the document is re-saved, not re-rendered. The released file reads identically to the version you reviewed, and the page count matches the input.

Is the XMP metadata cleared too?

No. PDFs store metadata twice: in the legacy /Info dictionary (which this tool empties) and in an XMP packet (dc:creator, xmp:CreateDate) which this tool does not rewrite. Some viewers prefer XMP, so an author name can survive there. Re-serialize through pdf-compress-lossless (/pdf-tools/pdf-compress-lossless) to drop the stale packet.

Are reviewer comments and exemption notes removed?

No. Comments and sticky notes are page annotations, not metadata, so they survive a metadata scrub — and they frequently quote internal deliberation about the request. Remove them separately with the PDF Annotation Remover (/pdf-tools/pdf-annotation-remover) before the record is posted.

Can I sanitize an encrypted record?

The scrubber loads with encryption ignored, so it can often open a protected file to clear its metadata, but it is not a decryption or password-removal tool. If the PDF needs a password to view its pages, remove protection first with PDF Remove Password (/pdf-tools/pdf-remove-password) on a file the agency owns, then sanitize the unlocked copy.

What about a native Office source released with the PDF?

This tool only handles PDFs. If the request also covers the Word or Excel source, that file carries its own author and revision properties — wipe them with the Office Doc Property Wiper (/security-tools/office-doc-property-wiper) before disclosure.

How do I prove the posted file matches what I cleared?

Fingerprint it. After sanitizing, drop the file on multi-hash-fingerprinter (/security-tools/multi-hash-fingerprinter) and log the SHA-256 in the release record. If the public copy's digest ever differs from the logged value, the file was altered after sign-off.

How big a PDF can I sanitize?

PDF input is file-based, so Security-family tier limits apply: Free handles up to 10 MB and one file; Pro up to 100 MB; Pro-media up to 500 MB; Developer up to 2 GB. Scanned responsive packets hit these caps fastest. If you are over the limit, compress first with pdf-compress-lossless (/pdf-tools/pdf-compress-lossless) or move up a tier.

What is a defensible pre-release checklist?

Five steps. (1) Redact exempt page content with pdf-pii-redactor (/pdf-tools/pdf-pii-redactor). (2) Strip reviewer comments with pdf-annotation-remover (/pdf-tools/pdf-annotation-remover). (3) Sanitize metadata with this tool. (4) Re-serialize via pdf-compress-lossless (/pdf-tools/pdf-compress-lossless) to clear the XMP packet. (5) Fingerprint the final file with multi-hash-fingerprinter (/security-tools/multi-hash-fingerprinter) and log the digest in the release record.

Privacy first

Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.

How to clean pdf metadata before a foia / public-records disclosure

Step 1
Read the document properties of the responsive record first — Open the file in Acrobat or Preview and check File → Properties → Description before it goes near the public record. The /Author field usually carries the analyst's OS or network account name, and /Producer fingerprints the exact agency software stack. Note /Title too — internal tracking numbers are routinely parked there.
Step 2
Drop the responsive PDF onto the sanitizer — Add the file to the tool above. It routes to the canonical PDF Metadata Scrubber, which processes the document with pdf-lib in your browser. There are no options to configure — the field set is fixed, so a records team cannot accidentally leave a field selected.
Step 3
Run the single metadata pass — pdf-lib loads the document with ignoreEncryption: true, calls the empty-string setters for Title, Author, Subject, Keywords, Producer, and Creator, sets both dates to new Date(0), and re-saves with updateFieldAppearances: false. Pages are not re-rendered or re-encoded.
Step 4
Confirm exemption redactions are already on the page — Metadata scrubbing is independent of substantive redaction. Names, addresses, or other exempt material visible in the page text must be removed with pdf-pii-redactor first — a metadata pass never touches the visible page stream.
Step 5
Verify the cleaned properties — Re-open File → Properties. Author, Creator, Producer, Title, Subject, and Keywords now read blank, and the two dates show 1 January 1970 — the epoch — rather than vanishing. That fixed constant is privacy-neutral because every sanitized file shares it.
Step 6
Fingerprint the release version for the file — Record proof that the posted file is the file you cleared. Drop the result on multi-hash-fingerprinter and log the SHA-256 in the release record so any later alteration to the posted PDF is detectable.

Document-info fields after sanitizing a responsive record

All eight fields are handled in the single pdf-lib pass. The six text fields become empty strings; the two dates reset to the Unix epoch rather than being removed.

Field	What it leaks in a public-records context	After sanitizing
`/Author`	The analyst or staffer who drafted the record — usually their network account name	Empty string
`/Creator`	The authoring application used inside the agency	Empty string
`/Producer`	The exact PDF library and version — an agency toolchain fingerprint	Empty string
`/Title`	Internal matter, case, or request-tracking number	Empty string
`/Subject`	Classification or routing note added during review	Empty string
`/Keywords`	Internal tags, program names, or distribution labels	Empty (cleared to no keywords)
`/CreationDate`	When the record was first drafted — anchors the pre-release timeline	Reset to 1970-01-01T00:00:00Z (epoch)
`/ModDate`	Last save before release — reveals last-minute edits	Reset to 1970-01-01T00:00:00Z (epoch)

What a metadata scrub covers vs. the rest of a disclosure

Metadata is one layer of a defensible release. Exemptions and comments are owned by other tools.

Risk in the released file	Owned by this tool?	Tool to use
`/Author` `/Creator` `/Producer` in Document Properties	Yes	This tool (pdf-metadata-scrubber)
`/Title` `/Subject` `/Keywords` with internal tracking data	Yes	This tool
Drafting / review timeline (the two date stamps)	Yes — reset to epoch	This tool
Exempt names / addresses still readable on the page	No	pdf-pii-redactor
Reviewer comments and sticky notes about the request	No	pdf-annotation-remover
XMP metadata packet (`dc:creator`, `xmp:CreateDate`)	No — not rewritten	Re-save via pdf-compress-lossless
An Office source doc released alongside the PDF	No	office-doc-property-wiper

File-size limits by tier (PDF input)

PDF input is file-based, so Security-family tier limits apply. One file per pass on the metadata scrubber.

Tier	Max file size	Files per pass
Free	10 MB	1
Pro	100 MB	5 (processed one at a time)
Pro-media	500 MB	50
Developer	2 GB	Unlimited

Cookbook

Real Document Properties before and after a sanitizing pass for a public-records release, plus the leaks a metadata-only scrub does not touch. Names and tracking numbers are illustrative.

Analyst account name leaking through /Author on a responsive memo

Before (File -> Properties):
  Author:   a.santos
  Creator:  Microsoft Word for Microsoft 365
  Producer: Microsoft: Print To PDF
  Title:    REQ-2026-0481_response_DRAFT
  Created:  2026-05-19 14:02:33
  Modified: 2026-06-04 10:48:17

After sanitizing:
  Author:   (blank)
  Creator:  (blank)
  Producer: (blank)
  Title:    (blank)
  Created:  1970-01-01 00:00:00
  Modified: 1970-01-01 00:00:00

Internal tracking number parked in /Title

Records teams often store the request-tracking number in the document title. Released as-is, it maps the file to internal case-management data. Sanitizing empties it.

Before:
  Title:    REQ-2026-0481_response_DRAFT
  Subject:  FOR REVIEW - legal cleared 06/03

After:
  Title:    (blank)
  Subject:  (blank)

The tracking number and the internal review note
are gone from Document Properties.

The pre-release timeline collapses to the epoch

Before:
  CreationDate: D:20260519140233-04'00'
  ModDate:      D:20260604104817-04'00'

After:
  CreationDate: D:19700101000000Z
  ModDate:      D:19700101000000Z

The 19 May -> 04 Jun drafting/edit window is no longer
readable from the file.

Metadata is clean but an exempt name is still on the page

Document Properties after sanitizing: all clean.

But page 2 still reads:
  "Complainant: Maria Alvarez, 14 Birch Lane"

This survives a metadata scrub. Run
/pdf-tools/pdf-pii-redactor to remove the exempt
detail from the visible text stream.

Logging the SHA-256 of the posted release

After sanitizing, fingerprint the file so the agency can prove the posted version matches the version that was cleared for disclosure.

Workflow:
  1. Sanitize REQ-2026-0481.pdf -> REQ-2026-0481.clean.pdf
  2. Drop the clean file on
     /security-tools/multi-hash-fingerprinter
     SHA-256: 4ad9...c70b
  3. Record the digest in the release log.

If the public copy's SHA-256 later differs, the file
was altered after it was cleared.

Edge cases and what actually happens

Dates show 1970, not blank

Expected

Exempt material on the page is unchanged

Not covered

XMP packet still names the author

Not covered

Reviewer comments about the request survive

Not covered

Encrypted / password-protected record

Loaded with ignoreEncryption

Office source document released alongside the PDF

Not covered

Embedded preview thumbnail predates a redaction

Not covered

File exceeds the tier size cap

Rejected

Corrupt or truncated PDF fails to load

Error

Frequently asked questions

Which metadata fields does the sanitizer clear?

Is metadata scrubbing the same as redacting exemptions?

Why do the dates show 1970 instead of disappearing?

Does the responsive file get uploaded anywhere?

Will sanitizing change the released pages?

Is the XMP metadata cleared too?

Are reviewer comments and exemption notes removed?

Can I sanitize an encrypted record?

What about a native Office source released with the PDF?

How do I prove the posted file matches what I cleared?

How big a PDF can I sanitize?

What is a defensible pre-release checklist?

Privacy first

Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.

Clean PDF Metadata Before a FOIA / Public-Records Disclosure

How to clean pdf metadata before a foia / public-records disclosure

Document-info fields after sanitizing a responsive record

What a metadata scrub covers vs. the rest of a disclosure

File-size limits by tier (PDF input)

Cookbook

Analyst account name leaking through /Author on a responsive memo

Internal tracking number parked in /Title

The pre-release timeline collapses to the epoch

Metadata is clean but an exempt name is still on the page

Logging the SHA-256 of the posted release

Edge cases and what actually happens

Dates show 1970, not blank

Exempt material on the page is unchanged

XMP packet still names the author

Reviewer comments about the request survive

Encrypted / password-protected record

Office source document released alongside the PDF

Embedded preview thumbnail predates a redaction

File exceeds the tier size cap

Corrupt or truncated PDF fails to load

Frequently asked questions

Which metadata fields does the sanitizer clear?

Is metadata scrubbing the same as redacting exemptions?

Why do the dates show 1970 instead of disappearing?

Does the responsive file get uploaded anywhere?

Will sanitizing change the released pages?

Is the XMP metadata cleared too?

Are reviewer comments and exemption notes removed?

Can I sanitize an encrypted record?

What about a native Office source released with the PDF?

How do I prove the posted file matches what I cleared?

How big a PDF can I sanitize?

What is a defensible pre-release checklist?

Privacy first

Related guides

Clean PDF Metadata Before a FOIA / Public-Records Disclosure

How to clean pdf metadata before a foia / public-records disclosure

Document-info fields after sanitizing a responsive record

What a metadata scrub covers vs. the rest of a disclosure

File-size limits by tier (PDF input)

Cookbook

Analyst account name leaking through /Author on a responsive memo

Internal tracking number parked in /Title

The pre-release timeline collapses to the epoch

Metadata is clean but an exempt name is still on the page

Logging the SHA-256 of the posted release

Edge cases and what actually happens

Dates show 1970, not blank

Exempt material on the page is unchanged

XMP packet still names the author

Reviewer comments about the request survive

Encrypted / password-protected record

Office source document released alongside the PDF

Embedded preview thumbnail predates a redaction

File exceeds the tier size cap

Corrupt or truncated PDF fails to load

Frequently asked questions

Which metadata fields does the sanitizer clear?

Is metadata scrubbing the same as redacting exemptions?

Why do the dates show 1970 instead of disappearing?

Does the responsive file get uploaded anywhere?

Will sanitizing change the released pages?

Is the XMP metadata cleared too?

Are reviewer comments and exemption notes removed?

Can I sanitize an encrypted record?

What about a native Office source released with the PDF?

How do I prove the posted file matches what I cleared?

How big a PDF can I sanitize?

What is a defensible pre-release checklist?

Privacy first

Related guides