How to clean pdf metadata before a foia / public-records disclosure
- Step 1Read the document properties of the responsive record first — Open the file in Acrobat or Preview and check File → Properties → Description before it goes near the public record. The
/Authorfield usually carries the analyst's OS or network account name, and/Producerfingerprints the exact agency software stack. Note/Titletoo — internal tracking numbers are routinely parked there. - Step 2Drop the responsive PDF onto the sanitizer — Add the file to the tool above. It routes to the canonical PDF Metadata Scrubber, which processes the document with pdf-lib in your browser. There are no options to configure — the field set is fixed, so a records team cannot accidentally leave a field selected.
- Step 3Run the single metadata pass — pdf-lib loads the document with
ignoreEncryption: true, calls the empty-string setters for Title, Author, Subject, Keywords, Producer, and Creator, sets both dates tonew Date(0), and re-saves withupdateFieldAppearances: false. Pages are not re-rendered or re-encoded. - Step 4Confirm exemption redactions are already on the page — Metadata scrubbing is independent of substantive redaction. Names, addresses, or other exempt material visible in the page text must be removed with pdf-pii-redactor first — a metadata pass never touches the visible page stream.
- Step 5Verify the cleaned properties — Re-open File → Properties. Author, Creator, Producer, Title, Subject, and Keywords now read blank, and the two dates show 1 January 1970 — the epoch — rather than vanishing. That fixed constant is privacy-neutral because every sanitized file shares it.
- Step 6Fingerprint the release version for the file — Record proof that the posted file is the file you cleared. Drop the result on multi-hash-fingerprinter and log the SHA-256 in the release record so any later alteration to the posted PDF is detectable.
Document-info fields after sanitizing a responsive record
All eight fields are handled in the single pdf-lib pass. The six text fields become empty strings; the two dates reset to the Unix epoch rather than being removed.
| Field | What it leaks in a public-records context | After sanitizing |
|---|---|---|
/Author | The analyst or staffer who drafted the record — usually their network account name | Empty string |
/Creator | The authoring application used inside the agency | Empty string |
/Producer | The exact PDF library and version — an agency toolchain fingerprint | Empty string |
/Title | Internal matter, case, or request-tracking number | Empty string |
/Subject | Classification or routing note added during review | Empty string |
/Keywords | Internal tags, program names, or distribution labels | Empty (cleared to no keywords) |
/CreationDate | When the record was first drafted — anchors the pre-release timeline | Reset to 1970-01-01T00:00:00Z (epoch) |
/ModDate | Last save before release — reveals last-minute edits | Reset to 1970-01-01T00:00:00Z (epoch) |
What a metadata scrub covers vs. the rest of a disclosure
Metadata is one layer of a defensible release. Exemptions and comments are owned by other tools.
| Risk in the released file | Owned by this tool? | Tool to use |
|---|---|---|
/Author /Creator /Producer in Document Properties | Yes | This tool (pdf-metadata-scrubber) |
/Title /Subject /Keywords with internal tracking data | Yes | This tool |
| Drafting / review timeline (the two date stamps) | Yes — reset to epoch | This tool |
| Exempt names / addresses still readable on the page | No | pdf-pii-redactor |
| Reviewer comments and sticky notes about the request | No | pdf-annotation-remover |
XMP metadata packet (dc:creator, xmp:CreateDate) | No — not rewritten | Re-save via pdf-compress-lossless |
| An Office source doc released alongside the PDF | No | office-doc-property-wiper |
File-size limits by tier (PDF input)
PDF input is file-based, so Security-family tier limits apply. One file per pass on the metadata scrubber.
| Tier | Max file size | Files per pass |
|---|---|---|
| Free | 10 MB | 1 |
| Pro | 100 MB | 5 (processed one at a time) |
| Pro-media | 500 MB | 50 |
| Developer | 2 GB | Unlimited |
Cookbook
Real Document Properties before and after a sanitizing pass for a public-records release, plus the leaks a metadata-only scrub does not touch. Names and tracking numbers are illustrative.
Analyst account name leaking through /Author on a responsive memo
A staffer exported a memo to PDF and Word wrote their agency network account into /Author. A requester opens Properties and learns exactly who drafted the released record — information that was never part of the request.
Before (File -> Properties): Author: a.santos Creator: Microsoft Word for Microsoft 365 Producer: Microsoft: Print To PDF Title: REQ-2026-0481_response_DRAFT Created: 2026-05-19 14:02:33 Modified: 2026-06-04 10:48:17 After sanitizing: Author: (blank) Creator: (blank) Producer: (blank) Title: (blank) Created: 1970-01-01 00:00:00 Modified: 1970-01-01 00:00:00
Internal tracking number parked in /Title
Records teams often store the request-tracking number in the document title. Released as-is, it maps the file to internal case-management data. Sanitizing empties it.
Before: Title: REQ-2026-0481_response_DRAFT Subject: FOR REVIEW - legal cleared 06/03 After: Title: (blank) Subject: (blank) The tracking number and the internal review note are gone from Document Properties.
The pre-release timeline collapses to the epoch
The two date stamps reveal that the record was drafted on 19 May and quietly edited as late as 4 June — minutes before release. After sanitizing, both show 1 January 1970, a constant shared by every cleaned file.
Before: CreationDate: D:20260519140233-04'00' ModDate: D:20260604104817-04'00' After: CreationDate: D:19700101000000Z ModDate: D:19700101000000Z The 19 May -> 04 Jun drafting/edit window is no longer readable from the file.
Metadata is clean but an exempt name is still on the page
Critical: emptying /Author does nothing to an exempt name printed in the body. That is page content, not metadata, and it remains fully visible and selectable. Redact it before — or alongside — the metadata pass.
Document Properties after sanitizing: all clean. But page 2 still reads: "Complainant: Maria Alvarez, 14 Birch Lane" This survives a metadata scrub. Run /pdf-tools/pdf-pii-redactor to remove the exempt detail from the visible text stream.
Logging the SHA-256 of the posted release
After sanitizing, fingerprint the file so the agency can prove the posted version matches the version that was cleared for disclosure.
Workflow:
1. Sanitize REQ-2026-0481.pdf -> REQ-2026-0481.clean.pdf
2. Drop the clean file on
/security-tools/multi-hash-fingerprinter
SHA-256: 4ad9...c70b
3. Record the digest in the release log.
If the public copy's SHA-256 later differs, the file
was altered after it was cleared.Edge cases and what actually happens
Dates show 1970, not blank
ExpectedThe sanitizer sets CreationDate and ModDate to new Date(0) — the Unix epoch (1970-01-01T00:00:00Z), written as D:19700101000000Z. It does not delete the date keys. This is by design and privacy-safe: a fixed constant shared by every released file carries no information about the real drafting timeline.
Exempt material on the page is unchanged
Not coveredA metadata scrub never touches the visible page stream. Names, addresses, and other exempt details printed in the body remain fully visible and selectable. Substantive redaction is a separate step — use pdf-pii-redactor to remove the underlying text rather than relying on the metadata pass.
XMP packet still names the author
Not coveredModern PDFs carry a second metadata channel — an XMP packet with fields like dc:creator and xmp:CreateDate — alongside the legacy /Info dictionary. This tool empties /Info but does not rewrite the XMP stream, and some viewers prefer XMP. To clear it, re-serialize through pdf-compress-lossless, which rebuilds the document structure.
Reviewer comments about the request survive
Not coveredComments and sticky notes added during legal or exemption review are stored as page annotations, not metadata, so a metadata scrub leaves every one of them in place — and they often quote internal deliberation. Strip them with pdf-annotation-remover before posting.
Encrypted / password-protected record
Loaded with ignoreEncryptionThe scrubber loads with ignoreEncryption: true, so it can often open a protected file to clear metadata, but it does not decrypt content and is not a password-removal tool. If the PDF needs a password just to view its pages, remove protection first with pdf-remove-password for a file the agency owns, then sanitize the unlocked copy.
Office source document released alongside the PDF
Not coveredIf a request also calls for the native Word or Excel source, this PDF tool does nothing to that file's own metadata. Office documents carry their own author and revision properties — wipe them with office-doc-property-wiper before disclosure.
Embedded preview thumbnail predates a redaction
Not coveredSome PDFs and embedded images carry a preview thumbnail that can reflect an earlier, un-redacted state. The metadata scrubber does not search for or strip these. Inspect a file for hidden previews with hidden-thumbnail-extractor before posting sensitive imagery.
File exceeds the tier size cap
RejectedPDF input is file-based, so Security-family limits apply: Free 10 MB, Pro 100 MB, Pro-media 500 MB, Developer 2 GB. A scanned, image-heavy responsive packet can blow past the Free cap quickly. Upgrade the tier or reduce size first with pdf-compress-lossless, which also re-serializes the XMP packet.
Corrupt or truncated PDF fails to load
ErrorIf pdf-lib cannot parse the file — a truncated scan, a non-PDF renamed to .pdf, or a structurally broken document — the load throws and no output is produced. Confirm the bytes really are a valid PDF first with magic-byte-validator before trying to sanitize.
Frequently asked questions
Which metadata fields does the sanitizer clear?
Eight fields in the document-information dictionary. Six text fields are emptied to blank strings: /Title, /Author, /Subject, /Keywords, /Producer, and /Creator. The two timestamps — /CreationDate and /ModDate — are reset to the Unix epoch (1970-01-01T00:00:00Z). That fixed eight-field set is processed on every run; there is nothing to configure.
Is metadata scrubbing the same as redacting exemptions?
No, and conflating them is a serious mistake in records work. This tool only clears the metadata layer in Document Properties. Exempt names, addresses, or other material visible on the page are page content and remain fully readable after a metadata pass. Substantive redaction is a separate step with the PDF PII Redactor (/pdf-tools/pdf-pii-redactor).
Why do the dates show 1970 instead of disappearing?
The tool sets both date stamps to new Date(0), the Unix epoch. It is privacy-neutral on purpose: because every sanitized file shares the same 1970 value, the date no longer reveals when the record was actually drafted or last edited before release. If you specifically need the date keys absent, this tool does not offer that.
Does the responsive file get uploaded anywhere?
No. Processing runs entirely in your browser via pdf-lib. A record being prepared for disclosure never leaves the workstation — which is the whole reason to run it locally rather than through an online converter you would have to trust with pre-release material.
Will sanitizing change the released pages?
No. Only the /Info dictionary entries change. Pages, text, images, embedded fonts, and bookmarks are preserved exactly as pdf-lib leaves them — the document is re-saved, not re-rendered. The released file reads identically to the version you reviewed, and the page count matches the input.
Is the XMP metadata cleared too?
No. PDFs store metadata twice: in the legacy /Info dictionary (which this tool empties) and in an XMP packet (dc:creator, xmp:CreateDate) which this tool does not rewrite. Some viewers prefer XMP, so an author name can survive there. Re-serialize through pdf-compress-lossless (/pdf-tools/pdf-compress-lossless) to drop the stale packet.
Are reviewer comments and exemption notes removed?
No. Comments and sticky notes are page annotations, not metadata, so they survive a metadata scrub — and they frequently quote internal deliberation about the request. Remove them separately with the PDF Annotation Remover (/pdf-tools/pdf-annotation-remover) before the record is posted.
Can I sanitize an encrypted record?
The scrubber loads with encryption ignored, so it can often open a protected file to clear its metadata, but it is not a decryption or password-removal tool. If the PDF needs a password to view its pages, remove protection first with PDF Remove Password (/pdf-tools/pdf-remove-password) on a file the agency owns, then sanitize the unlocked copy.
What about a native Office source released with the PDF?
This tool only handles PDFs. If the request also covers the Word or Excel source, that file carries its own author and revision properties — wipe them with the Office Doc Property Wiper (/security-tools/office-doc-property-wiper) before disclosure.
How do I prove the posted file matches what I cleared?
Fingerprint it. After sanitizing, drop the file on multi-hash-fingerprinter (/security-tools/multi-hash-fingerprinter) and log the SHA-256 in the release record. If the public copy's digest ever differs from the logged value, the file was altered after sign-off.
How big a PDF can I sanitize?
PDF input is file-based, so Security-family tier limits apply: Free handles up to 10 MB and one file; Pro up to 100 MB; Pro-media up to 500 MB; Developer up to 2 GB. Scanned responsive packets hit these caps fastest. If you are over the limit, compress first with pdf-compress-lossless (/pdf-tools/pdf-compress-lossless) or move up a tier.
What is a defensible pre-release checklist?
Five steps. (1) Redact exempt page content with pdf-pii-redactor (/pdf-tools/pdf-pii-redactor). (2) Strip reviewer comments with pdf-annotation-remover (/pdf-tools/pdf-annotation-remover). (3) Sanitize metadata with this tool. (4) Re-serialize via pdf-compress-lossless (/pdf-tools/pdf-compress-lossless) to clear the XMP packet. (5) Fingerprint the final file with multi-hash-fingerprinter (/security-tools/multi-hash-fingerprinter) and log the digest in the release record.
Privacy first
Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.