How to subset embedded fonts in a pdf for long-term archiving
- Step 1Confirm the source is already self-contained — Open the PDF's font list in Acrobat (File → Properties → Fonts) or a preflight tool. Every font should show as Embedded or Embedded Subset. This tool optimises embedded fonts; it does not embed fonts that are missing.
- Step 2Open the PDF Font Subsetter — Go to the PDF Font Subsetter. All processing is local — confidential archive documents never leave your browser.
- Step 3Drop in one PDF — One file per run on the free tier (2 MB / 50 pages). Pro raises that to 50 MB / 500 pages and allows 5 files per batch, which suits bulk archival passes better.
- Step 4Press Process — No options to set. The tool collects used codepoints, analyses each embedded font, and re-saves with packed object streams. Output preserves the document's text and structure.
- Step 5Tag for PDF/A if your policy requires it — This tool does not produce a validator-passing PDF/A. If your retention policy mandates PDF/A, run PDF to PDF/A — it adds the XMP
pdfaididentifier, an output intent, and forces the PDF 1.4 header. - Step 6Validate before committing to the archive — Run the output through your archive's preflight/veraPDF check. Confirm fonts still show as embedded and text still extracts before the document enters long-term storage.
Archival self-containment: what each tool contributes
Self-contained archiving is a checklist. This tool covers the font-size part; other tools cover the rest. Pick the combination your retention policy needs.
| Archival requirement | Tool that addresses it | Notes |
|---|---|---|
| Reduce size of already-embedded fonts | PDF Font Subsetter (this tool) | Font analysis + object-stream re-save |
| Add PDF/A identifier + output intent | PDF to PDF/A | Writes XMP pdfaid, output intent, forces 1.4 header |
| Strip privacy metadata before archiving | Metadata Scrubber | Removes author, dates, producer history |
| Flatten interactive form fields | PDF Flatten | Bakes filled values into static page content |
| Lock a specific PDF version | Version Converter | Set the PDF header version explicitly |
What survives the archival re-save
Behaviour of the object-stream round-trip on the elements an archivist cares about.
| Element | Behaviour | Archival impact |
|---|---|---|
| Embedded fonts | Analysed; left intact if unparseable | Self-containment preserved |
| Text + ToUnicode | Preserved | Documents stay searchable long-term |
| Bookmarks / outlines | Carried through the round-trip | Navigation retained |
| XMP / PDF/A markers | Not added by this tool | Use PDF/A converter separately |
| Document metadata | Carried through unchanged | Scrub separately if required |
PDF tier limits for archival batches
Limits enforced before processing. Pro is the practical tier for bulk archival work because of the batch allowance.
| Tier | Max file size | Max pages | Batch files |
|---|---|---|---|
| Free | 2 MB | 50 | 1 |
| Pro | 50 MB | 500 | 5 |
| Pro Media | 500 MB | 500 | 5 |
Cookbook
Archival workflows, with the honest result for each. The tool has no settings, so the variables are the source file and the tools you pair it with.
Shrinking a self-contained legal archive PDF
A 60-page deposition transcript embeds two full serif weights. It is already self-contained; the goal is just to reduce storage.
Input: deposition.pdf 1.8 MB (2 embedded weights, all text) Process: Font Subsetter Output: deposition.pdf 1.4 MB Fonts after: still Embedded Subset (verify in Acrobat) Text: still extracts verbatim
Full archival pipeline: subset → tag → scrub
A document destined for a 30-year retention archive needs size reduction, PDF/A tagging, and metadata removal. Chain three tools in order.
1. /pdf-tools/pdf-font-subsetter → size pass 2. /pdf-tools/pdf-to-pdfa → adds XMP pdfaid + output intent 3. /pdf-tools/pdf-metadata-scrubber → strips author/dates Result: smaller, PDF/A-tagged, privacy-clean archive file
Source that isn't actually self-contained
A PDF references Calibri without embedding it. This tool can't make it archival — the font data isn't present.
Check: File → Properties → Fonts Calibri ...... (NOT embedded) Font Subsetter cannot embed it. Re-generate the PDF from the source application with 'embed all fonts' enabled, THEN run the subsetter to optimise the embedded result.
Validating the output against veraPDF
After the size pass and PDF/A tagging, run a strict validator. Note that the bundled PDF/A converter targets broad acceptance, not strict veraPDF conformance.
After: /pdf-tools/pdf-to-pdfa Run: veraPDF --flavour 1b output.pdf Note: the PDF/A tagger writes the identifier + output intent that most archive systems accept; strict veraPDF may still flag the stub ICC profile. Use a desktop tool for hard conformance.
Archiving a form after flattening
Interactive forms should be flattened before archiving so values are permanent. Flatten first, then run the size pass.
1. /pdf-tools/pdf-flatten → bakes filled values into the page 2. /pdf-tools/pdf-font-subsetter → size pass on the flattened result Why: a flattened form has no live fields to drift; the static text is then optimised by the re-save.
Edge cases and what actually happens
The source PDF isn't self-contained (fonts not embedded)
Cannot embedThis tool optimises fonts that are already embedded; it cannot add a font that was never in the file. A PDF that references unembedded system fonts will not become archival here — regenerate it from the source application with font embedding enabled, then run the size pass.
You expected validator-passing PDF/A from this tool
Not PDF/AThe font subsetter does not write the XMP pdfaid identifier, output intent, or 1.4 header that PDF/A validators require. Use PDF to PDF/A for tagging. Even that converter targets broad archive-system acceptance rather than strict veraPDF conformance — see its own guide.
Output is barely smaller than the input
By designMany archive-grade PDFs are already produced with subsetted fonts and tight structure. If the size barely moves, the document was already efficient — that is a correct result, not a fault.
A proprietary licensed font is embedded
Check licenceEmbedding for archival can be restricted by some commercial font EULAs. This tool does not change licensing and does not convert text to outlines (no tool in this suite outlines text). Confirm your font licence permits embedding before committing the document to a long-term archive.
A CID/Type 0 font program won't parse
PreservedIf fontkit can't parse an unusual embedded program, that font is left intact and the rest of the document is still re-saved. Self-containment is never broken by a parse failure — at worst one font isn't analysed.
Privacy metadata is still in the file
Scrub separatelyThe re-save carries document metadata (author, creation/modification dates, producer) through unchanged. For an archive that must not leak who created the file or when, run Metadata Scrubber as a separate step.
File exceeds the tier limit
rejectedFree rejects PDFs over 2 MB or 50 pages before processing. Bulk archival work is better on Pro (50 MB / 500 pages, 5-file batches) or Pro Media (500 MB).
Encrypted archive document
SupportedThe tool reads with encryption ignored, so most protected files process. If a strongly encrypted file fails to load, remove the password first with Remove Password, run the size pass, then re-apply protection if your retention policy requires it.
Frequently asked questions
Does this tool make my PDF PDF/A-compliant?
No. It is a font-aware size pass — it analyses embedded fonts and re-saves with object streams. It does not add the XMP identifier, output intent, or PDF version header that PDF/A validators check. For tagging, use the dedicated PDF to PDF/A converter, and for strict conformance verify with a desktop tool or veraPDF afterwards.
Will my archived document still look identical in 30 years?
If the fonts were embedded in the source, yes — embedded subsetted fonts render the document from its own data regardless of what's installed on a future reader. This tool preserves that self-containment. What it cannot do is create self-containment that wasn't already there; a PDF with unembedded fonts stays non-self-contained.
Does it reduce my archive's storage footprint?
Often, yes — the object-stream re-save cuts structural overhead, which adds up across thousands of documents. The amount depends on how the source was produced; files already optimised will see little change. The result panel reports exact before/after sizes per file.
Can it embed fonts that my archive PDFs are missing?
No. It only works with fonts already embedded in the file. To fix a missing-font situation, regenerate the PDF from its source application with 'embed all fonts' enabled, then run this tool to optimise the embedded result.
Is processing private enough for sensitive records?
Yes. Everything runs in your browser — pdf-lib, pdfjs, and fontkit load into the tab and the file is never uploaded. That makes it suitable for HR files, legal holds, and medical archives where the document must not touch a third-party server.
Should I flatten forms before archiving?
Yes, if the document is an interactive form. Use PDF Flatten first so filled values become permanent static text, then run this size pass. A flattened form has no live fields that could change in a future reader.
Will it strip the metadata I don't want in my archive?
No — metadata (author, dates, producer) is carried through unchanged. If your archival policy requires removing identifying metadata, run the Metadata Scrubber as a separate step before or after this tool.
What about proprietary fonts with restrictive licences?
This tool doesn't change licensing and doesn't outline text. If a font's EULA restricts embedding for archival, that's a legal question to resolve with the font vendor — none of the tools in this suite convert text to vector outlines as an escape hatch.
Can I batch-process a whole archive folder?
On the free tier it's one file at a time. Pro allows 5 PDFs per batch. For a large archive, Pro or Pro Media is the practical choice; otherwise process sequentially.
Does the round-trip keep bookmarks and the text layer?
Yes. Outlines/bookmarks, page content, and ToUnicode maps are preserved, so archived documents remain navigable and searchable after the re-save.
Is the largest archive file I can process really 500 MB?
On Pro Media, yes — the PDF family cap is 500 MB. Pro is 50 MB and free is 2 MB. The page cap is 500 on Pro and Pro Media, 50 on free. Files over the limit are rejected before processing.
Why would I run this if my files are already PDF/A?
Purely for storage efficiency. An existing PDF/A with full or loosely-packed font structure can still shrink on the object-stream re-save while keeping its embedded fonts. Re-validate afterwards to confirm the PDF/A tagging survived, since strict conformance is checked separately.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.