Subset Embedded Fonts for PDF Archiving — Free Online

How to subset embedded fonts in a pdf for long-term archiving

Step 1
Confirm the source is already self-contained — Open the PDF's font list in Acrobat (File → Properties → Fonts) or a preflight tool. Every font should show as Embedded or Embedded Subset. This tool optimises embedded fonts; it does not embed fonts that are missing.
Step 2
Open the PDF Font Subsetter — Go to the PDF Font Subsetter. All processing is local — confidential archive documents never leave your browser.
Step 3
Drop in one PDF — One file per run on the free tier (2 MB / 50 pages). Pro raises that to 50 MB / 500 pages and allows 5 files per batch, which suits bulk archival passes better.
Step 4
Press Process — No options to set. The tool collects used codepoints, analyses each embedded font, and re-saves with packed object streams. Output preserves the document's text and structure.
Step 5
Tag for PDF/A if your policy requires it — This tool does not produce a validator-passing PDF/A. If your retention policy mandates PDF/A, run PDF to PDF/A — it adds the XMP pdfaid identifier, an output intent, and forces the PDF 1.4 header.
Step 6
Validate before committing to the archive — Run the output through your archive's preflight/veraPDF check. Confirm fonts still show as embedded and text still extracts before the document enters long-term storage.

Archival self-containment: what each tool contributes

Self-contained archiving is a checklist. This tool covers the font-size part; other tools cover the rest. Pick the combination your retention policy needs.

Archival requirement	Tool that addresses it	Notes
Reduce size of already-embedded fonts	PDF Font Subsetter (this tool)	Font analysis + object-stream re-save
Add PDF/A identifier + output intent	PDF to PDF/A	Writes XMP `pdfaid`, output intent, forces 1.4 header
Strip privacy metadata before archiving	Metadata Scrubber	Removes author, dates, producer history
Flatten interactive form fields	PDF Flatten	Bakes filled values into static page content
Lock a specific PDF version	Version Converter	Set the PDF header version explicitly

What survives the archival re-save

Behaviour of the object-stream round-trip on the elements an archivist cares about.

Element	Behaviour	Archival impact
Embedded fonts	Analysed; left intact if unparseable	Self-containment preserved
Text + ToUnicode	Preserved	Documents stay searchable long-term
Bookmarks / outlines	Carried through the round-trip	Navigation retained
XMP / PDF/A markers	Not added by this tool	Use PDF/A converter separately
Document metadata	Carried through unchanged	Scrub separately if required

PDF tier limits for archival batches

Limits enforced before processing. Pro is the practical tier for bulk archival work because of the batch allowance.

Tier	Max file size	Max pages	Batch files
Free	2 MB	50	1
Pro	50 MB	500	5
Pro Media	500 MB	500	5

Cookbook

Archival workflows, with the honest result for each. The tool has no settings, so the variables are the source file and the tools you pair it with.

Shrinking a self-contained legal archive PDF

A 60-page deposition transcript embeds two full serif weights. It is already self-contained; the goal is just to reduce storage.

Input:   deposition.pdf   1.8 MB  (2 embedded weights, all text)
Process: Font Subsetter
Output:  deposition.pdf   1.4 MB

Fonts after: still Embedded Subset (verify in Acrobat)
Text:        still extracts verbatim

Full archival pipeline: subset → tag → scrub

A document destined for a 30-year retention archive needs size reduction, PDF/A tagging, and metadata removal. Chain three tools in order.

1. /pdf-tools/pdf-font-subsetter   → size pass
2. /pdf-tools/pdf-to-pdfa           → adds XMP pdfaid + output intent
3. /pdf-tools/pdf-metadata-scrubber → strips author/dates

Result: smaller, PDF/A-tagged, privacy-clean archive file

Source that isn't actually self-contained

A PDF references Calibri without embedding it. This tool can't make it archival — the font data isn't present.

Check: File → Properties → Fonts
  Calibri ...... (NOT embedded)

Font Subsetter cannot embed it. Re-generate the PDF from the
source application with 'embed all fonts' enabled, THEN run
the subsetter to optimise the embedded result.

Validating the output against veraPDF

After the size pass and PDF/A tagging, run a strict validator. Note that the bundled PDF/A converter targets broad acceptance, not strict veraPDF conformance.

After: /pdf-tools/pdf-to-pdfa
Run:   veraPDF --flavour 1b output.pdf

Note: the PDF/A tagger writes the identifier + output intent that
most archive systems accept; strict veraPDF may still flag the
stub ICC profile. Use a desktop tool for hard conformance.

Archiving a form after flattening

Interactive forms should be flattened before archiving so values are permanent. Flatten first, then run the size pass.

1. /pdf-tools/pdf-flatten         → bakes filled values into the page
2. /pdf-tools/pdf-font-subsetter  → size pass on the flattened result

Why: a flattened form has no live fields to drift; the static
text is then optimised by the re-save.

Edge cases and what actually happens

The source PDF isn't self-contained (fonts not embedded)

Cannot embed

This tool optimises fonts that are already embedded; it cannot add a font that was never in the file. A PDF that references unembedded system fonts will not become archival here — regenerate it from the source application with font embedding enabled, then run the size pass.

You expected validator-passing PDF/A from this tool

Not PDF/A

The font subsetter does not write the XMP pdfaid identifier, output intent, or 1.4 header that PDF/A validators require. Use PDF to PDF/A for tagging. Even that converter targets broad archive-system acceptance rather than strict veraPDF conformance — see its own guide.

Output is barely smaller than the input

By design

Many archive-grade PDFs are already produced with subsetted fonts and tight structure. If the size barely moves, the document was already efficient — that is a correct result, not a fault.

A proprietary licensed font is embedded

Check licence

Embedding for archival can be restricted by some commercial font EULAs. This tool does not change licensing and does not convert text to outlines (no tool in this suite outlines text). Confirm your font licence permits embedding before committing the document to a long-term archive.

A CID/Type 0 font program won't parse

Preserved

If fontkit can't parse an unusual embedded program, that font is left intact and the rest of the document is still re-saved. Self-containment is never broken by a parse failure — at worst one font isn't analysed.

Privacy metadata is still in the file

Scrub separately

The re-save carries document metadata (author, creation/modification dates, producer) through unchanged. For an archive that must not leak who created the file or when, run Metadata Scrubber as a separate step.

File exceeds the tier limit

rejected

Free rejects PDFs over 2 MB or 50 pages before processing. Bulk archival work is better on Pro (50 MB / 500 pages, 5-file batches) or Pro Media (500 MB).

Encrypted archive document

Supported

The tool reads with encryption ignored, so most protected files process. If a strongly encrypted file fails to load, remove the password first with Remove Password, run the size pass, then re-apply protection if your retention policy requires it.

Frequently asked questions

Does this tool make my PDF PDF/A-compliant?

No. It is a font-aware size pass — it analyses embedded fonts and re-saves with object streams. It does not add the XMP identifier, output intent, or PDF version header that PDF/A validators check. For tagging, use the dedicated PDF to PDF/A converter, and for strict conformance verify with a desktop tool or veraPDF afterwards.

Will my archived document still look identical in 30 years?

If the fonts were embedded in the source, yes — embedded subsetted fonts render the document from its own data regardless of what's installed on a future reader. This tool preserves that self-containment. What it cannot do is create self-containment that wasn't already there; a PDF with unembedded fonts stays non-self-contained.

Does it reduce my archive's storage footprint?

Often, yes — the object-stream re-save cuts structural overhead, which adds up across thousands of documents. The amount depends on how the source was produced; files already optimised will see little change. The result panel reports exact before/after sizes per file.

Can it embed fonts that my archive PDFs are missing?

No. It only works with fonts already embedded in the file. To fix a missing-font situation, regenerate the PDF from its source application with 'embed all fonts' enabled, then run this tool to optimise the embedded result.

Is processing private enough for sensitive records?

Yes. Everything runs in your browser — pdf-lib, pdfjs, and fontkit load into the tab and the file is never uploaded. That makes it suitable for HR files, legal holds, and medical archives where the document must not touch a third-party server.

Should I flatten forms before archiving?

Yes, if the document is an interactive form. Use PDF Flatten first so filled values become permanent static text, then run this size pass. A flattened form has no live fields that could change in a future reader.

Will it strip the metadata I don't want in my archive?

No — metadata (author, dates, producer) is carried through unchanged. If your archival policy requires removing identifying metadata, run the Metadata Scrubber as a separate step before or after this tool.

What about proprietary fonts with restrictive licences?

This tool doesn't change licensing and doesn't outline text. If a font's EULA restricts embedding for archival, that's a legal question to resolve with the font vendor — none of the tools in this suite convert text to vector outlines as an escape hatch.

Can I batch-process a whole archive folder?

On the free tier it's one file at a time. Pro allows 5 PDFs per batch. For a large archive, Pro or Pro Media is the practical choice; otherwise process sequentially.

Does the round-trip keep bookmarks and the text layer?

Yes. Outlines/bookmarks, page content, and ToUnicode maps are preserved, so archived documents remain navigable and searchable after the re-save.

Is the largest archive file I can process really 500 MB?

On Pro Media, yes — the PDF family cap is 500 MB. Pro is 50 MB and free is 2 MB. The page cap is 500 on Pro and Pro Media, 50 on free. Files over the limit are rejected before processing.

Why would I run this if my files are already PDF/A?

Purely for storage efficiency. An existing PDF/A with full or loosely-packed font structure can still shrink on the object-stream re-save while keeping its embedded fonts. Re-validate afterwards to confirm the PDF/A tagging survived, since strict conformance is checked separately.

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

How to subset embedded fonts in a pdf for long-term archiving

Step 1
Confirm the source is already self-contained — Open the PDF's font list in Acrobat (File → Properties → Fonts) or a preflight tool. Every font should show as Embedded or Embedded Subset. This tool optimises embedded fonts; it does not embed fonts that are missing.
Step 2
Open the PDF Font Subsetter — Go to the PDF Font Subsetter. All processing is local — confidential archive documents never leave your browser.
Step 3
Drop in one PDF — One file per run on the free tier (2 MB / 50 pages). Pro raises that to 50 MB / 500 pages and allows 5 files per batch, which suits bulk archival passes better.
Step 4
Press Process — No options to set. The tool collects used codepoints, analyses each embedded font, and re-saves with packed object streams. Output preserves the document's text and structure.
Step 5
Tag for PDF/A if your policy requires it — This tool does not produce a validator-passing PDF/A. If your retention policy mandates PDF/A, run PDF to PDF/A — it adds the XMP pdfaid identifier, an output intent, and forces the PDF 1.4 header.
Step 6
Validate before committing to the archive — Run the output through your archive's preflight/veraPDF check. Confirm fonts still show as embedded and text still extracts before the document enters long-term storage.

Archival self-containment: what each tool contributes

Self-contained archiving is a checklist. This tool covers the font-size part; other tools cover the rest. Pick the combination your retention policy needs.

Archival requirement	Tool that addresses it	Notes
Reduce size of already-embedded fonts	PDF Font Subsetter (this tool)	Font analysis + object-stream re-save
Add PDF/A identifier + output intent	PDF to PDF/A	Writes XMP `pdfaid`, output intent, forces 1.4 header
Strip privacy metadata before archiving	Metadata Scrubber	Removes author, dates, producer history
Flatten interactive form fields	PDF Flatten	Bakes filled values into static page content
Lock a specific PDF version	Version Converter	Set the PDF header version explicitly

What survives the archival re-save

Behaviour of the object-stream round-trip on the elements an archivist cares about.

Element	Behaviour	Archival impact
Embedded fonts	Analysed; left intact if unparseable	Self-containment preserved
Text + ToUnicode	Preserved	Documents stay searchable long-term
Bookmarks / outlines	Carried through the round-trip	Navigation retained
XMP / PDF/A markers	Not added by this tool	Use PDF/A converter separately
Document metadata	Carried through unchanged	Scrub separately if required

PDF tier limits for archival batches

Limits enforced before processing. Pro is the practical tier for bulk archival work because of the batch allowance.

Tier	Max file size	Max pages	Batch files
Free	2 MB	50	1
Pro	50 MB	500	5
Pro Media	500 MB	500	5

Cookbook

Archival workflows, with the honest result for each. The tool has no settings, so the variables are the source file and the tools you pair it with.

Shrinking a self-contained legal archive PDF

A 60-page deposition transcript embeds two full serif weights. It is already self-contained; the goal is just to reduce storage.

Input:   deposition.pdf   1.8 MB  (2 embedded weights, all text)
Process: Font Subsetter
Output:  deposition.pdf   1.4 MB

Fonts after: still Embedded Subset (verify in Acrobat)
Text:        still extracts verbatim

Full archival pipeline: subset → tag → scrub

A document destined for a 30-year retention archive needs size reduction, PDF/A tagging, and metadata removal. Chain three tools in order.

1. /pdf-tools/pdf-font-subsetter   → size pass
2. /pdf-tools/pdf-to-pdfa           → adds XMP pdfaid + output intent
3. /pdf-tools/pdf-metadata-scrubber → strips author/dates

Result: smaller, PDF/A-tagged, privacy-clean archive file

Source that isn't actually self-contained

A PDF references Calibri without embedding it. This tool can't make it archival — the font data isn't present.

Check: File → Properties → Fonts
  Calibri ...... (NOT embedded)

Font Subsetter cannot embed it. Re-generate the PDF from the
source application with 'embed all fonts' enabled, THEN run
the subsetter to optimise the embedded result.

Validating the output against veraPDF

After the size pass and PDF/A tagging, run a strict validator. Note that the bundled PDF/A converter targets broad acceptance, not strict veraPDF conformance.

After: /pdf-tools/pdf-to-pdfa
Run:   veraPDF --flavour 1b output.pdf

Note: the PDF/A tagger writes the identifier + output intent that
most archive systems accept; strict veraPDF may still flag the
stub ICC profile. Use a desktop tool for hard conformance.

Archiving a form after flattening

Interactive forms should be flattened before archiving so values are permanent. Flatten first, then run the size pass.

1. /pdf-tools/pdf-flatten         → bakes filled values into the page
2. /pdf-tools/pdf-font-subsetter  → size pass on the flattened result

Why: a flattened form has no live fields to drift; the static
text is then optimised by the re-save.

Edge cases and what actually happens

The source PDF isn't self-contained (fonts not embedded)

Cannot embed

You expected validator-passing PDF/A from this tool

Not PDF/A

Output is barely smaller than the input

By design

Many archive-grade PDFs are already produced with subsetted fonts and tight structure. If the size barely moves, the document was already efficient — that is a correct result, not a fault.

A proprietary licensed font is embedded

Check licence

A CID/Type 0 font program won't parse

Preserved

Privacy metadata is still in the file

Scrub separately

File exceeds the tier limit

rejected

Free rejects PDFs over 2 MB or 50 pages before processing. Bulk archival work is better on Pro (50 MB / 500 pages, 5-file batches) or Pro Media (500 MB).

Encrypted archive document

Supported

Frequently asked questions

Does this tool make my PDF PDF/A-compliant?

Will my archived document still look identical in 30 years?

Does it reduce my archive's storage footprint?

Can it embed fonts that my archive PDFs are missing?

Is processing private enough for sensitive records?

Should I flatten forms before archiving?

Will it strip the metadata I don't want in my archive?

What about proprietary fonts with restrictive licences?

Can I batch-process a whole archive folder?

On the free tier it's one file at a time. Pro allows 5 PDFs per batch. For a large archive, Pro or Pro Media is the practical choice; otherwise process sequentially.

Does the round-trip keep bookmarks and the text layer?

Yes. Outlines/bookmarks, page content, and ToUnicode maps are preserved, so archived documents remain navigable and searchable after the re-save.

Is the largest archive file I can process really 500 MB?

On Pro Media, yes — the PDF family cap is 500 MB. Pro is 50 MB and free is 2 MB. The page cap is 500 on Pro and Pro Media, 50 on free. Files over the limit are rejected before processing.

Why would I run this if my files are already PDF/A?

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

Subset Embedded Fonts in a PDF for Long-Term Archiving

How to subset embedded fonts in a pdf for long-term archiving

Archival self-containment: what each tool contributes

What survives the archival re-save

PDF tier limits for archival batches

Cookbook

Shrinking a self-contained legal archive PDF

Full archival pipeline: subset → tag → scrub

Source that isn't actually self-contained

Validating the output against veraPDF

Archiving a form after flattening

Edge cases and what actually happens

The source PDF isn't self-contained (fonts not embedded)

You expected validator-passing PDF/A from this tool

Output is barely smaller than the input

A proprietary licensed font is embedded

A CID/Type 0 font program won't parse

Privacy metadata is still in the file

File exceeds the tier limit

Encrypted archive document

Frequently asked questions

Does this tool make my PDF PDF/A-compliant?

Will my archived document still look identical in 30 years?

Does it reduce my archive's storage footprint?

Can it embed fonts that my archive PDFs are missing?

Is processing private enough for sensitive records?

Should I flatten forms before archiving?

Will it strip the metadata I don't want in my archive?

What about proprietary fonts with restrictive licences?

Can I batch-process a whole archive folder?

Does the round-trip keep bookmarks and the text layer?

Is the largest archive file I can process really 500 MB?

Why would I run this if my files are already PDF/A?

Privacy first

Related guides

Subset Embedded Fonts in a PDF for Long-Term Archiving

How to subset embedded fonts in a pdf for long-term archiving

Archival self-containment: what each tool contributes

What survives the archival re-save

PDF tier limits for archival batches

Cookbook

Shrinking a self-contained legal archive PDF

Full archival pipeline: subset → tag → scrub

Source that isn't actually self-contained

Validating the output against veraPDF

Archiving a form after flattening

Edge cases and what actually happens

The source PDF isn't self-contained (fonts not embedded)

You expected validator-passing PDF/A from this tool

Output is barely smaller than the input

A proprietary licensed font is embedded

A CID/Type 0 font program won't parse

Privacy metadata is still in the file

File exceeds the tier limit

Encrypted archive document

Frequently asked questions

Does this tool make my PDF PDF/A-compliant?

Will my archived document still look identical in 30 years?

Does it reduce my archive's storage footprint?

Can it embed fonts that my archive PDFs are missing?

Is processing private enough for sensitive records?

Should I flatten forms before archiving?

Will it strip the metadata I don't want in my archive?

What about proprietary fonts with restrictive licences?

Can I batch-process a whole archive folder?

Does the round-trip keep bookmarks and the text layer?

Is the largest archive file I can process really 500 MB?

Why would I run this if my files are already PDF/A?

Privacy first

Related guides