Lossless PDF Compression for Long-Term Archiving

How to lossless pdf compression for document archiving

Step 1
Decide the order: OCR, then PDF/A, then compress — For archival scans, add the searchable text layer with OCR and the conformance marker with the PDF/A converter first. Lossless compression is the final structural step so it doesn't get undone.
Step 2
Open the lossless compressor — Go to PDF Compress (Lossless) and drop one record. The header shows its size and page count.
Step 3
Let the rebuild run — There are no settings — it auto-runs, copying pages into a new document, clearing Producer/Creator, and re-saving with compressed object streams.
Step 4
Verify fidelity before filing — Open the output and the source at high zoom on a critical element — a signature, a stamp, fine print. They are pixel-identical because nothing was re-encoded.
Step 5
Re-confirm PDF/A status if required — If the record must be PDF/A, validate it after compressing. Because this tool re-saves the file, re-run the PDF/A converter or your validator if strict conformance is mandated — compression is a structure change.
Step 6
Ingest into your archive — Store the optimised PDF in your DMS or cold storage. It's a standard, widely-readable PDF; record the new size in your storage ledger if you track it.

What lossless preserves vs. what archival standards add

Compression optimises structure; PDF/A conformance is a separate concern. Don't conflate the two — this tool does the former, not the latter.

Concern	Handled by lossless compress?	Handled by	Notes
Faithful appearance over time	Yes	This tool	Pages copied, never re-encoded
Selectable / searchable text retained	Yes (if already present)	This tool	Add it first with OCR for scans
Smaller storage footprint	Yes	This tool	Object packing + dead-object pruning
PDF/A conformance marker	No	PDF/A converter	Adds the tag; re-validate after compressing
Fonts fully embedded	Preserved as-is, not added	PDF/A converter / font subsetter	Lossless doesn't embed missing fonts
Thorough metadata removal	Producer/Creator only	Metadata Scrubber	Use the scrubber for full governance

Expected archival savings by record type

Approximate. Edited and merged records hold the most reclaimable structure; image-only scans barely move because lossless never re-encodes pixels.

Record type	Typical lossless reduction	Archival note
Born-digital report (edited/merged)	15-35%	Best candidate — dead objects dominate the slack
Born-digital, single-source export	5-20%	Mostly object packing + metadata
OCR'd scan (image + text layer)	0-10%	Bytes are in the image; lossless can't re-encode it
Image-only scan (no OCR)	0-10%	Add OCR first for retrieval; compress for structure only
Already PDF/A + optimised	0-5%	Near its lossless floor; little to reclaim

Cookbook

Archival workflows and outcomes. Sizes are illustrative; the tool reports exact figures.

A repository of merged case files

Each file was assembled by merging correspondence, leaving duplicate fonts and orphaned objects. Lossless prunes them across the batch without altering any page.

Input:  case-2019-0457.pdf   31 MB, 240 pages
Output: case-2019-0457.pdf   20 MB, 240 pages
Result: ~35% off, content byte-identical, text searchable

OCR then PDF/A then compress (correct order)

An archival scan gets its text layer and conformance marker first; lossless compression is the last structural step so nothing undoes it.

1. /pdf-tools/pdf-ocr        → adds searchable text layer
2. /pdf-tools/pdf-to-pdfa    → adds PDF/A conformance marker
3. /pdf-tools/pdf-compress-lossless → structural shrink (final)
Result: searchable, conformant, smaller — re-validate PDF/A after

An image-only scan that barely shrinks

A pure scan has its bytes in the pixels; lossless can't re-encode them. Add OCR for retrieval, but don't expect a size drop from this tool.

Input:  ledger-1987.pdf   18 MB, 12 pages (image-only)
Lossless output: 17.2 MB  (~4%)
Note: run /pdf-tools/pdf-ocr for searchability; size stays put

Re-validating PDF/A after compression

Because compression re-saves the file, strict archives should re-check conformance afterward rather than assume it carried through.

Before: report.pdf  PDF/A-2b valid
After compress: re-run /pdf-tools/pdf-to-pdfa or your validator
Why: re-saving changes structure; confirm the marker survived

Confirming fidelity for the archive of record

For records of legal or regulatory significance, verify pixel-identity at high zoom before ingesting the compressed copy.

1. Open source + output at 500% on the official seal
2. Pixels match exactly; the stamp is unchanged
3. Text still selects → retrieval indexing preserved
Approved for ingest

Edge cases and what actually happens

Expecting compression to make a file PDF/A-compliant

Not this tool

Lossless compression optimises structure; it does not add the embedded fonts, tagged structure, or conformance marker that PDF/A requires. Use the PDF/A converter for compliance. And because compression re-saves the file, re-validate conformance afterward — don't assume a PDF/A status survives a structural rewrite unchecked.

Image-only scan barely shrinks

By design

Lossless never re-encodes image pixels, so a pure scan stays near its original size. That's the archival guarantee, not a fault — the stored image is preserved exactly. Add searchable text with OCR for retrieval, but expect the size to stay put. Genuine scan size reduction requires the lossy tool, which trades away fidelity.

Record carries a digital signature / seal

Signature invalidated

Compressing rewrites the bytes and breaks the cryptographic hash a signature covers, so a sealed record will no longer validate. For signed records of authenticity, archive the original signed file (compress only unsigned working copies). Check status with Verify Signature.

Encrypted / access-controlled record

Loaded with ignoreEncryption

The engine opens encrypted files (encryption ignored on load), but the compressed output is not re-encrypted. If archive policy requires the file to remain protected, remove the password with Remove Password, compress, then re-apply protection per your retention rules.

Record exceeds your tier's size limit

Blocked

PDF caps by plan: Free 2 MB, Pro 50 MB, Pro+Media 500 MB, Developer 2 GB. Large case files and bound volumes often exceed lower tiers and are blocked with an upgrade prompt. Split with Split by Range or choose a plan sized for your archive.

Record exceeds your tier's page limit

Blocked

Page caps: Free 50, Pro 500, Pro+Media 2,000, Developer 10,000. Long bound records over the cap won't process. Break them into logical volumes with Split by Range before compressing.

Producer/Creator cleared, but you needed full metadata removal

Producer/Creator only

This tool blanks only the Producer and Creator fields; other metadata copied with the pages survives. For archives with strict privacy or chain-of-custody metadata rules, run the Metadata Scrubber to control author, title, keywords, and XMP comprehensively.

Corrupt or partially-downloaded record

Error

If pdf-lib can't parse the file, compression can't run. Damaged archival files should go through Repair PDF first — which is good practice for any record showing read errors before long-term storage.

Already-optimised PDF/A shows almost no reduction

Expected

A record already optimised and conformant is near its lossless floor; you may see only a percent or two from clearing metadata. The output is still valid and faithful — there was simply no structural slack left to reclaim.

Output marginally larger than the source

Possible

On a small, already-tight file the object-stream overhead can net out slightly larger. The pages are preserved exactly, so it remains archival-safe — but if size is your goal and the file is image-heavy, lossless isn't the right lever.

Frequently asked questions

Is lossless compression safe for documents of record?

Yes. It copies every page into a new document without re-encoding images or rasterising text, so the stored copy is a faithful, pixel-identical reproduction of the source — exactly what an archive of record requires. The size reduction comes only from packing the structure and dropping unused objects. Verify by comparing source and output at high zoom before ingesting.

Does this make my PDF PDF/A-compliant?

No. This tool optimises structure; it does not add PDF/A's required embedded fonts, tagged structure, or conformance marker. Use the PDF/A converter for compliance. And since compression re-saves the file, re-validate PDF/A status afterward — a structural rewrite means you should confirm the conformance marker survived rather than assume it.

What order should I run OCR, PDF/A, and compression?

For archival scans: OCR first to add the searchable text layer, then the PDF/A converter for conformance, then lossless compression last as the final structural step. Doing compression last means a later step won't undo it; doing it before re-validation means you confirm conformance on the file you'll actually store.

How much storage will lossless compression save across an archive?

Typically 15-35% on born-digital records that were edited or merged, where orphaned objects dominate the reclaimable slack. Single-source exports save less (5-20%), and image-only scans barely move because lossless can't re-encode pixels. Run it across a representative sample to estimate the footprint reduction for your repository.

Will full-text search still work on the compressed records?

Yes. Lossless preserves the existing text objects, so any selectable/searchable text — including an OCR layer added beforehand — stays intact, keeping your DMS full-text indexing and retrieval working. This is a decisive advantage over lossy compression, which flattens pages to images and destroys the searchable layer.

Why didn't my scanned record get smaller?

Because a scan stores its content as image pixels, and lossless compression never re-encodes images — by design, to keep the archived copy faithful. The bytes simply aren't structural slack the tool can reclaim. Add OCR for retrieval, accept that the size stays put, and reserve image re-encoding (the lossy tool) for non-archival copies where fidelity isn't required.

Does compressing remove sensitive metadata for compliance?

Only the Producer and Creator fields are cleared. Other metadata copied with the pages — author, title, keywords, XMP — survives. For archives with privacy or chain-of-custody requirements, run the Metadata Scrubber, which gives comprehensive control over what metadata is retained or removed.

Is anything uploaded when I compress an archival record?

No. Compression runs entirely in your browser, so confidential or regulated records never leave the archivist's machine — important for legal, medical, and government archives. Only an anonymous usage counter is recorded server-side if you're signed in, never document content.

What about signed or sealed records?

Don't compress them if signature validity must be preserved — re-saving the file breaks the cryptographic hash and invalidates the signature. Archive the original signed file as the record of authenticity, and apply compression only to unsigned working copies. Check a signature's status with Verify Signature.

What are the size and page limits for archival files?

By plan: Free 2 MB / 50 pages, Pro 50 MB / 500 pages, Pro+Media 500 MB / 2,000 pages, Developer 2 GB / 10,000 pages. Large bound volumes and case files frequently exceed lower tiers and are blocked with an upgrade prompt; split them with Split by Range or use a plan sized for your collection.

Will the output open in 10 or 20 years?

It uses compressed object streams, a core PDF feature since version 1.5 (2003) supported universally today, so longevity is good. For maximum archival durability, store it as PDF/A via the PDF/A converter — PDF/A is the ISO standard purpose-built for long-term readability, and lossless compression is compatible with it as long as you re-validate after compressing.

Can I compress an entire archive in batch?

Yes, through the @jadapps/runner. Fetch the schema with GET /api/v1/tools/pdf-compress-lossless, pair the runner once, then POST each record to 127.0.0.1:9789/v1/tools/pdf-compress-lossless/run. Processing is local, so nothing leaves your network — ideal for a migration script that optimises a repository before moving it to cold storage.

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

How to lossless pdf compression for document archiving

Step 1
Decide the order: OCR, then PDF/A, then compress — For archival scans, add the searchable text layer with OCR and the conformance marker with the PDF/A converter first. Lossless compression is the final structural step so it doesn't get undone.
Step 2
Open the lossless compressor — Go to PDF Compress (Lossless) and drop one record. The header shows its size and page count.
Step 3
Let the rebuild run — There are no settings — it auto-runs, copying pages into a new document, clearing Producer/Creator, and re-saving with compressed object streams.
Step 4
Verify fidelity before filing — Open the output and the source at high zoom on a critical element — a signature, a stamp, fine print. They are pixel-identical because nothing was re-encoded.
Step 5
Re-confirm PDF/A status if required — If the record must be PDF/A, validate it after compressing. Because this tool re-saves the file, re-run the PDF/A converter or your validator if strict conformance is mandated — compression is a structure change.
Step 6
Ingest into your archive — Store the optimised PDF in your DMS or cold storage. It's a standard, widely-readable PDF; record the new size in your storage ledger if you track it.

What lossless preserves vs. what archival standards add

Compression optimises structure; PDF/A conformance is a separate concern. Don't conflate the two — this tool does the former, not the latter.

Concern	Handled by lossless compress?	Handled by	Notes
Faithful appearance over time	Yes	This tool	Pages copied, never re-encoded
Selectable / searchable text retained	Yes (if already present)	This tool	Add it first with OCR for scans
Smaller storage footprint	Yes	This tool	Object packing + dead-object pruning
PDF/A conformance marker	No	PDF/A converter	Adds the tag; re-validate after compressing
Fonts fully embedded	Preserved as-is, not added	PDF/A converter / font subsetter	Lossless doesn't embed missing fonts
Thorough metadata removal	Producer/Creator only	Metadata Scrubber	Use the scrubber for full governance

Expected archival savings by record type

Approximate. Edited and merged records hold the most reclaimable structure; image-only scans barely move because lossless never re-encodes pixels.

Record type	Typical lossless reduction	Archival note
Born-digital report (edited/merged)	15-35%	Best candidate — dead objects dominate the slack
Born-digital, single-source export	5-20%	Mostly object packing + metadata
OCR'd scan (image + text layer)	0-10%	Bytes are in the image; lossless can't re-encode it
Image-only scan (no OCR)	0-10%	Add OCR first for retrieval; compress for structure only
Already PDF/A + optimised	0-5%	Near its lossless floor; little to reclaim

Cookbook

Archival workflows and outcomes. Sizes are illustrative; the tool reports exact figures.

A repository of merged case files

Each file was assembled by merging correspondence, leaving duplicate fonts and orphaned objects. Lossless prunes them across the batch without altering any page.

Input:  case-2019-0457.pdf   31 MB, 240 pages
Output: case-2019-0457.pdf   20 MB, 240 pages
Result: ~35% off, content byte-identical, text searchable

OCR then PDF/A then compress (correct order)

An archival scan gets its text layer and conformance marker first; lossless compression is the last structural step so nothing undoes it.

1. /pdf-tools/pdf-ocr        → adds searchable text layer
2. /pdf-tools/pdf-to-pdfa    → adds PDF/A conformance marker
3. /pdf-tools/pdf-compress-lossless → structural shrink (final)
Result: searchable, conformant, smaller — re-validate PDF/A after

An image-only scan that barely shrinks

A pure scan has its bytes in the pixels; lossless can't re-encode them. Add OCR for retrieval, but don't expect a size drop from this tool.

Input:  ledger-1987.pdf   18 MB, 12 pages (image-only)
Lossless output: 17.2 MB  (~4%)
Note: run /pdf-tools/pdf-ocr for searchability; size stays put

Re-validating PDF/A after compression

Because compression re-saves the file, strict archives should re-check conformance afterward rather than assume it carried through.

Before: report.pdf  PDF/A-2b valid
After compress: re-run /pdf-tools/pdf-to-pdfa or your validator
Why: re-saving changes structure; confirm the marker survived

Confirming fidelity for the archive of record

For records of legal or regulatory significance, verify pixel-identity at high zoom before ingesting the compressed copy.

1. Open source + output at 500% on the official seal
2. Pixels match exactly; the stamp is unchanged
3. Text still selects → retrieval indexing preserved
Approved for ingest

Edge cases and what actually happens

Expecting compression to make a file PDF/A-compliant

Not this tool

Image-only scan barely shrinks

By design

Record carries a digital signature / seal

Signature invalidated

Encrypted / access-controlled record

Loaded with ignoreEncryption

Record exceeds your tier's size limit

Blocked

Record exceeds your tier's page limit

Blocked

Page caps: Free 50, Pro 500, Pro+Media 2,000, Developer 10,000. Long bound records over the cap won't process. Break them into logical volumes with Split by Range before compressing.

Producer/Creator cleared, but you needed full metadata removal

Producer/Creator only

Corrupt or partially-downloaded record

Error

Already-optimised PDF/A shows almost no reduction

Expected

Output marginally larger than the source

Possible

Frequently asked questions

Is lossless compression safe for documents of record?

Does this make my PDF PDF/A-compliant?

What order should I run OCR, PDF/A, and compression?

How much storage will lossless compression save across an archive?

Will full-text search still work on the compressed records?

Why didn't my scanned record get smaller?

Does compressing remove sensitive metadata for compliance?

Is anything uploaded when I compress an archival record?

What about signed or sealed records?

What are the size and page limits for archival files?

Will the output open in 10 or 20 years?

Can I compress an entire archive in batch?

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

Lossless PDF Compression for Document Archiving

How to lossless pdf compression for document archiving

What lossless preserves vs. what archival standards add

Expected archival savings by record type

Cookbook

A repository of merged case files

OCR then PDF/A then compress (correct order)

An image-only scan that barely shrinks

Re-validating PDF/A after compression

Confirming fidelity for the archive of record

Edge cases and what actually happens

Expecting compression to make a file PDF/A-compliant

Image-only scan barely shrinks

Record carries a digital signature / seal

Encrypted / access-controlled record

Record exceeds your tier's size limit

Record exceeds your tier's page limit

Producer/Creator cleared, but you needed full metadata removal

Corrupt or partially-downloaded record

Already-optimised PDF/A shows almost no reduction

Output marginally larger than the source

Frequently asked questions

Is lossless compression safe for documents of record?

Does this make my PDF PDF/A-compliant?

What order should I run OCR, PDF/A, and compression?

How much storage will lossless compression save across an archive?

Will full-text search still work on the compressed records?

Why didn't my scanned record get smaller?

Does compressing remove sensitive metadata for compliance?

Is anything uploaded when I compress an archival record?

What about signed or sealed records?

What are the size and page limits for archival files?

Will the output open in 10 or 20 years?

Can I compress an entire archive in batch?

Privacy first

Related guides

Lossless PDF Compression for Document Archiving

How to lossless pdf compression for document archiving

What lossless preserves vs. what archival standards add

Expected archival savings by record type

Cookbook

A repository of merged case files

OCR then PDF/A then compress (correct order)

An image-only scan that barely shrinks

Re-validating PDF/A after compression

Confirming fidelity for the archive of record

Edge cases and what actually happens

Expecting compression to make a file PDF/A-compliant

Image-only scan barely shrinks

Record carries a digital signature / seal

Encrypted / access-controlled record

Record exceeds your tier's size limit

Record exceeds your tier's page limit

Producer/Creator cleared, but you needed full metadata removal

Corrupt or partially-downloaded record

Already-optimised PDF/A shows almost no reduction

Output marginally larger than the source

Frequently asked questions

Is lossless compression safe for documents of record?

Does this make my PDF PDF/A-compliant?

What order should I run OCR, PDF/A, and compression?

How much storage will lossless compression save across an archive?

Will full-text search still work on the compressed records?

Why didn't my scanned record get smaller?

Does compressing remove sensitive metadata for compliance?

Is anything uploaded when I compress an archival record?

What about signed or sealed records?

What are the size and page limits for archival files?

Will the output open in 10 or 20 years?

Can I compress an entire archive in batch?

Privacy first

Related guides