How to lossless pdf compression for document archiving
- Step 1Decide the order: OCR, then PDF/A, then compress — For archival scans, add the searchable text layer with OCR and the conformance marker with the PDF/A converter first. Lossless compression is the final structural step so it doesn't get undone.
- Step 2Open the lossless compressor — Go to PDF Compress (Lossless) and drop one record. The header shows its size and page count.
- Step 3Let the rebuild run — There are no settings — it auto-runs, copying pages into a new document, clearing Producer/Creator, and re-saving with compressed object streams.
- Step 4Verify fidelity before filing — Open the output and the source at high zoom on a critical element — a signature, a stamp, fine print. They are pixel-identical because nothing was re-encoded.
- Step 5Re-confirm PDF/A status if required — If the record must be PDF/A, validate it after compressing. Because this tool re-saves the file, re-run the PDF/A converter or your validator if strict conformance is mandated — compression is a structure change.
- Step 6Ingest into your archive — Store the optimised PDF in your DMS or cold storage. It's a standard, widely-readable PDF; record the new size in your storage ledger if you track it.
What lossless preserves vs. what archival standards add
Compression optimises structure; PDF/A conformance is a separate concern. Don't conflate the two — this tool does the former, not the latter.
| Concern | Handled by lossless compress? | Handled by | Notes |
|---|---|---|---|
| Faithful appearance over time | Yes | This tool | Pages copied, never re-encoded |
| Selectable / searchable text retained | Yes (if already present) | This tool | Add it first with OCR for scans |
| Smaller storage footprint | Yes | This tool | Object packing + dead-object pruning |
| PDF/A conformance marker | No | PDF/A converter | Adds the tag; re-validate after compressing |
| Fonts fully embedded | Preserved as-is, not added | PDF/A converter / font subsetter | Lossless doesn't embed missing fonts |
| Thorough metadata removal | Producer/Creator only | Metadata Scrubber | Use the scrubber for full governance |
Expected archival savings by record type
Approximate. Edited and merged records hold the most reclaimable structure; image-only scans barely move because lossless never re-encodes pixels.
| Record type | Typical lossless reduction | Archival note |
|---|---|---|
| Born-digital report (edited/merged) | 15-35% | Best candidate — dead objects dominate the slack |
| Born-digital, single-source export | 5-20% | Mostly object packing + metadata |
| OCR'd scan (image + text layer) | 0-10% | Bytes are in the image; lossless can't re-encode it |
| Image-only scan (no OCR) | 0-10% | Add OCR first for retrieval; compress for structure only |
| Already PDF/A + optimised | 0-5% | Near its lossless floor; little to reclaim |
Cookbook
Archival workflows and outcomes. Sizes are illustrative; the tool reports exact figures.
A repository of merged case files
Each file was assembled by merging correspondence, leaving duplicate fonts and orphaned objects. Lossless prunes them across the batch without altering any page.
Input: case-2019-0457.pdf 31 MB, 240 pages Output: case-2019-0457.pdf 20 MB, 240 pages Result: ~35% off, content byte-identical, text searchable
OCR then PDF/A then compress (correct order)
An archival scan gets its text layer and conformance marker first; lossless compression is the last structural step so nothing undoes it.
1. /pdf-tools/pdf-ocr → adds searchable text layer 2. /pdf-tools/pdf-to-pdfa → adds PDF/A conformance marker 3. /pdf-tools/pdf-compress-lossless → structural shrink (final) Result: searchable, conformant, smaller — re-validate PDF/A after
An image-only scan that barely shrinks
A pure scan has its bytes in the pixels; lossless can't re-encode them. Add OCR for retrieval, but don't expect a size drop from this tool.
Input: ledger-1987.pdf 18 MB, 12 pages (image-only) Lossless output: 17.2 MB (~4%) Note: run /pdf-tools/pdf-ocr for searchability; size stays put
Re-validating PDF/A after compression
Because compression re-saves the file, strict archives should re-check conformance afterward rather than assume it carried through.
Before: report.pdf PDF/A-2b valid After compress: re-run /pdf-tools/pdf-to-pdfa or your validator Why: re-saving changes structure; confirm the marker survived
Confirming fidelity for the archive of record
For records of legal or regulatory significance, verify pixel-identity at high zoom before ingesting the compressed copy.
1. Open source + output at 500% on the official seal 2. Pixels match exactly; the stamp is unchanged 3. Text still selects → retrieval indexing preserved Approved for ingest
Edge cases and what actually happens
Expecting compression to make a file PDF/A-compliant
Not this toolLossless compression optimises structure; it does not add the embedded fonts, tagged structure, or conformance marker that PDF/A requires. Use the PDF/A converter for compliance. And because compression re-saves the file, re-validate conformance afterward — don't assume a PDF/A status survives a structural rewrite unchecked.
Image-only scan barely shrinks
By designLossless never re-encodes image pixels, so a pure scan stays near its original size. That's the archival guarantee, not a fault — the stored image is preserved exactly. Add searchable text with OCR for retrieval, but expect the size to stay put. Genuine scan size reduction requires the lossy tool, which trades away fidelity.
Record carries a digital signature / seal
Signature invalidatedCompressing rewrites the bytes and breaks the cryptographic hash a signature covers, so a sealed record will no longer validate. For signed records of authenticity, archive the original signed file (compress only unsigned working copies). Check status with Verify Signature.
Encrypted / access-controlled record
Loaded with ignoreEncryptionThe engine opens encrypted files (encryption ignored on load), but the compressed output is not re-encrypted. If archive policy requires the file to remain protected, remove the password with Remove Password, compress, then re-apply protection per your retention rules.
Record exceeds your tier's size limit
BlockedPDF caps by plan: Free 2 MB, Pro 50 MB, Pro+Media 500 MB, Developer 2 GB. Large case files and bound volumes often exceed lower tiers and are blocked with an upgrade prompt. Split with Split by Range or choose a plan sized for your archive.
Record exceeds your tier's page limit
BlockedPage caps: Free 50, Pro 500, Pro+Media 2,000, Developer 10,000. Long bound records over the cap won't process. Break them into logical volumes with Split by Range before compressing.
Producer/Creator cleared, but you needed full metadata removal
Producer/Creator onlyThis tool blanks only the Producer and Creator fields; other metadata copied with the pages survives. For archives with strict privacy or chain-of-custody metadata rules, run the Metadata Scrubber to control author, title, keywords, and XMP comprehensively.
Corrupt or partially-downloaded record
ErrorIf pdf-lib can't parse the file, compression can't run. Damaged archival files should go through Repair PDF first — which is good practice for any record showing read errors before long-term storage.
Already-optimised PDF/A shows almost no reduction
ExpectedA record already optimised and conformant is near its lossless floor; you may see only a percent or two from clearing metadata. The output is still valid and faithful — there was simply no structural slack left to reclaim.
Output marginally larger than the source
PossibleOn a small, already-tight file the object-stream overhead can net out slightly larger. The pages are preserved exactly, so it remains archival-safe — but if size is your goal and the file is image-heavy, lossless isn't the right lever.
Frequently asked questions
Is lossless compression safe for documents of record?
Yes. It copies every page into a new document without re-encoding images or rasterising text, so the stored copy is a faithful, pixel-identical reproduction of the source — exactly what an archive of record requires. The size reduction comes only from packing the structure and dropping unused objects. Verify by comparing source and output at high zoom before ingesting.
Does this make my PDF PDF/A-compliant?
No. This tool optimises structure; it does not add PDF/A's required embedded fonts, tagged structure, or conformance marker. Use the PDF/A converter for compliance. And since compression re-saves the file, re-validate PDF/A status afterward — a structural rewrite means you should confirm the conformance marker survived rather than assume it.
What order should I run OCR, PDF/A, and compression?
For archival scans: OCR first to add the searchable text layer, then the PDF/A converter for conformance, then lossless compression last as the final structural step. Doing compression last means a later step won't undo it; doing it before re-validation means you confirm conformance on the file you'll actually store.
How much storage will lossless compression save across an archive?
Typically 15-35% on born-digital records that were edited or merged, where orphaned objects dominate the reclaimable slack. Single-source exports save less (5-20%), and image-only scans barely move because lossless can't re-encode pixels. Run it across a representative sample to estimate the footprint reduction for your repository.
Will full-text search still work on the compressed records?
Yes. Lossless preserves the existing text objects, so any selectable/searchable text — including an OCR layer added beforehand — stays intact, keeping your DMS full-text indexing and retrieval working. This is a decisive advantage over lossy compression, which flattens pages to images and destroys the searchable layer.
Why didn't my scanned record get smaller?
Because a scan stores its content as image pixels, and lossless compression never re-encodes images — by design, to keep the archived copy faithful. The bytes simply aren't structural slack the tool can reclaim. Add OCR for retrieval, accept that the size stays put, and reserve image re-encoding (the lossy tool) for non-archival copies where fidelity isn't required.
Does compressing remove sensitive metadata for compliance?
Only the Producer and Creator fields are cleared. Other metadata copied with the pages — author, title, keywords, XMP — survives. For archives with privacy or chain-of-custody requirements, run the Metadata Scrubber, which gives comprehensive control over what metadata is retained or removed.
Is anything uploaded when I compress an archival record?
No. Compression runs entirely in your browser, so confidential or regulated records never leave the archivist's machine — important for legal, medical, and government archives. Only an anonymous usage counter is recorded server-side if you're signed in, never document content.
What about signed or sealed records?
Don't compress them if signature validity must be preserved — re-saving the file breaks the cryptographic hash and invalidates the signature. Archive the original signed file as the record of authenticity, and apply compression only to unsigned working copies. Check a signature's status with Verify Signature.
What are the size and page limits for archival files?
By plan: Free 2 MB / 50 pages, Pro 50 MB / 500 pages, Pro+Media 500 MB / 2,000 pages, Developer 2 GB / 10,000 pages. Large bound volumes and case files frequently exceed lower tiers and are blocked with an upgrade prompt; split them with Split by Range or use a plan sized for your collection.
Will the output open in 10 or 20 years?
It uses compressed object streams, a core PDF feature since version 1.5 (2003) supported universally today, so longevity is good. For maximum archival durability, store it as PDF/A via the PDF/A converter — PDF/A is the ISO standard purpose-built for long-term readability, and lossless compression is compatible with it as long as you re-validate after compressing.
Can I compress an entire archive in batch?
Yes, through the @jadapps/runner. Fetch the schema with GET /api/v1/tools/pdf-compress-lossless, pair the runner once, then POST each record to 127.0.0.1:9789/v1/tools/pdf-compress-lossless/run. Processing is local, so nothing leaves your network — ideal for a migration script that optimises a repository before moving it to cold storage.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.