How to linearize a pdf for optimal cdn delivery
- Step 1Compress first to cut egress and cache footprint — Linearization won't reduce size, and CDN egress is billed by bytes. Run lossless compression (text) or aggressive compression (scans) before this step.
- Step 2Open the linearizer and drop your PDF — Load the (compressed) file into PDF Linearize (Fast Web View). One file per run.
- Step 3Set the password only if the asset is encrypted — Most CDN-hosted PDFs are public and unencrypted — leave it blank. For a protected asset, type the password so qpdf can round-trip it.
- Step 4Run and download the linearized PDF — qpdf-wasm writes the hint dictionary at the front of the file, locally. Leave Force off unless you're forcing a rebuild on an already-linearized asset.
- Step 5Upload to your CDN origin — Push the linearized PDF to your origin — an S3 bucket, Cloudflare R2, or your origin server — and set a long
Cache-Control(e.g.public, max-age=86400or longer) so the edge caches it. - Step 6Verify byte-range delivery from the edge — Run
curl -I -H 'Range: bytes=0-1023'against the CDN URL and confirm a206 Partial Contentresponse withAccept-Ranges: bytes— that proves the edge is serving ranges for the cached object.
Byte-range and cache behaviour by CDN
All major CDNs serve byte ranges for cached objects. The numbers/behaviour to confirm are byte-range support and your cache TTL.
| CDN / origin | Byte ranges for cached objects | Cache header to set |
|---|---|---|
| Cloudflare (Pages / proxy) | Yes, by default | Cache-Control: public, max-age=86400 |
| Cloudflare R2 | Yes | Set on object metadata or via Worker |
| S3 + CloudFront | Yes (S3 Range, CloudFront passes through) | Cache-Control on the S3 object |
| Fastly | Yes | Surrogate-Control / Cache-Control |
| Generic origin behind a CDN | Depends on origin | Ensure origin sends Accept-Ranges: bytes |
CDN workflow: order of operations
Linearization is structure-only and must be re-applied after any edit, so its place in the pipeline matters.
| Step | Tool / action | Why this order |
|---|---|---|
| 1. Compress | Lossless / aggressive | Smaller object = less egress, less cache; linearize won't shrink it |
| 2. Linearize | PDF Linearize | Must be last content-touching step so the byte order survives |
| 3. Upload + cache | CDN origin + Cache-Control | Edge caches the linearized object and serves ranges |
| 4. Re-do on update | Re-compress + re-linearize | Linearization is invalidated by any content change |
Input limits by tier (PDF family)
Caps gate the file you load; linearization runs in-browser.
| Tier | Max file size | Max pages |
|---|---|---|
| Free | 2 MB | 50 |
| Pro | 50 MB | 500 |
| Pro Media | 500 MB | 2,000 |
| Developer | 2 GB | 10,000 |
| Enterprise | unlimited | unlimited |
Cookbook
CDN-focused runs: linearize, push to edge, verify ranges, and handle updates.
Compress, linearize, push to S3 + CloudFront
The full pipeline for a public datasheet served globally.
1) pdf-compress-lossless datasheet.pdf → 6 MB → 4 MB
2) pdf-linearize datasheet.pdf → Fast Web View: Yes
3) aws s3 cp datasheet.pdf s3://bucket/ \
--cache-control 'public, max-age=86400'
4) CloudFront serves it; edge answers byte rangesVerify the edge is serving ranges
Confirm the cached object answers a partial request — the proof that linearization pays off on the CDN.
curl -I -H 'Range: bytes=0-1023' https://cdn.site/datasheet.pdf → HTTP/2 206 → accept-ranges: bytes → content-range: bytes 0-1023/4194304 → cf-cache-status: HIT (served from edge)
Re-linearize after a content update
You edited the datasheet and re-exported it. The new file is no longer linearized — re-run before re-uploading or the CDN serves a non-linearized object.
New export: datasheet-v2.pdf (Fast Web View: No) Run: pdf-linearize → Fast Web View: Yes Upload + purge the old object from the CDN cache.
Cloudflare R2 public bucket
R2 serves byte ranges for objects. Set cache metadata so the edge keeps the linearized file.
1) pdf-linearize brochure.pdf → Fast Web View: Yes 2) Upload to R2 with Cache-Control: public, max-age=604800 3) Range requests served from edge → fast first page worldwide
Already-linearized asset in a deploy pipeline
A CI step linearizes every PDF before sync. Assets already linearized on a prior run pass through untouched with Force off.
Options: password blank, force off → /Linearized detected → output == input → CI proceeds (no needless byte churn that would bust the CDN cache key)
Edge cases and what actually happens
Edge returns 200, not 206, for a range request
No edge benefitIf curl -I -H 'Range: bytes=0-1023' returns 200 instead of 206, the CDN or origin isn't serving byte ranges for that object, so the linearized first-page benefit never reaches readers. Check that the origin sends Accept-Ranges: bytes and that no transform/compression layer is stripping range support.
CDN gzips the PDF on the wire
Breaks rangesSome CDNs apply Content-Encoding: gzip to documents, which shifts byte offsets so range requests no longer map to real positions and progressive loading breaks. PDFs are already binary-compressed — exclude application/pdf from on-the-wire compression at the CDN.
Forgot to re-linearize after updating the file
Stale optimisationAny content edit invalidates the linearization — the new export is not linearized. If you push the updated file without re-running this tool, the CDN serves a non-linearized object and the first-page benefit silently disappears. Re-linearize on every update, then purge the old cached object.
Already linearized, Force off
Passed throughThe tool detects the existing hint dictionary and returns the file unchanged, avoiding needless byte churn that would change the cache key. Tick Force only when you intend a rebuild.
Encrypted asset, no password
Errorqpdf can't process an encrypted file without the password; it exits 2 and the tool reports it could not process the PDF. Supply the open password to do the decrypt-linearize-re-encrypt round trip before uploading.
qpdf finished with warnings
SupportedExit code 3 (warnings only) still yields a valid linearized object. The tool treats codes 0 and 3 as success — the file is safe to push to the CDN.
You expected linearization to cut CDN egress
By designIt won't — size is roughly unchanged, so egress and cache footprint are the same. Compress first with lossless or aggressive compression to actually reduce bytes, then linearize last.
Large asset exceeds the Free input cap
LimitFree caps PDF input at 2 MB / 50 pages — a marketing catalogue will exceed it. Compress it under the cap first, or upgrade: Pro raises the limit to 50 MB / 500 pages, with higher tiers above that. Processing itself runs in your browser via WebAssembly.
Frequently asked questions
Why linearize a PDF before putting it on a CDN?
A CDN gets the file close to the reader, but a non-linearized PDF still forces the viewer to download the whole thing before rendering. Linearization lets an edge node answer a byte-range request for just the first page, so the reader sees content in milliseconds while the rest streams from the same edge cache.
Do CDNs support byte-range requests by default?
Yes — Cloudflare (including Pages and R2), S3 + CloudFront, and Fastly all serve byte ranges for cached objects by default. Confirm for your setup with curl -I -H 'Range: bytes=0-1023' <cdn-url> and look for a 206 Partial Content response with Accept-Ranges: bytes.
Should I set caching headers on the linearized PDF?
Yes. Set a long-lived Cache-Control: public, max-age=86400 (a day) or longer on the PDF's CDN path so the linearized object is cached at the edge and range requests are served without origin hits. Increase the TTL for assets that rarely change.
Do I need to re-linearize after every update?
Yes. Any content edit produces a fresh, non-linearized export. Re-run the linearizer before re-uploading, and purge the old cached object from the CDN so readers get the linearized version. Forgetting this is the most common way the optimisation silently disappears.
Does linearization reduce CDN egress costs?
No — it's a structure reorder, not compression, so the byte count is roughly unchanged. To cut egress, compress first with lossless or aggressive compression, then linearize the smaller file last.
Can a CDN's on-the-wire gzip break this?
Yes. If the CDN applies Content-Encoding: gzip to the PDF, byte-range offsets stop matching and progressive loading breaks. PDFs are already internally compressed, so exclude application/pdf from the CDN's compression rules.
Is the file uploaded anywhere during linearization?
No. qpdf-wasm runs in your browser tab — the document is never uploaded to a processing server. It only reaches your CDN origin when you push the downloaded output yourself.
Does Cloudflare Pages serve byte ranges for PDFs?
Yes — Cloudflare Pages and R2 serve byte-range requests (Accept-Ranges: bytes) for cached objects by default, which makes a linearized PDF particularly effective there. Set Cache-Control so the object stays at the edge.
What does the Force checkbox do?
It rebuilds the hint stream even on a file that's already linearized. Normally the tool passes already-linearized files through unchanged — which is good in a CI pipeline because it avoids byte churn that would change the cache key. Use Force only when you deliberately want a rebuild.
Does it work on password-protected assets?
Yes — type the open password and qpdf decrypts, linearizes, and re-encrypts in one pass, leaving the file encrypted with the same password. Without the password, an encrypted file fails to process.
How do I confirm the edge is actually serving the first page fast?
Send a range request to the CDN URL: curl -I -H 'Range: bytes=0-1023' <cdn-url>. A working setup returns 206 Partial Content, Accept-Ranges: bytes, and a Content-Range header — and the CDN's cache-status header should show a HIT once the object is warm at the edge.
How large a PDF can I linearize for CDN delivery?
Free allows 2 MB / 50 pages, Pro 50 MB / 500 pages, Pro Media 500 MB / 2,000 pages, Developer 2 GB / 10,000 pages, Enterprise unlimited. Linearization runs in WebAssembly in your tab, so above those caps the practical limit is browser memory.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.