Normalise PDF Version for a Processing Pipeline

How to normalise the pdf version across many documents

Step 1
Pick the one version your pipeline standardises on — Choose based on the most restrictive consumer in your pipeline. If a modern library is downstream, 1.7 is the natural baseline. If a legacy parser can't read compressed cross-references, normalise to 1.4 (the only target with object streams off). Document the choice so the whole pipeline agrees.
Step 2
Spot-check one representative file in the browser tool — Before automating, run a typical document through the PDF Version Converter, select your target version, and confirm the output ingests cleanly into your pipeline. The browser tool is one file at a time — use it to validate the target, not to process the batch.
Step 3
Fetch the converter's schema from the API — Call GET /api/v1/tools/pdf-version-converter to get the option schema (the single version field). This confirms the accepted values: 1.4, 1.5, 1.6, 1.7.
Step 4
Pair the @jadapps/runner once — Install and pair the @jadapps/runner (a paid-tier feature). It runs the same converter locally and exposes it at 127.0.0.1:9789 — your documents never leave the machine.
Step 5
Script the batch over your directory — For each file in the folder, POST it with { "version": "1.7" } (or your target) to 127.0.0.1:9789/v1/tools/pdf-version-converter/run and write the normalised output to your pipeline's intake folder.
Step 6
Re-add anything the pipeline needs that the rebuild dropped — If downstream stages depend on form fields, bookmarks, or encryption, add those as separate pipeline stages after normalisation — version conversion drops them. Flatten forms with PDF Flatten; re-encrypt with PDF Password Protect.

In-browser vs. runner — how to actually batch this

The browser tool is for validation and one-offs; the runner is for pipeline batch.

Capability	In-browser tool	@jadapps/runner
Files per run	One	One per POST — script the loop yourself
Where it runs	Your browser tab (pdf-lib)	Locally at `127.0.0.1:9789`
Folder / directory batch	No (no multi-file intake for this tool)	Yes — drive it from your own script
Tier	Free works for small files	Paid tier (runner pairing required)
Upload exposure	None — local	None — local

Choosing a pipeline target version

Normalise to the version your most restrictive consumer needs.

Pipeline consumer	Recommended target	Why
Modern PDF library (pdf.js, PDFBox 3.x, MuPDF)	1.7	ISO 32000-1 baseline; the structure these expect
Legacy parser that fails on compressed xref	1.4	Only target with object streams OFF — plain xref/trailer
Mixed/unknown consumers	1.7	Broadest modern support; smallest output
PDF/A archival downstream	1.4 then PDF/A pass	1.4 is the PDF/A-1b structural baseline; run PDF to PDF/A after

Per-tier limits for normalisation at scale

Each file still goes through the PDF tier limits. Plan the runner around them.

Tier	Max file size	Max pages	Notes
Free	2 MB	50	Fine for validating; too small for most pipelines
Pro	50 MB	500	Runner available; suits moderate document volumes
Pro + Media	500 MB	2,000	Large scans / long documents
Developer	2 GB	10,000	Heaviest automation; per-file ceiling rarely hit

Cookbook

How to normalise versions across a real pipeline, and what to watch for at volume.

Standardise mixed supplier PDFs to 1.7

Suppliers send everything from 1.3 to 1.7. Normalise all to 1.7 so the ingestion parser sees one structure.

Incoming folder: invoices_1.3.pdf, scan_1.4.pdf, export_1.7.pdf …

for f in *.pdf; do
  POST $f with { "version": "1.7" }
    → 127.0.0.1:9789/v1/tools/pdf-version-converter/run
  → write to ./normalised/
done

Result: every file in ./normalised/ is %PDF-1.7

Legacy ingester needs 1.4 (object streams off)

An older DMS parser chokes on compressed cross-references. Normalise to 1.4 so every file uses the plain xref form.

Target: 1.4 for all files
Payload: { "version": "1.4" }

Effect: object streams OFF, plain xref/trailer on every output
        → legacy parser ingests them all consistently

Fetch the schema before scripting

Confirm the option contract first so your automation sends the right payload shape.

GET /api/v1/tools/pdf-version-converter
→ schema: { version: enum["1.4","1.5","1.6","1.7"], default "1.7" }

Now POST per file:
  127.0.0.1:9789/v1/tools/pdf-version-converter/run
  body: the PDF + { "version": "1.7" }

Add a flatten stage for form-heavy intake

If incoming PDFs are fillable forms and the pipeline needs the values, flatten before normalising — version conversion drops AcroForm fields.

Stage 1: PDF Flatten   (bake field values into the page)
Stage 2: Version Converter → { "version": "1.7" }

Wrong order loses the field data: normalising first drops the
AcroForm, so there's nothing left to flatten.

Re-encrypt as a final stage

Normalisation outputs unencrypted files. If the archive requires encryption, add a protect stage at the end.

Stage 1: Version Converter → { "version": "1.7" }   (unencrypted out)
Stage 2: PDF Password Protect → re-apply owner/user password

Keep this order: converting strips encryption, so protect LAST.

Edge cases and what actually happens

Expecting to drop a folder of PDFs into the web tool

One file at a time

The in-browser version converter doesn't accept multiple files for a single run — it processes one PDF per run. For directory-level batch, drive the @jadapps/runner from your own script (POST each file to the local endpoint). Use the browser tool only to validate the target version before automating.

Form fields disappear across the batch

Dropped by design

Every normalised file loses its AcroForm fields, because the page-copy rebuild doesn't carry the document form catalog. If your pipeline reads form data downstream, extract or flatten it before the normalisation stage — add PDF Flatten as a prior step so values become page content.

Bookmarks / document JavaScript lost

Dropped by design

Outlines and document-level JavaScript are catalog structures outside the copied pages, so they don't survive normalisation. There's no option to preserve them. If a downstream consumer relies on bookmarks, normalisation isn't the right transform for those files — handle them on a separate path.

Files come out unencrypted

Decrypted

The converter rebuilds each file without a security handler, so every output is unencrypted regardless of the input password. Good if your pipeline needs full read access; problematic if the archive mandates encryption. Add a PDF Password Protect stage at the end of the pipeline.

Some files exceed the tier size or page cap

400 limit

Each file is still bound by the PDF tier limits (e.g. Pro: 50 MB / 500 pages, Pro+Media: 500 MB / 2,000 pages). A few oversized scans in the batch will fail. Pre-filter by size, compress them first with PDF Compress (Lossless), or route them to a higher tier.

A corrupt or truly malformed PDF in the batch

Error / skip

pdf-lib reads tolerantly (it even ignores encryption), but a genuinely broken file can throw on load. In a batch, catch the error per file and route failures to a quarantine folder rather than halting the run. Try PDF Repair on quarantined files before re-normalising.

Producer / ModDate changed on every file

Expected

The save rewrites document metadata, so every normalised file gets a fresh Producer and modification date. If your pipeline keys off original metadata, capture it before normalising. To standardise or strip metadata too, add a PDF Metadata Scrubber stage.

Digital signatures invalidated across the batch

Invalidated

Re-serializing changes the bytes, so any signed file loses its valid signature during normalisation (and the signature object isn't carried over). Don't normalise files whose signatures must remain valid; route signed documents to a path that preserves them, or re-sign with PDF Digital Signature after.

Mixing targets within one batch by mistake

Inconsistent

The whole point of normalisation is one target. If your script sends 1.4 for some files and 1.7 for others, you've recreated the version mix you were trying to eliminate. Hard-code a single version value for the run and assert it in the payload.

Output slightly larger when normalising to 1.4

Expected

Normalising a 1.7 file down to 1.4 turns object streams off, so those files grow a little. Across a large batch this adds up in storage. If storage is a concern and the consumer can read 1.5+, normalise to 1.7 instead, which keeps the compact structure.

Frequently asked questions

Can I drop a whole folder of PDFs into the browser tool to normalise them?

No — the in-browser version converter processes one file per run; it doesn't accept a multi-file batch for this tool. Use the browser tool to validate your target version on a sample, then automate the real batch through the @jadapps/runner: fetch GET /api/v1/tools/pdf-version-converter for the schema, pair the runner once, and POST each file with { "version": "1.7" } to 127.0.0.1:9789/v1/tools/pdf-version-converter/run from your own script.

What target version should a pipeline standardise on?

Standardise on what your most restrictive consumer needs. If modern libraries (pdf.js, PDFBox 3.x, MuPDF) are downstream, 1.7 is the natural baseline and produces the smallest output. If a legacy parser fails on compressed cross-references, normalise to 1.4 — the only target that turns object streams off and writes the plain xref/trailer. For mixed or unknown consumers, 1.7 is the safest default.

Does normalisation preserve form fields and bookmarks?

No. The page-copy rebuild doesn't carry the AcroForm catalog, outlines (bookmarks), or document JavaScript, so every normalised file loses them. If your pipeline needs form values, flatten them into the page first with PDF Flatten as a prior stage. Bookmarks can't be preserved by this transform — route bookmark-dependent files separately.

Will encrypted documents stay encrypted after normalising?

No. Every output is unencrypted — the converter reads past input encryption and rebuilds without a security handler, and there's no password field. If the destination archive requires encryption, add a PDF Password Protect stage at the end of your pipeline, after normalisation.

How do I script the batch?

Pair the @jadapps/runner (paid tier), then in your script loop over the directory and POST each PDF with { "version": "<target>" } to 127.0.0.1:9789/v1/tools/pdf-version-converter/run, writing each result to your intake folder. Fetch GET /api/v1/tools/pdf-version-converter first to confirm the option schema. Everything runs locally — pipeline documents never reach JAD's servers.

What happens to a corrupt file in the middle of a batch?

pdf-lib is tolerant (it even ignores encryption on load), but a genuinely malformed file can throw. Wrap each file's call in error handling so a single bad file routes to a quarantine folder instead of stopping the whole run. Try PDF Repair on quarantined files, then re-normalise the repaired output.

Does normalising change how the documents look?

No — it re-serializes the page objects rather than re-rendering them, so text stays selectable and the visible content is identical. The only structural change is at 1.4, where optional-content layers are dropped (a 1.5 feature). For standard pipeline documents the rendered output is unchanged.

Will signatures survive normalisation?

No. Re-serializing changes the bytes and invalidates any existing digital signature, and the signature object isn't carried through. Don't normalise files whose signatures must stay valid — give signed documents a separate path. If you control signing, normalise first and sign last with PDF Digital Signature.

Can I script this in iText, PyPDF, or Ghostscript instead?

Yes — those libraries can set the PDF version too, and they may preserve more structure (forms, bookmarks) than this tool's page-copy rebuild. The advantage of the @jadapps/runner here is that it's the same engine as the web tool, runs locally with zero upload, and takes a one-field payload. Pick the approach that matches what your pipeline needs to preserve.

How do I check what version each incoming file is?

The version is in the first line of the raw file (%PDF-1.x), or in a desktop reader's document properties. For a pipeline, read the first several bytes of each file to log the original version before normalising — useful for auditing how mixed your inputs actually are.

What are the per-file size and page limits?

Each file is bound by the PDF tier limits: free 2 MB / 50 pages, Pro 50 MB / 500 pages, Pro+Media 500 MB / 2,000 pages, Developer 2 GB / 10,000 pages. For a production pipeline you'll typically want Pro or higher; pre-filter or compress oversized scans with PDF Compress (Lossless) before they hit the cap.

Is anything uploaded during normalisation?

No. Whether you use the browser tool (pdf-lib in the tab) or the @jadapps/runner (local 127.0.0.1 endpoint), processing is entirely local — pipeline documents never leave your machine. Only an anonymous usage counter is recorded when you're signed in, and you can opt out.

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

How to normalise the pdf version across many documents

Step 1
Pick the one version your pipeline standardises on — Choose based on the most restrictive consumer in your pipeline. If a modern library is downstream, 1.7 is the natural baseline. If a legacy parser can't read compressed cross-references, normalise to 1.4 (the only target with object streams off). Document the choice so the whole pipeline agrees.
Step 2
Spot-check one representative file in the browser tool — Before automating, run a typical document through the PDF Version Converter, select your target version, and confirm the output ingests cleanly into your pipeline. The browser tool is one file at a time — use it to validate the target, not to process the batch.
Step 3
Fetch the converter's schema from the API — Call GET /api/v1/tools/pdf-version-converter to get the option schema (the single version field). This confirms the accepted values: 1.4, 1.5, 1.6, 1.7.
Step 4
Pair the @jadapps/runner once — Install and pair the @jadapps/runner (a paid-tier feature). It runs the same converter locally and exposes it at 127.0.0.1:9789 — your documents never leave the machine.
Step 5
Script the batch over your directory — For each file in the folder, POST it with { "version": "1.7" } (or your target) to 127.0.0.1:9789/v1/tools/pdf-version-converter/run and write the normalised output to your pipeline's intake folder.
Step 6
Re-add anything the pipeline needs that the rebuild dropped — If downstream stages depend on form fields, bookmarks, or encryption, add those as separate pipeline stages after normalisation — version conversion drops them. Flatten forms with PDF Flatten; re-encrypt with PDF Password Protect.

In-browser vs. runner — how to actually batch this

The browser tool is for validation and one-offs; the runner is for pipeline batch.

Capability	In-browser tool	@jadapps/runner
Files per run	One	One per POST — script the loop yourself
Where it runs	Your browser tab (pdf-lib)	Locally at `127.0.0.1:9789`
Folder / directory batch	No (no multi-file intake for this tool)	Yes — drive it from your own script
Tier	Free works for small files	Paid tier (runner pairing required)
Upload exposure	None — local	None — local

Choosing a pipeline target version

Normalise to the version your most restrictive consumer needs.

Pipeline consumer	Recommended target	Why
Modern PDF library (pdf.js, PDFBox 3.x, MuPDF)	1.7	ISO 32000-1 baseline; the structure these expect
Legacy parser that fails on compressed xref	1.4	Only target with object streams OFF — plain xref/trailer
Mixed/unknown consumers	1.7	Broadest modern support; smallest output
PDF/A archival downstream	1.4 then PDF/A pass	1.4 is the PDF/A-1b structural baseline; run PDF to PDF/A after

Per-tier limits for normalisation at scale

Each file still goes through the PDF tier limits. Plan the runner around them.

Tier	Max file size	Max pages	Notes
Free	2 MB	50	Fine for validating; too small for most pipelines
Pro	50 MB	500	Runner available; suits moderate document volumes
Pro + Media	500 MB	2,000	Large scans / long documents
Developer	2 GB	10,000	Heaviest automation; per-file ceiling rarely hit

Cookbook

How to normalise versions across a real pipeline, and what to watch for at volume.

Standardise mixed supplier PDFs to 1.7

Suppliers send everything from 1.3 to 1.7. Normalise all to 1.7 so the ingestion parser sees one structure.

Incoming folder: invoices_1.3.pdf, scan_1.4.pdf, export_1.7.pdf …

for f in *.pdf; do
  POST $f with { "version": "1.7" }
    → 127.0.0.1:9789/v1/tools/pdf-version-converter/run
  → write to ./normalised/
done

Result: every file in ./normalised/ is %PDF-1.7

Legacy ingester needs 1.4 (object streams off)

An older DMS parser chokes on compressed cross-references. Normalise to 1.4 so every file uses the plain xref form.

Target: 1.4 for all files
Payload: { "version": "1.4" }

Effect: object streams OFF, plain xref/trailer on every output
        → legacy parser ingests them all consistently

Fetch the schema before scripting

Confirm the option contract first so your automation sends the right payload shape.

GET /api/v1/tools/pdf-version-converter
→ schema: { version: enum["1.4","1.5","1.6","1.7"], default "1.7" }

Now POST per file:
  127.0.0.1:9789/v1/tools/pdf-version-converter/run
  body: the PDF + { "version": "1.7" }

Add a flatten stage for form-heavy intake

If incoming PDFs are fillable forms and the pipeline needs the values, flatten before normalising — version conversion drops AcroForm fields.

Stage 1: PDF Flatten   (bake field values into the page)
Stage 2: Version Converter → { "version": "1.7" }

Wrong order loses the field data: normalising first drops the
AcroForm, so there's nothing left to flatten.

Re-encrypt as a final stage

Normalisation outputs unencrypted files. If the archive requires encryption, add a protect stage at the end.

Stage 1: Version Converter → { "version": "1.7" }   (unencrypted out)
Stage 2: PDF Password Protect → re-apply owner/user password

Keep this order: converting strips encryption, so protect LAST.

Edge cases and what actually happens

Expecting to drop a folder of PDFs into the web tool

One file at a time

Form fields disappear across the batch

Dropped by design

Bookmarks / document JavaScript lost

Dropped by design

Files come out unencrypted

Decrypted

Some files exceed the tier size or page cap

400 limit

A corrupt or truly malformed PDF in the batch

Error / skip

Producer / ModDate changed on every file

Expected

Digital signatures invalidated across the batch

Invalidated

Mixing targets within one batch by mistake

Inconsistent

Output slightly larger when normalising to 1.4

Expected

Frequently asked questions

Can I drop a whole folder of PDFs into the browser tool to normalise them?

What target version should a pipeline standardise on?

Does normalisation preserve form fields and bookmarks?

Will encrypted documents stay encrypted after normalising?

How do I script the batch?

What happens to a corrupt file in the middle of a batch?

Does normalising change how the documents look?

Will signatures survive normalisation?

Can I script this in iText, PyPDF, or Ghostscript instead?

How do I check what version each incoming file is?

What are the per-file size and page limits?

Is anything uploaded during normalisation?

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

Normalise the PDF Version Across Many Documents

How to normalise the pdf version across many documents

In-browser vs. runner — how to actually batch this

Choosing a pipeline target version

Per-tier limits for normalisation at scale

Cookbook

Standardise mixed supplier PDFs to 1.7

Legacy ingester needs 1.4 (object streams off)

Fetch the schema before scripting

Add a flatten stage for form-heavy intake

Re-encrypt as a final stage

Edge cases and what actually happens

Expecting to drop a folder of PDFs into the web tool

Form fields disappear across the batch

Bookmarks / document JavaScript lost

Files come out unencrypted

Some files exceed the tier size or page cap

A corrupt or truly malformed PDF in the batch

Producer / ModDate changed on every file

Digital signatures invalidated across the batch

Mixing targets within one batch by mistake

Output slightly larger when normalising to 1.4

Frequently asked questions

Can I drop a whole folder of PDFs into the browser tool to normalise them?

What target version should a pipeline standardise on?

Does normalisation preserve form fields and bookmarks?

Will encrypted documents stay encrypted after normalising?

How do I script the batch?

What happens to a corrupt file in the middle of a batch?

Does normalising change how the documents look?

Will signatures survive normalisation?

Can I script this in iText, PyPDF, or Ghostscript instead?

How do I check what version each incoming file is?

What are the per-file size and page limits?

Is anything uploaded during normalisation?

Privacy first

Related guides

Normalise the PDF Version Across Many Documents

How to normalise the pdf version across many documents

In-browser vs. runner — how to actually batch this

Choosing a pipeline target version

Per-tier limits for normalisation at scale

Cookbook

Standardise mixed supplier PDFs to 1.7

Legacy ingester needs 1.4 (object streams off)

Fetch the schema before scripting

Add a flatten stage for form-heavy intake

Re-encrypt as a final stage

Edge cases and what actually happens

Expecting to drop a folder of PDFs into the web tool

Form fields disappear across the batch

Bookmarks / document JavaScript lost

Files come out unencrypted

Some files exceed the tier size or page cap

A corrupt or truly malformed PDF in the batch

Producer / ModDate changed on every file

Digital signatures invalidated across the batch

Mixing targets within one batch by mistake

Output slightly larger when normalising to 1.4

Frequently asked questions

Can I drop a whole folder of PDFs into the browser tool to normalise them?

What target version should a pipeline standardise on?

Does normalisation preserve form fields and bookmarks?

Will encrypted documents stay encrypted after normalising?

How do I script the batch?

What happens to a corrupt file in the middle of a batch?

Does normalising change how the documents look?

Will signatures survive normalisation?

Can I script this in iText, PyPDF, or Ghostscript instead?

How do I check what version each incoming file is?

What are the per-file size and page limits?

Is anything uploaded during normalisation?

Privacy first

Related guides