How to normalise the pdf version across many documents
- Step 1Pick the one version your pipeline standardises on — Choose based on the most restrictive consumer in your pipeline. If a modern library is downstream, 1.7 is the natural baseline. If a legacy parser can't read compressed cross-references, normalise to 1.4 (the only target with object streams off). Document the choice so the whole pipeline agrees.
- Step 2Spot-check one representative file in the browser tool — Before automating, run a typical document through the PDF Version Converter, select your target version, and confirm the output ingests cleanly into your pipeline. The browser tool is one file at a time — use it to validate the target, not to process the batch.
- Step 3Fetch the converter's schema from the API — Call
GET /api/v1/tools/pdf-version-converterto get the option schema (the singleversionfield). This confirms the accepted values: 1.4, 1.5, 1.6, 1.7. - Step 4Pair the @jadapps/runner once — Install and pair the @jadapps/runner (a paid-tier feature). It runs the same converter locally and exposes it at
127.0.0.1:9789— your documents never leave the machine. - Step 5Script the batch over your directory — For each file in the folder, POST it with
{ "version": "1.7" }(or your target) to127.0.0.1:9789/v1/tools/pdf-version-converter/runand write the normalised output to your pipeline's intake folder. - Step 6Re-add anything the pipeline needs that the rebuild dropped — If downstream stages depend on form fields, bookmarks, or encryption, add those as separate pipeline stages after normalisation — version conversion drops them. Flatten forms with PDF Flatten; re-encrypt with PDF Password Protect.
In-browser vs. runner — how to actually batch this
The browser tool is for validation and one-offs; the runner is for pipeline batch.
| Capability | In-browser tool | @jadapps/runner |
|---|---|---|
| Files per run | One | One per POST — script the loop yourself |
| Where it runs | Your browser tab (pdf-lib) | Locally at 127.0.0.1:9789 |
| Folder / directory batch | No (no multi-file intake for this tool) | Yes — drive it from your own script |
| Tier | Free works for small files | Paid tier (runner pairing required) |
| Upload exposure | None — local | None — local |
Choosing a pipeline target version
Normalise to the version your most restrictive consumer needs.
| Pipeline consumer | Recommended target | Why |
|---|---|---|
| Modern PDF library (pdf.js, PDFBox 3.x, MuPDF) | 1.7 | ISO 32000-1 baseline; the structure these expect |
| Legacy parser that fails on compressed xref | 1.4 | Only target with object streams OFF — plain xref/trailer |
| Mixed/unknown consumers | 1.7 | Broadest modern support; smallest output |
| PDF/A archival downstream | 1.4 then PDF/A pass | 1.4 is the PDF/A-1b structural baseline; run PDF to PDF/A after |
Per-tier limits for normalisation at scale
Each file still goes through the PDF tier limits. Plan the runner around them.
| Tier | Max file size | Max pages | Notes |
|---|---|---|---|
| Free | 2 MB | 50 | Fine for validating; too small for most pipelines |
| Pro | 50 MB | 500 | Runner available; suits moderate document volumes |
| Pro + Media | 500 MB | 2,000 | Large scans / long documents |
| Developer | 2 GB | 10,000 | Heaviest automation; per-file ceiling rarely hit |
Cookbook
How to normalise versions across a real pipeline, and what to watch for at volume.
Standardise mixed supplier PDFs to 1.7
Suppliers send everything from 1.3 to 1.7. Normalise all to 1.7 so the ingestion parser sees one structure.
Incoming folder: invoices_1.3.pdf, scan_1.4.pdf, export_1.7.pdf …
for f in *.pdf; do
POST $f with { "version": "1.7" }
→ 127.0.0.1:9789/v1/tools/pdf-version-converter/run
→ write to ./normalised/
done
Result: every file in ./normalised/ is %PDF-1.7Legacy ingester needs 1.4 (object streams off)
An older DMS parser chokes on compressed cross-references. Normalise to 1.4 so every file uses the plain xref form.
Target: 1.4 for all files
Payload: { "version": "1.4" }
Effect: object streams OFF, plain xref/trailer on every output
→ legacy parser ingests them all consistentlyFetch the schema before scripting
Confirm the option contract first so your automation sends the right payload shape.
GET /api/v1/tools/pdf-version-converter
→ schema: { version: enum["1.4","1.5","1.6","1.7"], default "1.7" }
Now POST per file:
127.0.0.1:9789/v1/tools/pdf-version-converter/run
body: the PDF + { "version": "1.7" }Add a flatten stage for form-heavy intake
If incoming PDFs are fillable forms and the pipeline needs the values, flatten before normalising — version conversion drops AcroForm fields.
Stage 1: PDF Flatten (bake field values into the page)
Stage 2: Version Converter → { "version": "1.7" }
Wrong order loses the field data: normalising first drops the
AcroForm, so there's nothing left to flatten.Re-encrypt as a final stage
Normalisation outputs unencrypted files. If the archive requires encryption, add a protect stage at the end.
Stage 1: Version Converter → { "version": "1.7" } (unencrypted out)
Stage 2: PDF Password Protect → re-apply owner/user password
Keep this order: converting strips encryption, so protect LAST.Edge cases and what actually happens
Expecting to drop a folder of PDFs into the web tool
One file at a timeThe in-browser version converter doesn't accept multiple files for a single run — it processes one PDF per run. For directory-level batch, drive the @jadapps/runner from your own script (POST each file to the local endpoint). Use the browser tool only to validate the target version before automating.
Form fields disappear across the batch
Dropped by designEvery normalised file loses its AcroForm fields, because the page-copy rebuild doesn't carry the document form catalog. If your pipeline reads form data downstream, extract or flatten it before the normalisation stage — add PDF Flatten as a prior step so values become page content.
Bookmarks / document JavaScript lost
Dropped by designOutlines and document-level JavaScript are catalog structures outside the copied pages, so they don't survive normalisation. There's no option to preserve them. If a downstream consumer relies on bookmarks, normalisation isn't the right transform for those files — handle them on a separate path.
Files come out unencrypted
DecryptedThe converter rebuilds each file without a security handler, so every output is unencrypted regardless of the input password. Good if your pipeline needs full read access; problematic if the archive mandates encryption. Add a PDF Password Protect stage at the end of the pipeline.
Some files exceed the tier size or page cap
400 limitEach file is still bound by the PDF tier limits (e.g. Pro: 50 MB / 500 pages, Pro+Media: 500 MB / 2,000 pages). A few oversized scans in the batch will fail. Pre-filter by size, compress them first with PDF Compress (Lossless), or route them to a higher tier.
A corrupt or truly malformed PDF in the batch
Error / skippdf-lib reads tolerantly (it even ignores encryption), but a genuinely broken file can throw on load. In a batch, catch the error per file and route failures to a quarantine folder rather than halting the run. Try PDF Repair on quarantined files before re-normalising.
Producer / ModDate changed on every file
ExpectedThe save rewrites document metadata, so every normalised file gets a fresh Producer and modification date. If your pipeline keys off original metadata, capture it before normalising. To standardise or strip metadata too, add a PDF Metadata Scrubber stage.
Digital signatures invalidated across the batch
InvalidatedRe-serializing changes the bytes, so any signed file loses its valid signature during normalisation (and the signature object isn't carried over). Don't normalise files whose signatures must remain valid; route signed documents to a path that preserves them, or re-sign with PDF Digital Signature after.
Mixing targets within one batch by mistake
InconsistentThe whole point of normalisation is one target. If your script sends 1.4 for some files and 1.7 for others, you've recreated the version mix you were trying to eliminate. Hard-code a single version value for the run and assert it in the payload.
Output slightly larger when normalising to 1.4
ExpectedNormalising a 1.7 file down to 1.4 turns object streams off, so those files grow a little. Across a large batch this adds up in storage. If storage is a concern and the consumer can read 1.5+, normalise to 1.7 instead, which keeps the compact structure.
Frequently asked questions
Can I drop a whole folder of PDFs into the browser tool to normalise them?
No — the in-browser version converter processes one file per run; it doesn't accept a multi-file batch for this tool. Use the browser tool to validate your target version on a sample, then automate the real batch through the @jadapps/runner: fetch GET /api/v1/tools/pdf-version-converter for the schema, pair the runner once, and POST each file with { "version": "1.7" } to 127.0.0.1:9789/v1/tools/pdf-version-converter/run from your own script.
What target version should a pipeline standardise on?
Standardise on what your most restrictive consumer needs. If modern libraries (pdf.js, PDFBox 3.x, MuPDF) are downstream, 1.7 is the natural baseline and produces the smallest output. If a legacy parser fails on compressed cross-references, normalise to 1.4 — the only target that turns object streams off and writes the plain xref/trailer. For mixed or unknown consumers, 1.7 is the safest default.
Does normalisation preserve form fields and bookmarks?
No. The page-copy rebuild doesn't carry the AcroForm catalog, outlines (bookmarks), or document JavaScript, so every normalised file loses them. If your pipeline needs form values, flatten them into the page first with PDF Flatten as a prior stage. Bookmarks can't be preserved by this transform — route bookmark-dependent files separately.
Will encrypted documents stay encrypted after normalising?
No. Every output is unencrypted — the converter reads past input encryption and rebuilds without a security handler, and there's no password field. If the destination archive requires encryption, add a PDF Password Protect stage at the end of your pipeline, after normalisation.
How do I script the batch?
Pair the @jadapps/runner (paid tier), then in your script loop over the directory and POST each PDF with { "version": "<target>" } to 127.0.0.1:9789/v1/tools/pdf-version-converter/run, writing each result to your intake folder. Fetch GET /api/v1/tools/pdf-version-converter first to confirm the option schema. Everything runs locally — pipeline documents never reach JAD's servers.
What happens to a corrupt file in the middle of a batch?
pdf-lib is tolerant (it even ignores encryption on load), but a genuinely malformed file can throw. Wrap each file's call in error handling so a single bad file routes to a quarantine folder instead of stopping the whole run. Try PDF Repair on quarantined files, then re-normalise the repaired output.
Does normalising change how the documents look?
No — it re-serializes the page objects rather than re-rendering them, so text stays selectable and the visible content is identical. The only structural change is at 1.4, where optional-content layers are dropped (a 1.5 feature). For standard pipeline documents the rendered output is unchanged.
Will signatures survive normalisation?
No. Re-serializing changes the bytes and invalidates any existing digital signature, and the signature object isn't carried through. Don't normalise files whose signatures must stay valid — give signed documents a separate path. If you control signing, normalise first and sign last with PDF Digital Signature.
Can I script this in iText, PyPDF, or Ghostscript instead?
Yes — those libraries can set the PDF version too, and they may preserve more structure (forms, bookmarks) than this tool's page-copy rebuild. The advantage of the @jadapps/runner here is that it's the same engine as the web tool, runs locally with zero upload, and takes a one-field payload. Pick the approach that matches what your pipeline needs to preserve.
How do I check what version each incoming file is?
The version is in the first line of the raw file (%PDF-1.x), or in a desktop reader's document properties. For a pipeline, read the first several bytes of each file to log the original version before normalising — useful for auditing how mixed your inputs actually are.
What are the per-file size and page limits?
Each file is bound by the PDF tier limits: free 2 MB / 50 pages, Pro 50 MB / 500 pages, Pro+Media 500 MB / 2,000 pages, Developer 2 GB / 10,000 pages. For a production pipeline you'll typically want Pro or higher; pre-filter or compress oversized scans with PDF Compress (Lossless) before they hit the cap.
Is anything uploaded during normalisation?
No. Whether you use the browser tool (pdf-lib in the tab) or the @jadapps/runner (local 127.0.0.1 endpoint), processing is entirely local — pipeline documents never leave your machine. Only an anonymous usage counter is recorded when you're signed in, and you can opt out.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.