How to convert a pdf to pdf/a-1b for long-term archiving
- Step 1Remove any password first — PDF/A forbids encryption, and the converter can't tag an encrypted document cleanly. If your PDF is password-protected, strip it first with PDF Remove Password or PDF Unlock, then come back.
- Step 2Open the PDF to PDF/A converter and drop the file — Load your document into the converter. Parsing, tagging, and re-saving all happen in your browser — nothing is uploaded. There is no level selector: the tool always targets PDF/A-1b.
- Step 3Convert — The tool attaches the XMP
pdfaididentifier, adds theGTS_PDFA1output intent, forces the PDF-1.4 header, and re-saves. There are no options to set — one button produces the tagged file. - Step 4Download the tagged PDF/A-1b file — Save the output. Open it in Adobe Acrobat or your DMS preview; a PDF/A-aware reader will display the archival banner if it accepts the identifier.
- Step 5Validate before you trust it for audit — Run the file through veraPDF (the open-source ISO 19005 validator). Because the output intent uses a stub ICC profile and fonts are not re-embedded, expect veraPDF to report failures on font-embedding and colour-profile rules. Treat this tool as a tagging baseline, not a certified converter.
- Step 6Archive the source alongside the PDF/A copy — Keep the original PDF (and any Word / InDesign source) next to the archival copy. If a strict-validation requirement surfaces later, you'll want the editable source to re-export through a Ghostscript-grade converter.
What the converter writes to your PDF
Every change toPdfA1b makes, and whether it satisfies the matching PDF/A-1b requirement.
| Change applied | PDF/A-1b requirement it targets | Fully satisfied? |
|---|---|---|
XMP metadata stream with pdfaid:part=1, pdfaid:conformance=B, plus dc:title, dc:creator, create/modify dates | A PDF/A identifier must be present in document XMP | Yes — the identifier is written and well-formed |
OutputIntent dictionary, subtype S: GTS_PDFA1, OutputConditionIdentifier: sRGB IEC61966-2.1 | A device-independent output intent must define the colour space | Partial — the DestOutputProfile is a stub blob, not a real ICC profile |
Header rewritten to %PDF-1.4; saved with useObjectStreams: false | PDF/A-1 is built on PDF 1.4 and forbids object/xref streams | Yes — version and structure are 1.4-compatible |
| Re-save through pdf-lib (existing font embeds preserved) | All fonts must be embedded and complete | No — fonts already embedded survive; missing fonts are not added or subset |
Producer set to JAD PDF · PDF/A-1b tagger, modification date refreshed | Provenance metadata is recommended for archival files | Yes — informational only |
| (not done) flatten transparency, remove external references, embed ICC | PDF/A-1 forbids transparency and external content; needs real ICC | No — these passes are out of scope for this tool |
PDF/A-1b checklist vs. what you still need to do
Use this to decide whether the tagged output is enough or whether you need a heavier converter.
| PDF/A-1b rule | Handled here | If not, do this |
|---|---|---|
| PDF/A identifier in XMP | Yes | — |
| No encryption | Source must already be unencrypted | Strip first with PDF Remove Password |
| All fonts embedded | Only if the source already embeds them | Re-export from the source app with 'embed all fonts', or run PDF Font Subsetter on a file that already embeds fonts |
| Valid ICC output intent | Stub only | Use a Ghostscript / Acrobat-grade converter for a real sRGB ICC profile |
| No transparency | Not flattened | Flatten via your design app before exporting; PDF Flatten handles form fields and annotations |
| PDF version 1.4 | Yes (header forced) | — |
File-size and page limits by tier
PDF-family limits applied before conversion runs.
| Tier | Max file size | Max pages |
|---|---|---|
| Free | 2 MB | 50 pages |
| Pro | 50 MB | 500 pages |
| Pro + Media | 500 MB | 2,000 pages |
| Developer | 2 GB | 10,000 pages |
| Enterprise | Unlimited | Unlimited |
Cookbook
What the tagged output actually contains, shown as the bytes and dictionaries the converter writes. Use these to confirm the identifier landed and to understand why a strict validator may still object.
Confirm the PDF/A identifier was written
After conversion, the XMP metadata stream carries the PDF/A-1b identifier. Open the file in a hex/text viewer or a metadata inspector and you'll see this packet.
XMP packet (excerpt, as written): <pdfaid:part>1</pdfaid:part> <pdfaid:conformance>B</pdfaid:conformance> <pdf:Producer>JAD PDF · PDF/A-1b tagger</pdf:Producer> → A PDF/A-aware reader reads part=1 + conformance=B and labels the document 'PDF/A-1b'.
The header rewrite from 1.7 to 1.4
PDF/A-1 is defined against PDF 1.4. The converter rewrites the first eight bytes after saving so older readers and ingest checks accept the version.
Before: %PDF-1.7 (typical Word / Chrome export) After: %PDF-1.4 The body is also saved with useObjectStreams:false, so there are no PDF-1.5 object/cross-reference streams that a 1.4 parser couldn't read.
The output intent dictionary that gets added
An OutputIntent tells a reader which colour space the archive was authored for. The converter adds the dictionary an ingest check looks for — but the referenced profile is a placeholder, not a real sRGB ICC blob.
<< /Type /OutputIntent /S /GTS_PDFA1 /OutputConditionIdentifier (sRGB IEC61966-2.1) /Info (sRGB IEC61966-2.1) /DestOutputProfile <stub stream, N 3> >> → Structure present; strict validators that parse the ICC bytes will flag DestOutputProfile as invalid.
A born-digital report that already embeds its fonts
The best-case input. A PDF exported from LaTeX or Word with fonts embedded converts cleanly because the converter preserves embeds and only needs to add the archival markers.
Input: annual-accounts.pdf (1.4 MB, Latin Modern
embedded, no transparency, no encryption)
Output: annual-accounts_pdfa.pdf (~1.4 MB)
+ pdfaid identifier
+ GTS_PDFA1 output intent
+ %PDF-1.4 header
→ Closest to real PDF/A-1b this tool produces.veraPDF will still report failures — read them
Running the tagged file through veraPDF surfaces the known gaps. This is expected, not a bug — the tool is a tagging baseline.
veraPDF profile: PDF/A-1B Likely failures: 6.3.x Font program not embedded (if source lacked embeds) 6.2.2 OutputIntent ICC profile invalid (stub blob) Passes: 6.7.x XMP PDF/A identifier present and correct 6.1.x PDF version / object-stream constraints
Edge cases and what actually happens
Source PDF has missing / non-embedded fonts
Not fixedThe converter re-saves through pdf-lib, which preserves existing embeds but does not add fonts that were never embedded. A document relying on the reader's system fonts (common in old print-to-PDF exports) stays non-conformant on the font-embedding rule. Re-export from the source application with 'embed all fonts' enabled before converting.
Output intent uses a stub ICC profile
By designThe DestOutputProfile references a placeholder stream (JAD-sRGB-stub, N=3), not a real sRGB IEC61966-2.1 ICC profile. Readers that only check for the presence of an OutputIntent dictionary accept it; strict validators that parse the ICC bytes (veraPDF) will reject it. For a real profile you need a Ghostscript- or Acrobat-grade converter.
Encrypted / password-protected source
rejectPDF/A does not permit encryption, and the tagger can't produce a clean archival file from an encrypted one. Remove the password first with PDF Remove Password or PDF Unlock, then convert the decrypted copy.
PDF uses transparency or blend modes
Not flattenedPDF/A-1 forbids transparency entirely (PDF/A-2 relaxes this, but this tool only targets 1b). The converter does not flatten transparency, so design-heavy PDFs with drop shadows or transparent overlays will carry transparency into the output and fail strict 1b validation. Flatten in your design app, or target a profile that allows transparency using a heavier converter.
Scanned PDF with no text layer
Image-onlyTagging an image-only scan produces a valid-looking PDF/A-1b wrapper around page images, but it isn't searchable. For an archival scan you usually want a text layer first: run PDF OCR to add an invisible searchable layer, then convert.
Free-tier file over 2 MB or 50 pages
BlockedThe free tier caps PDF input at 2 MB and 50 pages. Archival masters are often larger. Upgrade to Pro (50 MB / 500 pages) or higher, or split the document with a range tool before tagging each part.
Existing XMP metadata on the source
ReplacedThe converter attaches its own XMP stream carrying the pdfaid identifier, title, creator, and dates. If you relied on custom XMP keys (e.g. Dublin Core subject keywords or a records-management schema), re-apply them after conversion — they are not merged.
You need PDF/A-2b or PDF/A-3b
UnsupportedThis tool only writes the PDF/A-1b identifier (pdfaid:part=1). It cannot target 2b (transparency, JPEG 2000, embedded files) or 3b (arbitrary attachments). If your archive mandates a later part, use a converter that supports it — do not edit the identifier by hand, because the file body still has to obey that part's rules.
Digitally signed source PDF
Signature invalidatedRe-saving the document changes its bytes, which breaks any existing digital signature. Convert to PDF/A first, then sign the archival copy with PDF Digital Signature if a signature is required on the archive.
Frequently asked questions
Which PDF/A level does this tool produce?
PDF/A-1b only — ISO 19005-1, conformance level B. It writes pdfaid:part=1 and pdfaid:conformance=B into the XMP metadata. There is no option to choose PDF/A-1a, 2b, or 3b; the converter always targets 1b. If you need a different part, use a converter that supports it.
Will the output pass veraPDF validation?
Not strictly. The converter is best-effort tagging: it adds the PDF/A identifier, the output intent dictionary, and the PDF-1.4 header, but it ships a stub ICC profile and does not re-embed missing fonts or flatten transparency. veraPDF will typically pass the XMP-identifier rules and fail on the ICC profile and any font-embedding gaps. Treat it as a tagging baseline, and run a Ghostscript-grade converter when you need a certified pass.
What is the difference between PDF/A-1a and PDF/A-1b?
PDF/A-1b (level B, basic) only requires that the document's visual appearance be reliably reproducible. PDF/A-1a (level A, accessible) additionally requires tagged structure and Unicode mapping for accessibility. This tool targets 1b; it does not add the tag tree that 1a needs.
Does it embed fonts for me?
No. It preserves fonts that are already embedded in the source by re-saving through pdf-lib, but it does not add or subset fonts that were never embedded. If your PDF relies on system fonts, re-export it from the original application with 'embed all fonts' before converting.
Can I convert an encrypted PDF to PDF/A-1b?
No — PDF/A forbids encryption. Remove the password first with PDF Remove Password or PDF Unlock, then run the decrypted copy through the converter.
Is my document uploaded anywhere?
No. Loading, tagging, and re-saving all run in your browser. The file never leaves your device; only an anonymous usage counter is recorded when you're signed in, with no document content.
Why does the converter force the PDF version to 1.4?
PDF/A-1 is defined against PDF 1.4 and forbids features introduced later, such as object and cross-reference streams. The converter saves without object streams and rewrites the header to %PDF-1.4 so a 1.4-era reader can open the archive decades from now.
Will my existing metadata survive conversion?
The converter writes a fresh XMP stream containing the PDF/A identifier, the document title, the author/creator, and create/modify dates. Custom XMP schemas you added separately are not merged — re-apply them after conversion if your records system depends on them.
How large a file can I convert?
Free tier allows up to 2 MB and 50 pages. Pro raises that to 50 MB / 500 pages, Pro + Media to 500 MB / 2,000 pages, Developer to 2 GB / 10,000 pages, and Enterprise is unlimited. Limits are checked before conversion runs.
Should I keep the original PDF after converting?
Yes. Archive the original source (and any editable Word / InDesign master) alongside the PDF/A copy. If a stricter validation requirement appears later, you'll want the source to re-export through a heavier converter rather than trying to repair the tagged file.
Is a scanned PDF acceptable as PDF/A-1b?
A scan can be valid PDF/A-1b, but it won't be searchable without a text layer. For an archival scan, run PDF OCR first to add an invisible searchable text layer, then tag the result. PDF/A-1a (not produced here) would also require the document to be tagged for accessibility.
Can I batch-convert several PDFs to PDF/A-1b?
The tool processes one PDF per run. Free tier is single-file; Pro and above raise the batch ceiling (Pro 5 files, Pro + Media 50). For an automated pipeline, pair the @jadapps/runner and POST each file to the local pdf-to-pdfa endpoint — the document stays on your machine.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.