How to repair a corrupted pdf that refuses to open
- Step 1Confirm it's the file, not the viewer — Open the PDF in a second reader — Chrome's built-in viewer, Firefox's pdf.js, and Acrobat Reader each tolerate different defects. If every reader fails with a 'damaged' or 'failed to load' message, the file structure itself is broken and is a candidate for rebuild.
- Step 2Open the PDF Repair tool — Go to the PDF Repair tool. It loads in your browser only — there is no upload step and no server round-trip.
- Step 3Drop the corrupted file in — Add the
.pdffile. The tool accepts onlyapplication/pdf(a.pdfextension); a renamed image or non-PDF is rejected with an 'is not supported' message. Repair has no options to set — it starts automatically as soon as the file is added. - Step 4Let the rebuild run in your browser — pdf-lib reloads the file with tolerant parsing, creates a fresh empty document, copies every page it can resolve into it, and saves. On a small file this is near-instant; the work is bounded by your device, not a queue.
- Step 5Download and open the repaired PDF — Save the output and open it. Because it has a newly-written xref and trailer, it should open in the strict reader that rejected the original.
- Step 6Verify page count and add back catalog features — Compare the recovered page count against what you expected. Bookmarks, form fields, and the original title/author metadata are not carried over by a rebuild — restore those in a desktop editor if you need them. If pages are missing, the references to those page objects were unrecoverable in the source.
What a structural rebuild recovers — and what it leaves behind
Grounded in how the tool actually works: pdf-lib loads the source, copies page objects into a fresh PDFDocument, and saves with updateMetadata: false. Anything attached to the source document's catalog (not its pages) is not carried over.
| Aspect | Behaviour on repair | What to do about it |
|---|---|---|
| Cross-reference table / trailer | Rebuilt. The output is saved as a brand-new document, so its xref, trailer, and object offsets are written fresh and correct | Nothing — this is the whole point of the rebuild |
| Page content & quality | Preserved exactly for every recoverable page. Pages are copied, not rasterised — vector text stays vector and scans stay at original DPI | Nothing — content is carried through untouched |
| Page-level annotations | Carried with the page. Annotations stored on a page object travel when that page is copied | Verify in your reader; use PDF Annotation Remover if you want them gone |
| Bookmarks / outline | Not carried over. The outline lives in the document catalog, not on the pages, so the rebuilt file has none | Rebuild navigation in a desktop editor if essential |
| Form fields (AcroForm) | Not reconstructed as a working form. The AcroForm dictionary is a catalog object and is not rebuilt | If values mattered, recover them from a backup; flatten future forms with PDF Flatten to lock values into page content |
| Document metadata (title/author) | Not copied. Saved with updateMetadata: false, so the output carries a fresh, empty metadata block | Re-add title/author in your reader; a clean record is often desirable for a recovered file |
| Digital signatures | Dropped / invalidated. Signatures are catalog objects and any byte change after signing breaks coverage anyway | Re-sign the repaired file with PDF Sign; verify later with PDF Signature Verifier |
Free vs. Pro limits for repairing a corrupted PDF
Limits are enforced before the rebuild runs. The page check happens by loading the file for a page count first; if a file is too broken even to count pages, that probe is skipped and the rebuild attempt runs directly.
| Limit | Free | Pro | Pro + Media |
|---|---|---|---|
| Max file size | 2 MB | 50 MB | 500 MB |
| Max pages | 50 | 500 | 2,000 |
| Files per job | 1 | 5 | 50 |
| Accepted input | .pdf only | .pdf only | .pdf only |
Cookbook
Real corruption patterns and exactly what the rebuild does for each. 'Recoverable' means pdf-lib could resolve the page references; bytes that were physically lost can't be invented by any tool.
Broken xref, pages intact — the textbook case
An editor wrote a bad incremental update, so the xref points to the wrong byte offsets. The page objects themselves are fine. This is exactly what a structural rebuild is for: discard the broken xref, copy the pages, write a new one.
Before: Acrobat: "The file is damaged and could not be repaired" Chrome: "Failed to load PDF document" Repair (pdf-lib reload -> copyPages -> save): loaded with tolerant parser, 14 page objects resolved 14 pages copied into a fresh document After: opens cleanly in Acrobat and Chrome, 14 pages, new xref
A stale encryption flag blocking an otherwise-fine file
The file carries an encryption marker but the content is readable. Many strict readers refuse to open it. The tool loads with ignoreEncryption: true, so the marker no longer aborts the rebuild.
Before: reader refuses: "This document is protected" (no password works) Repair loads with ignoreEncryption: true: source parsed, all pages resolved and copied After: plain, unencrypted, openable PDF (note: this is not password cracking -- it only ignores a stale/empty marker; truly encrypted content needs the PDF Unlock tool with the password)
Garbage objects that crash a strict parser
A few objects in the file are malformed. pdf-lib's parser defaults to not throwing on invalid objects, so it skips the bad ones and keeps the resolvable pages — where Acrobat's strict parser aborts the entire open.
Before: Acrobat aborts on the first malformed object Repair: pdf-lib skips 2 unresolvable objects (throwOnInvalidObject off) copies the 9 pages whose references still resolve After: 9-page valid PDF (verify against expected count below)
Verifying how much actually came through
Always compare the recovered page count to what you expected. The tool can only carry pages whose objects it could resolve; a page whose content stream was lost upstream won't appear.
Expected pages (from the original email/title): 20 Recovered pages in output: 18 -> 2 pages were unrecoverable (their objects were missing/destroyed in the source). Pull those pages from a backup or re-export from the source app.
When the rebuild can't help: total parse failure
If the file is so damaged that pdf-lib can't even begin parsing it (no usable header/trailer, truncated mid-structure), the load throws and the tool surfaces the parser error instead of producing output.
Repair attempt: pdf-lib PDFDocument.load() throws -- no parseable structure Tool shows the parser error message (no file produced). Next step: re-download or re-export the original. For a truncated download specifically, see the "fix-pdf-error-unexpected-end" guide.
Edge cases and what actually happens
The file can't be parsed at all (no usable header/trailer)
Parse failedIf pdf-lib's PDFDocument.load cannot find a parseable structure — for example the file is truncated mid-object or isn't really a PDF — it throws, and the tool surfaces the parser error rather than producing output. A structural rebuild needs at least enough intact structure to resolve page objects. Re-download or re-export the source.
Some pages are missing from the output
Partial recoveryThe tool copies every page whose references resolve. If a page's object or content stream was destroyed in the source, that page can't be carried over — no tool can recover bytes that aren't there. Compare the output's page count against the expected total and pull the missing pages from a backup.
The PDF is genuinely password-encrypted
LimitedignoreEncryption: true lets the tool ignore a stale or empty encryption marker, but it does not decrypt real encrypted content — this is not password cracking. If the document genuinely requires a password, decrypt it first with PDF Unlock (you must know the password), then repair if it's still structurally broken.
Bookmarks and the document outline are gone
Not preservedThe outline is stored in the document catalog, not on the pages. A rebuild copies pages into a fresh document, so the outline isn't carried over. This is expected behaviour — rebuild navigation in a desktop editor if you need it.
An interactive form is no longer fillable
Form not preservedThe AcroForm dictionary is a catalog object, not page content, so it isn't reconstructed by a rebuild. If a filled form was important, recover it from a backup. Going forward, flatten completed forms with PDF Flatten so the values become permanent page content that survives a rebuild.
A digital signature shows as invalid afterwards
Signature invalidatedRepair writes a brand-new document, which changes the bytes and breaks any existing signature's coverage. This is unavoidable for any rebuild. Re-sign the repaired file with PDF Sign and confirm with the PDF Signature Verifier.
Original title/author metadata is blank in the output
By designThe rebuild saves with updateMetadata: false, producing a fresh empty metadata block rather than copying the source's. Re-add title/author in your reader if needed — a clean metadata record is often desirable for a recovered file.
The file is larger than your tier allows
Limit reachedFree tier handles files up to 2 MB and 50 pages; the size check runs before the rebuild and blocks oversize files with an upgrade prompt. Pro raises this to 50 MB / 500 pages and Pro + Media to 500 MB / 2,000 pages. A badly bloated corrupted file may sit just over the line — upgrade or trim the source if you can.
You renamed a non-PDF to .pdf and dropped it in
Not supportedThe tool only accepts application/pdf with a .pdf extension. A renamed image, Word doc, or archive is rejected with an 'is not supported' message before any processing. If you have the original source file, re-export a real PDF from it.
You need low-level byte salvage of an unparseable file
Out of scopeThis tool rebuilds structure via pdf-lib; it does not do raw byte-stream carving of files pdf-lib can't parse. For those, a desktop utility like qpdf (qpdf --recover) or Ghostscript can sometimes reconstruct more aggressively. Use this tool first — it fixes the common 'broken xref, pages intact' case instantly and privately.
Frequently asked questions
How does the repair actually work?
It loads your PDF with pdf-lib's tolerant parser (which by default skips invalid objects instead of aborting), creates a brand-new empty document, copies every page object whose references resolve into it, and saves. The output therefore gets a freshly-written cross-reference table and trailer — which is what makes a structurally-broken file openable again. Everything happens in your browser.
Will the repair recover all my pages?
It recovers every page whose objects pdf-lib can resolve. A broken xref or a few garbage objects are usually fully recoverable because the page data itself is intact. If a page's content was physically destroyed in the source file, it can't be recovered — no tool can invent missing bytes. Always compare the output's page count to what you expected.
Why does my PDF get corrupted in the first place?
The most common causes are a save or export interrupted partway, a sync client (Dropbox/OneDrive/Drive) writing a partial file, an editor producing an invalid incremental update, or storage errors. These typically break the cross-reference table or trailer while leaving the page content intact — exactly the case a structural rebuild fixes.
Is my file uploaded anywhere?
No. The repair runs entirely in your browser using pdf-lib and pdfjs-dist. Your corrupted file — which may contain sensitive content — never leaves your device. Only anonymous usage counters are recorded when you're signed in.
Does it work on encrypted PDFs?
It loads files with ignoreEncryption: true, so a stale or empty encryption marker won't block the rebuild. That is not the same as decryption: if the document genuinely requires a password to read its content, decrypt it first with PDF Unlock (you need the password), then repair if it's still broken.
Are there any options to configure?
No. Repair has no settings — it starts automatically the moment you add a file and does one thing: rebuild the document structure by copying pages into a fresh PDF. There are no quality, mode, or page-selection controls, because the goal is a faithful structural rebuild of whatever is recoverable.
Why are my bookmarks, form fields, and metadata missing afterwards?
Those live in the document's catalog, not on its pages. Because repair copies pages into a brand-new document and saves with updateMetadata: false, catalog-level features (outline, AcroForm, original title/author) are not carried over. Page content and page-level annotations are preserved. Restore catalog features in a desktop editor if you need them.
It says 'is not supported' when I add my file — why?
The tool accepts only real PDFs (application/pdf, .pdf extension). If you renamed an image or another format to .pdf, it's rejected before processing. If you have the original source document, re-export a genuine PDF from it; that's more reliable than repairing a mislabelled file.
What if the repair produces no output at all?
That means pdf-lib couldn't parse the file enough to begin — it's too damaged or truncated to find a usable structure, and the tool surfaces the parser error instead of a download. Re-download or re-export the original. For an interrupted download specifically, see the fix-pdf-error-unexpected-end guide.
How large a file can I repair?
Free tier handles up to 2 MB and 50 pages per file; Pro raises this to 50 MB and 500 pages; Pro + Media to 500 MB and 2,000 pages. The size limit is checked before the rebuild starts. Repair processes one file per job on free (Pro allows 5, Pro + Media 50).
Is this as thorough as a desktop tool like qpdf?
For the common 'broken xref / invalid objects, pages intact' case, the result is equivalent — a freshly-saved valid PDF — and it's instant and private. The gap is raw byte-stream salvage of files that won't parse at all; desktop tools like qpdf (--recover) or Ghostscript can sometimes reconstruct more from a truly mangled file. Try this first; reach for desktop tools only if it can't parse the file.
Could repairing make the file worse?
No. The tool reads your input and writes a separate new output file — your original is never modified. If the rebuild fails, you still have the original to try another tool or re-download. There's no in-place editing and no risk of overwriting the damaged source.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.