How to recover the readable pages from a broken pdf
- Step 1Note how many pages you expect — Before recovering, find the original page count — from the file name, the email it came in, a colleague's copy, or your own memory. You'll compare this to the recovered count to know exactly what, if anything, was lost.
- Step 2Open the PDF Repair tool — Go to the PDF Repair tool. Recovery happens in your browser — no upload — which matters for partially-broken files holding private content.
- Step 3Drop the broken PDF in — Add the
.pdffile. There are no options and no page picker: the tool starts automatically and recovers every page it can. Only real PDFs are accepted. - Step 4Let the salvage pass run — pdf-lib loads the file, skips objects it can't resolve, copies the resolvable pages into a fresh document, and saves. The output is a clean PDF of the recovered pages.
- Step 5Download and count the recovered pages — Save the output and open it. Compare its page count to the number you expected so you know precisely how many pages were unrecoverable.
- Step 6Track down any missing pages elsewhere — Pages that didn't come through had destroyed data in the source — pull them from a backup, an earlier version, or re-export from the original application. To then combine the recovered set with replacement pages, use PDF Merge.
What gets recovered vs. what's lost
Recovery is page-by-page and depends on whether each page's objects resolve. The tool can't recover data that was destroyed in the source — only what's still readable.
| Page state in the source | Recovered? | Why |
|---|---|---|
| Intact page, broken only by the file's bad xref | Yes | pdf-lib resolves the page object; it's copied into the fresh document |
| Page sitting next to a few malformed objects | Yes | The parser skips the bad objects (tolerant by default) and keeps the resolvable page |
| Page whose content stream was destroyed | No | There are no usable bytes to copy — the page can't be reconstructed |
| Page after a truncation point | No | Its bytes were never in the file; nothing to recover |
| Page that references a missing/destroyed object | Sometimes | Recovered if the page still resolves; dropped if the missing object is essential to it |
Page salvage vs. page selection — pick the right tool
This tool recovers everything it can from a broken file; it does not let you choose pages. For deliberate page operations on a healthy PDF, use the sibling tools.
| Your goal | Right tool | Why |
|---|---|---|
| Salvage all readable pages from a broken file | PDF Repair (this tool) | Recovers every resolvable page; no page picker by design |
| Pull specific pages out of a healthy PDF | PDF Extract Pages | Lets you enter exact page numbers/ranges to keep |
| Drop unwanted pages from a healthy PDF | PDF Delete Pages | Removes the pages you list, keeps the rest |
| Combine recovered pages with replacements | PDF Merge | Joins the salvaged set with pages from a backup |
Cookbook
Real partial-corruption scenarios and what the salvage pass produces. 'Recovered N of M' is the key number to check — it tells you exactly what survived.
20-page report, 18 pages recovered
A few page objects deep in a report were corrupted by a bad save. The rest are fine. The salvage pass keeps the 18 resolvable pages and drops the 2 destroyed ones.
Expected pages: 20 Recovered: 18 Output: clean 18-page PDF (new xref, opens in Acrobat) Lost: pages 11 and 12 -- their objects were destroyed. Pull them from a backup, then PDF Merge them back in.
Broken xref hiding perfectly good pages
The whole file failed to open because its cross-reference table was wrong, even though every page was intact. Recovery resolves all pages and writes a fresh document.
Before: Acrobat aborts -> "file is damaged" Reality: all 9 pages are intact; only the xref is broken Salvage: 9 of 9 pages resolved and copied Output: 9-page valid PDF -- nothing lost
Recovered pages plus replacements from a backup
After salvaging, two pages were missing. You have an older copy with those pages. Recover first, then merge the salvaged set with the replacements to rebuild the complete document.
1. PDF Repair on the broken file -> recovered_18pp.pdf 2. PDF Extract Pages on backup -> pages_11_12.pdf 3. PDF Merge: recovered_18pp.pdf + pages_11_12.pdf, reordered as needed -> complete_20pp.pdf
Confirming the recovered set is what you think
Always open the output and verify page order and content, not just the count. Salvage preserves the resolvable pages in document order, but a destroyed page in the middle shifts everything after it.
Recovered count: 18 (expected 20) Open output -> pages 1-10 correct, page 11 in output is the original page 13 (12 + 12 were the lost ones). -> note the gap; reinsert the missing originals via PDF Merge to restore the correct sequence.
Nothing to recover — total corruption
If the file is so damaged that pdf-lib can't parse it at all, there are no resolvable pages and the load throws. The tool reports the error rather than an empty PDF.
Salvage attempt: pdf-lib PDFDocument.load() throws -- file unparseable No pages could be resolved. Recover from a backup or re-export from the original source application.
Edge cases and what actually happens
Some pages don't appear in the output
Partial recoveryPages whose objects or content streams were destroyed can't be copied — they have no usable bytes. The tool recovers every page that still resolves and silently drops the rest. Compare the recovered count to your expected total to identify exactly what was lost, then source those pages from a backup.
The whole file is unparseable
Parse failedIf pdf-lib can't parse the file at all, there are zero resolvable pages and the load throws; the tool surfaces the error rather than producing an empty document. This means the damage is beyond structural — re-export from the source or restore a backup.
You wanted to choose which pages to keep
By designThis tool has no page picker — it recovers everything it can from a broken file. To select specific pages from a healthy PDF, use PDF Extract Pages; to drop pages, use PDF Delete Pages. Recover first if the file is broken, then select on the clean output.
Recovered page order has gaps
ExpectedSalvage keeps resolvable pages in document order. If a page in the middle was destroyed, the pages after it shift up, so the output's page 11 might be the original page 12. Open the output to confirm sequence and reinsert any missing originals with PDF Merge.
Bookmarks and form fields are gone after recovery
Not preservedThese are document-catalog features, not page content, so a page-salvage rebuild doesn't carry them. Page-level annotations do survive because they're attached to pages. Rebuild navigation or recover form values from a backup if needed.
A digital signature is invalid after recovery
Signature invalidatedWriting the recovered pages into a new document changes the bytes, which breaks any signature's coverage. Re-sign the recovered file with PDF Sign if you need it signed; verify with the PDF Signature Verifier.
Recovered file exceeds your tier's page or size limit
Limit reachedLimits apply to the input before recovery: free tier allows up to 2 MB and 50 pages, Pro up to 50 MB / 500 pages, Pro + Media up to 500 MB / 2,000 pages. A large broken document may exceed the free cap — upgrade to process it.
The PDF is genuinely encrypted
LimitedignoreEncryption: true lets the tool load past a stale marker but doesn't decrypt real content. If the file needs a password, decrypt with PDF Unlock first (you must know the password), then recover.
Output metadata is empty
By designRecovery saves with updateMetadata: false, so the output has a fresh empty metadata block instead of the source's title/author. Re-add metadata in your reader if your workflow needs it.
You need deeper salvage than pdf-lib can do
Out of scopeThis tool recovers what pdf-lib can parse; it does not carve raw bytes from objects pdf-lib rejects. A desktop utility (qpdf --recover, Ghostscript, or Mutool) can sometimes reconstruct more from a heavily-mangled file. Try this first — it instantly recovers the common 'good pages, bad structure' case.
Frequently asked questions
How does page recovery actually work?
The tool loads your broken PDF with pdf-lib's tolerant parser, which by default skips objects it can't resolve instead of aborting. It then copies every page whose references still resolve into a brand-new document and saves it with a fresh cross-reference table. The output is a clean, valid PDF of all the recoverable pages — everything happens in your browser.
Will it recover every page?
It recovers every page whose objects pdf-lib can resolve. Pages broken only by a bad xref or surrounded by a few garbage objects are usually fully recoverable. Pages whose content was physically destroyed in the source can't be recovered — no tool can rebuild missing bytes. Compare the recovered count to what you expected.
Can I choose which pages to recover?
No — there's no page picker. This tool salvages everything it can from a broken file. If you want to select specific pages, recover first to get a clean PDF, then use PDF Extract Pages to keep the ones you want or PDF Delete Pages to drop the rest.
How do I know how many pages should be there?
Check the file name, the email or source the PDF came from, the document's own table of contents if a recovered page shows it, or a colleague with an intact copy. Knowing the expected total lets you compare against the recovered count and pinpoint exactly which pages were lost.
What happens to the pages that can't be recovered?
They're dropped from the output and their data is gone from the source file. Recover them from a backup, an earlier saved version, or by re-exporting from the original application. You can then stitch the missing pages back into the recovered set with PDF Merge.
Is the recovery done in the cloud?
No. It runs entirely in your browser using pdf-lib and pdfjs-dist. A partially-broken file with sensitive content never leaves your device. Only anonymous usage counters are recorded when you're signed in.
Is there a risk of corrupting the file further?
No. The recovery reads your input and writes a separate new output — the original is never modified. If recovery yields fewer pages than you hoped, your original is intact and you can try a desktop salvage tool on it as well.
Why is the page order off in the recovered file?
Salvage keeps the resolvable pages in document order. If a page in the middle was destroyed, every page after it shifts up by one slot, so a gap appears. Open the output to confirm the sequence and reinsert any missing originals with PDF Merge to restore the correct order.
Why are bookmarks and form fields missing from the recovered file?
They live in the document catalog, not on the pages, so a page-salvage rebuild doesn't carry them. Page content and page-level annotations are preserved. Rebuild bookmarks in a desktop editor and recover form values from a backup if you need them.
How many pages can it handle?
Free tier handles files up to 2 MB and 50 pages; Pro up to 50 MB and 500 pages; Pro + Media up to 500 MB and 2,000 pages. Limits are checked on the input before recovery runs. A large broken document may need Pro to process.
Can it recover an encrypted broken PDF?
It loads past a stale encryption marker via ignoreEncryption: true, but it does not decrypt genuinely encrypted content. If the file requires a password, decrypt it with PDF Unlock first (you need the password), then run recovery on the result.
What if no pages can be recovered?
That means pdf-lib couldn't parse the file at all — the damage is beyond structural and the load throws. The tool reports the error rather than an empty PDF. Restore from a backup or re-export from the original source application; a desktop tool like qpdf may also reconstruct more from a severely mangled file.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.