How to compare two pdf documents to find differences
- Step 1Open the PDF Compare tool — Go to the PDF Compare / Diff tool. It is a multi-file tool — the dropzone reads
Drop PDF files here. Everything runs in your browser; nothing is uploaded. - Step 2Add the original (version A) PDF — Drop the first/earlier version. It appears in the queued-files list with its name, size, and page count. Order matters: the first file is treated as A, so its lines become the
removedside when content is dropped. - Step 3Add the revised (version B) PDF — Drop the second/later version. The tool accepts exactly two files (
fileCountLimit: 2); B's new lines become theaddedside. There is no drag-to-reorder — if you queued them backwards, remove one with the X and re-add in the right order. - Step 4Click Process 2 files — The tool does not auto-run for multi-file compares — the Process 2 files button is disabled until both files are present. Click it to run. There is no options panel for this tool; comparison settings are fixed.
- Step 5Read the JSON result — The result panel shows the comparison object:
pageCountA,pageCountB, adifferencesarray of human-readable structural notes, and atextDiffblock withadded,removed,unchanged,addedCount,removedCount,identical, and a unified-diffreportstring. - Step 6Download the diff report — Click Download to save the full comparison as a
.jsonfile (named after the input). Attach it to a review thread, or parse it in a script. IftextDiff.identicalis true, the two documents have identical extractable text.
What the comparison report contains
The single JSON object returned by the tool. Structural fields come from pdf-lib; the textDiff block comes from the framework-free LCS core fed pdfjs-extracted text.
| Field | Meaning | Example value |
|---|---|---|
pageCountA / pageCountB | Page count of the first and second file | 12 / 13 |
differences | Human-readable structural notes: page-count mismatch, per-page size mismatch (>0.5 pt), and a one-line text-changed summary | ["Page count differs: 12 vs 13", "Text differs: 4 line(s) added, 2 line(s) removed"] |
textDiff.extracted | Whether a text layer was readable from both files. False means scanned/image-only input | true |
textDiff.added / removed | Arrays of the actual line strings present only in B / only in A | ["Page-4 text…"] |
textDiff.unchanged | Count of lines (pages) common to both files | 9 |
textDiff.identical | True when no lines were added or removed | false |
textDiff.report | Unified-diff-style string: - removed, + added, two spaces unchanged | Page 1 text\n- old line\n+ new line |
What it compares — and what it does not
Capabilities grounded in the implementation. Where a capability is out of scope, the matching JAD tool is named.
| Aspect | Detected? | Notes |
|---|---|---|
| Page count change | Yes | pageCountA vs pageCountB; surfaced in differences |
| Page size change | Yes | Per matched page, flagged when width or height differs by >0.5 pt (e.g. A4↔Letter) |
| Text additions / deletions | Yes (per page line) | LCS over text where each page is one line; granularity is per-page, not per-word |
| Word-level / inline redline | No | Output is whole-page lines added/removed, not character-level highlights |
| Formatting (bold, font, size, colour) | No | Only extracted text strings are compared; pure styling changes are invisible to the diff |
| Images / graphics / signatures | No | Image content is not diffed. For signature integrity use PDF Signature Verify |
| Scanned / image-only PDF | Structural only | No text layer → extracted: false; run PDF OCR first |
Cookbook
Concrete inputs and the exact shape of what the tool returns. The text diff aligns one line per page, so changes are reported page-by-page.
Two versions, identical text, different page size
Someone re-saved an A4 draft as US Letter without touching the words. The text diff is identical, but the structural diff flags the size change — which is exactly the kind of silent drift a visual skim misses.
Input A: report-v1.pdf (A4, 595x842 pt)
Input B: report-v2.pdf (Letter, 612x792 pt)
Result:
pageCountA: 8
pageCountB: 8
differences: [
"Page 1: size differs (595x842 vs 612x792)",
... (one per page)
]
textDiff.identical: true
textDiff.report: "No text differences — the two documents have identical text content."A single edited sentence on page 3
Because pdfjs joins each page into one string, an edit anywhere on page 3 shows as page 3's whole text removed and the new page 3 text added. You see exactly which page changed.
Result:
textDiff.addedCount: 1
textDiff.removedCount: 1
textDiff.report:
Page 1 text …
Page 2 text …
- Old page-3 text … fee is $4,000 …
+ New page-3 text … fee is $4,500 …
Page 4 text …Version B added a page
B has one extra page. Page count differs, and the new page's text appears as a single added line.
Result:
pageCountA: 10
pageCountB: 11
differences: [
"Page count differs: 10 vs 11",
"Text differs: 1 line(s) added, 0 line(s) removed"
]
textDiff.added: ["Appendix B text …"]
textDiff.removed: []Identical files
Comparing a file with itself (or a byte-for-byte copy) confirms a clean baseline.
Result: pageCountA: 5 pageCountB: 5 differences: [] textDiff.identical: true textDiff.unchanged: 5 textDiff.report: "No text differences — the two documents have identical text content."
One file is a scan with no text layer
If either PDF has no extractable text (a scanned page image), text extraction degrades gracefully: the structural diff still runs, but the text diff is skipped with a message.
Result:
pageCountA: 6
pageCountB: 6
differences: []
textDiff.extracted: false
textDiff.report:
"Text layer could not be extracted from one or both PDFs
(e.g. a scanned/image-only document). Run OCR first to
enable the text diff."
Fix: run both through /pdf-tools/pdf-ocr, then compare again.Edge cases and what actually happens
Files queued in the wrong order
By designThe first file dropped is A (its lines become removed), the second is B (its lines become added). There is no drag-to-reorder. If your additions and deletions look swapped, remove one file with the X button and re-add the versions in chronological order, then click Process 2 files again.
Scanned / image-only PDF (no text layer)
Structural onlypdfjs returns no text for image-only pages, so textDiff.extracted is false and the report tells you to OCR first. The structural diff (page count, page sizes) still runs. Run both files through PDF OCR to add a text layer, then re-compare.
Only formatting changed (bold, font, colour)
Not detectedThe tool compares extracted text strings, not styling. If the words are byte-for-byte the same but one version made a heading bold or changed the font, the text diff reports identical: true. Formatting-only changes are out of scope by design.
A whole page reflowed but the words are the same
ExpectedBecause each page is compared as a single joined line, two pages with the same words in the same reading order match even if line breaks moved. If reflow changed the pdfjs reading order, the joined strings differ and the page is reported as changed — review the page to confirm it is a layout-only change.
Trailing spaces or CRLF vs LF differences
NormalisedBefore diffing, each line is normalised: CRLF/CR collapse to LF, the final trailing newline is dropped, and trailing whitespace on every line is trimmed. Leading whitespace is preserved so genuine indentation changes still show. Stray trailing spaces from extraction never produce false positives.
Only one file dropped
BlockedThe Process button stays disabled until two files are present, and the handler throws Upload two PDF files to compare. if it is run with fewer than two. Add the second file to enable the comparison.
File over the size limit
LimitFree tier caps each PDF at 2 MB; Pro raises it to 50 MB per file. Large multi-hundred-page documents may also be slow to extract in the browser. Compress first with PDF Compress (lossless) if the file is bloated, or split the section you care about.
Encrypted / password-protected PDF
May failStructural loading ignores light encryption, but pdfjs text extraction can fail on a password-protected file, leaving extracted: false. Remove the password first with PDF Unlock or PDF Remove Password, then compare.
Two completely unrelated PDFs
ExpectedThere is no similarity threshold — the LCS simply finds whatever lines (pages) happen to match. Comparing two unrelated documents typically returns nearly everything as removed (from A) plus everything added (from B). That is correct behaviour, not an error.
You expected a colour-coded side-by-side view
Not availableThis tool outputs a JSON report and a unified-diff report string (-/+/space prefixes), not a rendered green/red overlay PDF and no accept/reject controls. Use the JSON or the report text; for a visual page render, open both files in a desktop PDF viewer alongside the report.
Frequently asked questions
Does this tool highlight changes in green and red on the PDF?
No. It returns a JSON report plus a unified-diff report string where removed lines start with - , added lines with + , and unchanged lines with two spaces. There is no rendered, colour-coded overlay PDF and no accept/reject buttons. The output is designed to be read or parsed, not visually marked up on the page.
How fine-grained is the text comparison — word level or line level?
It is a line-level LCS diff, and pdfjs joins all the text on a page into a single string. So in practice the comparison aligns one line per page: a change anywhere on a page surfaces as that page's whole text removed and the new text added. This reliably tells you which pages changed; it is not a word-by-word inline redline.
Can I export the diff as an annotated PDF?
No — the download is a .json file containing the full comparison object (page counts, structural differences, and the textDiff block including the unified-diff report). There is no annotated-PDF export. If you need a marked-up PDF, paste the report text into a desktop tool, or render both pages side by side manually.
Will it work on scanned PDFs?
Only for the structural part. A scanned/image-only PDF has no text layer, so pdfjs extracts nothing and textDiff.extracted comes back false with a message to run OCR first. Run both files through PDF OCR to add a searchable text layer, then compare again to get a real text diff.
Are formatting changes like bold or a font swap detected?
No. The diff compares extracted text strings only. If the words are identical but one version changed a heading to bold, used a different font, or changed text colour, the text diff reports the documents as identical. Page size changes are detected (structural diff), but in-text styling is not.
Does the order I add the files in matter?
Yes. The first file dropped is treated as A and the second as B. A's unique lines are reported as removed, B's unique lines as added. There is no reorder control, so add the older version first and the newer version second. If the result looks inverted, remove a file and re-add them in chronological order.
What exactly is in the differences array?
Human-readable structural notes: a Page count differs: X vs Y entry when counts don't match, a Page N: size differs (WxH vs WxH) entry per page whose width or height differs by more than 0.5 pt, and a one-line Text differs: A line(s) added, B line(s) removed summary when the text isn't identical. The detailed text lives in the textDiff block.
Are my documents uploaded to a server?
No. Both PDFs are read and compared entirely in your browser — pdf-lib for structure, pdfjs for text, and a framework-free LCS routine for the diff. The result panel confirms 0 bytes uploaded. Only an anonymous usage counter is recorded when you are signed in; the document content never leaves your device.
Why do two visually similar pages show as changed?
The diff compares the text pdfjs reads, in the order it reads it. If text reflowed enough to change that reading order, or invisible characters differ, the joined page strings won't match and the page is flagged. Open the page and compare visually; if the words are truly identical, treat it as a layout/extraction-order difference rather than a content edit.
What's the maximum file size I can compare?
Free tier allows up to 2 MB per PDF; Pro raises it to 50 MB per file. Very long documents can also be slow to extract in the browser. If a file is over the limit because of bloat rather than real content, shrink it first with PDF Compress (lossless).
Can I compare a PDF against a Word doc or a plain text file?
Not directly — this tool takes two PDFs. Convert the other document to PDF first, or pull both documents' text and diff that. To get just the text from a PDF for an external diff, use PDF to Text, which extracts the same pdfjs text layer this tool compares.
Can I run this comparison from a script or API?
Yes. pdf-diff is exposed as a tool slug, so you can pair the JAD runner and POST both PDF buffers to the local runner endpoint to get the same JSON object back — page counts, structural differences, and the textDiff block — without anything leaving your machine.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.