How to track text changes between two pdf document drafts
- Step 1Open the PDF Compare tool — Open PDF Compare / Diff. Drafts are processed locally in your browser — nothing leaves your device.
- Step 2Add the earlier draft (version n) first — Drop the older draft as file A. Its lines become the
removedside, so the earlier version goes first. It appears in the queue with name, size, and page count. - Step 3Add the later draft (version n+1) — Drop the newer draft as file B — its new lines become
added. The tool takes exactly two files and has no reorder control, so add them oldest-then-newest. - Step 4Click Process 2 files — The Process button enables once both drafts are queued. There is no options panel for this tool — settings are fixed — so click to run.
- Step 5Review the tracked changes — Read
textDiff.report: lines starting-were in the earlier draft and dropped, lines starting+are new in the later draft, two-space lines are unchanged. TheaddedCount/removedCountgive a quick magnitude of the revision. - Step 6Save as a change-log entry — Click Download to save the comparison
.json, or copy thereporttext into your document's revision history. Repeat for each successive pair (n→n+1, n+1→n+2) to build a full evolution log.
Turning the diff into a change-log entry
Each report element and the review-history line it produces.
| Report element | What it tells the reviewer | Change-log use |
|---|---|---|
textDiff.addedCount / removedCount | Magnitude of the revision (how many page-lines changed) | "v3→v4: 5 page-lines added, 2 removed" |
textDiff.report (+/- lines) | The actual new and dropped text, in document order | Paste the relevant +/- lines as the substantive summary |
textDiff.identical | Whether the draft text changed at all | "No substantive text change — re-export only" |
differences page-count note | A section/page was added or removed | "Section inserted: page count 18→19" |
differences page-size note | The document was re-laid-out | "Template change: A4→Letter" |
Granularity of the draft diff
Set reviewer expectations on what 'tracked changes' means here versus a word processor.
| Change type | How it appears in the report | Reviewer action |
|---|---|---|
| Edit on one page | That page's text as one - line + one + line | Read the pair to find the changed sentence |
| New page/section | An + added line; page count rises | Note the inserted section in the log |
| Deleted page/section | A - removed line; page count falls | Confirm the deletion was intended |
| Reordered pages | Affected pages show as removed + added | Verify it's a move, not new content |
| Formatting-only change | Not shown — text is identical | Compare visually if styling matters |
Cookbook
Successive-draft scenarios and the exact report shape. Remember the diff aligns one line per page, so a paragraph edit reads as a page-level removed/added pair.
v3 → v4: one paragraph rewritten
A reviewer rewrote a paragraph on page 5. The page shows as removed and re-added; everything else is unchanged.
textDiff.report:
Page 4 text …
- Page 5 … The methodology relied on a single survey wave …
+ Page 5 … The methodology used three survey waves over six months …
Page 6 text …
addedCount: 1 removedCount: 1 unchanged: (rest of pages)v4 → v5: a new section inserted
The later draft added an 'Assumptions' section, growing the page count by one.
Result: pageCountA: 17 pageCountB: 18 differences: ["Page count differs: 17 vs 18", "Text differs: 1 line(s) added, 0 line(s) removed"] textDiff.added: ["Assumptions … all figures are nominal …"] textDiff.removed: []
v5 → v6: re-exported, no text change
The author only regenerated the PDF — no words changed. Useful to prove a circulated build is the same draft you signed off.
Result: differences: [] textDiff.identical: true textDiff.unchanged: 18 textDiff.report: "No text differences — the two documents have identical text content."
Template switch between drafts
Between drafts the team moved from A4 to Letter. Text is unchanged but every page's size differs.
Result:
pageCountA: 12
pageCountB: 12
differences: [
"Page 1: size differs (595x842 vs 612x792)",
... one per page
]
textDiff.identical: trueOne draft is a scanned markup
A reviewer printed the draft, annotated it by hand, and scanned it back. The scan has no text layer, so the text diff can't run.
Result:
textDiff.extracted: false
textDiff.report:
"Text layer could not be extracted from one or both PDFs
(e.g. a scanned/image-only document). Run OCR first to
enable the text diff."
Fix: OCR the scan at /pdf-tools/pdf-ocr (the handwriting won't OCR,
but typed text will), then compare the typed content.Edge cases and what actually happens
Drafts written by different authors
SupportedAuthorship is irrelevant — the tool compares the extracted text of the two files, so it works whether the same person or different collaborators produced the two drafts. The diff reflects content changes only, not who made them.
Drafts have different page counts
ExpectedWhen a draft adds or removes pages, the structural diff reports Page count differs: X vs Y, and the text on the added or removed pages appears as added/removed lines. This is the normal way inserted or deleted sections show up between drafts.
A scanned/handwritten markup draft
Structural onlyA scanned draft (or hand-annotated print) is image-only, so textDiff.extracted is false and only the structural diff runs. Run PDF OCR on the scan to recover typed text; handwritten annotations generally won't OCR reliably, so review those manually.
Heavy reflow across many pages
Heads upIf a small edit early in the document pushes text across many page boundaries, lots of pages can show as changed because each page's joined string shifted. Use page count plus the structural notes as your first read, then go page by page to separate genuine edits from reflow.
Pages reordered between drafts
ExpectedReordering shows the affected pages as both removed and added — the LCS doesn't track moves. Matching text appearing once as a removal and once as an addition is the tell-tale of a reorder rather than new content.
Only whitespace/line-wrap changed
NormalisedTrailing whitespace is trimmed per line and CRLF/CR are normalised to LF before diffing, so cosmetic trailing-space or line-ending changes don't create false 'changes'. Genuine leading-indentation changes are preserved and will show.
Files queued newest-first
By designFile A is the removed side and file B is the added side. If your additions and deletions look inverted, you queued the newer draft first — remove a file with the X button and re-add oldest-then-newest. There's no reorder handle.
Draft over the size limit
LimitFree tier caps each PDF at 2 MB; Pro raises it to 50 MB per file. An image-heavy report draft may exceed the free cap — shrink it with PDF Compress (lossless) or upgrade.
You expected accept/reject controls
Not availableThis tool detects and reports changes; it does not provide accept/reject or merge controls like a word processor's track-changes view. Use the JSON/report output to inform your edits in the authoring tool, then re-export and compare again.
Comparing non-consecutive drafts (v3 vs v7)
ExpectedYou can compare any two drafts, not just consecutive ones — but the diff will show the cumulative change across all the intermediate versions, which can be large. For a clean evolution log, compare consecutive pairs (n→n+1) and chain the entries.
Frequently asked questions
Can I compare drafts written by different authors?
Yes. The tool diffs the text content of the two PDFs regardless of who wrote them. Authorship metadata isn't used — it compares what's on the page. Drop the earlier draft as file A and the later one as file B, click Process 2 files, and the report shows what text was added and removed between them.
Is the diff report suitable as an audit trail of document changes?
It's a solid, deterministic record of the text differences between two specific drafts — re-running on the same files reproduces it. But it isn't a versioned audit system on its own: it doesn't log who changed what or when. For a formal audit trail, store the JSON report alongside both source PDFs in a document management system that provides versioning and access logging.
What happens if the two drafts have different page counts?
The structural diff reports Page count differs: X vs Y, and the text on any page that exists in only one draft appears in the added or removed list. That's exactly how an inserted or deleted section shows up — you'll see both the count change and the new/dropped text.
How detailed is the change tracking — sentence by sentence?
It's page by page, not sentence by sentence. pdfjs joins each page's text into one string, so the line-level diff aligns one line per page: an edit shows as that page's whole text removed and re-added. You read the removed/added pair to find the changed sentence. It pinpoints the page and shows the new and old text for it.
Can I build a full change log across many drafts?
Yes — compare consecutive pairs (v1→v2, then v2→v3, and so on) and record each report as a log entry. Comparing consecutive drafts keeps each entry focused on a single revision; comparing far-apart drafts (v1→v7) shows the cumulative change, which is harder to read as a history.
Will it ignore changes that are only re-formatting?
It will report the documents as identical text if only formatting changed, because it compares extracted text, not styling. Cosmetic trailing-whitespace and line-ending differences are normalised away too. Page-size changes (e.g. a template switch) are caught structurally, but bold/font/colour changes within the text are not detected.
Are my unreleased drafts kept private?
Yes. Both PDFs are read and compared in your browser — pdf-lib for structure, pdfjs for text, an in-process LCS for the diff. The result panel confirms 0 bytes uploaded. Drafts of unpublished reports, policies, or standards never reach a server; only an anonymous usage counter is recorded when you're signed in.
A draft was annotated by hand and scanned — can I diff it?
The text diff needs a text layer. A scanned draft is image-only, so textDiff.extracted returns false. Run it through PDF OCR to recover the typed text and then compare. Handwritten margin notes usually won't OCR cleanly, so review those manually; the structural diff still works on the scan.
Why do so many pages show as changed when I only edited one line?
A single edit early in the document can reflow text across later page boundaries, so each affected page's joined text changes and the page is flagged. Use the page count and structural notes to gauge whether it's a big edit or mostly reflow, then read the pages in order to confirm where the real change is.
Which draft should I add first?
Add the earlier draft as file A and the later draft as file B. A's unique lines are reported as removed and B's as added, which reads as 'this was taken out, this was added.' There's no reorder control, so if the report reads inverted, remove a file and re-add oldest-then-newest.
Does it show me changes in green and red?
No. It outputs a JSON object and a unified-diff report string where added lines start with + and removed lines with - . There's no colour-coded overlay PDF and no accept/reject buttons. The text-based report is meant for reading, copying into a change log, or scripting.
Can I automate draft comparison in a pipeline?
Yes. pdf-diff is a named tool slug, so you can pair the JAD runner and POST both draft PDFs to the local runner endpoint to get the same JSON report — added/removed lines, counts, and the unified-diff report — automatically. Useful for a CI-style check that fails when a controlled document's text changes unexpectedly.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.