How to audit all differences between two pdf document versions
- Step 1Open the PDF Compare tool — Open PDF Compare / Diff. Controlled documents are processed locally in your browser — nothing is uploaded, which keeps the comparison inside your environment.
- Step 2Add the approved baseline first — Drop the last-approved version as file A. Its lines become the
removedside. The queue shows its name, size, and page count for your record. - Step 3Add the revised version — Drop the proposed/new version as file B — its lines become
added. The tool takes exactly two files; add baseline-then-revision (there's no reorder control). - Step 4Click Process 2 files — The Process button enables once both files are queued. There's no options panel — the comparison is fixed and therefore consistent across runs, which is what an audit wants. Click to run.
- Step 5Capture the change record — The JSON result holds
pageCountA/pageCountB, thedifferencesarray (structural notes), and thetextDiffblock (added/removed lines, counts,identical, and the unified-diffreport). This is your enumerated list of changes. - Step 6File it with the change-control record — Click Download to save the
.json. Attach it to the change-management or document-control record alongside both source versions. For tamper-evident versioning and approvals, keep these in your controlled document system.
Mapping the report to a change-control record
Each report element and the audit-record field it supports.
| Report element | Audit question it answers | Record entry |
|---|---|---|
textDiff.report | What text changed, in context? | Attach the unified-diff string as the enumerated change list |
textDiff.addedCount / removedCount | How extensive is the revision? | Scope/impact note for the change request |
textDiff.identical | Was there any substantive text change? | If true, log 'no text change — non-substantive re-issue' |
differences page-count note | Were pages/sections added or removed? | Record structural change for the impact assessment |
differences page-size note | Did the document layout/template change? | Record format change separately from content change |
Compliance scope — what the diff does and doesn't establish
Be precise about evidentiary scope so the record isn't over-claimed.
| Compliance need | This tool | Where it actually belongs |
|---|---|---|
| Enumerate text changes between versions | Yes — added/removed lines + unified-diff report | This tool |
| Reproducible/deterministic output | Yes — identical input yields identical report | This tool |
| Prove who changed it and when | No — the diff has no author/timestamp data | Document-control system audit log |
| Tamper-evident version history | No — it compares two files you provide | Controlled document repository with versioning |
| Confirm a controlled doc wasn't altered after approval | No — content diff isn't signature verification | PDF Signature Verify |
Cookbook
Audit scenarios and the exact report shape. The per-page line alignment means each changed page is captured as a removed/added pair you can enumerate in the change record.
Controlled procedure: two steps revised
An SOP revision changed wording on pages 2 and 5. Each changed page is captured as a removed/added pair — a clean enumerated change list for the record.
textDiff.report:
- Page 2 … operator shall verify the seal manually …
+ Page 2 … operator shall verify the seal using gauge G-12 …
Page 3 text … (unchanged)
Page 4 text … (unchanged)
- Page 5 … record results within 24 hours …
+ Page 5 … record results within 4 hours …
addedCount: 2 removedCount: 2Non-substantive re-issue (cover date only)
Only the cover-page issue date changed; the procedure text is otherwise identical. The diff isolates the single changed page.
Result: differences: ["Text differs: 1 line(s) added, 1 line(s) removed"] textDiff.report: - Page 1 … Issue date: 2025-11-01 … + Page 1 … Issue date: 2026-06-01 … Record: minor/administrative change — date only.
Section added under change control
A new 'Records Retention' section was added, increasing the page count. The new page's text is captured as an addition.
Result: pageCountA: 22 pageCountB: 23 differences: ["Page count differs: 22 vs 23", "Text differs: 1 line(s) added, 0 line(s) removed"] textDiff.added: ["Records Retention … retain for 7 years …"] textDiff.removed: []
Approved version vs current — no change
Comparing the filed approved copy to the live copy confirms they're identical — evidence the controlled document in use matches what was approved.
Result: differences: [] textDiff.identical: true textDiff.unchanged: 22 textDiff.report: "No text differences — the two documents have identical text content."
Legacy version is a scan
The archived baseline is an old scan with no text layer, so the text diff can't run until it's OCR'd.
Result:
textDiff.extracted: false
textDiff.report:
"Text layer could not be extracted from one or both PDFs
(e.g. a scanned/image-only document). Run OCR first to
enable the text diff."
Fix: OCR the archived scan at /pdf-tools/pdf-ocr, then re-run the audit diff.Edge cases and what actually happens
Diff is not a tamper-evident audit system
Out of scopeThe tool enumerates differences between two files you supply; it does not record who changed what or when, and it can't prove a file wasn't substituted. Use it to generate the change list, but rely on your document-control system for tamper-evident versioning, approvals, and the audit log.
It does not verify signatures
Out of scopeA content diff is not signature validation. If the audit needs proof that an approved, signed controlled document wasn't altered after approval, run PDF Signature Verify — it checks the ByteRange digest, the PKCS#7/CMS signature, and whether the signature covers the whole file.
An archived baseline is a scan
Structural onlyOlder controlled documents are often archived as scans with no text layer, so textDiff.extracted is false and only the structural diff runs. OCR the scan with PDF OCR to enable a text diff against the revised version; note OCR may introduce recognition errors that show as spurious differences — review them.
Page reflow inflates the apparent change
Heads upBecause each page is compared as one joined line, an edit that reflows text across page boundaries can make many pages differ even when the substantive change is small. Use page count and the structural notes to gauge true scope, then read pages in order so the change record reflects substance, not layout shift.
Template/format change between versions
ReportedA page-size change (e.g. A4→Letter) is flagged in differences even if the text is identical. For a compliance record, log a format/template change separately from a content change — the structural notes let you distinguish them.
Whitespace/line-ending differences
NormalisedTrailing whitespace is trimmed per line and CRLF/CR normalised to LF before diffing, so cosmetic differences don't produce false change entries that would clutter the audit record. Genuine leading-indentation changes are preserved.
Files added baseline-last
By designFile A is removed, file B is added. Add the approved baseline first and the revision second so the record reads 'removed from baseline / added in revision.' If it reads inverted, remove a file with the X and re-queue in order — there's no reorder control.
Controlled document over the size limit
LimitFree tier caps each PDF at 2 MB; Pro raises it to 50 MB per file. Large validated documents with embedded diagrams may exceed the free cap — compress losslessly with PDF Compress (lossless) or upgrade. Avoid lossy compression on a controlled document you'll diff, as it can alter extracted text.
Encrypted controlled document
May failAn encrypted controlled PDF may block pdfjs text extraction, leaving extracted: false. Decrypt a working copy with PDF Unlock for the comparison, but keep the original intact in the controlled repository and note that you diffed a decrypted copy.
You need a regulator-ready, signed change report
Pair toolsThe JSON/report output is the enumerated change list, not a signed, formatted change-control report. Compile it into your QMS change-control form, and if the deliverable itself must be authenticated, add a digital signature and verify it with PDF Signature Verify.
Frequently asked questions
Can the diff report be included in an ISO 9001 / document-control record?
Yes — the report enumerates the specific text and structural changes between two versions, which is exactly the change evidence document-control processes ask for. It's deterministic, so re-running on the same files reproduces it. File the JSON alongside both source versions; the controlled-document system supplies the versioning, approval, and audit-log layers the diff itself doesn't provide.
What format is the diff report output?
A downloadable .json file containing the full comparison object: pageCountA/pageCountB, a differences array of structural notes, and a textDiff block with added, removed, unchanged, addedCount, removedCount, identical, and a unified-diff report string (- removed, + added, two spaces unchanged). There is no annotated-PDF output — the record is the JSON and the diff text.
How are minor formatting changes handled in the audit?
Text-content changes are captured; formatting-only changes (bold, font, size, colour) are not, because the tool compares extracted text. Page-size/template changes are reported structurally and should be logged separately from content changes. This separation is usually helpful for compliance, where format control and content control are distinct.
Does the report prove who made the change and when?
No. The diff shows what changed between the two files you provide; it carries no author or timestamp information. Attribution and timing belong to your document-control system's audit log. Use the diff for the change list and the controlled-document repository for the who/when/approval trail.
Is it deterministic enough to rely on for an audit?
Yes. The comparison uses a fixed LCS algorithm with no tunable options, so the same two PDFs always produce the same added/removed lists and the same report. That reproducibility is precisely what makes it usable as filed evidence — an auditor can re-run it on the same files and get an identical result.
How precisely does it locate a change?
To the page. pdfjs joins each page's text into one string, so the line-level diff aligns one line per page — a changed clause shows as that page's whole text removed and re-added. The record pinpoints which pages changed and shows the old and new text for each. For a word-level highlight you'd read the removed/added pair.
Can I audit a legacy version that only exists as a scan?
Run PDF OCR on the scan first to add a text layer, then compare. Without OCR, the scan has no extractable text and textDiff.extracted returns false (structural diff still runs). Note OCR can introduce recognition errors that surface as spurious differences, so review the diff and annotate any OCR artefacts in the record.
Are controlled documents kept within our environment?
Yes. Both files are read and diffed entirely in your browser — pdf-lib for structure, pdfjs for text, an in-process LCS for the diff — and the result panel confirms 0 bytes uploaded. Regulated documents never reach a server, which avoids the data-residency and confidentiality issues of a cloud diff service. Only an anonymous usage counter is recorded when signed in.
Does this also confirm an approved document wasn't tampered with?
Not on its own. A content diff confirms two files have the same (or different) text, but it doesn't verify a digital signature. If the audit needs assurance that a signed, approved document is byte-for-byte intact, run PDF Signature Verify, which checks the signature and whether it covers the whole file. Use both for a complete picture.
What's the file-size limit for an audit comparison?
2 MB per file on the free tier, 50 MB per file on Pro. Validated documents with embedded diagrams can be large; compress losslessly with PDF Compress (lossless) if needed. Avoid lossy compression on a document you'll diff for audit, since it can alter the extracted text and create misleading differences.
How do I distinguish a substantive change from a re-issue?
Check textDiff.identical and the counts. If identical is true (or only an issue-date/cover line differs), log it as a non-substantive/administrative re-issue. If body pages show removed/added pairs, those are substantive changes — enumerate them from the report. The structural notes separately flag any page-count or format change for the impact assessment.
Can I run the audit comparison automatically?
Yes. pdf-diff is a named tool slug, so you can pair the JAD runner and POST both versions to the local runner endpoint to get the JSON change record back without anything leaving your environment. This supports a controlled-document workflow where every revision is automatically diffed against its approved baseline and the report is filed.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.