How to convert a pdf contract to editable word text
- Step 1Decrypt the contract if it's password-protected — Many executed contracts are encrypted. If so, decrypt first with Remove PDF Password using the password you hold — pdf.js can't read an encrypted file.
- Step 2Open the converter and drop the PDF — Load the PDF to Word tool and add the contract. Extraction runs locally and automatically — nothing is uploaded.
- Step 3Verify the clause text in the preview — Skim the 5,000-character preview to confirm the text layer is real (a scanned, signed copy will be blank — OCR it first). Spot-check that defined terms and clause numbers are present.
- Step 4Download the .txt and open in Word — Download the file (e.g.
msa.pdf→msa.txt) and open or paste it into Word. It arrives as a single text body. - Step 5Rebuild structure, then enable Track Changes — Apply heading styles, restore the numbered-clause scheme with Word's multilevel list, and fix any wrapped lines. Then turn on Track Changes (Review → Track Changes) before you edit.
- Step 6Always diff against the original before sending — Compare your reconstructed working copy against the source PDF (Word's Compare, or read side-by-side). Never treat the extraction as a legally exact reproduction — it's a drafting aid, not the executed instrument.
Contract elements: what survives extraction
Text-layer extraction only. Anything that isn't selectable text must be rebuilt in Word.
| Contract element | Extracted? | What to do in Word |
|---|---|---|
| Clause body text & defined terms | Yes (text) | Comes across accurately for digital contracts. Verify defined terms against the original. |
| Clause numbering (8.3, (a), (i)) | Text characters only | The literal numbers are extracted, but auto-numbering/indent logic is not — rebuild with a Word multilevel list. |
| Headings (ARTICLE V, Section 3) | Text only, not styles | Re-apply Heading styles so the document outlines correctly. |
| Tables (fee schedules, SLAs) | Cell text, not grids | Use PDF to Excel for the table, then paste a real table into Word. |
| Signature blocks & initials | Text if typed; not images | Wet/ink signatures are images and won't extract — recreate the signature block as text. |
| Scanned / signed-and-scanned copies | No (empty) | Run PDF OCR first to create a text layer, then convert. |
Limits & the privacy guarantee
Why this is safe for confidential agreements, and how big a contract it handles.
| Property | Value |
|---|---|
| Where parsing happens | Your browser (pdf.js) — local only |
| Bytes uploaded | 0 |
| Free tier | 2 MB / 50 pages |
| Pro tier | 50 MB / 500 pages |
| Pro Media tier | 500 MB / 2,000 pages |
| Output | UTF-8 .txt (not .docx) |
Cookbook
Redline-prep recipes for legal users. Output blocks show roughly what the .txt holds; PII and party names are illustrative.
Extract a master services agreement for redlining
A digital-native MSA: full clause text comes across, ready to restructure and mark up in Word.
Input: acme-msa.pdf (28 pages, exported from Word by counsel) Workflow: 1. /pdf-tools/pdf-to-word -> auto-extract -> acme-msa.txt 2. Open in Word, apply multilevel list for clause numbers 3. Review -> Track Changes ON -> redline clause 8.3 Output (abbreviated): 8. LIMITATION OF LIABILITY 8.1 Except as set out in clause 8.2 ... 8.3 Neither party shall be liable for ...
Signed-and-scanned contract needs OCR first
An executed copy that was printed, signed, and scanned has no text layer. OCR it, then extract — but treat the OCR text as a draft to verify.
Input: signed-nda.pdf (scan of a wet-signed NDA) Preview: (empty) Fix: 1. /pdf-tools/pdf-ocr (English) -> searchable signed-nda PDF 2. /pdf-tools/pdf-to-word -> signed-nda.txt 3. Proofread OCR output against the scan before relying on any clause
Restore clause numbering in Word
Extraction gives you the literal numbers as text. Convert them to a real multilevel list so insertions renumber automatically during the redline.
Output (.txt) — numbers are plain text: 1. Term 1.1 This Agreement commences on the Effective Date. 1.2 The Initial Term is twelve (12) months. 2. Fees In Word: Select the clauses -> Home -> Multilevel List -> a legal scheme Now adding a new 1.2 auto-renumbers the rest.
Pull a fee schedule out as a real table
Don't reconstruct a fee table from spaced text. Send the table page to PDF to Excel, then paste a clean grid into the contract.
Goal: editable fee schedule in the redline 1. /pdf-tools/pdf-to-excel on the schedule page -> CSV of rows/cols 2. Open CSV in Excel -> copy the range 3. Paste into Word as a table (Keep Source Formatting) 4. Body clauses come from the pdf-to-word .txt
Decrypt before you can extract
An encrypted contract blocks extraction. Remove the password you legitimately hold, then convert.
Input: confidential-spa.pdf (password to open) Direct convert -> error: encrypted, cannot parse text Fix: 1. /pdf-tools/pdf-remove-password (your password) -> decrypted PDF 2. /pdf-tools/pdf-to-word -> confidential-spa.txt
Edge cases and what actually happens
Treating the extraction as a legally exact copy
Do not rely on itThe .txt is a drafting aid, not a certified reproduction. Reading order, spacing, and numbering can differ from the executed PDF. Always compare your working copy against the original (Word Compare or side-by-side) before exchanging redlines or relying on a clause.
Signed-and-scanned contract
No text layerAn executed copy that was scanned is an image — extraction returns nothing. Run PDF OCR to create a searchable layer, then convert, and proofread the OCR carefully because a single misread digit in a date or amount matters in legal text.
Encrypted / password-protected contract
Blocked until decryptedpdf.js can't read encrypted PDFs. Use Remove PDF Password with your password first. Only decrypt documents you're authorised to handle.
Clause numbering loses its auto-logic
Rebuild in WordNumbers like 8.3 or (a)(i) extract as literal text, not as a numbered list. Inserting a clause won't renumber the rest until you convert them to a Word multilevel list. Rebuild the scheme before heavy editing.
Defined-term capitalisation or spacing drift
Verify against originalJustified text and kerning can introduce extra spaces around defined terms ("the Company"). Normalise with Find & Replace and re-check each defined term against the source so the redline stays precise.
Tables and schedules come out as runs of text
Use PDF to ExcelFee schedules and SLA matrices extract as spaced text, not grids. Convert the relevant pages with PDF to Excel and paste a real table into Word.
Signature blocks and initials
Partly extractedTyped signature-block text extracts; ink signatures and initials are images and don't. Recreate the signature block as text in the working copy if you need it editable.
Contract exceeds the tier limit
RejectedLong master agreements with schedules can exceed 2 MB / 50 pages on Free. Upgrade to Pro (50 MB / 500 pages) or split with Extract Pages and convert the operative clauses separately.
Frequently asked questions
Will this give me an exact editable copy of the contract?
It gives you accurate clause text, not a legally exact .docx clone. Numbering schemes, indentation, table grids, and signature images are not reproduced — you rebuild those in Word. Always diff your working copy against the executed PDF before relying on it.
Is the contract uploaded to a server?
No. pdf.js parses it in your browser; the UI shows "0 bytes uploaded." This is the key reason it's appropriate for privileged or NDA-bound agreements — the terms never leave your machine.
Do I get a .docx I can open in Word?
You get a UTF-8 .txt. Open it in Word (File → Open) or paste it into your working draft, then save as .docx from Word. There is no auto-generated Word document with reconstructed clauses and tables.
How do I redline the result with Track Changes?
Open the .txt in Word, rebuild headings and clause numbering, then go to Review → Track Changes and switch it on before editing. Your insertions, deletions, and comments are then captured for the counterparty.
Will clause numbering be preserved?
The literal numbers (8.3, (a)) come across as text, but not as an auto-numbered list. Convert them to a Word multilevel list so the scheme renumbers correctly when you add or remove clauses.
My signed PDF returned no text — why?
It's almost certainly a scan of a printed, signed contract — an image with no text layer. Run PDF OCR first, then convert, and proofread the recognised text against the scan because legal text is unforgiving of OCR errors.
Can I convert a password-protected contract?
Not directly — encrypted PDFs can't be parsed. Decrypt first with Remove PDF Password (using a password you're authorised to use), then convert.
How do I keep a fee schedule as a real table?
Don't reconstruct it from spaced text. Run the schedule page through PDF to Excel to get CSV, open it in Excel, and paste a proper table into Word.
Are there options to configure for legal documents?
No. The converter has no settings — it extracts the text layer and produces a .txt. All legal-specific structuring (numbering, headings, definitions) is done by you in Word afterwards.
How long a contract can I convert?
Free: 2 MB / 50 pages. Pro: 50 MB / 500 pages. Pro Media: 500 MB / 2,000 pages. For longer master agreements, split with Extract Pages.
Is the extracted text safe to send back to the counterparty?
Send your reviewed Word redline, not the raw extraction. The .txt may have spacing/numbering artefacts and possibly OCR errors. Reconstruct, verify against the original, and only then exchange the .docx.
What if columns or recitals read out of order?
Multi-column or boxed layouts can interleave because extraction follows pdf.js item order. Check the preview and re-order in Word; for unusual layouts, extract the affected pages individually.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.