How to convert a pdf to editable word text
- Step 1Open the PDF to Word converter — Load the PDF to Word tool. It is a pure browser tool — pdf.js parses the file on your own machine and nothing is uploaded.
- Step 2Drop in your PDF — Drag the file onto the dropzone (or click to browse). It accepts a single PDF. Extraction starts automatically — there are no settings to choose first.
- Step 3Read the on-screen preview — The extracted text appears in a scrollable panel, truncated to the first 5,000 characters. Skim it to confirm the text layer came through (a blank or garbled preview is the signal you have a scanned PDF — see the cookbook).
- Step 4Download the .txt file — Click Download. You get a UTF-8 text file named after your PDF (for example
contract.pdf→contract.txt) containing the full text, not just the preview. - Step 5Open or paste into Word — Open the
.txtin Word, Google Docs, or LibreOffice — or paste its contents into an existing document. Word imports plain text as a single body stream; you then apply Heading 1/2, lists, and tables using Word's own styles. - Step 6Apply structure and clean line breaks — Use Find & Replace in Word to tidy the predictable artefacts: collapse runs of spaces, re-join lines that were hard-wrapped in the PDF, and promote section titles to heading styles. Five minutes of styling beats an hour of un-boxing a fake .docx.
What actually carries across (and what doesn't)
The converter extracts the PDF text layer only. Everything that isn't selectable text is, by definition, not in that layer.
| PDF element | In the output? | Why / what to do instead |
|---|---|---|
| Body text & headings (text) | Yes — text only | All selectable characters are extracted in reading order. The words come across; the heading style does not — re-apply Heading 1/2 in Word. |
| Paragraph breaks | Partly | Pages are separated by a blank line (\n\n). Within a page, text items are joined with single spaces, so original line/paragraph wrapping is approximate — re-flow in Word. |
| Tables | Text only, not grids | Cell text is extracted but not as a Word table. For real tabular structure use PDF to Excel (CSV) or PDF table to JSON. |
| Images, logos, charts | No | Images are not text and are skipped. Re-insert them in Word, or pull page renders with PDF to JPG. |
| Fonts, colours, sizes | No | Styling is dropped — you get raw text. Apply your template's fonts and styles in Word after pasting. |
| Scanned / image-only pages | No (empty) | There is no text layer to read. Run PDF OCR first to add one, then convert. |
File-size & page limits by tier
PDF-family limits enforced before the converter runs (lib/tier-limits.ts).
| Tier | Max file size | Max pages | Batch files |
|---|---|---|---|
| Free | 2 MB | 50 pages | 1 |
| Pro | 50 MB | 500 pages | 5 |
| Pro Media | 500 MB | 2,000 pages | 50 |
Cookbook
Concrete before/after for the everyday "I just need to edit this PDF in Word" job. The Output blocks show roughly what lands in the .txt file.
A two-page Word-exported PDF, round-tripped back to editable text
The cleanest case: the PDF was exported from Word, so it has a perfect text layer. Extraction is near-lossless at the character level — you lose styling, not words.
Input: proposal.pdf (2 pages, exported from Word, 180 KB) Workflow: 1. Drop proposal.pdf onto /pdf-tools/pdf-to-word 2. Preview shows clean text -> click Download 3. proposal.txt opens in Word Output (proposal.txt, abbreviated): Project Proposal Prepared for Acme Ltd (page 1 body text in reading order...) (blank line marks the page break) (page 2 body text...)
Spotting a scanned PDF before you waste time in Word
If the preview is blank or shows only stray characters, the PDF has no text layer. Catch it here, OCR it, then convert — don't paste an empty file into Word and wonder why.
Input: scanned-letter.pdf (1 page, photo of a printed letter) Preview panel: (empty) <- no extractable text layer Fix: 1. Run /pdf-tools/pdf-ocr (language: English) -> searchable PDF 2. Drop the OCR'd PDF onto /pdf-tools/pdf-to-word 3. Now the preview shows the recognised text -> Download
Re-joining lines that were hard-wrapped in the PDF
PDFs often hard-wrap each visual line. The extractor preserves the words but the breaks come through as the source had them. One Find & Replace in Word reflows the prose.
Output (.txt) — wrapped as in the PDF: The quarterly results exceeded expectations across every region except EMEA. In Word, Find & Replace (regex / wildcards): Find: ([a-z,])\n([a-z]) Replace: \1 \2 Result: The quarterly results exceeded expectations across every region except EMEA.
Convert only the pages you'll actually edit
Editing one section of a 120-page PDF? Extract that range first so the .txt is short and the free-tier page limit isn't a factor.
Goal: edit pages 12-18 of handbook.pdf in Word 1. /pdf-tools/pdf-extract-pages -> pages "12-18" -> handbook.extract-pages.pdf 2. /pdf-tools/pdf-to-word on that 7-page PDF 3. handbook.extract-pages.txt -> paste the section into Word
Encrypted PDF: decrypt first, then convert
A password-protected PDF can't be parsed for text until it's decrypted. Remove the password (with the password you own), then run the converter.
Input: signed-offer.pdf (opens only with a password) Direct convert -> error: the file can't be parsed while encrypted Fix: 1. /pdf-tools/pdf-remove-password (enter your password) -> decrypted PDF 2. /pdf-tools/pdf-to-word -> signed-offer.txt -> edit in Word
Edge cases and what actually happens
You expected a .docx but got a .txt
By designThis tool extracts the PDF text layer and downloads it as a UTF-8 .txt file (named yourfile.txt). It does not synthesise a Microsoft Word .docx with styles, tables, and images. Open or paste the .txt into Word and apply formatting there. If you specifically need structured tables out of the PDF, use PDF to Excel instead.
Scanned or photographed PDF
No text layerImage-only PDFs (scans, phone photos, faxed pages) contain pixels, not selectable text, so the preview comes back empty. Run PDF OCR first — it recognises the glyphs and emits a searchable PDF — then convert that to text.
Encrypted / password-protected PDF
Blocked until decryptedpdf.js cannot read the text of an encrypted PDF. Decrypt it first with Remove PDF Password (using the password you legitimately hold) or Unlock PDF for owner-restricted files, then run the converter.
Multi-column layout (newsletter, academic paper)
Reading order may differText is extracted in the order pdf.js reports items, grouped per page. For two- and three-column layouts the columns can interleave rather than reading down one column then the next. Skim the preview; if columns are scrambled, paste into Word and re-order the blocks manually.
Spacing looks off — extra or missing spaces
ExpectedWithin each page, text fragments are joined with a single space, so kerned or justified text can pick up extra spaces, and some glyph runs can lose them. This is cosmetic — fix with Word's Find & Replace (collapse multiple spaces to one).
PDF over the tier size or page limit
RejectedFree tier caps at 2 MB / 50 pages. A larger document is blocked before extraction. Either upgrade (Pro = 50 MB / 500 pages) or split it first with Extract Pages and convert each part.
Ligatures and special glyphs
Usually preservedWhether fi, fl, or ff come back as separate letters or as a single ligature character depends on the font's ToUnicode map embedded in the PDF. Output is UTF-8, so any glyph the PDF maps correctly survives; a small number of decorative fonts map ligatures to private-use code points that may look odd — search-and-replace fixes those.
Hidden / off-page text comes through
ExpectedThe extractor returns the whole text layer, including text positioned off the visible page or set to a tiny/transparent size (sometimes used for SEO or watermarks). If unexpected strings appear, they were in the PDF's text layer — review and delete them in Word.
Form fields and their values
Partly extractedStatic text on a form is extracted; the contents of interactive AcroForm fields may not appear in the text layer. To pull field names and values specifically, use a form-aware tool such as PDF Form Extractor.
Frequently asked questions
Do I get a real .docx Word file?
No — and that's deliberate. The tool extracts the PDF's text layer and downloads it as a UTF-8 .txt file. You open or paste that into Microsoft Word, Google Docs, or LibreOffice and apply your own styles. There is no auto-generated .docx with reconstructed tables and images, because those reconstructions are usually more trouble to clean up than starting from clean text.
Why .txt instead of .docx?
Reliability and honesty. A faithful .docx reconstruction from a PDF requires guessing styles, table grids, and image placement, and the results are routinely messy. Clean extracted text drops into Word instantly and you control the formatting. If you need tabular data structured, PDF to Excel is the right tool; for Markdown, use PDF to Markdown.
Is my document uploaded anywhere?
No. The PDF is parsed in your browser with pdf.js. Nothing is sent to a server — the on-screen badge reads "Local browser processing · 0 bytes uploaded." This is the main reason to use it for offer letters, NDAs, and other confidential drafts.
Will formatting and fonts be preserved?
No. You get the text content; fonts, colours, sizes, headings, and layout are not carried into the .txt. Re-apply your styles in Word after pasting. Paragraph breaks are approximate — pages are separated by a blank line and intra-page wrapping comes through as the PDF had it.
Does it convert tables into Word tables?
No. Table cell text is extracted, but as plain text, not as a Word grid. For structured tables, use PDF to Excel (CSV output, columns detected by position) or PDF table to JSON.
What about images and charts?
They are skipped — images are not text. Re-insert graphics in Word, or render pages to images with PDF to JPG and paste those in.
It returned nothing / a blank preview. Why?
Your PDF is almost certainly a scan or photo — an image with no selectable text layer. Run PDF OCR first to create a searchable PDF, then convert that. A blank preview is the tool telling you there was no text to extract.
Are there any settings to configure?
No. The converter has no options panel — it runs automatically as soon as you drop the file, shows a preview, and offers a Download button. Any tuning (re-joining lines, fixing spacing, applying styles) happens afterwards in Word.
How big a PDF can I convert?
Free tier: up to 2 MB and 50 pages. Pro: 50 MB and 500 pages. Pro Media: 500 MB and 2,000 pages. Over the limit, split the file with Extract Pages and convert sections.
Can I open the .txt directly in Word without copy-pasting?
Yes. In Word choose File → Open and pick the .txt; Word imports it as a plain-text body you can then style. Google Docs and LibreOffice open .txt the same way. Copy-paste works too if you want it inside an existing document.
What's the difference between this and the PDF to Text tool?
Functionally none — both extract the text layer and produce a .txt file. The PDF to Text tool is framed for search/NLP/plumbing workflows; this page is framed for the "I want to edit it in Word" workflow. Pick whichever name matches your intent.
Will reading order always be correct?
For single-column documents, yes. Multi-column layouts (papers, newsletters) can interleave columns because extraction follows the order pdf.js reports text items in. Check the preview; re-order blocks in Word if needed.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.