How to extract text from a legal brief pdf for editing
- Step 1Open the extractor — Go to PDF to Plain Text. Processing is local (pdf.js) — privileged documents never leave your device, which keeps you within confidentiality and no-upload policies.
- Step 2Drop the brief or opinion PDF — Drag the filing in. It auto-extracts every page — no options to set. PACER, CourtListener, and publisher PDFs of born-digital filings extract fully.
- Step 3Check the preview for completeness — Skim the preview. If a filing you know has text comes out blank, it's a scanned exhibit — run it through OCR first, then re-extract (and treat OCR text as draft).
- Step 4Download the .txt and open it — Save
<name>.txt(UTF-8) and open it in your editor or Word. Use blank lines as page markers when you need to locate a passage by page. - Step 5Search, quote, and pull citations —
Ctrl+Ffor the term, party name, or reporter string. Copy quotes and citation strings into your work product or cite-checking workflow. - Step 6Verify every quote against the filed PDF — Before a quote or pin cite goes into a brief, confirm it character-for-character against the original PDF — extraction can interleave columns or carry header text. You own the accuracy of what you file.
Legal document types and how they extract
Behaviour by source. The dividing line is born-digital (typed) vs. scanned (image-only).
| Source | Typically… | Extracts? |
|---|---|---|
| PACER / ECF e-filed brief | Born-digital (filed from a word processor) | Yes — fully |
| CourtListener / Google Scholar opinion | Born-digital text PDF | Yes |
| Westlaw / Lexis printed-to-PDF | Born-digital with publisher headers/footers | Yes (strip the publisher boilerplate) |
| Scanned exhibit / old filing | Image-only (photocopied or scanned) | No — needs OCR first |
| Signed/locked filing (copy disabled, opens fine) | Born-digital with a copy restriction | Yes (pdf.js reads the text layer) |
| Sealed file requiring a password to open | Encrypted | No — decrypt first (Remove Password) |
Accuracy and confidentiality notes
What to rely on and what to double-check before anything reaches a brief.
| Concern | Reality |
|---|---|
| Privilege / upload | Local processing in your browser; 0 bytes uploaded |
| Verbatim quotes | Faithful to the text layer — still verify against the PDF |
| Footnotes | Extracted as text; verify position/attachment |
| Multi-column / sidebars | May interleave — re-read carefully before quoting |
| Page citations | Blank line marks page breaks; confirm the page number |
| OCR'd exhibits | Treat as draft; OCR can misread characters |
| Free tier | 2 MB / 50 pages (split long records) |
Cookbook
Research workflows for getting accurate, quotable text out of legal PDFs.
Find and quote a passage from an opinion
Extract the opinion, search for the holding language, copy it, then verify against the PDF before it goes in your brief.
1. Drop opinion.pdf → opinion.txt 2. Ctrl+F: "abuse of discretion" 3. Copy the surrounding sentence as a block quote 4. VERIFY the quote char-for-char vs. opinion.pdf 5. Add the pin cite (confirm the page via the blank-line page breaks)
Harvest citation strings for cite-checking
Pull every reporter citation out of a brief into a list your cite-checker or a regex can process.
Extract brief.txt, then (example regex for reporters):
\d{1,4}\s+[A-Z][A-Za-z.]+\s+\d{1,4}
→ 410 U.S. 113 ; 347 U.S. 483 ; 5 F.3d 1255
Review the matches against the brief before relying on them.OCR a scanned exhibit, then extract
An older exhibit is a scan with no text layer. Add one with OCR, then extract — and treat the result as draft.
1. exhibit-A.pdf (scanned) → /pdf-tools/pdf-ocr → ocr.pdf 2. ocr.pdf → /pdf-tools/pdf-to-text → exhibit-A.txt 3. Use for searching; verify any quote against the original exhibit image (OCR can misread)
Strip Westlaw/Lexis publisher boilerplate
Reporter PDFs repeat the publisher header/footer on every page; remove the recurring lines so your quotes are clean.
Raw page text: © 2026 Thomson Reuters. No claim to original ... 3 The court held that ... Cleaned (drop the © line and bare page number): The court held that ...
Locate a quote's page via the page breaks
Pages are separated by a blank line. Count separators up to your passage to find the page for the pin cite, then confirm in the PDF.
pages = open('brief.txt', encoding='utf-8').read().split('\n\n')
for i, p in enumerate(pages, 1):
if 'res judicata' in p:
print('appears on page', i) # confirm in brief.pdfEdge cases and what actually happens
Scanned exhibit or older filing
Empty outputImage-only filings have no text layer and extract blank. Many exhibits and pre-ECF documents are scans. Run them through OCR first — and treat OCR'd text as a draft research aid, not an authoritative source, because OCR can misread characters in a way that matters legally.
Quote must be verified before filing
verify requiredExtraction is faithful to the text layer, but you remain responsible for the accuracy of anything you quote. Column interleaving, stray header text, or a font-mapping quirk can subtly alter a passage. Always confirm a quote and pin cite character-for-character against the filed PDF.
Two-column or sidebar layout
may interleaveSome opinions and reporter pages use columns or marginal notes; the tool joins runs in pdf.js order and can interleave them. Re-read extracted passages carefully before quoting, especially across a column boundary.
Sealed/encrypted file needs a password to open
fails to openA file encrypted so it can't open without a password can't be read by pdf.js. Decrypt a copy you're authorized to access with PDF Remove Password (you must know the password) before extracting.
Copy-restricted but openable filing
SupportedA filing that opens without a password but disables copying still extracts here — pdf.js reads the text layer rather than honouring the copy flag. Confirm you're authorized to extract it; the tool doesn't change the document's restriction.
Footnote text and placement
Preserved as textFootnotes are text runs, so footnote citations are extracted. Their reading position depends on where the runs sit on the page, so verify that a footnote attaches to the right sentence before quoting it or its citation.
Long record exceeds free-tier limits
blockedFree extraction caps at 2 MB / 50 pages. A long brief or multi-document record is blocked with an upgrade prompt. Split it with PDF Split by Range and extract each part, or use Pro limits (50 MB / 500 pages).
Privileged document must not be uploaded
Local onlyBy design, extraction runs in your browser and uploads nothing (the result panel shows 0 bytes uploaded), which is what makes it appropriate for privileged filings and attorney work product. Avoid server-side PDF-to-text sites for these documents.
Frequently asked questions
Will footnote citations be included in the extracted text?
Yes — footnotes are text runs, so footnote text and citations are extracted. Their position in the output depends on where the runs sit on the page, so verify that a footnote attaches to the correct sentence (and that its citation is intact) before you rely on it.
Can I extract text from a court opinion PDF?
Yes — born-digital opinions from PACER/ECF, CourtListener, Google Scholar, or publisher PDFs extract fully. Reporter PDFs from Westlaw/Lexis also extract, but you'll want to strip the publisher header/footer boilerplate that repeats on each page.
Does this work for legal documents in other languages?
Yes — any language with an embedded, Unicode-mapped font extracts in its native script. If a document extracts as boxes, the font lacks a Unicode map and you'll need OCR. As always with foreign-language filings, have quotes verified by a qualified reader.
Is it safe to use on privileged or confidential filings?
Yes — extraction runs entirely in your browser via pdf.js and uploads nothing (the result panel confirms 0 bytes uploaded). That's the key reason to use this rather than a server-side conversion site for privileged documents and attorney work product.
Do I still need to verify quotes against the original PDF?
Always. The tool extracts the text layer faithfully, but column interleaving, stray header text, or a font quirk can alter a passage, and you are responsible for the accuracy of what you file. Confirm every quote and pin cite character-for-character against the filed PDF.
What about scanned exhibits?
Scanned (image-only) exhibits have no text layer and extract blank. Add a text layer with the PDF OCR tool first, then extract — but treat OCR'd text as a draft research aid only, because OCR can misread characters in ways that matter in a legal context.
Can I extract citations automatically?
Not as a built-in feature, but once you have the .txt you can run a reporter-citation regex (see the cookbook) or feed it to a cite-checker. Review every matched citation against the brief before relying on it — automated extraction surfaces candidates, it doesn't certify them.
How do I find which page a passage is on for a pin cite?
Pages are separated by a blank line in the output. Count the separators up to your passage (or split on \n\n in a script) to identify the page, then confirm that page number against the filed PDF before using it as a pin cite.
Will a signed or locked filing extract?
If it opens without a password but only disables copying, yes — pdf.js reads the text layer regardless of the copy flag (extract only documents you're authorized to). If it requires a password to open (e.g. a sealed file), decrypt an authorized copy first with PDF Remove Password.
Is there a length limit on briefs?
On the free tier, 2 MB and 50 pages per file. Pro raises that to 50 MB and 500 pages. For a long brief or a multi-document record, split it with PDF Split by Range and extract each part.
Will the output keep the brief's formatting?
No — it's plain text, so headings, indentation, and emphasis are dropped. If you need editable formatted text, try PDF to Word; for structured exhibits with tables, PDF Table to JSON or PDF to Excel.
Can I compare two versions of a brief?
Extract each to text and diff them in your editor for a quick line comparison, or use PDF Compare / Diff for a structural and line-level comparison of the two PDFs directly.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.