Convert a PDF to Markdown Text — Free Browser Tool

How to convert a pdf document to markdown text

Step 1
Confirm the PDF has a real text layer — Open the PDF and try to select a sentence with your cursor. If text highlights, it is born-digital and will convert. If nothing selects, it is a scan or photo — run PDF OCR first to add a text layer, then come back here.
Step 2
Drop the PDF onto the converter — Use the dropzone above. The tool reads the file in your browser with pdf.js. There is no Settings panel and nothing to configure — conversion starts automatically the moment a valid PDF is added.
Step 3
Watch it auto-convert — The tool extracts text page by page, emits a ## Page N heading for each page, and splits each page into one-sentence-per-line Markdown. For a clean, born-digital document this takes a second or two.
Step 4
Review the preview — The result panel shows the first ~5,000 characters of the generated Markdown plus an output-size stat. Skim it to confirm the text came through readable and the page headings line up with the source.
Step 5
Download the .md file — Click Download. The file saves as yourfilename.md with the text/markdown type and UTF-8 encoding. The full output is saved, not just the previewed portion.
Step 6
Polish in your Markdown editor — Because original headings, lists, and bold are not reconstructed, expect to promote the section titles to #/##, re-create bullet lists, and wrap any code in fenced blocks. The conversion gives you clean text to work from, not a finished document.

What the converter preserves — and what it doesn't

The output is generated text, not a faithful re-rendering of the PDF's layout. Knowing the difference saves you from chasing formatting that was never extracted.

PDF element	In the Markdown?	What actually happens
Body text (born-digital)	Yes	Read via `page.getTextContent()`, joined in pdf.js order, split into one-sentence-per-line.
Page boundaries	Yes	Each page is preceded by a `## Page N` Markdown heading — the only Markdown syntax the tool emits.
Original headings (titles, H1/H2)	As plain text only	A title in the PDF becomes a normal text line. It is not turned into a `#`/`##` heading — the tool can't tell a heading from body text.
Bold / italic	No	Font weight and style are layout attributes, not text. They are dropped; you get unstyled text.
Bullet / numbered lists	No	List markers may survive as literal characters in the text, but no Markdown `-`/`1.` list structure is created.
Tables	No (flattened)	Cells are read as positioned text and collapse into space-joined lines. For real tables use PDF Table to JSON or PDF to Excel.
Images / figures / logos	No	Pictures are ignored entirely. There is no image extraction in this tool.
Hyperlinks	No	Link annotations are not read; only visible link text comes through, with no `[text](url)` syntax.
Scanned page (no text layer)	Empty	An image-only page yields a `## Page N` heading and little or no text. OCR first.

Output format and tier limits

Everything is fixed — there are no encoding, page-range, or style options.

Property	Value
Input accepted	A single `.pdf` file (one at a time)
Output	One `.md` file, `text/markdown`, UTF-8
Filename	Source name with the extension swapped to `.md`
Markdown syntax emitted	`## Page N` headings only; everything else is plain text
Options	None — auto-converts on drop
Free tier	2 MB and 50 pages per file
Pro tier	50 MB and 500 pages per file
Privacy	Processed locally in your browser; 0 bytes uploaded

Cookbook

Real before/after snippets showing what the generated Markdown actually looks like. Sample content is illustrative.

A clean single-column document

The ideal case: a born-digital report with one column of body text. The text reads in natural order and each page is clearly marked.

Input:  notes.pdf (born-digital, 3 pages)
Action: drop on the tool → auto-converts

Output (notes.md):
## Page 1

Project kickoff happened on Monday.
The team agreed on a two-week sprint cadence.

## Page 2

Design review is scheduled for Friday.

Original headings come out as plain text

A PDF title and section heading are visually large in the source, but the tool sees them as ordinary glyph runs. They become text lines, not Markdown headings.

Source PDF shows (visually):
  ANNUAL REPORT 2026          ← big title
  1. Overview                 ← section heading
  Revenue grew 14% ...

Markdown output:
## Page 1

ANNUAL REPORT 2026
1.
Overview Revenue grew 14% ...

→ promote 'ANNUAL REPORT 2026' to '# ' and
  'Overview' to '## ' yourself afterward.

A table flattens into text lines

Tabular content does not become a Markdown table. Cells read as positioned text and merge by reading order, so columns lose their alignment.

Source table:
  Name     Role        Start
  Ada      Engineer    2024
  Bola     Designer    2025

Markdown output (flattened):
## Page 1

Name Role Start Ada Engineer 2024 Bola Designer 2025

→ for structured rows use PDF Table to JSON or PDF to Excel.

Multi-page document with page anchors

Page headings make it easy to jump around a long file and trace text back to its source page when you edit.

Input:  handbook.pdf (40 pages, born-digital)

Output (handbook.md) structure:
## Page 1
...
## Page 2
...
## Page 40
...

Search '## Page 23' in your editor to land on page 23's text.

A scanned PDF converts to almost nothing

If the pages are images of text (a scan or phone photo), there is no text layer to read, so the Markdown is just empty page headings. OCR first.

Input:  scanned-invoice.pdf (image-only)

Output:
## Page 1

## Page 2

(no body text — pages are pictures)

Fix: run PDF OCR (/pdf-tools/pdf-ocr) to add a text layer,
then convert the OCR'd PDF here.

Edge cases and what actually happens

Scanned / image-only PDF

Empty output

There is no embedded text on the page, so getTextContent() returns nothing and you get a ## Page N heading with no body. Run PDF OCR first to add a real text layer, then convert.

PDF headings are not turned into Markdown headings

By design

Only ## Page N is emitted. A visually large title or numbered section heading in the source comes through as an ordinary text line because the tool has no way to distinguish a heading from body text by font size alone. Promote them to #/## yourself after conversion.

Tables are not converted to Markdown tables

Flattened

Table cells are positioned text; they collapse into space-joined lines and lose column structure. This is expected. For structured output use PDF Table to JSON or PDF to Excel.

File larger than 2 MB on the free tier

blocked

The free tier caps input at 2 MB. A larger file is blocked before conversion with an upgrade prompt. Pro raises the cap to 50 MB. To keep it free, split the PDF first with PDF Split by Range and convert each part.

More than 50 pages on the free tier

blocked

Page count is checked on drop. Over 50 pages is blocked on free (Pro allows up to 500). Extract a slice with PDF Extract Pages and convert that, or upgrade.

Password-protected (open-password) PDF

fails to open

If the PDF requires a password just to open, pdf.js cannot read its pages and conversion fails. Remove the password first with PDF Remove Password (you must know it), then convert.

Multi-column layout

May interleave

pdf.js returns text in its own order, which for two-column pages can interleave the columns mid-line. The text is all there but the reading order may be jumbled. Single-column documents convert cleanly; expect to re-order paragraphs on complex layouts.

Subset font with no Unicode mapping

garbled

Some PDFs embed subsetted fonts without a ToUnicode map, so the stored codes don't map to real characters. The text comes out as gibberish. This is a property of the source file, not the converter — re-export the PDF with text-extraction enabled if you control the source.

Images and figures are dropped

Expected

This tool extracts text only — embedded pictures, charts, and logos are ignored and never appear in the Markdown. If you need the figures, export them separately with PDF to PNG.

Sentence splitter mishandles abbreviations

Cosmetic

Lines break on ., !, and ?, so an abbreviation like 'Inc.' or a decimal can occasionally start a new line mid-sentence. It's purely cosmetic — the words are all present and correct; rejoin lines in your editor if you prefer paragraphs.

Frequently asked questions

Are the headings from my PDF preserved as Markdown headings?

No. The only Markdown headings in the output are the ## Page N markers the tool adds. A title or section heading from your PDF comes through as ordinary text because the tool can't reliably tell a heading from body text by appearance. Promote them to #/## yourself after conversion.

Will tables in the PDF become Markdown tables?

No. Table cells are positioned text and flatten into space-joined lines with no | column structure. For tabular data, use PDF Table to JSON for structured records or PDF to Excel for a spreadsheet, then format as Markdown if you still need it.

Does bold and italic text survive?

No. Font weight and style are layout attributes, not part of the text stream, so they are dropped. You get unstyled text and add **bold** or *italic* yourself where needed.

Does this work on scanned PDFs?

Not directly. A scan is an image with no text layer, so you'd get page headings and little else. Run PDF OCR first to add a searchable text layer, then convert the OCR'd PDF here.

Can I import the Markdown into Notion or Obsidian?

Yes. The output is plain, standard Markdown with no front matter or extended syntax, so it imports cleanly. For a Notion-specific walkthrough see the PDF to Markdown for Notion guide; Obsidian just needs the .md file dropped into a vault.

Are there any options — encoding, page range, style?

No. The tool converts the whole document automatically on drop, as UTF-8, with ## Page N headings. To work with a subset of pages, extract them first with PDF Extract Pages and convert the result.

Why is each sentence on its own line?

Each page's text is split on sentence-ending punctuation (., !, ?) and one sentence is written per line. This keeps Git diffs small and readable. If you prefer flowing paragraphs, join the lines in your editor — the content is identical either way.

Is my PDF uploaded anywhere?

No. Conversion runs entirely in your browser via pdf.js. The file's bytes never leave your machine — the result panel even states '0 bytes uploaded'. Signed-in users have a single usage counter recorded, never the document content.

What's the largest PDF I can convert?

Free tier: 2 MB and up to 50 pages. Pro: 50 MB and 500 pages. Larger plans go higher still. Files over the page or size cap are blocked on drop with an upgrade prompt; split or extract pages to stay within free limits.

Will hyperlinks come through as Markdown links?

No. The tool reads visible text, not link annotations, so a clickable link appears as its display text with no [text](url) syntax. Re-add links manually where they matter.

How is this different from PDF to Text?

PDF to Text gives you a plain .txt with no structure. This tool produces a .md file that additionally inserts a ## Page N heading before each page and splits text by sentence — handier when you're heading into a Markdown editor or docs pipeline.

Can I automate this without using the web UI?

Yes, on Pro. pdf-to-markdown is a runner-builtin tool: pair the @jadapps/runner once and POST the PDF to your local runner endpoint to get the Markdown back. Processing still happens locally on your machine — the document never reaches JAD's servers.

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

How to convert a pdf document to markdown text

Step 1
Confirm the PDF has a real text layer — Open the PDF and try to select a sentence with your cursor. If text highlights, it is born-digital and will convert. If nothing selects, it is a scan or photo — run PDF OCR first to add a text layer, then come back here.
Step 2
Drop the PDF onto the converter — Use the dropzone above. The tool reads the file in your browser with pdf.js. There is no Settings panel and nothing to configure — conversion starts automatically the moment a valid PDF is added.
Step 3
Watch it auto-convert — The tool extracts text page by page, emits a ## Page N heading for each page, and splits each page into one-sentence-per-line Markdown. For a clean, born-digital document this takes a second or two.
Step 4
Review the preview — The result panel shows the first ~5,000 characters of the generated Markdown plus an output-size stat. Skim it to confirm the text came through readable and the page headings line up with the source.
Step 5
Download the .md file — Click Download. The file saves as yourfilename.md with the text/markdown type and UTF-8 encoding. The full output is saved, not just the previewed portion.
Step 6
Polish in your Markdown editor — Because original headings, lists, and bold are not reconstructed, expect to promote the section titles to #/##, re-create bullet lists, and wrap any code in fenced blocks. The conversion gives you clean text to work from, not a finished document.

What the converter preserves — and what it doesn't

The output is generated text, not a faithful re-rendering of the PDF's layout. Knowing the difference saves you from chasing formatting that was never extracted.

PDF element	In the Markdown?	What actually happens
Body text (born-digital)	Yes	Read via `page.getTextContent()`, joined in pdf.js order, split into one-sentence-per-line.
Page boundaries	Yes	Each page is preceded by a `## Page N` Markdown heading — the only Markdown syntax the tool emits.
Original headings (titles, H1/H2)	As plain text only	A title in the PDF becomes a normal text line. It is not turned into a `#`/`##` heading — the tool can't tell a heading from body text.
Bold / italic	No	Font weight and style are layout attributes, not text. They are dropped; you get unstyled text.
Bullet / numbered lists	No	List markers may survive as literal characters in the text, but no Markdown `-`/`1.` list structure is created.
Tables	No (flattened)	Cells are read as positioned text and collapse into space-joined lines. For real tables use PDF Table to JSON or PDF to Excel.
Images / figures / logos	No	Pictures are ignored entirely. There is no image extraction in this tool.
Hyperlinks	No	Link annotations are not read; only visible link text comes through, with no `[text](url)` syntax.
Scanned page (no text layer)	Empty	An image-only page yields a `## Page N` heading and little or no text. OCR first.

Output format and tier limits

Everything is fixed — there are no encoding, page-range, or style options.

Property	Value
Input accepted	A single `.pdf` file (one at a time)
Output	One `.md` file, `text/markdown`, UTF-8
Filename	Source name with the extension swapped to `.md`
Markdown syntax emitted	`## Page N` headings only; everything else is plain text
Options	None — auto-converts on drop
Free tier	2 MB and 50 pages per file
Pro tier	50 MB and 500 pages per file
Privacy	Processed locally in your browser; 0 bytes uploaded

Cookbook

Real before/after snippets showing what the generated Markdown actually looks like. Sample content is illustrative.

A clean single-column document

The ideal case: a born-digital report with one column of body text. The text reads in natural order and each page is clearly marked.

Input:  notes.pdf (born-digital, 3 pages)
Action: drop on the tool → auto-converts

Output (notes.md):
## Page 1

Project kickoff happened on Monday.
The team agreed on a two-week sprint cadence.

## Page 2

Design review is scheduled for Friday.

Original headings come out as plain text

A PDF title and section heading are visually large in the source, but the tool sees them as ordinary glyph runs. They become text lines, not Markdown headings.

Source PDF shows (visually):
  ANNUAL REPORT 2026          ← big title
  1. Overview                 ← section heading
  Revenue grew 14% ...

Markdown output:
## Page 1

ANNUAL REPORT 2026
1.
Overview Revenue grew 14% ...

→ promote 'ANNUAL REPORT 2026' to '# ' and
  'Overview' to '## ' yourself afterward.

A table flattens into text lines

Tabular content does not become a Markdown table. Cells read as positioned text and merge by reading order, so columns lose their alignment.

Source table:
  Name     Role        Start
  Ada      Engineer    2024
  Bola     Designer    2025

Markdown output (flattened):
## Page 1

Name Role Start Ada Engineer 2024 Bola Designer 2025

→ for structured rows use PDF Table to JSON or PDF to Excel.

Multi-page document with page anchors

Page headings make it easy to jump around a long file and trace text back to its source page when you edit.

Input:  handbook.pdf (40 pages, born-digital)

Output (handbook.md) structure:
## Page 1
...
## Page 2
...
## Page 40
...

Search '## Page 23' in your editor to land on page 23's text.

A scanned PDF converts to almost nothing

If the pages are images of text (a scan or phone photo), there is no text layer to read, so the Markdown is just empty page headings. OCR first.

Input:  scanned-invoice.pdf (image-only)

Output:
## Page 1

## Page 2

(no body text — pages are pictures)

Fix: run PDF OCR (/pdf-tools/pdf-ocr) to add a text layer,
then convert the OCR'd PDF here.

Edge cases and what actually happens

Scanned / image-only PDF

Empty output

There is no embedded text on the page, so getTextContent() returns nothing and you get a ## Page N heading with no body. Run PDF OCR first to add a real text layer, then convert.

PDF headings are not turned into Markdown headings

By design

Tables are not converted to Markdown tables

Flattened

Table cells are positioned text; they collapse into space-joined lines and lose column structure. This is expected. For structured output use PDF Table to JSON or PDF to Excel.

File larger than 2 MB on the free tier

blocked

More than 50 pages on the free tier

blocked

Page count is checked on drop. Over 50 pages is blocked on free (Pro allows up to 500). Extract a slice with PDF Extract Pages and convert that, or upgrade.

Password-protected (open-password) PDF

fails to open

If the PDF requires a password just to open, pdf.js cannot read its pages and conversion fails. Remove the password first with PDF Remove Password (you must know it), then convert.

Multi-column layout

May interleave

Subset font with no Unicode mapping

garbled

Images and figures are dropped

Expected

This tool extracts text only — embedded pictures, charts, and logos are ignored and never appear in the Markdown. If you need the figures, export them separately with PDF to PNG.

Sentence splitter mishandles abbreviations

Cosmetic

Frequently asked questions

Are the headings from my PDF preserved as Markdown headings?

Will tables in the PDF become Markdown tables?

Does bold and italic text survive?

No. Font weight and style are layout attributes, not part of the text stream, so they are dropped. You get unstyled text and add **bold** or *italic* yourself where needed.

Does this work on scanned PDFs?

Not directly. A scan is an image with no text layer, so you'd get page headings and little else. Run PDF OCR first to add a searchable text layer, then convert the OCR'd PDF here.

Can I import the Markdown into Notion or Obsidian?

Are there any options — encoding, page range, style?

No. The tool converts the whole document automatically on drop, as UTF-8, with ## Page N headings. To work with a subset of pages, extract them first with PDF Extract Pages and convert the result.

Why is each sentence on its own line?

Is my PDF uploaded anywhere?

What's the largest PDF I can convert?

Will hyperlinks come through as Markdown links?

No. The tool reads visible text, not link annotations, so a clickable link appears as its display text with no [text](url) syntax. Re-add links manually where they matter.

How is this different from PDF to Text?

Can I automate this without using the web UI?

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

Convert a PDF Document to Markdown Text

How to convert a pdf document to markdown text

What the converter preserves — and what it doesn't

Output format and tier limits

Cookbook

A clean single-column document

Original headings come out as plain text

A table flattens into text lines

Multi-page document with page anchors

A scanned PDF converts to almost nothing

Edge cases and what actually happens

Scanned / image-only PDF

PDF headings are not turned into Markdown headings

Tables are not converted to Markdown tables

File larger than 2 MB on the free tier

More than 50 pages on the free tier

Password-protected (open-password) PDF

Multi-column layout

Subset font with no Unicode mapping

Images and figures are dropped

Sentence splitter mishandles abbreviations

Frequently asked questions

Are the headings from my PDF preserved as Markdown headings?

Will tables in the PDF become Markdown tables?

Does bold and italic text survive?

Does this work on scanned PDFs?

Can I import the Markdown into Notion or Obsidian?

Are there any options — encoding, page range, style?

Why is each sentence on its own line?

Is my PDF uploaded anywhere?

What's the largest PDF I can convert?

Will hyperlinks come through as Markdown links?

How is this different from PDF to Text?

Can I automate this without using the web UI?

Privacy first

Related guides

Convert a PDF Document to Markdown Text

How to convert a pdf document to markdown text

What the converter preserves — and what it doesn't

Output format and tier limits

Cookbook

A clean single-column document

Original headings come out as plain text

A table flattens into text lines

Multi-page document with page anchors

A scanned PDF converts to almost nothing

Edge cases and what actually happens

Scanned / image-only PDF

PDF headings are not turned into Markdown headings

Tables are not converted to Markdown tables

File larger than 2 MB on the free tier

More than 50 pages on the free tier

Password-protected (open-password) PDF

Multi-column layout

Subset font with no Unicode mapping

Images and figures are dropped

Sentence splitter mishandles abbreviations

Frequently asked questions

Are the headings from my PDF preserved as Markdown headings?

Will tables in the PDF become Markdown tables?

Does bold and italic text survive?

Does this work on scanned PDFs?

Can I import the Markdown into Notion or Obsidian?

Are there any options — encoding, page range, style?

Why is each sentence on its own line?

Is my PDF uploaded anywhere?

What's the largest PDF I can convert?

Will hyperlinks come through as Markdown links?

How is this different from PDF to Text?

Can I automate this without using the web UI?

Privacy first

Related guides