How to convert a pdf manual or docs into markdown
- Step 1Check the manual has selectable text — Try to select a paragraph in the PDF. If it highlights, conversion will work. A scanned manual won't select — run PDF OCR first to add a text layer.
- Step 2Convert page-range chunks if it's a big manual — Free caps at 50 pages and Pro at 500. For a long manual, slice it into chapters with PDF Extract Pages and convert each chunk, so each output maps to a docs section.
- Step 3Drop the PDF onto the converter — It reads in your browser with pdf.js and converts automatically — there's no settings panel. You'll get a
## Page Nheading before every page. - Step 4Download and split by section — Save the
.md, then divide it into one file per topic. The## Page Nmarkers and your own section titles are natural boundaries; many teams script this split by heading. - Step 5Promote headings and fence code blocks — Turn chapter and section titles (which arrived as plain text) into
#/##/###. Wrap every code sample in fenced blocks with a language tag — code came through as plain text, so this is manual but mechanical. - Step 6Commit to your docs repo and build — Add the new
.mdfiles to your docs framework's content tree, update the sidebar/nav, and run the build. The PDF can now become a 'download the old manual' link if you still need it.
How documentation elements convert
What a manual or technical PDF produces, and the cleanup you'll do to make it real docs.
| Doc element | In the Markdown? | Notes |
|---|---|---|
| Body text / instructions | Yes | Extracted and split into one sentence per line per page. |
| Chapter / section headings | As plain text | Not promoted to #/##. You set heading levels and split into pages. |
| Code samples & commands | As plain text | No fenced blocks, no language tag, no monospace. Wrap in ``` yourself and add the language. |
| Numbered step lists | As plain text | Step numbers may survive as literal characters, but no Markdown 1. list is created. |
| Tables (spec sheets, params) | No (flattened) | Cells collapse into text. Use PDF Table to JSON or PDF to Excel for the data. |
| Screenshots & diagrams | No | Images are ignored. Re-capture or export with PDF to PNG and embed manually. |
| Callouts / admonitions (Note, Warning) | As plain text | The words survive; the box styling does not. Re-wrap as :::note / > blocks for your framework. |
| Cross-references & ToC links | As plain text | Page-number references and ToC entries come through as text, not working links. |
Output format and tier limits
Fixed pipeline — no options for encoding, page range, or splitting.
| Property | Value |
|---|---|
| Input | One .pdf at a time |
| Output | One .md file, UTF-8, text/markdown |
| Headings emitted | ## Page N only |
| Section splitting | Manual / scripted after download |
| Free tier | 2 MB / 50 pages |
| Pro tier | 50 MB / 500 pages |
| Privacy | In-browser; 0 bytes uploaded |
Cookbook
Recipes for migrating a PDF manual into a Markdown docs site. Sample content is illustrative.
Convert a short guide, then split by topic
A small manual converts in one pass; you then break it into one file per topic using the page markers as guides.
Input: setup-guide.pdf (12 pages) Output (setup-guide.md): ## Page 1 Getting Started Install the CLI before you begin. ## Page 2 Configuration Edit config.yaml to set your token. → split into getting-started.md, configuration.md, ...
Re-fence a code sample by hand
Commands and code come through as plain text. Wrap them in fenced blocks with a language so they render as code.
As extracted: ## Page 2 Run the installer: npm install -g acme-cli acme login After your edit: Run the installer: ```bash npm install -g acme-cli acme login ```
Re-create a numbered step list
Procedure steps lose their list structure. Convert the lines into a real Markdown ordered list.
As extracted: Open Settings. Click Integrations. Paste your API key. After your edit: 1. Open Settings. 2. Click Integrations. 3. Paste your API key.
Chunk a 400-page manual on Pro
A large manual exceeds the free 50-page cap and is awkward as one file even on Pro. Slice it into chapters first.
Workflow: 1. PDF Extract Pages → chapters 1-40, 41-90, ... 2. Convert each chunk here → ch1.md, ch2.md, ... 3. Drop each .md into the matching docs section (Free tier: keep chunks <= 50 pages and <= 2 MB.)
Convert callouts into framework admonitions
A 'Warning' box is just text after extraction. Re-wrap it as your docs framework's admonition syntax.
As extracted: Warning Do not delete the lock file while a job is running. After your edit (Docusaurus): :::warning Do not delete the lock file while a job is running. :::
Edge cases and what actually happens
Code samples lose their formatting
ExpectedCode in the PDF arrives as plain text — no backticks, no language tag, no monospace. Whitespace and indentation may also be collapsed. Wrap samples in fenced blocks and fix indentation manually after conversion.
Chapter/section headings stay as body text
By designOnly ## Page N is emitted. Your manual's heading hierarchy comes through as plain text lines; promote them to #/##/### and split into files yourself.
Spec/parameter tables flatten
FlattenedTables collapse into space-joined text and lose columns — bad for parameter references. Extract the table data with PDF Table to JSON and rebuild it as a Markdown table.
Screenshots and diagrams are dropped
ExpectedImages aren't extracted, so step screenshots and architecture diagrams disappear. Export them with PDF to PNG (or re-capture from the live product) and embed them where they belong.
Manual over 50 pages on free tier
blockedFree caps at 50 pages; big manuals are blocked on drop. Pro allows 500. Slice the manual into chapters with PDF Extract Pages and convert each, or upgrade.
Manual over 2 MB on free tier
blockedImage-rich manuals often exceed 2 MB and are blocked on free. Pro raises it to 50 MB. Compress with PDF Lossy Compress or convert on Pro.
Scanned / image-only manual
Empty outputA scanned manual has no text layer, so conversion yields empty pages. Run PDF OCR first, then convert the OCR'd file.
Running headers/footers repeat on every page
NoiseManuals usually repeat a header/footer (product name, page number) on each page, and these extract as text at every page boundary. Strip them with a find-and-replace pass in your editor before publishing.
Table of contents becomes plain text
ExpectedA PDF ToC with dotted leaders and page numbers extracts as text lines, not working links. Delete it and let your docs framework generate the sidebar/nav instead.
Frequently asked questions
Will my code samples extract as fenced code blocks?
No. Code comes through as plain text with no backticks, language tag, or monospace, and indentation may collapse. Wrap each sample in a fenced block (```), add the language, and fix indentation after conversion.
Are the manual's chapter and section headings preserved?
Only the ## Page N markers are headings. Your chapters and sections arrive as plain text because the tool can't infer heading levels from layout. Promote them to #/##/### and split the file into pages yourself.
What happens to parameter or spec tables?
They flatten into space-joined text and lose their columns. For reference tables, extract the data with PDF Table to JSON or PDF to Excel, then rebuild a clean Markdown table.
Can I keep the PDF and Markdown in sync automatically?
No. After conversion the Markdown becomes the source of truth; the tool is one-directional and has no link back to the PDF. Maintain the docs in Git going forward and keep the old PDF only as a download if needed.
How do I handle a 500-page manual?
Don't convert it as one blob. Slice it into chapters with PDF Extract Pages (each under 50 pages / 2 MB to stay on free, or up to 500 pages on Pro), convert each chunk, and drop the outputs into matching docs sections.
Do screenshots and diagrams come across?
No — images are ignored entirely. Export figures with PDF to PNG or re-capture them from the live product, then embed them in the Markdown where they belong.
Will it work in Docusaurus, MkDocs, or ReadTheDocs?
Yes. The output is standard Markdown with no extended syntax to strip, so it works across all the common docs frameworks. Promote the headings, fence the code, and add the pages to your nav.
How do I deal with repeated headers and footers?
They extract on every page (product name, page number, etc.). Do a find-and-replace pass in your editor to remove the repeating lines before you publish — the tool doesn't strip running headers.
My scanned manual converted to empty pages — why?
A scan is images, not text, so there's nothing for pdf.js to read. Run PDF OCR first to add a text layer, then convert the OCR'd manual here.
Is my documentation uploaded anywhere?
No. Conversion runs entirely in your browser via pdf.js, so internal and unreleased manuals stay on your machine. The result panel confirms '0 bytes uploaded'.
Can I script the whole docs migration?
Partly. On Pro, pdf-to-markdown is a runner-builtin you can call from the @jadapps/runner locally, then run your own script to split by heading and fence code. The PDF never reaches JAD's servers — it's processed on your machine.
How is this different from PDF to Text for docs?
PDF to Text gives a plain .txt with no structure at all. This adds ## Page N markers and sentence-per-line output, which are handier split points and diff units when you're building a Markdown docs site.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.