How to convert a pdf manual to html documentation pages
- Step 1Drop the manual PDF onto the converter — Add the manual to the PDF to HTML converter. It converts automatically on drop — no configuration.
- Step 2Mind the page limit for long manuals — Free tier caps at 50 pages. A 300-page manual needs Pro (500 pages) or higher, or split it into parts first with PDF Split.
- Step 3pdf.js extracts the text page by page — Every page becomes a
<div class="page">section under a<h2>Page N</h2>label. Prose comes across; code samples and screenshots do not (see below). - Step 4Split into chapter pages — Use the
<div class="page">boundaries (or your manual's chapter starts) to divide the body into individual documentation pages for your framework. - Step 5Promote headings and wrap code blocks — Turn flattened section titles into
<h2>/<h3>, wrap code samples in<pre><code>and apply a syntax highlighter, and rebuild any tables and lists. - Step 6Build navigation and deploy — Add a sidebar/table of contents in Docusaurus, MkDocs, or Next.js, re-insert screenshots as
<img>, and deploy as a searchable docs site.
Manual elements — conversion outcome
The converter extracts text only; technical-manual specifics need manual restoration.
| Manual element | In the HTML? | Restore with |
|---|---|---|
| Body / instructional text | Yes — <p> per page | — |
| Chapter & section headings | No — flattened to <p> (only <h2>Page N</h2> is emitted) | Promote to <h2>/<h3> manually |
| Code samples | Text only — no <pre>/<code> | Wrap in <pre><code>, add a highlighter |
| Screenshots / diagrams | No — images not extracted | PDF to PNG → <img> |
| Step / parameter tables | No <table> — flattens | PDF Table to JSON → <table> |
| Numbered procedures | Text only — no <ol> | Rebuild as <ol> in the editor |
Splitting a long manual into docs pages
Strategies for turning one converted HTML file into a multi-page docs site.
| Split strategy | How | When to use it |
|---|---|---|
| By page section | One docs page per <div class="page"> block | Quick first pass; manuals where 1 page ≈ 1 topic |
| By chapter heading | Promote headings first, then split on <h2> | Most manuals — gives topic-based pages |
| Pre-split the PDF | PDF Split by chapter, convert each part | 300+ page manuals over the page limit |
Input limits by tier
Checked on the manual PDF before conversion.
| Tier | Max file size | Max pages |
|---|---|---|
| Free | 2 MB | 50 pages |
| Pro | 50 MB | 500 pages |
| Pro + Media | 500 MB | 2,000 pages |
| Developer | 2 GB | 10,000 pages |
Cookbook
Workflows for migrating a manual into a docs site, with the markup you start from and the cleanup each needs.
One docs page per chapter
Convert, promote chapter titles to headings, then split the body on those headings into framework pages.
Converted (titles are flat <p>):
<div class="page"><h2>Page 12</h2>
<p>Chapter 3: Installation Open the package ...</p></div>
After promoting + splitting:
docs/installation.md (or .html)
## Chapter 3: Installation
Open the package ...
Repeat per chapter; wire into the sidebar nav.Restoring a code block
Code samples extract as plain text. Wrap them in <pre><code> and apply a highlighter in your docs framework.
Converted output (code is inline text): <p>Run the command npm install acme-cli then acme init</p> After cleanup: <pre><code class="language-bash">npm install acme-cli acme init</code></pre> Docusaurus/MkDocs highlighter colours it automatically.
Re-inserting a screenshot
Screenshots are images and are not extracted. Render the page to PNG and place it where the step needs it.
Step 1: PDF to PNG on the screenshot's page -> setup-step3.png
Step 2: drop it in docs/img/
Step 3: reference it in the step:
<p>Click <strong>Add Device</strong>:</p>
<img src="img/setup-step3.png"
alt="Add Device button in the toolbar">Splitting a 300-page manual that exceeds the page limit
Over the 50-page free limit (or even 500 on Pro), split the PDF by chapter first, then convert each chapter.
Step 1: PDF Split (fixed, e.g. by chapter page ranges)
manual.pdf -> ch01.pdf, ch02.pdf, ... ch12.pdf
Step 2: convert each chapter PDF to HTML separately
Step 3: each becomes one (or a few) docs pages
Step 4: assemble the sidebar nav across all chaptersRebuilding a parameter table
Reference tables of options/parameters flatten into text. Recover them with the table tool and render a real table.
Converted (flattened):
<p>Flag Default Description --verbose false Show logs ...</p>
PDF Table to JSON ->
[{"Flag":"--verbose","Default":"false",
"Description":"Show logs"}, ...]
Render as a real <table> in the docs page.Edge cases and what actually happens
Chapter and section headings flatten to body text
By designThe converter doesn't detect the manual's heading hierarchy — only a <h2>Page N</h2> label per page. 'Chapter 3: Installation' arrives as <p> text. Promote headings by hand before splitting into docs pages, or use PDF to Markdown for a per-page ## marker.
Code samples lose their formatting
By designCode blocks extract as plain text inside <p> — no <pre> or <code>, and indentation/line breaks within the sample are usually collapsed. Wrap each sample in <pre><code> and add a syntax highlighter after conversion.
Screenshots and diagrams are dropped
By designNo images are extracted and no assets folder is written, so every screenshot and diagram vanishes. A how-to manual is half-useless without them — render those pages with PDF to PNG and re-insert as <img>.
Numbered procedures lose their list markup
By designStep-by-step procedures come across as plain text, not <ol>. The step numbers may survive as text but the list structure doesn't. Rebuild procedures as ordered lists in your editor.
300+ page manual over the page limit
blockedFree tier caps at 50 pages and Pro at 500. For a large manual, split it by chapter with PDF Split (or upgrade to Developer for 10,000 pages) and convert each part separately.
Scanned legacy manual has no text
Empty outputAn old manual distributed as a scan has no text layer, so the page sections come out empty. Run PDF OCR first to add a searchable text layer, then convert.
Password-protected manual
fails to openAn encrypted manual can't be opened by pdf.js and conversion fails. Remove the password with PDF Unlock (you must know it), then convert the unlocked file.
Proprietary / internal manual
PreservedConversion is browser-local via pdf.js — the manual never reaches a server, so converting confidential or internal documentation is safe. Only an anonymous usage counter is recorded when signed in.
Running headers and footers appear on every page
ReviewThe manual's repeated header (product name, version) and footer (page numbers) extract into each page's paragraph. Find-replace them out before building the docs pages so they don't clutter every section.
Multi-column reference sections interleave
May interleaveTwo-column reference or glossary sections can come out with columns zig-zagged because pdf.js follows internal item order. Reorder the affected sections by hand, or crop to one column with PDF Crop before converting.
Frequently asked questions
Will the manual's chapter and section headings be detected?
No. The converter only emits a <h2>Page N</h2> label per page; your chapter and section titles arrive as plain <p> text. Promote them to <h2>/<h3> by hand before splitting into docs pages. For a per-page heading marker to start from, PDF to Markdown writes a ## Page N heading.
Will code blocks in the manual format correctly?
No — code samples extract as plain text inside <p>, with no <pre>/<code> and often collapsed line breaks. Wrap each sample in <pre><code class="language-…"> and apply your docs framework's syntax highlighter after conversion.
Are screenshots and diagrams included?
No. Images are not extracted and no assets folder is created. For a how-to manual you'll need to render the relevant pages with PDF to PNG (or PDF to JPG), host the images, and add <img> tags where each step needs them.
What if the manual is 300+ pages?
Watch the page limit: free tier converts 50 pages, Pro 500, and Developer up to 10,000. For very large manuals, split by chapter with PDF Split and convert each chapter separately — that also gives you natural docs-page boundaries.
Can I add sidebar navigation?
Yes, but you build it in your docs framework, not the converter. Extract the manual's table of contents, then construct a sidebar in Docusaurus, MkDocs, or your Next.js docs setup. The converter's <div class="page"> boundaries and the headings you promote give you the structure to map into the nav.
How do I split the converted HTML into separate pages?
Three options: one page per <div class="page"> block (quick), one page per chapter after you promote headings and split on <h2> (recommended), or pre-split the PDF by chapter and convert each part (best for very long manuals). See the splitting table above for when to use each.
Is my proprietary manual uploaded anywhere?
No. Conversion runs entirely in your browser via pdf.js — the file never leaves your device. Only an anonymous usage counter is recorded when you're signed in, which you can opt out of. That keeps internal and confidential manuals private.
Are there options to control the conversion?
No. The tool auto-converts on drop and produces a fixed HTML structure — no page-range, heading-detection, or code-formatting toggles. To convert a single chapter, extract its pages first with PDF Extract Pages.
Why is each page one long paragraph?
Text items are joined with spaces during extraction, leaving no blank-line breaks within a page, so the splitter keeps the whole page in one <p>. Split it into paragraphs, lists, and code blocks during cleanup — that single paragraph is where flattened procedures and code samples live.
Should I use Markdown instead for a docs site?
Often, yes. Most docs frameworks (Docusaurus, MkDocs, Hugo) are Markdown-first, and PDF to Markdown outputs .md with a ## Page N heading per page — usually easier to clean and commit than raw HTML. Like the HTML output, it doesn't rebuild code blocks, tables, or images.
Can I automate converting a whole library of manuals?
Yes, on a Pro plan. pdf-to-html is available as a runner-builtin tool — pair the @jadapps/runner once, then POST each manual PDF to the local runner endpoint and collect the HTML. Conversion runs locally on your machine, so the manuals never reach JAD's servers.
How searchable is the resulting docs site?
Fully — the body text is real <p> content, so your docs framework's search index (and Google) can read every word. That's the core win over a PDF: instead of one opaque download, you get many crawlable, deep-linkable pages, once you've split and structured them.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.