Convert a PDF Manual to HTML Documentation

How to convert a pdf manual to html documentation pages

Step 1
Drop the manual PDF onto the converter — Add the manual to the PDF to HTML converter. It converts automatically on drop — no configuration.
Step 2
Mind the page limit for long manuals — Free tier caps at 50 pages. A 300-page manual needs Pro (500 pages) or higher, or split it into parts first with PDF Split.
Step 3
pdf.js extracts the text page by page — Every page becomes a <div class="page"> section under a <h2>Page N</h2> label. Prose comes across; code samples and screenshots do not (see below).
Step 4
Split into chapter pages — Use the <div class="page"> boundaries (or your manual's chapter starts) to divide the body into individual documentation pages for your framework.
Step 5
Promote headings and wrap code blocks — Turn flattened section titles into <h2>/<h3>, wrap code samples in <pre><code> and apply a syntax highlighter, and rebuild any tables and lists.
Step 6
Build navigation and deploy — Add a sidebar/table of contents in Docusaurus, MkDocs, or Next.js, re-insert screenshots as <img>, and deploy as a searchable docs site.

Manual elements — conversion outcome

The converter extracts text only; technical-manual specifics need manual restoration.

Manual element	In the HTML?	Restore with
Body / instructional text	Yes — `<p>` per page	—
Chapter & section headings	No — flattened to `<p>` (only `<h2>Page N</h2>` is emitted)	Promote to `<h2>`/`<h3>` manually
Code samples	Text only — no `<pre>`/`<code>`	Wrap in `<pre><code>`, add a highlighter
Screenshots / diagrams	No — images not extracted	PDF to PNG → `<img>`
Step / parameter tables	No `<table>` — flattens	PDF Table to JSON → `<table>`
Numbered procedures	Text only — no `<ol>`	Rebuild as `<ol>` in the editor

Splitting a long manual into docs pages

Strategies for turning one converted HTML file into a multi-page docs site.

Split strategy	How	When to use it
By page section	One docs page per `<div class="page">` block	Quick first pass; manuals where 1 page ≈ 1 topic
By chapter heading	Promote headings first, then split on `<h2>`	Most manuals — gives topic-based pages
Pre-split the PDF	PDF Split by chapter, convert each part	300+ page manuals over the page limit

Input limits by tier

Checked on the manual PDF before conversion.

Tier	Max file size	Max pages
Free	2 MB	50 pages
Pro	50 MB	500 pages
Pro + Media	500 MB	2,000 pages
Developer	2 GB	10,000 pages

Cookbook

Workflows for migrating a manual into a docs site, with the markup you start from and the cleanup each needs.

One docs page per chapter

Convert, promote chapter titles to headings, then split the body on those headings into framework pages.

Converted (titles are flat <p>):
  <div class="page"><h2>Page 12</h2>
    <p>Chapter 3: Installation Open the package ...</p></div>

After promoting + splitting:
  docs/installation.md (or .html)
  ## Chapter 3: Installation
  Open the package ...

Repeat per chapter; wire into the sidebar nav.

Restoring a code block

Code samples extract as plain text. Wrap them in <pre><code> and apply a highlighter in your docs framework.

Converted output (code is inline text):
  <p>Run the command npm install acme-cli then acme init</p>

After cleanup:
  <pre><code class="language-bash">npm install acme-cli
acme init</code></pre>

Docusaurus/MkDocs highlighter colours it automatically.

Re-inserting a screenshot

Screenshots are images and are not extracted. Render the page to PNG and place it where the step needs it.

Step 1: PDF to PNG on the screenshot's page -> setup-step3.png
Step 2: drop it in docs/img/
Step 3: reference it in the step:
  <p>Click <strong>Add Device</strong>:</p>
  <img src="img/setup-step3.png"
       alt="Add Device button in the toolbar">

Splitting a 300-page manual that exceeds the page limit

Over the 50-page free limit (or even 500 on Pro), split the PDF by chapter first, then convert each chapter.

Step 1: PDF Split (fixed, e.g. by chapter page ranges)
        manual.pdf  ->  ch01.pdf, ch02.pdf, ... ch12.pdf
Step 2: convert each chapter PDF to HTML separately
Step 3: each becomes one (or a few) docs pages
Step 4: assemble the sidebar nav across all chapters

Rebuilding a parameter table

Reference tables of options/parameters flatten into text. Recover them with the table tool and render a real table.

Converted (flattened):
  <p>Flag Default Description --verbose false Show logs ...</p>

PDF Table to JSON ->
  [{"Flag":"--verbose","Default":"false",
    "Description":"Show logs"}, ...]

Render as a real <table> in the docs page.

Edge cases and what actually happens

Chapter and section headings flatten to body text

By design

The converter doesn't detect the manual's heading hierarchy — only a <h2>Page N</h2> label per page. 'Chapter 3: Installation' arrives as <p> text. Promote headings by hand before splitting into docs pages, or use PDF to Markdown for a per-page ## marker.

Code samples lose their formatting

By design

Code blocks extract as plain text inside <p> — no <pre> or <code>, and indentation/line breaks within the sample are usually collapsed. Wrap each sample in <pre><code> and add a syntax highlighter after conversion.

Screenshots and diagrams are dropped

By design

No images are extracted and no assets folder is written, so every screenshot and diagram vanishes. A how-to manual is half-useless without them — render those pages with PDF to PNG and re-insert as <img>.

Numbered procedures lose their list markup

By design

Step-by-step procedures come across as plain text, not <ol>. The step numbers may survive as text but the list structure doesn't. Rebuild procedures as ordered lists in your editor.

300+ page manual over the page limit

blocked

Free tier caps at 50 pages and Pro at 500. For a large manual, split it by chapter with PDF Split (or upgrade to Developer for 10,000 pages) and convert each part separately.

Scanned legacy manual has no text

Empty output

An old manual distributed as a scan has no text layer, so the page sections come out empty. Run PDF OCR first to add a searchable text layer, then convert.

Password-protected manual

fails to open

An encrypted manual can't be opened by pdf.js and conversion fails. Remove the password with PDF Unlock (you must know it), then convert the unlocked file.

Proprietary / internal manual

Preserved

Conversion is browser-local via pdf.js — the manual never reaches a server, so converting confidential or internal documentation is safe. Only an anonymous usage counter is recorded when signed in.

Running headers and footers appear on every page

Review

The manual's repeated header (product name, version) and footer (page numbers) extract into each page's paragraph. Find-replace them out before building the docs pages so they don't clutter every section.

Multi-column reference sections interleave

May interleave

Two-column reference or glossary sections can come out with columns zig-zagged because pdf.js follows internal item order. Reorder the affected sections by hand, or crop to one column with PDF Crop before converting.

Frequently asked questions

Will the manual's chapter and section headings be detected?

No. The converter only emits a <h2>Page N</h2> label per page; your chapter and section titles arrive as plain <p> text. Promote them to <h2>/<h3> by hand before splitting into docs pages. For a per-page heading marker to start from, PDF to Markdown writes a ## Page N heading.

Will code blocks in the manual format correctly?

No — code samples extract as plain text inside <p>, with no <pre>/<code> and often collapsed line breaks. Wrap each sample in <pre><code class="language-…"> and apply your docs framework's syntax highlighter after conversion.

Are screenshots and diagrams included?

No. Images are not extracted and no assets folder is created. For a how-to manual you'll need to render the relevant pages with PDF to PNG (or PDF to JPG), host the images, and add <img> tags where each step needs them.

What if the manual is 300+ pages?

Watch the page limit: free tier converts 50 pages, Pro 500, and Developer up to 10,000. For very large manuals, split by chapter with PDF Split and convert each chapter separately — that also gives you natural docs-page boundaries.

Can I add sidebar navigation?

Yes, but you build it in your docs framework, not the converter. Extract the manual's table of contents, then construct a sidebar in Docusaurus, MkDocs, or your Next.js docs setup. The converter's <div class="page"> boundaries and the headings you promote give you the structure to map into the nav.

How do I split the converted HTML into separate pages?

Three options: one page per <div class="page"> block (quick), one page per chapter after you promote headings and split on <h2> (recommended), or pre-split the PDF by chapter and convert each part (best for very long manuals). See the splitting table above for when to use each.

Is my proprietary manual uploaded anywhere?

No. Conversion runs entirely in your browser via pdf.js — the file never leaves your device. Only an anonymous usage counter is recorded when you're signed in, which you can opt out of. That keeps internal and confidential manuals private.

Are there options to control the conversion?

No. The tool auto-converts on drop and produces a fixed HTML structure — no page-range, heading-detection, or code-formatting toggles. To convert a single chapter, extract its pages first with PDF Extract Pages.

Why is each page one long paragraph?

Text items are joined with spaces during extraction, leaving no blank-line breaks within a page, so the splitter keeps the whole page in one <p>. Split it into paragraphs, lists, and code blocks during cleanup — that single paragraph is where flattened procedures and code samples live.

Should I use Markdown instead for a docs site?

Often, yes. Most docs frameworks (Docusaurus, MkDocs, Hugo) are Markdown-first, and PDF to Markdown outputs .md with a ## Page N heading per page — usually easier to clean and commit than raw HTML. Like the HTML output, it doesn't rebuild code blocks, tables, or images.

Can I automate converting a whole library of manuals?

Yes, on a Pro plan. pdf-to-html is available as a runner-builtin tool — pair the @jadapps/runner once, then POST each manual PDF to the local runner endpoint and collect the HTML. Conversion runs locally on your machine, so the manuals never reach JAD's servers.

How searchable is the resulting docs site?

Fully — the body text is real <p> content, so your docs framework's search index (and Google) can read every word. That's the core win over a PDF: instead of one opaque download, you get many crawlable, deep-linkable pages, once you've split and structured them.

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

Convert a PDF Manual to HTML Documentation Pages

How to convert a pdf manual to html documentation pages

Manual elements — conversion outcome

Splitting a long manual into docs pages

Input limits by tier

Cookbook

One docs page per chapter

Restoring a code block

Re-inserting a screenshot

Splitting a 300-page manual that exceeds the page limit

Rebuilding a parameter table

Edge cases and what actually happens

Chapter and section headings flatten to body text

Code samples lose their formatting

Screenshots and diagrams are dropped

Numbered procedures lose their list markup

300+ page manual over the page limit

Scanned legacy manual has no text

Password-protected manual

Proprietary / internal manual

Running headers and footers appear on every page

Multi-column reference sections interleave

Frequently asked questions

Will the manual's chapter and section headings be detected?

Will code blocks in the manual format correctly?

Are screenshots and diagrams included?

What if the manual is 300+ pages?

Can I add sidebar navigation?

How do I split the converted HTML into separate pages?

Is my proprietary manual uploaded anywhere?

Are there options to control the conversion?

Why is each page one long paragraph?

Should I use Markdown instead for a docs site?

Can I automate converting a whole library of manuals?

How searchable is the resulting docs site?

Privacy first

Related guides