How to convert an html page to a pdf document
- Step 1Save the page as a standalone .html file — In your browser use File → Save Page As → Webpage, HTML Only (or copy the page source into a
.htmlfile). The tool reads.htmland.htmfiles (text/html). It does not fetch live URLs — you supply the markup, not a web address. - Step 2Open the converter and drop the file — Load the file into the HTML to PDF converter. Parsing happens locally in your browser via pdf-lib — nothing is uploaded. Upload is the only input path; there is no paste-HTML box.
- Step 3Let it strip styles, scripts, and tags — The converter removes every
<style>and<script>block, turns each remaining tag into a line break, and decodes the four core entities. You do not choose a page size — output is always US-Letter (612×792pt). - Step 4Understand the single-column text layout — Text is drawn left-aligned at x=50 in 10pt Helvetica, one source line per PDF line, 14pt apart. There is no word-wrap: any line longer than 100 characters is clipped at 100. Tables, columns, and floats collapse into a plain vertical sequence.
- Step 5Download the paginated PDF — The PDF is generated in-page and downloaded straight to your device. Open it to confirm the text you needed survived — especially headings, line items, and any non-Latin characters.
- Step 6Switch tools if you needed visual fidelity — If the layout, images, or fonts mattered, this is the wrong direction. Use your browser's Print → Save as PDF for a faithful render, or screenshot the page and feed the image to image-to-pdf. For structured authored content, markdown-to-pdf styles headings and bold.
What survives HTML → PDF conversion
Mapped to the exact transforms the converter applies (strip style/script, tag→newline, decode four entities, draw 10pt Helvetica). Anything not in the 'kept' column is discarded.
| HTML feature | In the PDF? | Why |
|---|---|---|
| Visible text content | Kept | Text between tags is drawn line by line in 10pt Helvetica. |
Headings (<h1>…<h6>) | Text kept, styling lost | The heading text appears, but at the same 10pt size as body text — there is no heading sizing. |
CSS (inline, <style>, external) | Discarded | <style> blocks are deleted outright; style= attributes and external stylesheets are never applied. |
JavaScript (<script>) | Discarded, never run | <script> blocks are deleted. JS is not executed, so client-rendered content never appears. |
Images (<img>, CSS backgrounds) | Discarded | Tags are stripped; the converter draws text only — no image embedding. |
Tables (<table>) | Flattened to lines | Each cell's text becomes its own line in reading order; rows and columns are not preserved. |
Links (<a href>) | Anchor text kept, URL lost | The link text shows; the href is dropped and no clickable link is created. |
Core entities & < > | Decoded | Converted to space, &, <, > respectively. |
Other entities (©, ', é) | Passed through literally | Only the four core entities are decoded; the rest render as the raw escape text. |
Rendering specifics (defaults you can't change)
These are fixed in the engine — there is no options panel for this tool.
| Property | Value | Consequence |
|---|---|---|
| Page size | US-Letter, 612×792pt | No A4/Legal/custom selector exists for this tool. |
| Font | Helvetica, 10pt, black | Brand fonts and CSS font-family are ignored; non-Latin glyphs do not render. |
| Left margin | x = 50pt | All text starts at a single left edge — no centering or indentation. |
| Line spacing | 14pt per line | Fixed leading regardless of source font size. |
| Line length | Truncated to 100 characters | Long lines are clipped, not wrapped — text past column 100 is lost. |
| Pagination | New page when y < 50pt | Long documents auto-split across Letter pages. |
Pick the right tool for your goal
This converter is text-extraction-grade. Use a sibling when fidelity or structure matters.
| You need… | Best tool | Why |
|---|---|---|
| Just the readable text in a PDF | html-to-pdf (this tool) | Strips markup, paginates the text. |
| The page to look exactly as in the browser | Browser Print → Save as PDF | Uses the browser's own layout engine; this tool has none. |
| A visual capture of a styled page | image-to-pdf | Screenshot the page, then turn the image into a PDF page. |
| Styled headings/bold from authored content | markdown-to-pdf | Renders heading sizes and bold; HTML-to-PDF does not. |
| Go the other way (PDF → HTML) | pdf-to-html | Extracts PDF text into semantic HTML. |
Cookbook
Real before/after examples showing exactly what the converter keeps and drops. The 'before' is HTML source; the 'after' is the text that lands in the PDF.
A simple article — clean result
Plain prose with headings and paragraphs is the sweet spot. All the words come through; only the heading sizing is lost (everything renders at 10pt).
Before (HTML): <h1>Quarterly Notes</h1> <p>Revenue rose 12% on stronger retention.</p> <p>Churn fell to 1.8% for the period.</p> After (PDF text, 10pt Helvetica): Quarterly Notes Revenue rose 12% on stronger retention. Churn fell to 1.8% for the period.
Inline CSS and a tracking script — both vanish
The <style> and <script> blocks are deleted before any text is drawn, so neither leaks into the PDF as garbage text. Only the visible body text remains.
Before (HTML):
<style>.hero{font-size:48px;color:#09c}</style>
<div class="hero">Welcome</div>
<script>analytics.track('view')</script>
After (PDF text):
Welcome
(the .hero CSS and the analytics call are gone — Welcome
renders at the default 10pt, not 48px)A line longer than 100 characters — silently clipped
There is no word-wrap. A single long line (a URL, a CSS-collapsed paragraph) is cut at character 100. If your text matters, pre-wrap it in the source before converting.
Before (one 130-char line): <p>This is an unusually long sentence that keeps going well past one hundred characters before it ever stops here.</p> After (PDF text — clipped at 100): This is an unusually long sentence that keeps going well past one hundred characters before it eve
Entities: four are decoded, the rest are literal
Only & < > are translated. A copyright sign or accented entity comes through as the raw escape text — and accented letters would not render in Helvetica anyway.
Before (HTML): <p>Acme & Co © 2026 — café</p> After (PDF text): Acme & Co © 2026 — café (only & decoded; © / — / é stay literal)
A table flattens to a single column
Rows and columns are not reconstructed. Each cell's text becomes its own line in document order — usually unreadable for data. Export tabular data a different way.
Before (HTML): <table><tr><td>Item</td><td>Qty</td></tr> <tr><td>Widget</td><td>3</td></tr></table> After (PDF text): Item Qty Widget 3
Edge cases and what actually happens
You expected the CSS layout to come through
By designThis converter discards all CSS. The PDF is a single-column 10pt text dump in document order. For a faithful layout, use the browser's Print → Save as PDF, or capture a screenshot and run it through image-to-pdf.
The page is a JavaScript single-page app
Empty result<script> is stripped and never executed, so a React/Vue/Angular shell that renders content at runtime yields a nearly blank PDF — only the static fallback markup survives. Save the fully-rendered DOM as HTML first (View Source won't help; use 'Save Page As' after it loads).
A line is longer than 100 characters
TruncatedEvery line is clipped at 100 characters with no word-wrap. Long URLs and CSS-collapsed paragraphs lose their tail. Insert line breaks in the source HTML before converting if the full text must survive.
Non-Latin or accented text (Chinese, Arabic, é, ñ)
Render errorThe PDF uses Helvetica, which is WinAnsi-encoded. Characters outside Latin-1 cannot be drawn and will fail or be dropped. There is no embedded-font option in this tool; for CJK/RTL content, capture the page as an image and use image-to-pdf.
Images and logos are missing from the PDF
By designThe converter draws text only — <img> tags and CSS backgrounds are stripped. No image is ever embedded. If you need the logo, screenshot the rendered page and use image-to-pdf.
The file is over the free 2 MB limit
413 blockedFree conversions cap the input at 2 MB. A large self-contained HTML file (lots of inline base64 assets) can exceed this even though the visible text is small. Pro raises the cap to 50 MB; stripping embedded data URIs from the source first also helps.
You pasted HTML expecting a text box
Upload onlyThere is no paste-HTML field for this tool — input is by file upload (.html / .htm). Save your markup to a file first, then drop it in.
Headings all look the same size
ExpectedEvery line, including <h1>–<h6>, renders at 10pt Helvetica. The converter does not size headings. If you want visual heading hierarchy, author the content in Markdown and use markdown-to-pdf, which styles headings and bold.
Blank lines are missing where you had spacing
CollapsedRuns of three or more newlines are collapsed to two. CSS margins/padding that created visual spacing are gone, so the PDF is more tightly packed than the page looked in the browser.
Frequently asked questions
Does this produce a pixel-perfect copy of my web page?
No. It extracts the text content and lays it into a single-column PDF in 10pt Helvetica. CSS, images, fonts, and JavaScript are all discarded. For a faithful visual copy, use your browser's Print → Save as PDF, or screenshot the page and run it through image-to-pdf.
Is my HTML uploaded anywhere?
No. Conversion runs entirely in your browser using pdf-lib. The file never leaves your device — only anonymous usage counters are recorded if you're signed in. That's why it's safe for internal or unpublished pages.
Will external stylesheets and images be included?
Neither. External (and inline) CSS is stripped, and images are never embedded — the converter draws text only. There is no asset-fetching step at all; the tool works purely on the markup you upload.
Can I control the page size (A4, Letter, custom)?
No. Output is always US-Letter (612×792pt). There is no page-size selector for this tool. If you need A4 dimensions specifically, generate the PDF here, then change page size with the pdf-resize tool.
Can I control page breaks with CSS like page-break-before?
No. CSS is discarded, so page-break-* properties have no effect. Pages break automatically when text reaches the bottom margin (y < 50pt). You can't force a break from the HTML.
Is JavaScript executed before rendering?
No. <script> blocks are stripped and never run. Content that a single-page app renders at runtime will not appear. Save the fully-loaded page's HTML and convert that, or use the browser's Print to PDF which does run the page.
Why is some text cut off at the end of a line?
Each line is truncated to 100 characters and there is no word-wrap. Long unbroken lines (URLs, CSS-collapsed paragraphs) lose everything past column 100. Add line breaks in the source HTML before converting to keep the full text.
Will accented or non-Latin characters work?
No. The PDF uses Helvetica (Latin-1 / WinAnsi), so accented, CJK, Arabic, and similar characters can't be drawn and may cause an error or be dropped. For those scripts, capture the page as an image and use image-to-pdf.
Do my HTML entities get decoded?
Only the four core ones: , &, <, and >. Anything else — ©, —, ', named accents — comes through as the literal escape text. Replace them in the source with plain characters before converting if needed.
Will tables keep their rows and columns?
No. Tables flatten to one line per cell in document order. For tabular data, this is usually unreadable — keep the data in CSV/Excel, or capture the rendered table as an image and use image-to-pdf.
What's the file-size limit?
Free conversions accept HTML files up to 2 MB (one file at a time). Pro raises the limit to 50 MB and allows batches of up to 5 files. Inline base64 assets inflate file size quickly, so strip them if you're near the cap.
What's the best free alternative for a faithful render?
Your own browser. Open the page and choose Print → Save as PDF (Destination: Save as PDF). It uses the browser's layout engine, so CSS, fonts, and images all come across. This tool is for getting clean text out, not for visual fidelity.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.