Convert an HTML Page to a PDF (Text Content) — Free Browser Tool

How to convert an html page to a pdf document

Step 1
Save the page as a standalone .html file — In your browser use File → Save Page As → Webpage, HTML Only (or copy the page source into a .html file). The tool reads .html and .htm files (text/html). It does not fetch live URLs — you supply the markup, not a web address.
Step 2
Open the converter and drop the file — Load the file into the HTML to PDF converter. Parsing happens locally in your browser via pdf-lib — nothing is uploaded. Upload is the only input path; there is no paste-HTML box.
Step 3
Let it strip styles, scripts, and tags — The converter removes every <style> and <script> block, turns each remaining tag into a line break, and decodes the four core entities. You do not choose a page size — output is always US-Letter (612×792pt).
Step 4
Understand the single-column text layout — Text is drawn left-aligned at x=50 in 10pt Helvetica, one source line per PDF line, 14pt apart. There is no word-wrap: any line longer than 100 characters is clipped at 100. Tables, columns, and floats collapse into a plain vertical sequence.
Step 5
Download the paginated PDF — The PDF is generated in-page and downloaded straight to your device. Open it to confirm the text you needed survived — especially headings, line items, and any non-Latin characters.
Step 6
Switch tools if you needed visual fidelity — If the layout, images, or fonts mattered, this is the wrong direction. Use your browser's Print → Save as PDF for a faithful render, or screenshot the page and feed the image to image-to-pdf. For structured authored content, markdown-to-pdf styles headings and bold.

What survives HTML → PDF conversion

Mapped to the exact transforms the converter applies (strip style/script, tag→newline, decode four entities, draw 10pt Helvetica). Anything not in the 'kept' column is discarded.

HTML feature	In the PDF?	Why
Visible text content	Kept	Text between tags is drawn line by line in 10pt Helvetica.
Headings (`<h1>`…`<h6>`)	Text kept, styling lost	The heading text appears, but at the same 10pt size as body text — there is no heading sizing.
CSS (inline, `<style>`, external)	Discarded	`<style>` blocks are deleted outright; `style=` attributes and external stylesheets are never applied.
JavaScript (`<script>`)	Discarded, never run	`<script>` blocks are deleted. JS is not executed, so client-rendered content never appears.
Images (`<img>`, CSS backgrounds)	Discarded	Tags are stripped; the converter draws text only — no image embedding.
Tables (`<table>`)	Flattened to lines	Each cell's text becomes its own line in reading order; rows and columns are not preserved.
Links (`<a href>`)	Anchor text kept, URL lost	The link text shows; the `href` is dropped and no clickable link is created.
Core entities `  & < >`	Decoded	Converted to space, `&`, `<`, `>` respectively.
Other entities (`©`, `'`, `é`)	Passed through literally	Only the four core entities are decoded; the rest render as the raw escape text.

Rendering specifics (defaults you can't change)

These are fixed in the engine — there is no options panel for this tool.

Property	Value	Consequence
Page size	US-Letter, 612×792pt	No A4/Legal/custom selector exists for this tool.
Font	Helvetica, 10pt, black	Brand fonts and CSS font-family are ignored; non-Latin glyphs do not render.
Left margin	x = 50pt	All text starts at a single left edge — no centering or indentation.
Line spacing	14pt per line	Fixed leading regardless of source font size.
Line length	Truncated to 100 characters	Long lines are clipped, not wrapped — text past column 100 is lost.
Pagination	New page when y < 50pt	Long documents auto-split across Letter pages.

Pick the right tool for your goal

This converter is text-extraction-grade. Use a sibling when fidelity or structure matters.

You need…	Best tool	Why
Just the readable text in a PDF	html-to-pdf (this tool)	Strips markup, paginates the text.
The page to look exactly as in the browser	Browser Print → Save as PDF	Uses the browser's own layout engine; this tool has none.
A visual capture of a styled page	image-to-pdf	Screenshot the page, then turn the image into a PDF page.
Styled headings/bold from authored content	markdown-to-pdf	Renders heading sizes and bold; HTML-to-PDF does not.
Go the other way (PDF → HTML)	pdf-to-html	Extracts PDF text into semantic HTML.

Cookbook

Real before/after examples showing exactly what the converter keeps and drops. The 'before' is HTML source; the 'after' is the text that lands in the PDF.

A simple article — clean result

Plain prose with headings and paragraphs is the sweet spot. All the words come through; only the heading sizing is lost (everything renders at 10pt).

Before (HTML):
<h1>Quarterly Notes</h1>
<p>Revenue rose 12% on stronger retention.</p>
<p>Churn fell to 1.8% for the period.</p>

After (PDF text, 10pt Helvetica):
Quarterly Notes
Revenue rose 12% on stronger retention.
Churn fell to 1.8% for the period.

Inline CSS and a tracking script — both vanish

The <style> and <script> blocks are deleted before any text is drawn, so neither leaks into the PDF as garbage text. Only the visible body text remains.

Before (HTML):
<style>.hero{font-size:48px;color:#09c}</style>
<div class="hero">Welcome</div>
<script>analytics.track('view')</script>

After (PDF text):
Welcome

(the .hero CSS and the analytics call are gone — Welcome
renders at the default 10pt, not 48px)

A line longer than 100 characters — silently clipped

There is no word-wrap. A single long line (a URL, a CSS-collapsed paragraph) is cut at character 100. If your text matters, pre-wrap it in the source before converting.

Before (one 130-char line):
<p>This is an unusually long sentence that keeps going well past one hundred characters before it ever stops here.</p>

After (PDF text — clipped at 100):
This is an unusually long sentence that keeps going well past one hundred characters before it eve

Entities: four are decoded, the rest are literal

Only   & < > are translated. A copyright sign or accented entity comes through as the raw escape text — and accented letters would not render in Helvetica anyway.

Before (HTML):
<p>Acme &amp; Co &copy; 2026 &mdash; caf&eacute;</p>

After (PDF text):
Acme & Co &copy; 2026 &mdash; caf&eacute;

(only &amp; decoded; &copy; / &mdash; / &eacute; stay literal)

A table flattens to a single column

Rows and columns are not reconstructed. Each cell's text becomes its own line in document order — usually unreadable for data. Export tabular data a different way.

Before (HTML):
<table><tr><td>Item</td><td>Qty</td></tr>
<tr><td>Widget</td><td>3</td></tr></table>

After (PDF text):
Item
Qty
Widget
3

Edge cases and what actually happens

You expected the CSS layout to come through

By design

This converter discards all CSS. The PDF is a single-column 10pt text dump in document order. For a faithful layout, use the browser's Print → Save as PDF, or capture a screenshot and run it through image-to-pdf.

The page is a JavaScript single-page app

Empty result

<script> is stripped and never executed, so a React/Vue/Angular shell that renders content at runtime yields a nearly blank PDF — only the static fallback markup survives. Save the fully-rendered DOM as HTML first (View Source won't help; use 'Save Page As' after it loads).

A line is longer than 100 characters

Truncated

Every line is clipped at 100 characters with no word-wrap. Long URLs and CSS-collapsed paragraphs lose their tail. Insert line breaks in the source HTML before converting if the full text must survive.

Non-Latin or accented text (Chinese, Arabic, é, ñ)

Render error

The PDF uses Helvetica, which is WinAnsi-encoded. Characters outside Latin-1 cannot be drawn and will fail or be dropped. There is no embedded-font option in this tool; for CJK/RTL content, capture the page as an image and use image-to-pdf.

Images and logos are missing from the PDF

By design

The converter draws text only — <img> tags and CSS backgrounds are stripped. No image is ever embedded. If you need the logo, screenshot the rendered page and use image-to-pdf.

The file is over the free 2 MB limit

413 blocked

Free conversions cap the input at 2 MB. A large self-contained HTML file (lots of inline base64 assets) can exceed this even though the visible text is small. Pro raises the cap to 50 MB; stripping embedded data URIs from the source first also helps.

You pasted HTML expecting a text box

Upload only

There is no paste-HTML field for this tool — input is by file upload (.html / .htm). Save your markup to a file first, then drop it in.

Headings all look the same size

Expected

Every line, including <h1>–<h6>, renders at 10pt Helvetica. The converter does not size headings. If you want visual heading hierarchy, author the content in Markdown and use markdown-to-pdf, which styles headings and bold.

Blank lines are missing where you had spacing

Collapsed

Runs of three or more newlines are collapsed to two. CSS margins/padding that created visual spacing are gone, so the PDF is more tightly packed than the page looked in the browser.

Frequently asked questions

Does this produce a pixel-perfect copy of my web page?

No. It extracts the text content and lays it into a single-column PDF in 10pt Helvetica. CSS, images, fonts, and JavaScript are all discarded. For a faithful visual copy, use your browser's Print → Save as PDF, or screenshot the page and run it through image-to-pdf.

Is my HTML uploaded anywhere?

No. Conversion runs entirely in your browser using pdf-lib. The file never leaves your device — only anonymous usage counters are recorded if you're signed in. That's why it's safe for internal or unpublished pages.

Will external stylesheets and images be included?

Neither. External (and inline) CSS is stripped, and images are never embedded — the converter draws text only. There is no asset-fetching step at all; the tool works purely on the markup you upload.

Can I control the page size (A4, Letter, custom)?

No. Output is always US-Letter (612×792pt). There is no page-size selector for this tool. If you need A4 dimensions specifically, generate the PDF here, then change page size with the pdf-resize tool.

Can I control page breaks with CSS like page-break-before?

No. CSS is discarded, so page-break-* properties have no effect. Pages break automatically when text reaches the bottom margin (y < 50pt). You can't force a break from the HTML.

Is JavaScript executed before rendering?

No. <script> blocks are stripped and never run. Content that a single-page app renders at runtime will not appear. Save the fully-loaded page's HTML and convert that, or use the browser's Print to PDF which does run the page.

Why is some text cut off at the end of a line?

Each line is truncated to 100 characters and there is no word-wrap. Long unbroken lines (URLs, CSS-collapsed paragraphs) lose everything past column 100. Add line breaks in the source HTML before converting to keep the full text.

Will accented or non-Latin characters work?

No. The PDF uses Helvetica (Latin-1 / WinAnsi), so accented, CJK, Arabic, and similar characters can't be drawn and may cause an error or be dropped. For those scripts, capture the page as an image and use image-to-pdf.

Do my HTML entities get decoded?

Only the four core ones:  , &, <, and >. Anything else — ©, —, ', named accents — comes through as the literal escape text. Replace them in the source with plain characters before converting if needed.

Will tables keep their rows and columns?

No. Tables flatten to one line per cell in document order. For tabular data, this is usually unreadable — keep the data in CSV/Excel, or capture the rendered table as an image and use image-to-pdf.

What's the file-size limit?

Free conversions accept HTML files up to 2 MB (one file at a time). Pro raises the limit to 50 MB and allows batches of up to 5 files. Inline base64 assets inflate file size quickly, so strip them if you're near the cap.

What's the best free alternative for a faithful render?

Your own browser. Open the page and choose Print → Save as PDF (Destination: Save as PDF). It uses the browser's layout engine, so CSS, fonts, and images all come across. This tool is for getting clean text out, not for visual fidelity.

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

How to convert an html page to a pdf document

Step 1
Save the page as a standalone .html file — In your browser use File → Save Page As → Webpage, HTML Only (or copy the page source into a .html file). The tool reads .html and .htm files (text/html). It does not fetch live URLs — you supply the markup, not a web address.
Step 2
Open the converter and drop the file — Load the file into the HTML to PDF converter. Parsing happens locally in your browser via pdf-lib — nothing is uploaded. Upload is the only input path; there is no paste-HTML box.
Step 3
Let it strip styles, scripts, and tags — The converter removes every <style> and <script> block, turns each remaining tag into a line break, and decodes the four core entities. You do not choose a page size — output is always US-Letter (612×792pt).
Step 4
Understand the single-column text layout — Text is drawn left-aligned at x=50 in 10pt Helvetica, one source line per PDF line, 14pt apart. There is no word-wrap: any line longer than 100 characters is clipped at 100. Tables, columns, and floats collapse into a plain vertical sequence.
Step 5
Download the paginated PDF — The PDF is generated in-page and downloaded straight to your device. Open it to confirm the text you needed survived — especially headings, line items, and any non-Latin characters.
Step 6
Switch tools if you needed visual fidelity — If the layout, images, or fonts mattered, this is the wrong direction. Use your browser's Print → Save as PDF for a faithful render, or screenshot the page and feed the image to image-to-pdf. For structured authored content, markdown-to-pdf styles headings and bold.

What survives HTML → PDF conversion

Mapped to the exact transforms the converter applies (strip style/script, tag→newline, decode four entities, draw 10pt Helvetica). Anything not in the 'kept' column is discarded.

HTML feature	In the PDF?	Why
Visible text content	Kept	Text between tags is drawn line by line in 10pt Helvetica.
Headings (`<h1>`…`<h6>`)	Text kept, styling lost	The heading text appears, but at the same 10pt size as body text — there is no heading sizing.
CSS (inline, `<style>`, external)	Discarded	`<style>` blocks are deleted outright; `style=` attributes and external stylesheets are never applied.
JavaScript (`<script>`)	Discarded, never run	`<script>` blocks are deleted. JS is not executed, so client-rendered content never appears.
Images (`<img>`, CSS backgrounds)	Discarded	Tags are stripped; the converter draws text only — no image embedding.
Tables (`<table>`)	Flattened to lines	Each cell's text becomes its own line in reading order; rows and columns are not preserved.
Links (`<a href>`)	Anchor text kept, URL lost	The link text shows; the `href` is dropped and no clickable link is created.
Core entities `  & < >`	Decoded	Converted to space, `&`, `<`, `>` respectively.
Other entities (`©`, `'`, `é`)	Passed through literally	Only the four core entities are decoded; the rest render as the raw escape text.

Rendering specifics (defaults you can't change)

These are fixed in the engine — there is no options panel for this tool.

Property	Value	Consequence
Page size	US-Letter, 612×792pt	No A4/Legal/custom selector exists for this tool.
Font	Helvetica, 10pt, black	Brand fonts and CSS font-family are ignored; non-Latin glyphs do not render.
Left margin	x = 50pt	All text starts at a single left edge — no centering or indentation.
Line spacing	14pt per line	Fixed leading regardless of source font size.
Line length	Truncated to 100 characters	Long lines are clipped, not wrapped — text past column 100 is lost.
Pagination	New page when y < 50pt	Long documents auto-split across Letter pages.

Pick the right tool for your goal

This converter is text-extraction-grade. Use a sibling when fidelity or structure matters.

You need…	Best tool	Why
Just the readable text in a PDF	html-to-pdf (this tool)	Strips markup, paginates the text.
The page to look exactly as in the browser	Browser Print → Save as PDF	Uses the browser's own layout engine; this tool has none.
A visual capture of a styled page	image-to-pdf	Screenshot the page, then turn the image into a PDF page.
Styled headings/bold from authored content	markdown-to-pdf	Renders heading sizes and bold; HTML-to-PDF does not.
Go the other way (PDF → HTML)	pdf-to-html	Extracts PDF text into semantic HTML.

Cookbook

Real before/after examples showing exactly what the converter keeps and drops. The 'before' is HTML source; the 'after' is the text that lands in the PDF.

A simple article — clean result

Plain prose with headings and paragraphs is the sweet spot. All the words come through; only the heading sizing is lost (everything renders at 10pt).

Before (HTML):
<h1>Quarterly Notes</h1>
<p>Revenue rose 12% on stronger retention.</p>
<p>Churn fell to 1.8% for the period.</p>

After (PDF text, 10pt Helvetica):
Quarterly Notes
Revenue rose 12% on stronger retention.
Churn fell to 1.8% for the period.

Inline CSS and a tracking script — both vanish

The <style> and <script> blocks are deleted before any text is drawn, so neither leaks into the PDF as garbage text. Only the visible body text remains.

Before (HTML):
<style>.hero{font-size:48px;color:#09c}</style>
<div class="hero">Welcome</div>
<script>analytics.track('view')</script>

After (PDF text):
Welcome

(the .hero CSS and the analytics call are gone — Welcome
renders at the default 10pt, not 48px)

A line longer than 100 characters — silently clipped

There is no word-wrap. A single long line (a URL, a CSS-collapsed paragraph) is cut at character 100. If your text matters, pre-wrap it in the source before converting.

Before (one 130-char line):
<p>This is an unusually long sentence that keeps going well past one hundred characters before it ever stops here.</p>

After (PDF text — clipped at 100):
This is an unusually long sentence that keeps going well past one hundred characters before it eve

Entities: four are decoded, the rest are literal

Only   & < > are translated. A copyright sign or accented entity comes through as the raw escape text — and accented letters would not render in Helvetica anyway.

Before (HTML):
<p>Acme &amp; Co &copy; 2026 &mdash; caf&eacute;</p>

After (PDF text):
Acme & Co &copy; 2026 &mdash; caf&eacute;

(only &amp; decoded; &copy; / &mdash; / &eacute; stay literal)

A table flattens to a single column

Rows and columns are not reconstructed. Each cell's text becomes its own line in document order — usually unreadable for data. Export tabular data a different way.

Before (HTML):
<table><tr><td>Item</td><td>Qty</td></tr>
<tr><td>Widget</td><td>3</td></tr></table>

After (PDF text):
Item
Qty
Widget
3

Edge cases and what actually happens

You expected the CSS layout to come through

By design

The page is a JavaScript single-page app

Empty result

A line is longer than 100 characters

Truncated

Non-Latin or accented text (Chinese, Arabic, é, ñ)

Render error

Images and logos are missing from the PDF

By design

The converter draws text only — <img> tags and CSS backgrounds are stripped. No image is ever embedded. If you need the logo, screenshot the rendered page and use image-to-pdf.

The file is over the free 2 MB limit

413 blocked

You pasted HTML expecting a text box

Upload only

There is no paste-HTML field for this tool — input is by file upload (.html / .htm). Save your markup to a file first, then drop it in.

Headings all look the same size

Expected

Blank lines are missing where you had spacing

Collapsed

Runs of three or more newlines are collapsed to two. CSS margins/padding that created visual spacing are gone, so the PDF is more tightly packed than the page looked in the browser.

Frequently asked questions

Does this produce a pixel-perfect copy of my web page?

Is my HTML uploaded anywhere?

Will external stylesheets and images be included?

Can I control the page size (A4, Letter, custom)?

Can I control page breaks with CSS like page-break-before?

No. CSS is discarded, so page-break-* properties have no effect. Pages break automatically when text reaches the bottom margin (y < 50pt). You can't force a break from the HTML.

Is JavaScript executed before rendering?

Why is some text cut off at the end of a line?

Will accented or non-Latin characters work?

Do my HTML entities get decoded?

Will tables keep their rows and columns?

No. Tables flatten to one line per cell in document order. For tabular data, this is usually unreadable — keep the data in CSV/Excel, or capture the rendered table as an image and use image-to-pdf.

What's the file-size limit?

What's the best free alternative for a faithful render?

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

Convert an HTML Page to a PDF Document

How to convert an html page to a pdf document

What survives HTML → PDF conversion

Rendering specifics (defaults you can't change)

Pick the right tool for your goal

Cookbook

A simple article — clean result

Inline CSS and a tracking script — both vanish

A line longer than 100 characters — silently clipped

Entities: four are decoded, the rest are literal

A table flattens to a single column

Edge cases and what actually happens

You expected the CSS layout to come through

The page is a JavaScript single-page app

A line is longer than 100 characters

Non-Latin or accented text (Chinese, Arabic, é, ñ)

Images and logos are missing from the PDF

The file is over the free 2 MB limit

You pasted HTML expecting a text box

Headings all look the same size

Blank lines are missing where you had spacing

Frequently asked questions

Does this produce a pixel-perfect copy of my web page?

Is my HTML uploaded anywhere?

Will external stylesheets and images be included?

Can I control the page size (A4, Letter, custom)?

Can I control page breaks with CSS like page-break-before?

Is JavaScript executed before rendering?

Why is some text cut off at the end of a line?

Will accented or non-Latin characters work?

Do my HTML entities get decoded?

Will tables keep their rows and columns?

What's the file-size limit?

What's the best free alternative for a faithful render?

Privacy first

Related guides

Convert an HTML Page to a PDF Document

How to convert an html page to a pdf document

What survives HTML → PDF conversion

Rendering specifics (defaults you can't change)

Pick the right tool for your goal

Cookbook

A simple article — clean result

Inline CSS and a tracking script — both vanish

A line longer than 100 characters — silently clipped

Entities: four are decoded, the rest are literal

A table flattens to a single column

Edge cases and what actually happens

You expected the CSS layout to come through

The page is a JavaScript single-page app

A line is longer than 100 characters

Non-Latin or accented text (Chinese, Arabic, é, ñ)

Images and logos are missing from the PDF

The file is over the free 2 MB limit

You pasted HTML expecting a text box

Headings all look the same size

Blank lines are missing where you had spacing

Frequently asked questions

Does this produce a pixel-perfect copy of my web page?

Is my HTML uploaded anywhere?

Will external stylesheets and images be included?

Can I control the page size (A4, Letter, custom)?

Can I control page breaks with CSS like page-break-before?

Is JavaScript executed before rendering?

Why is some text cut off at the end of a line?

Will accented or non-Latin characters work?

Do my HTML entities get decoded?

Will tables keep their rows and columns?

What's the file-size limit?

What's the best free alternative for a faithful render?

Privacy first

Related guides