How to save a webpage as a pdf from its html source
- Step 1Save the page's HTML — In your browser choose File → Save Page As. For this text-extraction tool, Webpage, HTML Only is enough — the linked CSS and images would be discarded anyway. The result is a
.htmlfile the converter can read. - Step 2Open the converter and drop the saved file — Load it into the HTML to PDF converter. Everything happens locally — the saved page is never uploaded. Input is by file upload; there's no URL-fetch and no paste box.
- Step 3Let it extract the readable text — Styles and scripts are removed, tags become line breaks, and the four core entities are decoded. The text is drawn in 10pt Helvetica on US-Letter pages — there is no page-size choice.
- Step 4Name the file for your archive — Save with a filename that encodes the source and capture date, e.g.
2026-06-06_example-com_article-title.pdf. This keeps a flat archive folder searchable later. - Step 5Verify the text you cared about survived — Open the PDF and confirm the article body, thread replies, or reference text are intact. Remember tables flatten to single lines and any line over 100 characters is clipped.
- Step 6Capture the visuals separately if you need them — For evidentiary or design archives where appearance matters, also save a screenshot and run it through image-to-pdf, or use the browser's Print → Save as PDF for a layout-faithful copy alongside the text archive.
Save As options vs what this tool uses
Because the converter discards CSS and images, the lighter 'HTML Only' save is sufficient — the heavier 'Complete' save buys you nothing here.
| Browser Save mode | Includes | Matters for this tool? |
|---|---|---|
| Webpage, HTML Only | Single .html, markup only | Yes — this is all the converter needs; it only reads text. |
| Webpage, Complete | .html + CSS + images folder | No benefit — CSS and images are stripped during conversion. |
| Web Page, Single File (MHTML) | .mht/.mhtml archive | Not accepted — convert/save to .html first; the tool reads .html/.htm. |
| Reader Mode → Print to PDF | Decluttered visual PDF | Different output — keeps layout/fonts; use this when appearance matters. |
What lands in the archive PDF
The exact transforms applied to your saved HTML.
| Element of the saved page | In the archive? | Detail |
|---|---|---|
| Article / thread body text | Kept | Drawn line by line in 10pt Helvetica. |
| Navigation, sidebars, footers | Kept as text | Their text is included too — there's no main-content detection, so chrome text appears inline. |
| Cookie-banner / ad scripts | Removed | <script> blocks are stripped before any drawing. |
| Page styling and theme | Removed | All CSS discarded; output is plain black-on-white. |
| Images, logos, screenshots | Removed | Text-only output; no image embedding. |
| Hyperlink targets | URL lost | Link text remains; the destination URL is dropped. |
Cookbook
Practical archiving recipes. 'Before' is what you saved; 'after' is what ends up in the PDF archive.
Archiving a documentation page
Reference docs are mostly prose and headings — a good fit. The text and structure-in-reading-order survive; only the visual styling is lost.
Before (saved docs HTML): <h2>Authentication</h2> <p>Send the token in the Authorization header.</p> <code>Authorization: Bearer <token></code> After (PDF archive text): Authentication Send the token in the Authorization header. Authorization: Bearer <token>
Nav and footer text leak into the archive
There is no 'main content' extraction. The header menu, sidebar, and footer text all appear because their tags become lines too. Trim them from the HTML first if you want a clean capture.
Before (full-page save): <nav>Home About Pricing Blog</nav> <article>The actual post you wanted...</article> <footer>© 2026 Example Inc</footer> After (PDF archive text): Home About Pricing Blog The actual post you wanted... © 2026 Example Inc
A long forum thread paginates automatically
Threads run long; the converter starts a new Letter page each time text reaches the bottom margin, so a 40-reply thread spreads across however many pages it needs.
Input: saved-thread.html (≈ 18,000 words) Process: text drawn at 14pt leading, new page at y < 50 Result: one PDF, ~30 Letter pages, fully text-searchable (reply order is preserved; avatars and reaction icons are gone)
MHTML single-file save isn't accepted
Chrome's 'Single File' save produces .mht/.mhtml, which this tool doesn't read. Re-save as HTML Only, or open the .mht and save its source as .html.
page.mhtml → not accepted (tool reads .html / .htm)
Fix: open the page again → Save Page As →
Webpage, HTML Only → page.html → convertPair a text archive with a visual snapshot
For a complete record, keep both: this tool for the searchable text, plus an image-to-pdf capture for how the page looked. Store them together under one dated name.
2026-06-06_example-com_post/ ├─ text.pdf (this tool — searchable words) └─ visual.pdf (screenshot → image-to-pdf — layout/colors)
Edge cases and what actually happens
The archive looks nothing like the live page
By designThis tool archives text, not appearance. Layout, theme, images, and fonts are all dropped. If you need a visual record, capture a screenshot and use image-to-pdf, or use the browser's Print → Save as PDF.
You saved a page from a JavaScript-rendered site
Empty resultIf the content is injected by JS and your 'Save As' captured the pre-render shell, scripts are stripped and never run — the archive is nearly blank. Save the page after it fully loads (some browsers' 'Save As' captures the live DOM; otherwise copy the rendered HTML).
An MHTML / .mht single-file save
Unsupported formatThe converter reads .html and .htm only. Re-save the page as 'Webpage, HTML Only' to get a compatible file.
Navigation and footer clutter the capture
ExpectedThere is no main-content extraction, so chrome text (menus, sidebars, footers) appears inline with the article. Delete those sections from the saved HTML before converting for a clean archive.
Long lines or wide code blocks get cut
TruncatedLines are clipped at 100 characters with no wrapping — wide code samples and long URLs lose their tails. Add line breaks in the source first.
Accented / non-Latin page content
Render errorHelvetica is Latin-1 only; non-Latin scripts (and many accented letters) can't be drawn. For multilingual pages, archive a screenshot via image-to-pdf instead.
You need a certified, timestamped legal capture
Out of scopeThis produces a working text copy, not a tamper-evident, timestamped capture. For legal web evidence use a dedicated archiving service (e.g. one that records a cryptographic hash and capture time).
Saved file exceeds the free 2 MB limit
413 blockedA 'Complete' save or a page with large inline data URIs can blow past 2 MB. Use 'HTML Only', strip inline assets, or upgrade to Pro (50 MB).
Frequently asked questions
Will the CSS styling be included from an HTML-only save?
No — and it wouldn't matter if you did a 'Complete' save either, because this converter strips all CSS regardless. The archive is plain black 10pt Helvetica text. For styling, use the browser's Print → Save as PDF, which keeps the layout.
Do I need 'Save As: Complete' or is 'HTML Only' enough?
'HTML Only' is enough. The converter discards the CSS and images that 'Complete' adds, so the heavier save buys you nothing here. Save the lighter file and convert it.
What if the page uses Google Fonts or other web fonts?
Web fonts are ignored entirely. The PDF is always rendered in Helvetica. If a brand font is essential to your archive, capture the rendered page as an image and use image-to-pdf.
Is this suitable for legal web-evidence archiving?
Not on its own. It makes a functional, searchable text copy but offers no timestamp, hash, or certification. For evidence, use a dedicated web-archiving service that certifies the capture; you can keep this PDF as a convenient working copy.
Can I save a page directly from its URL?
No. The tool doesn't fetch URLs — you supply the saved .html file. Save the page with File → Save Page As first, then upload the file.
Will images and the page logo be in the archive?
No. The converter draws text only; all images are stripped. To preserve images, screenshot the page and run the image through image-to-pdf.
Does it keep the navigation, ads, and footer?
It keeps their text (there's no main-content detection), but their styling and scripts are gone. For a clean capture, delete those sections from the saved HTML before converting.
How big a page can I archive for free?
Up to 2 MB per file on the free tier, one file at a time. Pro raises it to 50 MB and allows 5-file batches — handy for archiving several saved pages in one go.
Are MHTML or .webarchive files supported?
No. The tool reads .html and .htm. Re-save the page as 'Webpage, HTML Only' to get a file it can read.
Why are some lines cut short?
Each line is truncated at 100 characters and there's no word-wrap. Long URLs and wide code blocks lose their ends. Insert line breaks in the source HTML before converting.
Is the saved page sent to a server?
No. Conversion is fully client-side via pdf-lib, so even pages you saved while logged in stay on your device. Only anonymous usage counts are recorded when you're signed in.
Can I archive a non-English page?
Only if it's Latin-1 text. Helvetica can't render CJK, Arabic, and many accented characters, so those pages won't archive correctly here — capture them as an image and use image-to-pdf instead.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.