Save a Web Page as Markdown — Free Browser Tool

How to save a web page as clean markdown

Step 1
Capture the page HTML the right way — For a server-rendered page, View Source (Ctrl+U) and copy. For a JavaScript-rendered page (most modern sites), View Source is an empty shell — instead open DevTools, find the <article> or main content element, right-click → Copy → Copy outerHTML.
Step 2
Copy only the main content — This is the single most important step. Copying the whole <body> drags in nav, ads, comments, and <script>/<style> blocks that leak as text. Copy the smallest element that contains the article — usually <article>, <main>, or the content <div>.
Step 3
Paste and run — Choose Paste text, drop the HTML in, and run. There are no options — Turndown converts deterministically. (Upload also works if you saved the HTML to a .html file.)
Step 4
Scan for leaked junk — Check the Markdown for stray CSS (.class{...}) or script text — that's the tell that ad/analytics <script>/<style> came along. If so, re-copy a tighter element and convert again.
Step 5
Save to your vault — Copy the Markdown to your notes app or download it as a .md file. Add a source-URL line and capture date at the top so future-you knows where it came from.
Step 6
Tidy for storage — Optionally run the result through md-prettifier to normalize spacing, and md-link-validator to confirm the links you captured still resolve.

Capture method by page type

Which capture method to use, and what to expect in the Markdown.

Page type	How to capture	Result quality
Server-rendered article (blog, news)	View Source → copy `<article>`	Excellent — full content present
JS-rendered SPA (React/Vue/Next)	DevTools → Copy outerHTML of content element	Good — captures the rendered DOM
Whole `<body>` pasted	Not recommended	Noisy — nav/ads/script text leaks in
Paywalled / login-gated page	Capture only the HTML you can legitimately see	As good as the visible HTML
Infinite-scroll feed	Scroll to load, then copy the loaded section	Partial — only loaded items exist in DOM

What converts cleanly vs. what leaks

Turndown does not auto-clean a page. Knowing what survives tells you how tight your copy needs to be.

Element	Result	Action
`<article>` / `<main>` content	Clean Markdown	Copy this element
`<nav>`, `<header>`, `<footer>`	Links/text converted as-is	Exclude from your copy
`<script>` (ads, analytics)	Text leaks	Don't include — copy a tighter element
`<style>` blocks	CSS leaks as text	Don't include
`<aside>` (related links, ads)	Converted as content	Exclude from your copy
`<table>` with header	GFM pipe table	Kept — good
`<iframe>` (videos, embeds)	Empty output	Note the URL manually if needed

Tier limits for HTML input

Full-page HTML is large; the character count (not just bytes) is what's enforced.

Plan	Max file size	Max characters	Files per run
Free	1 MB	500,000	1
Pro	10 MB	5,000,000	10
Pro-media	50 MB	20,000,000	50
Developer	500 MB	Unlimited	Unlimited

Cookbook

Real capture scenarios and the Markdown they produce. The lesson in most of them: copy the article, not the page.

Article copied cleanly via Copy outerHTML

Copying just the <article> element gives clean, readable Markdown with no nav or ad noise.

HTML in (from DevTools Copy outerHTML on <article>):
<article><h1>How DNS Works</h1><p>DNS resolves <strong>names</strong> to IPs.</p></article>

Markdown out:
# How DNS Works

DNS resolves **names** to IPs.

What happens when you paste the whole page

Include the analytics script and you get its source text in your notes. This is why copying a tighter element matters.

HTML in (whole page fragment):
<script>ga('send','pageview');</script>
<style>.ad{display:block}</style>
<p>Real content.</p>

Markdown out:
ga('send','pageview');

.ad{display:block}

Real content.

→ Re-copy just the <article> to avoid the leaked script/style text.

A data table from a reference page

Reference pages often have comparison tables. With a header row, they convert to a clean GFM table you can search in your notes.

HTML in:
<table><thead><tr><th>Port</th><th>Service</th></tr></thead>
<tbody><tr><td>443</td><td>HTTPS</td></tr></tbody></table>

Markdown out:
| Port | Service |
| --- | --- |
| 443 | HTTPS |

Add provenance to your saved note

The tool gives you the body Markdown; add a source header yourself so the archive is traceable.

After conversion, prepend in your vault:
> Source: https://example.com/how-dns-works
> Captured: 2026-06-13

# How DNS Works

DNS resolves **names** to IPs.

A video embed leaves a gap

Tutorial pages with embedded videos lose the embed. Capture the URL so you can re-link it in your notes.

HTML in:
<iframe src="https://player.vimeo.com/video/12345"></iframe>
<p>Watch the walkthrough above.</p>

Markdown out:
Watch the walkthrough above.

→ The iframe produced nothing. Add the URL by hand:
[Walkthrough video](https://player.vimeo.com/video/12345)

Edge cases and what actually happens

It can't fetch a URL for you

Not supported

This is a converter, not a crawler. You must provide the HTML (View Source or Copy outerHTML). It never makes a network request to a page, which is also why it's fully private.

JavaScript-rendered pages come back empty from View Source

Expected

Modern SPAs build content client-side, so View Source returns an empty shell and the Markdown is nearly empty. Use DevTools → Copy outerHTML on the rendered element instead — that captures the live DOM.

Ads, nav, and cookie banners are not removed

By design

There is no readability/article-extraction step. Whatever HTML you paste is converted. The clean-up happens at capture time: copy only the <article>/main element, not the whole page.

Analytics/ad scripts leak as visible text

Leaked

If your copied HTML includes <script> (Google Analytics, ad tags), the script source appears as text in the Markdown — Turndown has no rule to drop it. Likewise <style> CSS. Copy a tighter element to avoid them.

Embedded videos and maps disappear

Dropped

<iframe> embeds (YouTube, Vimeo, Google Maps) produce empty output. If the embed matters for your archive, copy its src URL from the source HTML and add a Markdown link by hand.

Lazy-loaded images may have placeholder src

Preserved

Sites that lazy-load images often keep the real URL in data-src and put a placeholder in src. Turndown reads src, so you may capture ![](placeholder.gif). Capture after images load, or copy the real URL from data-src manually.

Infinite-scroll feeds only capture what's loaded

Partial

Only DOM that exists at copy time is converted. For an infinite-scroll page, scroll to load the section you want before Copy outerHTML, or you'll archive a fraction of the content.

A reference table without a header stays as HTML

By design

Layout-only tables (no <thead>/<th>) are left as raw HTML in the Markdown, since the GFM rule needs a header. Repair them with md-table-repair if you need clean output.

Whole-page capture exceeds the character limit

Rejected

Full HTML pages with inline SVG and scripts are huge and can blow past the Free 500,000-character cap. The tool reports the count and limit. Copy just the article (which also gives cleaner output) or upgrade for the 5,000,000-character Pro limit.

Frequently asked questions

Can I paste a URL and have it scrape the page?

No. It converts HTML you provide, not a URL it fetches. Use View Source (server-rendered) or DevTools Copy outerHTML (JS-rendered) to get the HTML, then paste it.

Will it handle JavaScript-rendered content?

Only if you give it the rendered HTML. View Source on a SPA returns an empty shell, so use DevTools → Copy outerHTML on the content element to capture the live DOM.

Does it strip ads, navigation, and scripts automatically?

No. There's no readability extraction. Worse, <script> and <style> text leaks into the output. The fix is to copy only the <article>/main element, not the whole page.

Is this for archiving to archive.org?

No — this produces a local Markdown copy for your own notes/archive. It's complementary to web-archive services, not a submission tool.

Is anything I capture sent to a server?

No. The conversion runs entirely in your browser, so the pages you capture stay on your device.

Why did a script's code end up in my notes?

You included a <script> block in the copied HTML. Turndown emits its text. Re-copy a tighter element (just the article) and convert again.

Do tables survive?

Tables with a header row become GFM pipe tables. Layout tables without a header stay as raw HTML — repair them with the table-repair tool if needed.

What about images?

Images become ![alt](src) references to the original URLs; files aren't downloaded. Lazy-loaded images may capture a placeholder src — grab the real URL from data-src if so.

Can I add the source URL automatically?

No — the tool outputs body Markdown only. Prepend a source/date line yourself in your notes app for provenance.

How big a page can I convert?

Free allows 500,000 characters. Full pages are large; copying just the article keeps you under the limit and produces cleaner Markdown. Pro raises it to 5,000,000.

Is this good for feeding an LLM?

Yes — Markdown is more token-efficient than HTML. For an LLM-focused workflow, see the dedicated guide at html-to-markdown-for-llm-input.

What format do I get out?

Markdown (.md). Copy it to your clipboard or download it. To convert Markdown back to HTML, use md-to-html.

Privacy first

All Markdown processing runs locally in your browser using JavaScript. No file is ever uploaded to JAD Apps servers — only metadata counters are saved for signed-in dashboard stats.

Save a Web Page as Clean Markdown