How to convert html to clean markdown
- Step 1Get your HTML — Use View Source or Copy outerHTML from your browser's DevTools (right-click an element → Copy → Copy outerHTML), or save the page as
.html. For a single article, copy just the article container's HTML rather than the whole page so navigation and footers don't end up in the Markdown. - Step 2Choose paste or upload — Pick Paste text and drop your HTML into the textarea (the placeholder reads
Paste HTML here...), or pick Upload file and select a.htmlor.htmfile. The character counter under the box tracks against your plan's limit (500,000 characters on Free). - Step 3Run the conversion — Click run. Turndown parses the HTML with the browser's
DOMParser, then applies its rules plus the GFM plugin (tables, strikethrough, task lists, highlighted code blocks). There are no settings — the output is deterministic for a given input. - Step 4Review the Markdown — Check three things: code fences kept their language tags, tables converted (header-less tables stay as raw HTML — see edge cases), and no
<style>/<script>text leaked in. Scan the output once before you commit it. - Step 5Copy or download — Use the copy button to put the Markdown on your clipboard, or download it as a
.mdfile. An uploadedpage.htmldownloads aspage.md; pasted input downloads asinput.md. - Step 6Tidy the result — Run the output through md-prettifier to normalize spacing, or rewrite relative image paths with md-image-path-rewriter before committing to a repo or static site.
HTML element → Markdown output
What Turndown 7.2.4 (this tool's exact config: headingStyle atx, codeBlockStyle fenced, GFM plugin) emits for common elements. Verified against the running converter.
| HTML element | Markdown output | Notes |
|---|---|---|
<h1>…<h6> | # … ###### | ATX style; the heading's id attribute is dropped |
<strong> / <b> | **text** | Double-asterisk strong delimiter |
<em> / <i> | _text_ | Underscore emphasis delimiter (Turndown default) |
<a href title> | [text](url "title") | Inlined links; relative and absolute URLs passed through verbatim |
<img src alt> |  | Reference only — the image file is not downloaded |
<ul> / <li> | * item | Bullet marker is * with 3 trailing spaces; nesting indents 4 spaces |
<ol> / <li> | 1. item | Numbered list |
<blockquote> | > text | Standard Markdown quote |
<pre><code> | fenced block | Language class (language-js) becomes the fence info string |
<code> (inline) | code | Backtick-wrapped inline code |
<table> with header | GFM pipe table | Requires a <thead>/<th> header row |
<del> / <s> / <strike> | ~text~ | Single-tilde strikethrough (this plugin version) |
<input type=checkbox> in <li> | * [ ] / * [x] | GFM task-list items |
<hr> | * * * | Thematic break |
<br> | two trailing spaces | Hard line break |
What is dropped, kept, or leaked
Non-content and presentational nodes. The script/style behavior is a real gotcha, not a feature.
| Input | Result | Why |
|---|---|---|
class, style, id, data-* attributes | Dropped | Markdown has no place for presentational attributes |
<div>, <span> wrappers | Unwrapped | Block divs add paragraph breaks; spans are inlined away |
<!-- comments --> | Dropped | Including WordPress <!-- wp:... --> block comments |
<iframe>, <video>, <audio>, <canvas> | Empty output | No Markdown equivalent; embeds are lost |
<script> body | Leaks as text | Turndown core has no remove rule for scripts — alert(1) becomes visible text |
<style> body | Leaks as text | CSS like .x{color:red} becomes visible text in the output |
HTML entities (&, <, ©) | Decoded (&, <, ©) | DOMParser resolves entities before conversion |
| Regular space | Non-breaking space collapses to an ordinary space |
Tier limits for HTML input
Limits apply to the character count of your HTML, not just file size. The character limit is separate from the byte limit.
| Plan | Max file size | Max characters | Files per run |
|---|---|---|---|
| Free | 1 MB | 500,000 | 1 |
| Pro | 10 MB | 5,000,000 | 10 |
| Pro-media | 50 MB | 20,000,000 | 50 |
| Developer | 500 MB | Unlimited | Unlimited |
Cookbook
Real before/after conversions from the running tool. Each shows the HTML you paste and the exact Markdown it produces.
A formatted paragraph with bold, italic, and a link
The bread-and-butter case. Note that emphasis uses underscores and strong uses asterisks — that's Turndown's default delimiter choice, and this tool does not change it.
HTML in: <p class="lead">Hello <strong>world</strong> and <em>italics</em> with <a href="/x">a link</a>.</p> Markdown out: Hello **world** and _italics_ with [a link](/x).
An HTML table becomes a GFM pipe table
The GFM plugin converts tables that have a header row. The <thead>/<th> row is what makes the conversion fire.
HTML in: <table> <thead><tr><th>Name</th><th>Role</th></tr></thead> <tbody><tr><td>Ada</td><td>Engineer</td></tr></tbody> </table> Markdown out: | Name | Role | | --- | --- | | Ada | Engineer |
A code block keeps its language
A <pre><code class="language-js"> produces a fenced block with the js info string, ready for syntax highlighting on a static site.
HTML in: <pre><code class="language-js">const x = 1;</code></pre> Markdown out: ```js const x = 1; ```
A nested list
Bullets use the * marker with three trailing spaces; sub-items indent by four spaces. This is consistent and renders correctly in every Markdown engine.
HTML in:
<ul><li>Setup<ul><li>Install deps</li></ul></li><li>Build</li></ul>
Markdown out:
* Setup
* Install deps
* BuildStrikethrough and task lists (GFM)
The GFM plugin adds strikethrough and checkbox lists. Note strikethrough uses a single tilde in this plugin version, and <del>, <s>, and <strike> all map to it.
HTML in: <p><del>old plan</del></p> <ul> <li><input type="checkbox" checked> shipped</li> <li><input type="checkbox"> pending</li> </ul> Markdown out: ~old plan~ * [x] shipped * [ ] pending
Edge cases and what actually happens
A `<style>` block becomes visible text
LeakedTurndown's core has no rule to remove <style> elements, so the CSS inside one (.x{color:red}) is emitted as plain text in the Markdown. If your source HTML has inline <style> blocks, strip them before pasting, or delete the resulting text after conversion.
A `<script>` body leaks into the output
LeakedSame root cause as <style>: there's no default remove rule, so <script>alert(1)</script> produces the text alert(1) in the Markdown. This is the opposite of "scripts are stripped" — scripts are not executed (DOMParser doesn't run them), but their source text survives. Remove <script> blocks from your HTML first.
A table with no header row stays as raw HTML
By designThe GFM table rule only fires when the table has a header (<thead> or <th>). A table built only from <tr><td> rows is left as a raw <table>...</table> block in the Markdown. Add a header row to the source, or repair the result with md-table-repair.
JavaScript-rendered content is missing
Not supportedThe tool converts the static HTML you give it. If a page builds its content client-side (React/Vue/Angular SPA), View Source returns an empty shell and the Markdown will be nearly empty. Use DevTools → Copy outerHTML on the rendered DOM instead of View Source.
Relative URLs pass through unchanged
PreservedA <img src="../uploads/x.png"> becomes  verbatim. The conversion does not resolve or rewrite paths, so relative links may break in their new location. Rewrite them with md-image-path-rewriter or validate them with md-link-validator.
Embeds and media disappear
Dropped<iframe> (YouTube, maps), <video>, <audio>, and <canvas> have no Markdown equivalent and produce empty output. The embed URL is lost. If you need it, copy the src from the source HTML manually before converting.
Input exceeds your plan's character limit
RejectedFree is capped at 500,000 characters; HTML is verbose, so a long article with inline SVG can hit this fast. The tool reports the exact character count and the limit. Trim the HTML to the article container, or upgrade for the 5,000,000-character Pro limit.
HTML comments are silently removed
DroppedAll <!-- ... --> comments, including WordPress Gutenberg block markers like <!-- wp:paragraph -->, are dropped. This is usually what you want, but if you relied on comments as content markers, they will not survive.
Empty anchors produce empty link syntax
PreservedAn <a href="#"></a> with no text becomes [](#), which is noise. These come from icon links and skip-nav anchors. Search-and-remove them after conversion, or strip such anchors from the source first.
Frequently asked questions
What library does this use?
Turndown 7.2.4 with the turndown-plugin-gfm extension, configured with headingStyle: "atx" and codeBlockStyle: "fenced". The same engine runs in both the browser and the server-safe path.
Are there any options to configure?
No. This tool has no settings — the conversion is deterministic. Paste or upload HTML and run it. The only choice is paste vs. upload as the input method.
Does it support GFM tables?
Yes, for tables that have a header row (<thead> or <th>). The GFM plugin converts them to pipe tables. A header-less table is left as raw HTML.
Will it strip scripts and styles?
Scripts are not executed, but their text leaks into the output, and <style> CSS leaks too. Turndown has no default remove rule for either. Remove <script> and <style> blocks from your HTML before converting.
Will inline styles and CSS classes be removed?
Yes. style, class, id, and data-* attributes are all dropped. Only semantic markup (headings, paragraphs, lists, links, images, code) is converted.
Are images downloaded?
No. An <img> becomes  — a reference to the original URL. The image binary is never fetched or downloaded. You'll need to move image files separately.
Why are my links relative and possibly broken?
Turndown passes URLs through verbatim, including relative paths like ../uploads/x.png. It does not resolve them against a base URL. Rewrite paths with the image-path-rewriter tool after converting.
Can it convert a whole live web page by URL?
No — it converts HTML you provide, not a URL it fetches. Copy the page's HTML (View Source, or DevTools Copy outerHTML) and paste it in. For JS-rendered pages, use the rendered DOM, not View Source.
What heading style does it produce?
ATX (# H1, ## H2), not the Setext underline style. This is configured explicitly and is more consistent and diff-friendly for repos.
Why is emphasis using underscores and bold using asterisks?
Those are Turndown's default delimiters: _italic_ and **bold**. This tool does not change them. Both render identically in standard Markdown.
Is my HTML uploaded anywhere?
No. The conversion runs entirely in your browser using the page's DOMParser. Nothing is sent to a server, which is why you can safely paste unpublished or proprietary HTML.
What file do I get back?
Markdown. You can copy it to the clipboard or download a .md file. An uploaded report.html downloads as report.md; pasted input downloads as input.md. To go the other way, use md-to-html.
Privacy first
All Markdown processing runs locally in your browser using JavaScript. No file is ever uploaded to JAD Apps servers — only metadata counters are saved for signed-in dashboard stats.