Archive Blog HTML as Markdown — Future-Proof Backup (Free)

How to archive blog content as markdown

Step 1
Capture each post's body HTML — View Source on the post (Ctrl+U) and copy, or in DevTools right-click the post's <article>/content element → Copy → Copy outerHTML. Capture the post body, not the whole page template, so nav and sidebars stay out of your archive.
Step 2
Paste or upload and convert — Choose Paste text and drop the HTML in (or Upload file with a saved .html), then run. There are no options — Turndown produces the Markdown deterministically.
Step 3
Save the images separately — The Markdown references images by their original URL, which dies with the host. Download each image (right-click → Save, or save the whole page's assets) into a folder alongside your .md, then rewrite paths with md-image-path-rewriter.
Step 4
Add a metadata header — The tool outputs body Markdown only. Prepend a front-matter block with the post title, original publish date, source URL, and tags so the archive is self-describing. md-frontmatter-builder can scaffold it.
Step 5
Re-link any embeds — <iframe> embeds (YouTube, CodePen, tweets) are dropped. Note their URLs from the source HTML and add them back as Markdown links so the archived post still points to the original media.
Step 6
Commit to Git — Store the .md and its image folder in a Git repo. That gives you version history, off-machine backup (push to a remote), and a format that outlives any blogging platform.

What survives the archive vs. what you must capture separately

A durable archive needs more than body text. This is the gap list for blog backups.

Blog element	In the Markdown?	Action for a complete archive
Post body (headings, text, lists)	Yes — converted	Done
Code blocks	Yes — fenced, with language if tagged	Done
Tables (with header row)	Yes — GFM pipe table	Header-less ones stay as HTML
Images	Reference only (`![alt](url)`)	Download files + rewrite paths
Post metadata (date, author, tags)	No	Add front matter manually
Comments	No (unless in the copied HTML)	Export comments separately if wanted
Embeds (YouTube, CodePen, tweets)	No — empty output	Note URLs and re-link

Blog markup → Markdown

How typical blog HTML converts with this tool's Turndown config. Verified against the running converter.

Blog markup	Markdown output	Notes
`<h2>Section</h2>`	`## Section`	ATX heading; `id` attribute dropped
`<blockquote>`	`> quote`	Pull quotes preserved
`<figure><img><figcaption>`	`![alt](url)` + caption paragraph	Caption kept as following text
`<pre><code class="language-py">`	`py` block	Language preserved for tech posts
`<a href="/2019/old-post/">`	`[text](/2019/old-post/)`	Internal links keep old slugs
`<iframe>` (embed)	(empty)	Re-link manually
`<del>` strikethrough	`~text~`	Single-tilde in this plugin version

Tier limits for HTML input

Older posts with inline styling can be large; the character count is what's enforced.

Plan	Max file size	Max characters	Files per run
Free	1 MB	500,000	1
Pro	10 MB	5,000,000	10
Pro-media	50 MB	20,000,000	50
Developer	500 MB	Unlimited	Unlimited

Cookbook

Real blog-archiving scenarios and the Markdown they produce — plus the manual steps that make the backup complete.

A blog post body converts cleanly

Headings, a pull quote, and a paragraph archive as clean, durable Markdown.

HTML in:
<article><h2>Why I Left Medium</h2>
<blockquote><p>Own your content.</p></blockquote>
<p>Here's what I learned.</p></article>

Markdown out:
## Why I Left Medium

> Own your content.

Here's what I learned.

Image reference that needs the file saved

The image converts to a reference pointing at the original host. When the blog dies, that URL 404s — so download the file and rewrite the path.

HTML in:
<img src="https://oldblog.com/wp-content/uploads/cover.jpg" alt="Cover">

Markdown out:
![Cover](https://oldblog.com/wp-content/uploads/cover.jpg)

Archive fix:
  1. Save cover.jpg into ./images/
  2. md-image-path-rewriter → ![Cover](./images/cover.jpg)

Add a metadata header for a self-describing archive

Body Markdown alone loses the publish date and source. Prepend front matter so the archived post stands on its own.

After conversion, prepend (md-frontmatter-builder):
---
title: "Why I Left Medium"
date: 2019-08-14
source: https://oldblog.com/why-i-left-medium
tags: [blogging, ownership]
---

## Why I Left Medium

> Own your content.

A CodePen embed is lost — re-link it

Embedded demos vanish. Capture the URL from the source HTML and add a link so the archive still references the original.

HTML in:
<iframe src="https://codepen.io/user/embed/abcdef"></iframe>
<p>Live demo above.</p>

Markdown out:
Live demo above.

Archive fix — add the link by hand:
[Live demo on CodePen](https://codepen.io/user/pen/abcdef)

A technical post keeps its code highlighting

For dev blogs, fenced blocks with language tags mean your archived tutorials still render with syntax highlighting if you republish.

HTML in:
<pre><code class="language-py">def hello():
    print('hi')</code></pre>

Markdown out:
```py
def hello():
    print('hi')
```

Edge cases and what actually happens

Images are not downloaded

Not handled

<img> becomes ![alt](url) pointing at the original host. When the blog goes offline, those URLs break and your archive shows broken images. Download the files separately and rewrite paths with md-image-path-rewriter.

Post metadata is not captured

By design

Publish date, author, and tags live in the page template or the CMS, not the body HTML — so they're not in the Markdown. Add a front-matter block manually with md-frontmatter-builder so the archive is dated and attributed.

Comments are not archived

Not handled

Comment threads (Disqus, native, Webmentions) are usually loaded separately and won't be in the post body you copy. If comments matter, export them from the platform separately — this tool won't capture them.

Embeds (YouTube, CodePen, tweets) disappear

Dropped

<iframe> embeds produce empty output, so demos and videos vanish from the archive. Record each embed URL from the source HTML and re-add it as a Markdown link to keep the reference.

Internal links keep old slugs and will break

Preserved

Links to other posts keep the original blog's URLs (/2019/old-post/). After the blog dies these 404. Decide whether to point them at archived copies, the Wayback Machine, or your new site, and fix them after conversion.

Theme `<style>`/`<script>` leaks if you copy the whole page

Leaked

Copying the full page instead of the <article> brings inline <style>/<script> along, and their text leaks into the Markdown. Capture only the post body element for a clean archive.

Old posts with classic-editor tables stay as HTML

By design

Tables without a header row (common in old posts) aren't converted and remain raw <table> HTML. They still archive fine as text but won't render as clean tables — repair with md-table-repair if you care.

Only the loaded part of a paginated post is captured

Partial

If a post is split across pages or lazy-loads sections, only the HTML present in the DOM at copy time converts. Load every part (or capture each page) before converting so the archive is complete.

A media-heavy post exceeds the character limit

Rejected

Old posts with inline base64 images or heavy styling can blow past the Free 500,000-character cap. The tool reports the count and limit. Strip inline data URIs (save those images as files) or upgrade to Pro's 5,000,000.

Frequently asked questions

Will my images be archived too?

No. Images become ![alt](url) references to the original host, which breaks when the blog goes offline. Download the image files separately and rewrite the paths with the image-path-rewriter tool.

Should I include the comments?

Comments are usually loaded separately and won't be in the post body you copy, so they aren't archived. If you want them, export them from the platform (e.g. Disqus) as a separate file.

What about blog metadata like date and author?

Not captured — the tool converts body HTML, and metadata lives in the template. Add a front-matter block manually with md-frontmatter-builder so the archive is self-describing.

Do embedded videos and demos survive?

No. <iframe> embeds produce empty output. Note their URLs from the source HTML and re-add them as Markdown links to preserve the reference.

Why Markdown for a long-term archive?

It's plain text — readable in any editor, diffable in Git, and importable into any future blog. No database, no proprietary format, no platform dependency.

Can I archive a whole blog at once?

Not in one run — this converts one post body at a time (Free is 1 file). Capture and convert posts individually, or use a platform export to feed each body through programmatically.

Do internal links still work after archiving?

They keep the original blog's URLs, which break once the blog is gone. Re-point them at archived copies or your new site after conversion.

Is anything I archive uploaded?

No. Conversion runs entirely in your browser, so you can archive private drafts and unpublished posts safely.

Will code samples keep their formatting?

Yes. <pre><code> becomes a fenced block, and a language-* class is kept as the fence language — important for dev-blog archives.

What's the best way to store the archive?

Put the .md files and an images folder in a Git repo and push to a remote. That gives you history, off-machine backup, and a durable format.

What if a post has inline base64 images?

Those inflate the character count and can exceed the Free 500,000 limit. Extract them to image files first, or upgrade to Pro's 5,000,000-character limit.

Can I re-publish the archive later?

Yes — that's the point. Markdown imports into Hugo, Astro, Ghost, and most platforms. Add front matter and rewrite image paths, and the post is ready to go live again.

Privacy first

All Markdown processing runs locally in your browser using JavaScript. No file is ever uploaded to JAD Apps servers — only metadata counters are saved for signed-in dashboard stats.

How to archive blog content as markdown

Step 1
Capture each post's body HTML — View Source on the post (Ctrl+U) and copy, or in DevTools right-click the post's <article>/content element → Copy → Copy outerHTML. Capture the post body, not the whole page template, so nav and sidebars stay out of your archive.
Step 2
Paste or upload and convert — Choose Paste text and drop the HTML in (or Upload file with a saved .html), then run. There are no options — Turndown produces the Markdown deterministically.
Step 3
Save the images separately — The Markdown references images by their original URL, which dies with the host. Download each image (right-click → Save, or save the whole page's assets) into a folder alongside your .md, then rewrite paths with md-image-path-rewriter.
Step 4
Add a metadata header — The tool outputs body Markdown only. Prepend a front-matter block with the post title, original publish date, source URL, and tags so the archive is self-describing. md-frontmatter-builder can scaffold it.
Step 5
Re-link any embeds — <iframe> embeds (YouTube, CodePen, tweets) are dropped. Note their URLs from the source HTML and add them back as Markdown links so the archived post still points to the original media.
Step 6
Commit to Git — Store the .md and its image folder in a Git repo. That gives you version history, off-machine backup (push to a remote), and a format that outlives any blogging platform.

What survives the archive vs. what you must capture separately

A durable archive needs more than body text. This is the gap list for blog backups.

Blog element	In the Markdown?	Action for a complete archive
Post body (headings, text, lists)	Yes — converted	Done
Code blocks	Yes — fenced, with language if tagged	Done
Tables (with header row)	Yes — GFM pipe table	Header-less ones stay as HTML
Images	Reference only (`![alt](url)`)	Download files + rewrite paths
Post metadata (date, author, tags)	No	Add front matter manually
Comments	No (unless in the copied HTML)	Export comments separately if wanted
Embeds (YouTube, CodePen, tweets)	No — empty output	Note URLs and re-link

Blog markup → Markdown

How typical blog HTML converts with this tool's Turndown config. Verified against the running converter.

Blog markup	Markdown output	Notes
`<h2>Section</h2>`	`## Section`	ATX heading; `id` attribute dropped
`<blockquote>`	`> quote`	Pull quotes preserved
`<figure><img><figcaption>`	`![alt](url)` + caption paragraph	Caption kept as following text
`<pre><code class="language-py">`	`py` block	Language preserved for tech posts
`<a href="/2019/old-post/">`	`[text](/2019/old-post/)`	Internal links keep old slugs
`<iframe>` (embed)	(empty)	Re-link manually
`<del>` strikethrough	`~text~`	Single-tilde in this plugin version

Tier limits for HTML input

Older posts with inline styling can be large; the character count is what's enforced.

Plan	Max file size	Max characters	Files per run
Free	1 MB	500,000	1
Pro	10 MB	5,000,000	10
Pro-media	50 MB	20,000,000	50
Developer	500 MB	Unlimited	Unlimited

Cookbook

Real blog-archiving scenarios and the Markdown they produce — plus the manual steps that make the backup complete.

A blog post body converts cleanly

Headings, a pull quote, and a paragraph archive as clean, durable Markdown.

HTML in:
<article><h2>Why I Left Medium</h2>
<blockquote><p>Own your content.</p></blockquote>
<p>Here's what I learned.</p></article>

Markdown out:
## Why I Left Medium

> Own your content.

Here's what I learned.

Image reference that needs the file saved

The image converts to a reference pointing at the original host. When the blog dies, that URL 404s — so download the file and rewrite the path.

HTML in:
<img src="https://oldblog.com/wp-content/uploads/cover.jpg" alt="Cover">

Markdown out:
![Cover](https://oldblog.com/wp-content/uploads/cover.jpg)

Archive fix:
  1. Save cover.jpg into ./images/
  2. md-image-path-rewriter → ![Cover](./images/cover.jpg)

Add a metadata header for a self-describing archive

Body Markdown alone loses the publish date and source. Prepend front matter so the archived post stands on its own.

After conversion, prepend (md-frontmatter-builder):
---
title: "Why I Left Medium"
date: 2019-08-14
source: https://oldblog.com/why-i-left-medium
tags: [blogging, ownership]
---

## Why I Left Medium

> Own your content.

A CodePen embed is lost — re-link it

Embedded demos vanish. Capture the URL from the source HTML and add a link so the archive still references the original.

HTML in:
<iframe src="https://codepen.io/user/embed/abcdef"></iframe>
<p>Live demo above.</p>

Markdown out:
Live demo above.

Archive fix — add the link by hand:
[Live demo on CodePen](https://codepen.io/user/pen/abcdef)

A technical post keeps its code highlighting

For dev blogs, fenced blocks with language tags mean your archived tutorials still render with syntax highlighting if you republish.

HTML in:
<pre><code class="language-py">def hello():
    print('hi')</code></pre>

Markdown out:
```py
def hello():
    print('hi')
```

Edge cases and what actually happens

Images are not downloaded

Not handled

Post metadata is not captured

By design

Comments are not archived

Not handled

Embeds (YouTube, CodePen, tweets) disappear

Dropped

<iframe> embeds produce empty output, so demos and videos vanish from the archive. Record each embed URL from the source HTML and re-add it as a Markdown link to keep the reference.

Internal links keep old slugs and will break

Preserved

Theme `<style>`/`<script>` leaks if you copy the whole page

Leaked

Copying the full page instead of the <article> brings inline <style>/<script> along, and their text leaks into the Markdown. Capture only the post body element for a clean archive.

Old posts with classic-editor tables stay as HTML

By design

Only the loaded part of a paginated post is captured

Partial

If a post is split across pages or lazy-loads sections, only the HTML present in the DOM at copy time converts. Load every part (or capture each page) before converting so the archive is complete.

A media-heavy post exceeds the character limit

Rejected

Frequently asked questions

Will my images be archived too?

No. Images become ![alt](url) references to the original host, which breaks when the blog goes offline. Download the image files separately and rewrite the paths with the image-path-rewriter tool.

Should I include the comments?

Comments are usually loaded separately and won't be in the post body you copy, so they aren't archived. If you want them, export them from the platform (e.g. Disqus) as a separate file.

What about blog metadata like date and author?

Not captured — the tool converts body HTML, and metadata lives in the template. Add a front-matter block manually with md-frontmatter-builder so the archive is self-describing.

Do embedded videos and demos survive?

No. <iframe> embeds produce empty output. Note their URLs from the source HTML and re-add them as Markdown links to preserve the reference.

Why Markdown for a long-term archive?

It's plain text — readable in any editor, diffable in Git, and importable into any future blog. No database, no proprietary format, no platform dependency.

Can I archive a whole blog at once?

Not in one run — this converts one post body at a time (Free is 1 file). Capture and convert posts individually, or use a platform export to feed each body through programmatically.

Do internal links still work after archiving?

They keep the original blog's URLs, which break once the blog is gone. Re-point them at archived copies or your new site after conversion.

Is anything I archive uploaded?

No. Conversion runs entirely in your browser, so you can archive private drafts and unpublished posts safely.

Will code samples keep their formatting?

Yes. <pre><code> becomes a fenced block, and a language-* class is kept as the fence language — important for dev-blog archives.

What's the best way to store the archive?

Put the .md files and an images folder in a Git repo and push to a remote. That gives you history, off-machine backup, and a durable format.

What if a post has inline base64 images?

Those inflate the character count and can exceed the Free 500,000 limit. Extract them to image files first, or upgrade to Pro's 5,000,000-character limit.

Can I re-publish the archive later?

Yes — that's the point. Markdown imports into Hugo, Astro, Ghost, and most platforms. Add front matter and rewrite image paths, and the post is ready to go live again.

Privacy first

All Markdown processing runs locally in your browser using JavaScript. No file is ever uploaded to JAD Apps servers — only metadata counters are saved for signed-in dashboard stats.

Archive Blog Content as Markdown

How to archive blog content as markdown

What survives the archive vs. what you must capture separately

Blog markup → Markdown

Tier limits for HTML input

Cookbook

A blog post body converts cleanly

Image reference that needs the file saved

Add a metadata header for a self-describing archive

A CodePen embed is lost — re-link it

A technical post keeps its code highlighting

Edge cases and what actually happens

Images are not downloaded

Post metadata is not captured

Comments are not archived

Embeds (YouTube, CodePen, tweets) disappear

Internal links keep old slugs and will break

Theme `<style>`/`<script>` leaks if you copy the whole page

Old posts with classic-editor tables stay as HTML

Only the loaded part of a paginated post is captured

A media-heavy post exceeds the character limit

Frequently asked questions

Will my images be archived too?

Should I include the comments?

What about blog metadata like date and author?

Do embedded videos and demos survive?

Why Markdown for a long-term archive?

Can I archive a whole blog at once?

Do internal links still work after archiving?

Is anything I archive uploaded?

Will code samples keep their formatting?

What's the best way to store the archive?

What if a post has inline base64 images?

Can I re-publish the archive later?

Privacy first

Related guides

Archive Blog Content as Markdown

How to archive blog content as markdown

What survives the archive vs. what you must capture separately

Blog markup → Markdown

Tier limits for HTML input

Cookbook

A blog post body converts cleanly

Image reference that needs the file saved

Add a metadata header for a self-describing archive

A CodePen embed is lost — re-link it

A technical post keeps its code highlighting

Edge cases and what actually happens

Images are not downloaded

Post metadata is not captured

Comments are not archived

Embeds (YouTube, CodePen, tweets) disappear

Internal links keep old slugs and will break

Theme `<style>`/`<script>` leaks if you copy the whole page

Old posts with classic-editor tables stay as HTML

Only the loaded part of a paginated post is captured

A media-heavy post exceeds the character limit

Frequently asked questions

Will my images be archived too?

Should I include the comments?

What about blog metadata like date and author?

Do embedded videos and demos survive?

Why Markdown for a long-term archive?

Can I archive a whole blog at once?

Do internal links still work after archiving?

Is anything I archive uploaded?

Will code samples keep their formatting?

What's the best way to store the archive?

What if a post has inline base64 images?

Can I re-publish the archive later?

Privacy first

Related guides