How to migrate wordpress / drupal html to markdown
- Step 1Export each post's body HTML — In WordPress, the
post_contentcolumn (or the REST APIcontent.renderedfield) holds the body. In Drupal, it's the body field's value. Export per post — this tool converts one post body at a time, not a whole WXR/XML dump. - Step 2Isolate the article body — Paste only the post body, not the theme template. If you copy from a live page, use DevTools → Copy outerHTML on the
<article>or.entry-contentelement so the header, sidebar, and footer don't end up in your Markdown. - Step 3Paste or upload and run — Choose Paste text and drop the HTML in, or Upload file with a saved
.html. There are no options — click run and Turndown converts the body, dropping CMS classes and block comments. - Step 4Add front matter — This tool outputs body Markdown only. Prepend a YAML block with the post's title, date, slug, author, and tags from your CMS export. The md-frontmatter-builder tool can scaffold that block for you.
- Step 5Fix image and link paths — WordPress images point at
/wp-content/uploads/.... Rewrite those to your SSG's static path with md-image-path-rewriter, then catch any broken links with md-link-validator. - Step 6Spot-check before bulk migrating — Convert a few representative posts first — one with a code block, one with a table, one with embeds. Confirm tables converted (header-less ones won't) and no
<style>/<script>text leaked. Then process the rest one by one.
WordPress / Drupal markup → Markdown
How common CMS HTML patterns convert with this tool's exact Turndown config. Verified against the running converter.
| CMS markup | Markdown output | Notes |
|---|---|---|
<!-- wp:paragraph --> | (removed) | All Gutenberg block comments are dropped |
<p class="has-text-align-center"> | plain paragraph | Alignment classes dropped; only text survives |
<figure class="wp-block-image"><img><figcaption> |  + caption paragraph | Caption text kept as a following paragraph, not a real caption |
<pre class="wp-block-code"><code> | fenced block | Language tag only if a language-* class is present |
Drupal <div class="field__item"> | unwrapped content | Field wrappers removed; inner content converts |
<table> (WP block table, with header) | GFM pipe table | Header row required for conversion |
<iframe> (oEmbed: YouTube, Twitter) | (empty) | Embeds are lost — copy the URL manually |
<a href="/wp-content/uploads/x.pdf"> | [text](/wp-content/uploads/x.pdf) | Path passed through verbatim — rewrite for the SSG |
What the converter does NOT do (and where to finish)
A CMS migration needs more than body conversion. These are the gaps and the sibling tools that fill them.
| Migration task | Handled here? | Where to do it |
|---|---|---|
| Convert post body HTML → Markdown | Yes | This tool |
| Write YAML front matter (title/date/tags) | No | md-frontmatter-builder |
Rewrite /wp-content/uploads/ paths | No | md-image-path-rewriter |
| Validate internal links after migration | No | md-link-validator |
| Download media files | No | Your CMS export / a separate sync |
| Convert oEmbeds (YouTube, tweets) | No | Re-add shortcodes/embeds manually |
Tier limits for HTML input
Character count, not just bytes, is what's limited. Verbose CMS HTML can hit the Free limit on a long post.
| Plan | Max file size | Max characters | Files per run |
|---|---|---|---|
| Free | 1 MB | 500,000 | 1 |
| Pro | 10 MB | 5,000,000 | 10 |
| Pro-media | 50 MB | 20,000,000 | 50 |
| Developer | 500 MB | Unlimited | Unlimited |
Cookbook
Real WordPress/Drupal HTML patterns and the Markdown this tool produces. Use these to predict what your export will look like before bulk-converting.
Gutenberg paragraph block
WordPress wraps every block in HTML comments. They are dropped, leaving clean Markdown — no manual cleanup needed.
HTML in: <!-- wp:paragraph --> <p>Welcome to the <strong>new</strong> site.</p> <!-- /wp:paragraph --> Markdown out: Welcome to the **new** site.
WordPress image figure with caption
A wp-block-image figure becomes a Markdown image followed by the caption as a plain paragraph. The /wp-content/uploads/ path stays as-is and must be rewritten for your SSG.
HTML in: <figure class="wp-block-image"> <img src="/wp-content/uploads/2023/05/hero.jpg" alt="Hero"/> <figcaption>Our launch event</figcaption> </figure> Markdown out:  Our launch event
Code block from a docs post
If the source uses a language-* class, the fence keeps the language for highlighting on your new theme.
HTML in: <pre class="wp-block-code"><code class="language-php">echo 'hi';</code></pre> Markdown out: ```php echo 'hi'; ```
Body conversion plus front matter (two-step)
The tool gives you the body; you prepend front matter from your CMS export. This is the standard Hugo/Astro post shape.
Step 1 — convert body HTML to Markdown (this tool): ## Introduction Text of the post... Step 2 — prepend front matter (md-frontmatter-builder): --- title: "My Post" date: 2023-05-01 slug: my-post tags: [migration, hugo] --- ## Introduction Text of the post...
A YouTube embed disappears
oEmbeds become empty output. Note the URL from the source HTML and re-add it as an SSG shortcode after conversion.
HTML in:
<figure class="wp-block-embed">
<iframe src="https://www.youtube.com/embed/abc123"></iframe>
</figure>
Markdown out:
(empty — the iframe produces nothing)
Fix: re-add as a Hugo/Astro shortcode, e.g.
{{< youtube abc123 >}}Edge cases and what actually happens
Front matter is not generated
By designThis tool converts body HTML only — there is no YAML/TOML front matter in the output. Your SSG needs title, date, slug, and tags. Pull those from the CMS export and prepend them with md-frontmatter-builder.
Image files are not migrated
Not handled<img> becomes  referencing the original /wp-content/uploads/... URL. The binary is never downloaded. Sync media separately and rewrite the paths with md-image-path-rewriter.
oEmbeds (YouTube, tweets, maps) are lost
DroppedWordPress auto-embeds render as <iframe>, which has no Markdown equivalent and produces empty output. Record the embed URLs from the source HTML and re-add them as your SSG's shortcodes.
A classic-editor table without a header stays as HTML
By designOlder WordPress posts often have tables built from <tr><td> with no header row. The GFM table rule needs <thead>/<th>, so a header-less table is left as raw <table> HTML. Add a header or repair it with md-table-repair.
Theme `<style>` blocks leak as text
LeakedIf you copy a full page (with inline <style> from the theme) instead of just the post body, the CSS becomes visible text in the Markdown. Always isolate the .entry-content/<article> element before converting.
WordPress shortcodes survive as literal text
PreservedUnprocessed shortcodes like [gallery ids="1,2"] or [caption] that weren't rendered to HTML pass through as plain text. Convert the rendered HTML (content.rendered) rather than the raw post_content, or clean shortcodes manually.
Drupal inline styles disappear cleanly
DroppedDrupal's WYSIWYG often adds inline style attributes (font sizes, colors). These are dropped — which is the desired outcome for a clean Markdown migration. The text content is preserved.
A long post exceeds the character limit
RejectedCMS HTML is verbose. A long article with many figures can exceed the Free 500,000-character cap. The tool reports the count and limit. Convert the body field (not the full page) or upgrade for the 5,000,000-character Pro limit.
Anchor links to other posts may break
PreservedInternal links keep their original CMS URLs (/2023/05/01/old-slug/). If your SSG uses a different URL scheme, these will 404. Validate and remap them with md-link-validator after migrating.
Frequently asked questions
Can I upload a WordPress WXR/XML export?
No. This tool converts one post's body HTML at a time, not a full WXR dump. Extract each post's content.rendered HTML (REST API) or post_content and convert them individually.
Are Gutenberg block comments removed?
Yes. All <!-- wp:... --> and <!-- /wp:... --> comments are dropped automatically, leaving clean body Markdown with no block markers.
Will it create front matter for Hugo/Astro?
No — it outputs body Markdown only. Add front matter from your CMS export using md-frontmatter-builder.
What happens to my images?
An <img> becomes  referencing the original upload URL. Files aren't downloaded. Sync media separately and rewrite paths for your SSG.
Do YouTube and tweet embeds survive?
No. <iframe> embeds produce empty output. Note the embed URLs from the source HTML and re-add them as SSG shortcodes.
Are CMS classes and inline styles removed?
Yes. wp-block-* classes, Drupal field wrappers, and inline style/id attributes are all dropped. Only semantic content converts.
Why didn't my old table convert?
Tables need a header row (<thead> or <th>) for the GFM plugin to convert them. Header-less classic-editor tables are left as raw HTML; repair them with the table-repair tool.
Do code blocks keep their language?
Yes, if the source <code> has a language-* class (e.g. language-php). The fence becomes php . Without the class, you get a plain fence.
Should I convert raw post_content or rendered HTML?
Convert the rendered HTML (content.rendered). Raw post_content may contain unprocessed shortcodes that survive as literal text in the Markdown.
Is my content sent to a server?
No. Conversion runs entirely in your browser, so you can safely paste unpublished or private post bodies.
How do I handle hundreds of posts?
This tool converts one body per run (Free is 1 file). Convert representative posts to validate the pattern, then script your CMS export to feed each post body through and prepend front matter programmatically.
How do I check links after migrating?
Internal links keep their old CMS URLs. Run the converted Markdown through md-link-validator to catch links that won't resolve under your new SSG's URL scheme.
Privacy first
All Markdown processing runs locally in your browser using JavaScript. No file is ever uploaded to JAD Apps servers — only metadata counters are saved for signed-in dashboard stats.