How to convert a wordpress wxr export to json
- Step 1Export WXR from WordPress — WordPress Admin → Tools → Export. Choose 'All content' or a single type (Posts, Pages, or a registered custom post type), then 'Download Export File'. You get a
.xmlWXR file. - Step 2Open the XML to JSON tool — This is a Pro tool. Drop the
.xmlfile onto the dropzone. The free tier handles files up to 2 MB for evaluation; signed-in Pro raises the per-file limit to 100 MB — useful because WXR exports with embedded HTML bodies grow fast. - Step 3Decide whether to strip namespaces — Leave Strip namespaces OFF to keep the precise WXR keys (
wp:status,content:encoded) — safest for a faithful map. Turn it ON only if your importer needs bare keys, and read theencodedcollision caveat below before doing so. - Step 4Keep type coercion sensible for content — WordPress IDs (
wp:post_id) are numeric and safe to coerce. Leave Coerce types on for those. There is little prose-numeric ambiguity in WXR, so the default is fine; turn it off if you want every field as a string for a uniform importer. - Step 5Convert and locate the items — Click Convert to JSON. The post records live under
rss.channel.item— an array when there are multiple posts, a single object when there is one. Each item carries title, body, dates, taxonomy, and meta. - Step 6Filter and reshape, then import — The tool does not drop drafts/revisions/attachments. Use json-path-extractor with a filter like
$.rss.channel.item[?(@.wp:status=='publish')]or json-key-filter to keep only the fields your CMS needs, then run your import loop (Contentful management API, Sanityclient.create(), etc.).
WXR fields after conversion
Standard WXR item fields and their JSON shape (keys shown with Strip namespaces OFF). Values illustrative.
| WXR element | JSON key & value | Notes |
|---|---|---|
<title> | title: "My Post" | Plain string |
<content:encoded><![CDATA[..]]></content:encoded> | "content:encoded": "<p>HTML body</p>" | CDATA unwrapped to one HTML string |
<wp:status>publish</wp:status> | "wp:status": "publish" | Values: publish / draft / pending / private / inherit |
<wp:post_type>post</wp:post_type> | "wp:post_type": "post" | post / page / attachment / nav_menu_item / revision |
<wp:post_id>42</wp:post_id> | "wp:post_id": 42 | Number with Coerce types on |
<category domain="category">News</category> (repeated) | category: [ {"#text":"News","@domain":"category"}, .. ] | Repeated → array of objects (attributes as @-keys) |
<wp:postmeta><wp:meta_key>k</wp:meta_key>.. (repeated) | "wp:postmeta": [ {meta_key, meta_value}, .. ] | Array when multiple meta; object when one |
Option matrix for WXR
The real controls and their effect on a WordPress export. attributePrefix (@) and textNodeName (#text) are fixed defaults, not in the UI.
| Option | WXR effect | Recommendation |
|---|---|---|
| Strip namespaces OFF | Keeps wp:status, content:encoded, dc:creator — faithful WXR names | Default — safest for a complete map |
| Strip namespaces ON | wp:status→status, content:encoded→encoded, excerpt:encoded→encoded (COLLISION) | Use with caution — body/excerpt collide on encoded |
| Parse attributes ON | <category domain="post_tag" nicename="x">→@domain,@nicename keys | Keep on — taxonomy domain/nicename matter for import |
| Coerce types | wp:post_id,wp:post_parent→numbers; wp:is_sticky→0/1 number | Keep on; turn off only if you want all-string keys |
Cookbook
Real WXR fragments and the JSON the converter returns, plus the follow-up tools that filter and reshape for a headless import.
A WXR item with HTML body and meta
ExampleThe post body lives in content:encoded wrapped in CDATA; the converter unwraps it to a clean HTML string. Custom fields come through as wp:postmeta pairs.
Input (WXR item):
<item>
<title>Launch Day</title>
<wp:post_id>42</wp:post_id>
<wp:status>publish</wp:status>
<content:encoded><![CDATA[<p>We shipped!</p>]]></content:encoded>
<wp:postmeta>
<wp:meta_key>_thumbnail_id</wp:meta_key>
<wp:meta_value><![CDATA[88]]></wp:meta_value>
</wp:postmeta>
</item>
Output (Strip namespaces OFF, Coerce types ON):
{
"title": "Launch Day",
"wp:post_id": 42,
"wp:status": "publish",
"content:encoded": "<p>We shipped!</p>",
"wp:postmeta": { "wp:meta_key": "_thumbnail_id",
"wp:meta_value": 88 }
}The Strip-namespaces collision you must avoid
ExampleStripping namespaces turns BOTH content:encoded and excerpt:encoded into the same encoded key. fast-xml-parser merges same-named siblings into an array — so you lose which is the body and which is the excerpt. Leave Strip namespaces OFF for WXR.
Input:
<item>
<excerpt:encoded><![CDATA[Short teaser]]></excerpt:encoded>
<content:encoded><![CDATA[<p>Full body</p>]]></content:encoded>
</item>
Strip namespaces ON (BAD):
{ "encoded": [ "Short teaser", "<p>Full body</p>" ] }
-> which is which? order-dependent and fragile
Strip namespaces OFF (GOOD):
{ "excerpt:encoded": "Short teaser",
"content:encoded": "<p>Full body</p>" }Filtering out drafts and revisions after conversion
ExampleThe tool imports everything — drafts, revisions, attachments. Use json-path-extractor to keep only published posts before you write the import loop, so you do not push revisions into the new CMS.
After conversion you have rss.channel.item = [ ..mixed.. ] /tool/json-path-extractor expression: $.rss.channel.item[?(@.wp:post_type=='post')] then chain: $.[?(@.wp:status=='publish')] => only published posts of type 'post' remain; revisions (wp:post_type=='revision') and attachments (wp:post_type=='attachment') are excluded.
Single post vs. many posts
ExampleExport one post and item is an object; export several and it is an array. Your import loop must handle both, or always test against a multi-post export.
One post:
{ "rss": { "channel": { "item": { "title": "Only post" } } } }
Many posts:
{ "rss": { "channel": { "item": [ {..}, {..} ] } } }
Node import loop (safe):
const items = [].concat(data.rss.channel.item ?? []);
for (const post of items) { await cms.create(map(post)); }Mapping WXR meta into target custom fields
ExampleEach post can carry several wp:postmeta pairs. Flatten them into a lookup so your importer can address fields by name, then map to the destination schema.
Converted item.wp:postmeta:
[ { "wp:meta_key": "seo_title", "wp:meta_value": "Launch" },
{ "wp:meta_key": "reading_time", "wp:meta_value": 3 } ]
Reshape in your importer:
const meta = Object.fromEntries(
[].concat(item['wp:postmeta'] ?? [])
.map(m => [m['wp:meta_key'], m['wp:meta_value']]));
// meta.seo_title === 'Launch', meta.reading_time === 3
Use /tool/json-key-filter first to drop meta you do not need.Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Strip namespaces collides content:encoded and excerpt:encoded
Key collisionBoth fields strip to encoded. Because they are siblings, fast-xml-parser merges them into an encoded array, and you can no longer tell body from excerpt reliably. For WXR, leave Strip namespaces OFF and keep the prefixed keys, then rename in your importer if needed.
Drafts, revisions, and attachments are NOT filtered
Not filteredWXR includes every wp:post_type — post, page, revision, attachment, nav_menu_item. The tool converts them all. Filter after conversion with json-path-extractor ([?(@.wp:status=='publish')]) so you do not import revisions or menu items into the new CMS.
A single <item> is an object, not an array
Singleton trapExport with one post and rss.channel.item is an object; with many it is an array. An import loop that assumes an array will misbehave on a one-post export. Normalise with [].concat(item ?? []) or test against a multi-post export.
Media is referenced, not embedded
ExpectedAttachment items carry the file URL in <guid> / <wp:attachment_url>, but WXR never embeds binary image data. After conversion you must fetch each URL and re-upload to your new media store. The tool surfaces the URLs; it cannot download the images.
Post body is HTML, not structured content
By designcontent:encoded is a raw HTML string (Gutenberg block comments included, e.g. <!-- wp:paragraph -->). The tool delivers it verbatim. Targets needing Portable Text (Sanity) or Rich Text (Contentful) must run the HTML through an HTML-to-rich-text converter at import — that is outside this tool's scope.
Categories and tags share the <category> element
ReviewWXR uses <category domain="category"> for categories and <category domain="post_tag"> for tags — same element name, distinguished by the domain attribute. Keep Parse attributes on so you get @domain, then split categories from tags in your importer by inspecting @domain.
Repeated postmeta becomes an array; single becomes an object
Singleton trapA post with several wp:postmeta entries gives an array; a post with exactly one gives a single object. Normalise with [].concat(item['wp:postmeta'] ?? []) before mapping, or some posts will silently lose their meta in the import.
Large WXR with embedded HTML can exceed the free limit
Plan limitPost bodies in content:encoded make WXR files large fast. The free tier caps files at 2 MB. A blog with a few hundred long posts will exceed that — use Pro (100 MB per file), or export by post type / date range to produce smaller WXR files before converting.
Frequently asked questions
How is the post body (content:encoded) handled?
The CDATA wrapper is unwrapped and the full HTML body is delivered as a string on the content:encoded key (with Strip namespaces off). For headless targets that need Portable Text or Rich Text, run that HTML through an HTML-to-rich-text converter at import — the tool gives you clean HTML, not structured blocks.
Should I enable Strip namespaces for a WXR file?
Generally no. Stripping turns both content:encoded and excerpt:encoded into the same encoded key, which collides into an array and loses which is body vs excerpt. Leave it OFF to keep faithful wp:/content: keys, then rename selectively with json-key-renamer if your importer dislikes the prefixes.
Does the tool skip drafts, revisions, and attachments?
No — it converts every item regardless of wp:status or wp:post_type. Filter after conversion with json-path-extractor, e.g. $.rss.channel.item[?(@.wp:status=='publish')], so revisions and menu items never reach your new CMS.
What about media attachments and images in posts?
Attachment items expose the file URL in <guid> / <wp:attachment_url>, but WXR never embeds binary image data. After conversion, fetch each URL and re-upload to your new media storage. The tool surfaces the URLs; downloading the files is a separate step in your import script.
Why is my single-post export not an array of items?
fast-xml-parser only creates an array when an element repeats. One <item> yields an object; many yield an array. Normalise in your loop with [].concat(data.rss.channel.item ?? []) so both shapes work.
How do I tell categories from tags?
Both use the <category> element, distinguished by the domain attribute (category vs post_tag). Keep Parse attributes on so each becomes {"#text": name, "@domain": ..., "@nicename": ...}, then split by @domain in your importer.
Where do custom fields (postmeta) end up?
Under wp:postmeta as meta_key/meta_value pairs — an array when a post has several, a single object when it has one. Flatten with Object.fromEntries([].concat(item['wp:postmeta'] ?? []).map(m => [m['wp:meta_key'], m['wp:meta_value']])) to address fields by name. Drop unwanted meta first with json-key-filter.
Can I convert just the published posts of one type?
Not in the converter itself — it has no filtering. Convert the whole WXR, then use json-path-extractor with a filter expression like [?(@.wp:post_type=='post')] chained with [?(@.wp:status=='publish')] to isolate exactly what you want.
Does Gutenberg block markup survive?
Yes. content:encoded contains the raw HTML including Gutenberg block comments (<!-- wp:paragraph -->). The tool preserves it verbatim. If your target ignores block comments, strip them in your import transform; if it understands them, you can parse blocks from the HTML downstream.
Is the post content uploaded to JAD Apps?
No. WXR parsing runs entirely in your browser via fast-xml-parser. Post bodies, draft text, and author data never reach JAD Apps servers — only an anonymous run counter (no content) is recorded for signed-in dashboard stats.
How large a WXR file can I convert?
This is a Pro tool: 2 MB per file on the free tier, 100 MB per file on signed-in Pro. WXR grows quickly because post bodies are embedded as HTML — export by post type or date range to keep files small, or upgrade for a whole-site export in one pass.
How do I get a flat array of posts for my import loop?
After conversion, the posts live at rss.channel.item. Use json-path-extractor with $.rss.channel.item to pull just that array, then iterate. Coalesce single-post exports with [].concat(...) since one item is an object, not an array.
Privacy first
Conversion runs locally in your browser. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.