Convert WordPress WXR XML Export to JSON — Free Online

How to convert a wordpress wxr export to json

Step 1
Export WXR from WordPress — WordPress Admin → Tools → Export. Choose 'All content' or a single type (Posts, Pages, or a registered custom post type), then 'Download Export File'. You get a .xml WXR file.
Step 2
Open the XML to JSON tool — This is a Pro tool. Drop the .xml file onto the dropzone. The free tier handles files up to 2 MB for evaluation; signed-in Pro raises the per-file limit to 100 MB — useful because WXR exports with embedded HTML bodies grow fast.
Step 3
Decide whether to strip namespaces — Leave Strip namespaces OFF to keep the precise WXR keys (wp:status, content:encoded) — safest for a faithful map. Turn it ON only if your importer needs bare keys, and read the encoded collision caveat below before doing so.
Step 4
Keep type coercion sensible for content — WordPress IDs (wp:post_id) are numeric and safe to coerce. Leave Coerce types on for those. There is little prose-numeric ambiguity in WXR, so the default is fine; turn it off if you want every field as a string for a uniform importer.
Step 5
Convert and locate the items — Click Convert to JSON. The post records live under rss.channel.item — an array when there are multiple posts, a single object when there is one. Each item carries title, body, dates, taxonomy, and meta.
Step 6
Filter and reshape, then import — The tool does not drop drafts/revisions/attachments. Use json-path-extractor with a filter like $.rss.channel.item[?(@.wp:status=='publish')] or json-key-filter to keep only the fields your CMS needs, then run your import loop (Contentful management API, Sanity client.create(), etc.).

WXR fields after conversion

Standard WXR item fields and their JSON shape (keys shown with Strip namespaces OFF). Values illustrative.

WXR element	JSON key & value	Notes
`<title>`	`title: "My Post"`	Plain string
`<content:encoded><![CDATA[..]]></content:encoded>`	`"content:encoded": "<p>HTML body</p>"`	CDATA unwrapped to one HTML string
`<wp:status>publish</wp:status>`	`"wp:status": "publish"`	Values: publish / draft / pending / private / inherit
`<wp:post_type>post</wp:post_type>`	`"wp:post_type": "post"`	post / page / attachment / nav_menu_item / revision
`<wp:post_id>42</wp:post_id>`	`"wp:post_id": 42`	Number with Coerce types on
`<category domain="category">News</category>` (repeated)	`category: [ {"#text":"News","@domain":"category"}, .. ]`	Repeated → array of objects (attributes as `@`-keys)
`<wp:postmeta><wp:meta_key>k</wp:meta_key>..` (repeated)	`"wp:postmeta": [ {meta_key, meta_value}, .. ]`	Array when multiple meta; object when one

Option matrix for WXR

The real controls and their effect on a WordPress export. attributePrefix (@) and textNodeName (#text) are fixed defaults, not in the UI.

Option	WXR effect	Recommendation
Strip namespaces OFF	Keeps `wp:status`, `content:encoded`, `dc:creator` — faithful WXR names	Default — safest for a complete map
Strip namespaces ON	`wp:status`→`status`, `content:encoded`→`encoded`, `excerpt:encoded`→`encoded` (COLLISION)	Use with caution — body/excerpt collide on `encoded`
Parse attributes ON	`<category domain="post_tag" nicename="x">`→`@domain`,`@nicename` keys	Keep on — taxonomy `domain`/`nicename` matter for import
Coerce types	`wp:post_id`,`wp:post_parent`→numbers; `wp:is_sticky`→0/1 number	Keep on; turn off only if you want all-string keys

Cookbook

Real WXR fragments and the JSON the converter returns, plus the follow-up tools that filter and reshape for a headless import.

A WXR item with HTML body and meta

Example

The post body lives in content:encoded wrapped in CDATA; the converter unwraps it to a clean HTML string. Custom fields come through as wp:postmeta pairs.

Input (WXR item):
<item>
  <title>Launch Day</title>
  <wp:post_id>42</wp:post_id>
  <wp:status>publish</wp:status>
  <content:encoded><![CDATA[<p>We shipped!</p>]]></content:encoded>
  <wp:postmeta>
    <wp:meta_key>_thumbnail_id</wp:meta_key>
    <wp:meta_value><![CDATA[88]]></wp:meta_value>
  </wp:postmeta>
</item>

Output (Strip namespaces OFF, Coerce types ON):
{
  "title": "Launch Day",
  "wp:post_id": 42,
  "wp:status": "publish",
  "content:encoded": "<p>We shipped!</p>",
  "wp:postmeta": { "wp:meta_key": "_thumbnail_id",
                   "wp:meta_value": 88 }
}

The Strip-namespaces collision you must avoid

Example

Stripping namespaces turns BOTH content:encoded and excerpt:encoded into the same encoded key. fast-xml-parser merges same-named siblings into an array — so you lose which is the body and which is the excerpt. Leave Strip namespaces OFF for WXR.

Input:
<item>
  <excerpt:encoded><![CDATA[Short teaser]]></excerpt:encoded>
  <content:encoded><![CDATA[<p>Full body</p>]]></content:encoded>
</item>

Strip namespaces ON (BAD):
{ "encoded": [ "Short teaser", "<p>Full body</p>" ] }
  -> which is which? order-dependent and fragile

Strip namespaces OFF (GOOD):
{ "excerpt:encoded": "Short teaser",
  "content:encoded": "<p>Full body</p>" }

Filtering out drafts and revisions after conversion

Example

The tool imports everything — drafts, revisions, attachments. Use json-path-extractor to keep only published posts before you write the import loop, so you do not push revisions into the new CMS.

After conversion you have rss.channel.item = [ ..mixed.. ]

/tool/json-path-extractor expression:
  $.rss.channel.item[?(@.wp:post_type=='post')]
then chain:
  $.[?(@.wp:status=='publish')]

=> only published posts of type 'post' remain;
revisions (wp:post_type=='revision') and attachments
(wp:post_type=='attachment') are excluded.

Single post vs. many posts

Example

Export one post and item is an object; export several and it is an array. Your import loop must handle both, or always test against a multi-post export.

One post:
{ "rss": { "channel": { "item": { "title": "Only post" } } } }

Many posts:
{ "rss": { "channel": { "item": [ {..}, {..} ] } } }

Node import loop (safe):
const items = [].concat(data.rss.channel.item ?? []);
for (const post of items) { await cms.create(map(post)); }

Mapping WXR meta into target custom fields

Example

Each post can carry several wp:postmeta pairs. Flatten them into a lookup so your importer can address fields by name, then map to the destination schema.

Converted item.wp:postmeta:
[ { "wp:meta_key": "seo_title",  "wp:meta_value": "Launch" },
  { "wp:meta_key": "reading_time", "wp:meta_value": 3 } ]

Reshape in your importer:
const meta = Object.fromEntries(
  [].concat(item['wp:postmeta'] ?? [])
    .map(m => [m['wp:meta_key'], m['wp:meta_value']]));
// meta.seo_title === 'Launch', meta.reading_time === 3
Use /tool/json-key-filter first to drop meta you do not need.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Strip namespaces collides content:encoded and excerpt:encoded

Key collision

Both fields strip to encoded. Because they are siblings, fast-xml-parser merges them into an encoded array, and you can no longer tell body from excerpt reliably. For WXR, leave Strip namespaces OFF and keep the prefixed keys, then rename in your importer if needed.

Drafts, revisions, and attachments are NOT filtered

Not filtered

WXR includes every wp:post_type — post, page, revision, attachment, nav_menu_item. The tool converts them all. Filter after conversion with json-path-extractor ([?(@.wp:status=='publish')]) so you do not import revisions or menu items into the new CMS.

A single <item> is an object, not an array

Singleton trap

Export with one post and rss.channel.item is an object; with many it is an array. An import loop that assumes an array will misbehave on a one-post export. Normalise with [].concat(item ?? []) or test against a multi-post export.

Media is referenced, not embedded

Expected

Attachment items carry the file URL in <guid> / <wp:attachment_url>, but WXR never embeds binary image data. After conversion you must fetch each URL and re-upload to your new media store. The tool surfaces the URLs; it cannot download the images.

Post body is HTML, not structured content

By design

content:encoded is a raw HTML string (Gutenberg block comments included, e.g. ). The tool delivers it verbatim. Targets needing Portable Text (Sanity) or Rich Text (Contentful) must run the HTML through an HTML-to-rich-text converter at import — that is outside this tool's scope.

Categories and tags share the <category> element

Review

WXR uses <category domain="category"> for categories and <category domain="post_tag"> for tags — same element name, distinguished by the domain attribute. Keep Parse attributes on so you get @domain, then split categories from tags in your importer by inspecting @domain.

Repeated postmeta becomes an array; single becomes an object

Singleton trap

A post with several wp:postmeta entries gives an array; a post with exactly one gives a single object. Normalise with [].concat(item['wp:postmeta'] ?? []) before mapping, or some posts will silently lose their meta in the import.

Large WXR with embedded HTML can exceed the free limit

Plan limit

Post bodies in content:encoded make WXR files large fast. The free tier caps files at 2 MB. A blog with a few hundred long posts will exceed that — use Pro (100 MB per file), or export by post type / date range to produce smaller WXR files before converting.

Frequently asked questions

How is the post body (content:encoded) handled?

The CDATA wrapper is unwrapped and the full HTML body is delivered as a string on the content:encoded key (with Strip namespaces off). For headless targets that need Portable Text or Rich Text, run that HTML through an HTML-to-rich-text converter at import — the tool gives you clean HTML, not structured blocks.

Should I enable Strip namespaces for a WXR file?

Generally no. Stripping turns both content:encoded and excerpt:encoded into the same encoded key, which collides into an array and loses which is body vs excerpt. Leave it OFF to keep faithful wp:/content: keys, then rename selectively with json-key-renamer if your importer dislikes the prefixes.

Does the tool skip drafts, revisions, and attachments?

No — it converts every item regardless of wp:status or wp:post_type. Filter after conversion with json-path-extractor, e.g. $.rss.channel.item[?(@.wp:status=='publish')], so revisions and menu items never reach your new CMS.

What about media attachments and images in posts?

Attachment items expose the file URL in <guid> / <wp:attachment_url>, but WXR never embeds binary image data. After conversion, fetch each URL and re-upload to your new media storage. The tool surfaces the URLs; downloading the files is a separate step in your import script.

Why is my single-post export not an array of items?

fast-xml-parser only creates an array when an element repeats. One <item> yields an object; many yield an array. Normalise in your loop with [].concat(data.rss.channel.item ?? []) so both shapes work.

How do I tell categories from tags?

Both use the <category> element, distinguished by the domain attribute (category vs post_tag). Keep Parse attributes on so each becomes {"#text": name, "@domain": ..., "@nicename": ...}, then split by @domain in your importer.

Where do custom fields (postmeta) end up?

Under wp:postmeta as meta_key/meta_value pairs — an array when a post has several, a single object when it has one. Flatten with Object.fromEntries([].concat(item['wp:postmeta'] ?? []).map(m => [m['wp:meta_key'], m['wp:meta_value']])) to address fields by name. Drop unwanted meta first with json-key-filter.

Can I convert just the published posts of one type?

Not in the converter itself — it has no filtering. Convert the whole WXR, then use json-path-extractor with a filter expression like [?(@.wp:post_type=='post')] chained with [?(@.wp:status=='publish')] to isolate exactly what you want.

Does Gutenberg block markup survive?

Yes. content:encoded contains the raw HTML including Gutenberg block comments (). The tool preserves it verbatim. If your target ignores block comments, strip them in your import transform; if it understands them, you can parse blocks from the HTML downstream.

Is the post content uploaded to JAD Apps?

No. WXR parsing runs entirely in your browser via fast-xml-parser. Post bodies, draft text, and author data never reach JAD Apps servers — only an anonymous run counter (no content) is recorded for signed-in dashboard stats.

How large a WXR file can I convert?

This is a Pro tool: 2 MB per file on the free tier, 100 MB per file on signed-in Pro. WXR grows quickly because post bodies are embedded as HTML — export by post type or date range to keep files small, or upgrade for a whole-site export in one pass.

How do I get a flat array of posts for my import loop?

After conversion, the posts live at rss.channel.item. Use json-path-extractor with $.rss.channel.item to pull just that array, then iterate. Coalesce single-post exports with [].concat(...) since one item is an object, not an array.

Privacy first

Conversion runs locally in your browser. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

How to convert a wordpress wxr export to json

Step 1
Export WXR from WordPress — WordPress Admin → Tools → Export. Choose 'All content' or a single type (Posts, Pages, or a registered custom post type), then 'Download Export File'. You get a .xml WXR file.
Step 2
Open the XML to JSON tool — This is a Pro tool. Drop the .xml file onto the dropzone. The free tier handles files up to 2 MB for evaluation; signed-in Pro raises the per-file limit to 100 MB — useful because WXR exports with embedded HTML bodies grow fast.
Step 3
Decide whether to strip namespaces — Leave Strip namespaces OFF to keep the precise WXR keys (wp:status, content:encoded) — safest for a faithful map. Turn it ON only if your importer needs bare keys, and read the encoded collision caveat below before doing so.
Step 4
Keep type coercion sensible for content — WordPress IDs (wp:post_id) are numeric and safe to coerce. Leave Coerce types on for those. There is little prose-numeric ambiguity in WXR, so the default is fine; turn it off if you want every field as a string for a uniform importer.
Step 5
Convert and locate the items — Click Convert to JSON. The post records live under rss.channel.item — an array when there are multiple posts, a single object when there is one. Each item carries title, body, dates, taxonomy, and meta.
Step 6
Filter and reshape, then import — The tool does not drop drafts/revisions/attachments. Use json-path-extractor with a filter like $.rss.channel.item[?(@.wp:status=='publish')] or json-key-filter to keep only the fields your CMS needs, then run your import loop (Contentful management API, Sanity client.create(), etc.).

WXR fields after conversion

Standard WXR item fields and their JSON shape (keys shown with Strip namespaces OFF). Values illustrative.

WXR element	JSON key & value	Notes
`<title>`	`title: "My Post"`	Plain string
`<content:encoded><![CDATA[..]]></content:encoded>`	`"content:encoded": "<p>HTML body</p>"`	CDATA unwrapped to one HTML string
`<wp:status>publish</wp:status>`	`"wp:status": "publish"`	Values: publish / draft / pending / private / inherit
`<wp:post_type>post</wp:post_type>`	`"wp:post_type": "post"`	post / page / attachment / nav_menu_item / revision
`<wp:post_id>42</wp:post_id>`	`"wp:post_id": 42`	Number with Coerce types on
`<category domain="category">News</category>` (repeated)	`category: [ {"#text":"News","@domain":"category"}, .. ]`	Repeated → array of objects (attributes as `@`-keys)
`<wp:postmeta><wp:meta_key>k</wp:meta_key>..` (repeated)	`"wp:postmeta": [ {meta_key, meta_value}, .. ]`	Array when multiple meta; object when one

Option matrix for WXR

The real controls and their effect on a WordPress export. attributePrefix (@) and textNodeName (#text) are fixed defaults, not in the UI.

Option	WXR effect	Recommendation
Strip namespaces OFF	Keeps `wp:status`, `content:encoded`, `dc:creator` — faithful WXR names	Default — safest for a complete map
Strip namespaces ON	`wp:status`→`status`, `content:encoded`→`encoded`, `excerpt:encoded`→`encoded` (COLLISION)	Use with caution — body/excerpt collide on `encoded`
Parse attributes ON	`<category domain="post_tag" nicename="x">`→`@domain`,`@nicename` keys	Keep on — taxonomy `domain`/`nicename` matter for import
Coerce types	`wp:post_id`,`wp:post_parent`→numbers; `wp:is_sticky`→0/1 number	Keep on; turn off only if you want all-string keys

Cookbook

Real WXR fragments and the JSON the converter returns, plus the follow-up tools that filter and reshape for a headless import.

A WXR item with HTML body and meta

Example

The post body lives in content:encoded wrapped in CDATA; the converter unwraps it to a clean HTML string. Custom fields come through as wp:postmeta pairs.

Input (WXR item):
<item>
  <title>Launch Day</title>
  <wp:post_id>42</wp:post_id>
  <wp:status>publish</wp:status>
  <content:encoded><![CDATA[<p>We shipped!</p>]]></content:encoded>
  <wp:postmeta>
    <wp:meta_key>_thumbnail_id</wp:meta_key>
    <wp:meta_value><![CDATA[88]]></wp:meta_value>
  </wp:postmeta>
</item>

Output (Strip namespaces OFF, Coerce types ON):
{
  "title": "Launch Day",
  "wp:post_id": 42,
  "wp:status": "publish",
  "content:encoded": "<p>We shipped!</p>",
  "wp:postmeta": { "wp:meta_key": "_thumbnail_id",
                   "wp:meta_value": 88 }
}

The Strip-namespaces collision you must avoid

Example

Input:
<item>
  <excerpt:encoded><![CDATA[Short teaser]]></excerpt:encoded>
  <content:encoded><![CDATA[<p>Full body</p>]]></content:encoded>
</item>

Strip namespaces ON (BAD):
{ "encoded": [ "Short teaser", "<p>Full body</p>" ] }
  -> which is which? order-dependent and fragile

Strip namespaces OFF (GOOD):
{ "excerpt:encoded": "Short teaser",
  "content:encoded": "<p>Full body</p>" }

Filtering out drafts and revisions after conversion

Example

The tool imports everything — drafts, revisions, attachments. Use json-path-extractor to keep only published posts before you write the import loop, so you do not push revisions into the new CMS.

After conversion you have rss.channel.item = [ ..mixed.. ]

/tool/json-path-extractor expression:
  $.rss.channel.item[?(@.wp:post_type=='post')]
then chain:
  $.[?(@.wp:status=='publish')]

=> only published posts of type 'post' remain;
revisions (wp:post_type=='revision') and attachments
(wp:post_type=='attachment') are excluded.

Single post vs. many posts

Example

Export one post and item is an object; export several and it is an array. Your import loop must handle both, or always test against a multi-post export.

One post:
{ "rss": { "channel": { "item": { "title": "Only post" } } } }

Many posts:
{ "rss": { "channel": { "item": [ {..}, {..} ] } } }

Node import loop (safe):
const items = [].concat(data.rss.channel.item ?? []);
for (const post of items) { await cms.create(map(post)); }

Mapping WXR meta into target custom fields

Example

Each post can carry several wp:postmeta pairs. Flatten them into a lookup so your importer can address fields by name, then map to the destination schema.

Converted item.wp:postmeta:
[ { "wp:meta_key": "seo_title",  "wp:meta_value": "Launch" },
  { "wp:meta_key": "reading_time", "wp:meta_value": 3 } ]

Reshape in your importer:
const meta = Object.fromEntries(
  [].concat(item['wp:postmeta'] ?? [])
    .map(m => [m['wp:meta_key'], m['wp:meta_value']]));
// meta.seo_title === 'Launch', meta.reading_time === 3
Use /tool/json-key-filter first to drop meta you do not need.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Strip namespaces collides content:encoded and excerpt:encoded

Key collision

Drafts, revisions, and attachments are NOT filtered

Not filtered

A single <item> is an object, not an array

Singleton trap

Media is referenced, not embedded

Expected

Post body is HTML, not structured content

By design

Categories and tags share the <category> element

Review

Repeated postmeta becomes an array; single becomes an object

Singleton trap

Large WXR with embedded HTML can exceed the free limit

Plan limit

Frequently asked questions

How is the post body (content:encoded) handled?

Should I enable Strip namespaces for a WXR file?

Does the tool skip drafts, revisions, and attachments?

What about media attachments and images in posts?

Why is my single-post export not an array of items?

How do I tell categories from tags?

Where do custom fields (postmeta) end up?

Can I convert just the published posts of one type?

Does Gutenberg block markup survive?

Is the post content uploaded to JAD Apps?

How large a WXR file can I convert?

How do I get a flat array of posts for my import loop?

Privacy first

Conversion runs locally in your browser. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

Convert a WordPress WXR Export to JSON

How to convert a wordpress wxr export to json

WXR fields after conversion

Option matrix for WXR

Cookbook

A WXR item with HTML body and meta

The Strip-namespaces collision you must avoid

Filtering out drafts and revisions after conversion

Single post vs. many posts

Mapping WXR meta into target custom fields

Errors and edge cases

Strip namespaces collides content:encoded and excerpt:encoded

Drafts, revisions, and attachments are NOT filtered

A single <item> is an object, not an array

Media is referenced, not embedded

Post body is HTML, not structured content

Categories and tags share the <category> element

Repeated postmeta becomes an array; single becomes an object

Large WXR with embedded HTML can exceed the free limit

Frequently asked questions

How is the post body (content:encoded) handled?

Should I enable Strip namespaces for a WXR file?

Does the tool skip drafts, revisions, and attachments?

What about media attachments and images in posts?

Why is my single-post export not an array of items?

How do I tell categories from tags?

Where do custom fields (postmeta) end up?

Can I convert just the published posts of one type?

Does Gutenberg block markup survive?

Is the post content uploaded to JAD Apps?

How large a WXR file can I convert?

How do I get a flat array of posts for my import loop?

Privacy first

Related guides

Convert a WordPress WXR Export to JSON

How to convert a wordpress wxr export to json

WXR fields after conversion

Option matrix for WXR

Cookbook

A WXR item with HTML body and meta

The Strip-namespaces collision you must avoid

Filtering out drafts and revisions after conversion

Single post vs. many posts

Mapping WXR meta into target custom fields

Errors and edge cases

Strip namespaces collides content:encoded and excerpt:encoded

Drafts, revisions, and attachments are NOT filtered

A single <item> is an object, not an array

Media is referenced, not embedded

Post body is HTML, not structured content

Categories and tags share the <category> element

Repeated postmeta becomes an array; single becomes an object

Large WXR with embedded HTML can exceed the free limit

Frequently asked questions

How is the post body (content:encoded) handled?

Should I enable Strip namespaces for a WXR file?

Does the tool skip drafts, revisions, and attachments?

What about media attachments and images in posts?

Why is my single-post export not an array of items?

How do I tell categories from tags?

Where do custom fields (postmeta) end up?

Can I convert just the published posts of one type?

Does Gutenberg block markup survive?

Is the post content uploaded to JAD Apps?

How large a WXR file can I convert?

How do I get a flat array of posts for my import loop?

Privacy first

Related guides