How to convert salesforce xml data export to json
- Step 1Run a Salesforce Data Export — In Salesforce Setup → Data → Data Export, request a full or incremental export (XML format). When the email arrives, download the ZIP and extract the per-object XML files —
Account.xml,Contact.xml,Opportunity.xml, etc. - Step 2Open the XML to JSON tool — This is a Pro tool. Drop one object's
.xmlfile onto the dropzone. The free tier converts files up to 2 MB for evaluation; signed-in Pro raises the per-file limit to 100 MB, which covers most single-object exports. - Step 3Enable Strip namespaces — Tick Strip namespaces to drop the
sf:prefix so keys match Salesforce API names. This also removes thexmlns:sfdeclaration attribute, leaving clean prefix-free JSON ready for your warehouse schema. - Step 4Choose your type-coercion stance — Leave Coerce types on for numeric Amount/Quantity fields and boolean flags. Turn it OFF if any object has zero-padded external IDs or reference codes you must preserve exactly — coercion would strip the leading zeros.
- Step 5Convert and inspect the record array — Click Convert to JSON. Salesforce wraps records in a root element with repeated record children, so multi-record exports produce an array of record objects. Verify the array shape, then download the JSON.
- Step 6Load into your pipeline, then flatten if needed — Read the JSON with
pd.read_json(), stage it forCOPY INTOin Snowflake, or feed your migration script. Nested lookup objects are NOT flattened by this tool — use json-flattener to collapseAccount.NameintoAccount_Namecolumns for a tabular load.
Options mapped to Salesforce ETL outcomes
The real controls in the tool and what they do to a Salesforce export. attributePrefix (@) and textNodeName (#text) are fixed and not shown in the UI.
| Option | Salesforce effect | Default |
|---|---|---|
| Strip namespaces | <sf:Name>→Name; matches API field names and warehouse columns. Removes the xmlns:sf declaration too | Off (enable for SF exports) |
| Coerce types | <Amount>5000</Amount>→5000 (number), <IsWon>true</IsWon>→true (boolean). Off keeps all values as strings | On |
| Parse attributes | Salesforce data XML rarely uses attributes; leaving on or off has little effect on field values. On maps any attribute to @key | On |
| Indent | Output formatting only — choose 2/4 spaces for readable diffs, Minified for compact loads | 2 spaces |
Salesforce field types after conversion
How common Salesforce field shapes land in the JSON, and the follow-up step they often need.
| Salesforce field | JSON result | Follow-up |
|---|---|---|
18-char Record ID 001D000000abcXY... | String (kept as-is — has letters) | None — safe; never coerced to a number |
Currency <Amount>129.50</Amount> | 129.5 (number) with Coerce types on | Fine for analytics; cast to DECIMAL on load if exactness matters |
Custom field <Region__c>EMEA</Region__c> | "Region__c": "EMEA" (suffix kept) | json-key-renamer to drop __c if your target dislikes it |
Lookup <Account><Name>Acme</Name><Id>001..</Id></Account> | Nested object {Account:{Name,Id}} — NOT flattened | json-flattener → Account.Name / Account_Name |
Multi-select picklist <Tags>A;B;C</Tags> | Single string "A;B;C" (Salesforce uses ; inside one field) | Split in dbt/pandas — the tool does not split delimited strings |
Zero-padded external ID <Ref__c>00042</Ref__c> | 42 with Coerce types ON (zeros lost) / "00042" with it OFF | Turn Coerce types OFF to preserve |
Cookbook
Real Salesforce export fragments and the exact JSON the converter returns. Org IDs and PII anonymised.
Stripping the sf: namespace to match API names
ExampleSalesforce wraps record fields in the sf: namespace. Strip namespaces gives you keys that match the API names your warehouse schema already uses.
Input (Account.xml fragment):
<records xmlns:sf="urn:sobject.enterprise.soap.sforce.com">
<sf:Account>
<sf:Id>001D000000abcXY</sf:Id>
<sf:Name>Acme Corp</sf:Name>
<sf:AnnualRevenue>5000000</sf:AnnualRevenue>
</sf:Account>
</records>
Options: Strip namespaces ON, Coerce types ON
Output:
{
"records": {
"Account": {
"Id": "001D000000abcXY",
"Name": "Acme Corp",
"AnnualRevenue": 5000000
}
}
}Single record is an object, many records are an array
ExampleA test export with one record produces an object on the record key; production exports with many records produce an array. Plan your load loop to handle both, or always test with a multi-record file.
One <Account> in the file:
{ "records": { "Account": { "Id": "001..", "Name": "Acme" } } }
Many <Account> in the file:
{ "records": { "Account": [ {..}, {..}, {..} ] } }
Safe pandas load:
import pandas as pd, json
d = json.load(open('Account.json'))
recs = d['records']['Account']
df = pd.json_normalize(recs if isinstance(recs, list) else [recs])Lookup relationships stay nested — flatten for a table
ExampleThe tool does NOT flatten lookups to Account_Name. A nested <Account> becomes a nested object. Run json-flattener afterward to get the flat columns dbt and Snowflake expect.
Converted JSON (lookup kept nested):
{ "Contact": { "LastName": "Lee",
"Account": { "Name": "Acme", "Id": "001.." } } }
Then /tool/json-flattener (dot mode):
{ "Contact.LastName": "Lee",
"Contact.Account.Name": "Acme",
"Contact.Account.Id": "001.." }
=> now one row, one column per leaf, ready for COPY INTO.Protecting zero-padded reference fields
ExampleCoerce types turns numeric-looking strings into numbers, stripping leading zeros from padded external IDs. Turn coercion off to keep them exact, then cast only the truly numeric columns in dbt.
Input: <Account><ExternalRef__c>000417</ExternalRef__c>
<NumEmployees>250</NumEmployees></Account>
Coerce types ON:
{ "ExternalRef__c": 417, "NumEmployees": 250 } <-- ref broken
Coerce types OFF:
{ "ExternalRef__c": "000417", "NumEmployees": "250" } <-- safe
=> cast NumEmployees to INT in your model, keep the ref as TEXT.Working through a multi-object export ZIP
ExampleThe export ZIP holds one XML per object. Convert each separately, name the JSON after the object, then chain warehouse-prep tools so the mapping stays auditable.
ZIP contents: Account.xml Contact.xml Opportunity.xml ... Per object: Account.xml -> xml-to-json (Strip NS) -> Account.json Contact.xml -> xml-to-json (Strip NS) -> Contact.json ... Then: /tool/json-flattener -> tabular columns /tool/json-to-csv -> CSV for a Snowflake stage load Import order: Account before Contact before Opportunity (so foreign-key lookups resolve).
Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Lookup relationships are NOT flattened
By designA nested <Account> inside a <Contact> stays a nested JSON object — the tool does not produce Account_Name or Account.Name flat keys on its own. Run json-flattener after conversion to collapse the hierarchy into dot- or underscore-keyed columns for a tabular load.
Single-record export yields an object, not an array
Singleton trapIf an object's XML contains exactly one record, the record key holds an object; with two-or-more it holds an array. ETL code that always does for rec in data['records']['Account'] will iterate the object's keys when there is one record. Normalise to a list (recs if isinstance(recs, list) else [recs]) or test on a multi-record export.
Coerce types strips leading zeros from reference fields
Data lossZero-padded external IDs or reference codes (000417) become plain integers (417) when Coerce types is on. Salesforce auto-number and external-ID fields frequently use padding. Turn Coerce types OFF to keep all values as strings, then cast only the genuinely numeric columns downstream.
Multi-select picklists arrive as one delimited string
Not splitSalesforce stores multi-select picklist values inside a single field separated by ; (e.g. Hot;Warm). The XML has one element, so you get one string — the tool does not split it. Split it in pandas/dbt, or after exporting to CSV use a CSV column splitter, then re-import.
18-character IDs are safe; large numeric fields stay strings
PreservedSalesforce 15/18-character IDs contain letters, so they are always strings — no precision risk. Any genuinely large integer field (rare in Salesforce, but possible in custom number fields) that would exceed JavaScript's safe-integer range is also kept as a string by fast-xml-parser, so no silent rounding occurs.
No filtering of records or fields during conversion
Not supportedThere is no option to keep only certain objects, statuses, or fields. The tool converts the whole file. To extract a subset (e.g. only Won Opportunities or only specific columns), convert first, then use json-key-filter or json-path-extractor on the JSON.
Empty Salesforce fields become empty strings
ExpectedA null/blank Salesforce field exported as <Description/> or <Description></Description> becomes "" (empty string), not null. If your warehouse needs true nulls, post-process with json-null-stripper (to drop empties) or convert ""→null in your load logic.
Malformed export XML parses without an error
Silent parsefast-xml-parser tolerates minor malformation without throwing. If a Data Export was truncated or corrupted in transit, you may get a structurally wrong object rather than a clear failure. Verify the file size against the export report and validate in a strict XML validator if a load looks short.
Free tier caps evaluation files at 2 MB
Plan limitXML to JSON is a Pro tool; the free tier converts files up to 2 MB. Many single-object Salesforce exports exceed that. Use Pro (100 MB per file) for full objects, or split a large object by date range with Salesforce's incremental export before converting.
Frequently asked questions
How do Salesforce lookup (relationship) fields appear in the JSON?
As nested objects, exactly as the XML nests them — {Contact:{Account:{Name,Id}}}. The tool does NOT flatten them to Account_Name or Account.Name. Use json-flattener after conversion to collapse the hierarchy into flat columns for a warehouse load.
Will my 18-character Salesforce IDs stay intact?
Yes. They contain letters, so they are always parsed as strings — there is no risk of numeric coercion or precision loss. The same is true for 15-character case-sensitive IDs.
How do I keep zero-padded external IDs from being mangled?
Turn Coerce types OFF before converting. With it on, 000417 becomes 417. With it off, every value stays a string and your padded references are preserved exactly; you then cast only the genuinely numeric columns in dbt or pandas.
Does the tool split multi-select picklist values?
No. Salesforce stores multi-select picklists as a single ;-delimited string inside one field, and the tool returns that as one string. Split it downstream (pandas .str.split(';'), dbt, or a CSV column splitter after exporting to CSV).
Can I drop the __c suffix from custom field names?
Not during conversion — __c is kept verbatim so the mapping stays explicit. Rename after conversion with json-key-renamer, or alias the columns in your warehouse model. Many teams keep __c precisely because it makes custom-vs-standard fields obvious in the schema.
My export ZIP has dozens of object files — how should I work through them?
Convert each object's XML separately and name the JSON after the object (Account.json, Contact.json). This keeps the mapping auditable and lets you load in dependency order (Account → Contact → Opportunity) so lookup references resolve. Chain json-flattener and json-to-csv for warehouse staging.
Why is a one-record export not an array?
fast-xml-parser only creates an array when an element repeats. A single record stays an object on the record key. Normalise in your loader (recs if isinstance(recs, list) else [recs]) or test against a multi-record export so your code handles both shapes.
Can I filter to only certain objects or statuses while converting?
No — the tool converts the whole file with no filtering controls. Convert first, then use json-key-filter to keep specific fields or json-path-extractor to pull, say, only Won opportunities with a filter expression.
Are empty Salesforce fields converted to null?
No — <Field/> becomes an empty string "", not null. If your warehouse needs real nulls, run json-null-stripper to drop empties, or convert ""→null in your load step.
Is my CRM data uploaded during conversion?
No. Conversion is entirely client-side via fast-xml-parser. Account names, pipeline revenue, and contact PII never reach JAD Apps servers — only an anonymous run counter (no content) is recorded for signed-in dashboard stats.
How large a Salesforce object can I convert at once?
This is a Pro tool: free-tier evaluation caps files at 2 MB; signed-in Pro raises the per-file limit to 100 MB. For very large objects, use Salesforce's incremental (date-range) export to produce smaller files, then convert and load in batches.
Can I export Salesforce data as JSON directly instead of XML?
Salesforce's weekly Data Export only offers CSV or XML. If you choose CSV instead, use csv-to-json — it is simpler for flat objects. Pick XML (and this tool) when you need the nested relationship structure that CSV flattens away, then use json-flattener only where you want tabular columns.
Privacy first
Conversion runs locally in your browser. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.