How to flatten nested api responses for pipeline normalization
- Step 1Capture a representative response and save it to a file — Use your API client, curl, or dev tools to capture a representative JSON response from each source. Save each as a
.jsonfile — the flattener reads dropped files, not pasted text. For GraphQL, strip thedatawrapper at the source if you don't want adata.prefix on every key (the tool flattens from the root and has no key selector). - Step 2Drop the file and pick a delimiter that matches the target — Flattening runs in your browser. Set the Delimiter box to match your warehouse:
_for BigQuery/Snowflake column names,.for path-style references,__if source keys already contain underscores. Invalid JSON throws aSyntaxError. - Step 3Flatten one response per source — Flatten a representative response from each API you are integrating. A top-level array flattens to numeric-prefixed keys rather than rows, so flatten a single record to get the per-record flat shape that represents that source.
- Step 4Diff the flat key sets to find equivalent fields — Lay the flat key sets side by side. Equivalent fields under different names (
user_emailvscontact_emailAddress) jump out once both are flat. For a structured diff of two flattened responses, use json-diff to see exactly which keys differ. - Step 5Build the field-mapping table — Map each source flat key to your canonical target column. Apply the renames to the flat JSON with json-key-renamer so every source emits the same canonical key set before load.
- Step 6Bake the flat shape into the pipeline — Use the verified flat key set as your canonical intermediate representation. In code, run an equivalent recursive flatten at each ingestor; in dbt, reference the flat keys in
json_extract_scalar()calls. This tool defines and validates the target shape your pipeline code reproduces.
Typical API nesting depth and the flat shape
Illustrative nesting across common API sources, and what flattening normalises them to. Depths are typical, not guaranteed by each vendor.
| Source | Typical depth | Example nested path | Flat key (underscore delimiter) |
|---|---|---|---|
| Stripe charge | ~3 levels | charge.payment_method_details.card.last4 | charge_payment_method_details_card_last4 |
| Salesforce record | ~2 levels | Account.Owner.Name | Account_Owner_Name |
| GraphQL response | 1 + query depth | data.user.profile.email | data_user_profile_email (data_ prefix unless trimmed) |
| Custom internal API | ~5 levels | result.payload.user.contact.email.primary | result_payload_user_contact_email_primary |
Normalization options and behaviours
What you control and the behaviours that affect normalization. The tool flattens from the root — there is no key selector to start below a wrapper.
| Concern | Configurable? | Behaviour |
|---|---|---|
| Delimiter | Yes — text box | Default .; up to 3 chars; match your warehouse's identifier rules |
GraphQL data wrapper | No key selector | Flattens from root → keys are data_...; trim the wrapper at source to drop the prefix |
| Array handling | Yes — three modes | dot.0.key / arr[0].key / keep arrays for repeated fields |
| null vs missing | Preserved | null and "" kept as leaves; empty {}/[] produce no key (genuinely absent) |
| Max depth | No — fixed at 10 | Deeper subtrees left nested under the depth-10 key |
Cookbook
Real before/after responses from heterogeneous APIs. The goal is one flat shape per source so the mapping layer has a predictable input.
Stripe charge flattened to a canonical shape
ExampleA 3-level Stripe charge collapses to flat underscore keys that a downstream mapping table can reference directly.
Input (charge.json):
{ "id": "ch_1",
"payment_method_details": { "card": { "last4": "4242", "brand": "visa" } } }
Delimiter: _ Array handling: keep arrays
Output:
{
"id": "ch_1",
"payment_method_details_card_last4": "4242",
"payment_method_details_card_brand": "visa"
}GraphQL data wrapper stays prefixed
ExampleThere is no key selector, so the data wrapper is flattened along with everything else. Either accept the data_ prefix or strip the wrapper before flattening.
Input (graphql.json):
{ "data": { "user": { "email": "a@x.com" } } }
Flatten (underscore) as-is →
{ "data_user_email": "a@x.com" } ← data_ prefix kept
To drop it, save just the inner object first:
{ "user": { "email": "a@x.com" } }
→ { "user_email": "a@x.com" }Aligning two sources by flat key
ExampleFlatten one response from each source, then compare. The same concept under different names becomes obvious side by side and feeds the mapping table.
Source A (flattened): { "user_email": "a@x.com" }
Source B (flattened): { "contact_emailAddress": "a@x.com" }
Mapping table:
user_email -> canonical_email
contact_emailAddress -> canonical_email
Apply with json-key-renamer so both sources emit
canonical_email before load.null vs empty vs missing, preserved for DQ checks
ExampleData-quality rules often distinguish a present-but-null field from an absent one. The flattener preserves null and empty string but drops empty objects/arrays entirely.
Input:
{ "a": null, "b": "", "c": {}, "d": [] }
Output:
{ "a": null, "b": "" }
a is explicitly null, b is explicitly empty —
both kept. c and d are GONE (no key), which your
DQ layer reads as genuinely absent.Top-level array is one object, not rows
ExampleA paginated list response flattens to numeric-prefixed keys, not per-record rows. Normalize a single record to define the per-source shape.
Input (list endpoint):
[ { "id": 1 }, { "id": 2 } ]
Flattener output:
{ "0.id": 1, "1.id": 2 } ← not two records
Normalize ONE record to define the source's flat shape,
then apply that flatten per-record in your ingestor.Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
GraphQL data wrapper is flattened, not skipped
By designThere is no key selector to start flattening below the top-level data key. Every key comes out prefixed data_ (or data.). To drop the prefix, trim the wrapper at the source so the inner object is the document root, then flatten.
Top-level array flattens to numeric keys, not records
By designA list/paginated response flattens to one object with 0.field, 1.field keys — not per-record rows. Normalize one record to define the source's flat shape, then run the equivalent flatten per-record in your pipeline code.
Empty object/array drops a field the mapping expects
Expected{} and [] produce no key, so a field empty in the sample response is absent from the flat key set. If your mapping table expects it, flatten a response where that field is populated, or union flat key sets from several responses to capture the full source schema.
Field name collision across delimiter and source keys
Data lossIf a source key literally contains your delimiter (e.g. payment_method with underscore delimiter) it can collide with a nested path that flattens to the same key; the later write wins silently. Pick a delimiter that does not appear in the source keys (e.g. __ or ~) for that source.
Mixed types for the same field across sources
Mapping riskSource A returns count: 5, source B returns count: "5". The flattener preserves both types faithfully — it does not coerce. Your normalization layer must reconcile the type, not the flattener. This is correct behaviour: surfacing the type mismatch is part of normalization.
Response deeper than 10 levels stops flattening
ExpectedMax depth is fixed at 10. A 5-level API rarely hits it, but envelope-wrapped responses (result.data.payload...) can. Beyond 10, the subtree stays nested under the depth-10 key. Trim wrappers at source or re-flatten the inner fragment.
Invalid JSON throws before normalization
errorA truncated response, an HTML error page captured instead of JSON, or NDJSON throws a SyntaxError. Verify you captured a complete JSON body. Repair with json-format-fixer or locate the issue with json-validator before flattening.
Arrays of objects exploded into positional keys
Mapping riskIn dot/bracket mode, a repeated field (line_items) becomes line_items_0_*, line_items_1_* — positional and unstable across responses with different array lengths. Use keep arrays to keep repeated fields as a single key your mapping handles as a list.
Free-tier 2 MB / one-file limit
Tier limitFree tier caps JSON input at 2 MB, one file. Large paginated dumps may be rejected. Normalize a representative single response per source (which defines the shape), or upgrade to Pro (100 MB, 10-file batches) for bigger captures.
Frequently asked questions
How do I handle GraphQL responses wrapped in a data key?
This flattener has no key selector, so it flattens from the document root and every key comes out prefixed data_ (or data.). If you don't want that prefix, strip the data wrapper at the source so the inner object is the root, then flatten. The prefix is harmless if your mapping table accounts for it.
How do I align fields that mean the same thing across two APIs?
Flatten one response from each source, then compare the flat key sets — equivalent fields under different names (user_email vs contact_emailAddress) become obvious. Use json-diff for a structured comparison, then apply renames with json-key-renamer so every source emits your canonical key set.
My API has conditional nesting — sometimes a field exists, sometimes not. How is that handled?
Flat keys that only appear in some responses are simply absent from the others. Downstream, those become NULL/NaN columns — define them as nullable in your schema. Note an important nuance: a field that is an empty object {} or empty array [] produces no key at all, which reads as genuinely absent rather than null.
Does a top-level array become one normalized record per element?
No. A top-level array flattens to one object with numeric-prefixed keys (0.id, 1.id) — not per-record rows. Normalize a single record to define the source's flat shape, then run the equivalent flatten per-record in your ingestion code.
What delimiter should I use for normalization?
Match your target. Use _ for warehouse column names (BigQuery, Snowflake), . for path-style references, and __ if source keys already contain single underscores (to avoid collisions). The Delimiter box accepts up to 3 characters; an empty value falls back to ..
Does the flattener change my values during normalization?
No. It only restructures keys — leaf values keep their JSON type exactly. This matters for normalization: type mismatches across sources (5 vs "5") stay visible so your reconciliation logic can catch them rather than being masked by the flattener.
How does it distinguish missing from empty?
null and empty string "" are preserved as explicit leaf values; empty objects {} and empty arrays [] produce no key at all. So your data-quality layer can read three states: present-with-value, present-but-null/empty, and absent. This distinction is preserved by design.
Can I paste a response instead of dropping a file?
No — the tool reads a dropped .json file and has no paste box. Save the captured response to a file first. If you only have it on the clipboard, paste it into a new file in your editor, save, and drop that file.
What is the maximum nesting depth it flattens?
Fixed at 10 levels, not adjustable. A typical 2–5 level API never hits it, but heavily enveloped responses can. Beyond 10, the inner subtree stays nested under the depth-10 key. Trim envelope wrappers at the source or re-flatten the inner fragment.
Are my production API responses uploaded?
No. Flattening runs entirely in your browser. Production responses with customer data and business metrics never reach JAD Apps servers. Only an anonymous file-processed counter (no content) is stored for signed-in dashboard stats, which you can opt out of.
How large a response can I normalize for free?
Free tier caps JSON input at 2 MB, one file. A single representative response per source is plenty to define the flat shape. Pro raises it to 100 MB with 10-file batches; higher tiers go further. For large dumps, normalize a sample and process the bulk via the API runner.
Can I run flattening as a step in my ETL pipeline?
Yes. GET /api/v1/tools/json-flattener returns the option schema (delimiter, arrayHandling); pair the @jadapps/runner and POST each response plus options to /api/v1/tools/json-flattener/run locally — data stays on your machine. Or reproduce the same recursive flatten in code; use this tool to define and verify the canonical flat shape each ingestor must produce.
Privacy first
Conversion runs locally in your browser. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.