How to flatten nested json for pandas dataframe import
- Step 1Save the JSON response to a file — The flattener reads a dropped file, not pasted text. Save your API response or query result as a
.jsonfile. For a multi-row frame, save the JSON array of records; for column design, a single representative record is enough. - Step 2Drop the file onto the dropzone — Flattening runs in your browser — nothing is uploaded. The header shows the file name and size. Malformed JSON throws a
SyntaxError, so fix the source first if you see that. - Step 3Set the delimiter to underscore — Type
_into the Delimiter box (default.). Underscore givesuser_address_city, usable with pandas' dot accessor. Dot-delimited columns (user.address.city) work too but only with bracket access —df['user.address.city'], neverdf.user.address.city. - Step 4Pick the array mode for your analysis —
dot.0.keyspreadstagsintotags_0,tags_1columns (good when positions are meaningful and fixed-length). keep arrays keepstagsas a list column you candf.explode('tags')later.arr[0].keyuses bracket keys that are awkward in pandas — avoid for DataFrame work. - Step 5Flatten and check the input → output key count — The result panel shows
inputKeys → outputKeys. A big jump usually means an array of objects got expanded to per-element columns — switch to keep arrays and handle the array withexplode/json_normalizeon that one column instead. - Step 6Load the flat JSON into pandas — Download the
.flat.json(or copy), then:import json, pandas as pd; data = json.load(open('out.flat.json')); df = pd.DataFrame(data)for a single object, or pass the original array topd.DataFrameafter flattening per-record. Columns are now scalar — ready forgroupby,merge, or scikit-learn. To go the other way (flat keys back to nested JSON) use json-unflattener; to export the frame as CSV, json-to-csv.
Delimiter vs pandas column access
How the delimiter you set affects which pandas accessors work on the resulting column.
| Delimiter | Column name | Bracket access | Attribute (dot) access | Notes |
|---|---|---|---|---|
_ (underscore) | user_address_city | df['user_address_city'] | df.user_address_city | Recommended — both accessors work |
. (dot, default) | user.address.city | df['user.address.city'] | Not available — df.user.address.city fails | Dots break the attribute accessor and df.query() strings |
__ (double underscore) | user__address__city | df['user__address__city'] | df.user__address__city | Use when source keys already contain single underscores |
Array mode → pandas column shape
Same primitive array, three modes, and the pandas dtype/shape you end up with.
| Mode | Input | Resulting columns | pandas dtype / next step |
|---|---|---|---|
| dot.0.key (default) | tags: ["a","b"] | tags_0, tags_1 | Two object columns; sparse if rows differ in length |
| keep arrays | tags: ["a","b"] | tags (list column) | object column of lists — df.explode('tags') to expand to rows |
| keep arrays | items: [{...}] | items (list-of-dict column) | df['items'] holds dicts — pd.json_normalize(df['items']) per row |
| arr[0].key | tags: ["a","b"] | tags[0], tags[1] | Bracket names need df['tags[0]']; awkward — avoid for DataFrames |
Cookbook
Before/after JSON aimed at a pandas DataFrame. Flatten the object shape here; let pandas handle row stacking and explosion.
Nested record → scalar DataFrame columns
ExampleThe core win. Without flattening, df['user'] is a dict cell; after flattening, each field is its own scalar column with a clean dtype.
Input (record.json):
{ "id": 1, "user": { "name": "Sue", "age": 30 } }
Delimiter: _ Array handling: keep arrays
Output:
{ "id": 1, "user_name": "Sue", "user_age": 30 }
Python:
df = pd.DataFrame([json.load(open('out.flat.json'))])
# columns: id (int64), user_name (object), user_age (int64)Top-level array: flatten per-record, not all at once
ExampleA whole array flattens to numeric-prefixed keys in one object — not a list of rows. For a real multi-row frame, pass the original array to pandas; flatten one record only to design columns.
Input:
[ {"user":{"name":"Sue"}}, {"user":{"name":"Jon"}} ]
Flattener output (one object, numeric prefixes):
{ "0.user.name": "Sue", "1.user.name": "Jon" } ← not rows!
For a 2-row DataFrame, flatten each record, e.g. in Python:
rows = [flatten(r) for r in data]; df = pd.DataFrame(rows)
# use this tool to confirm the per-record flat key shape.keep arrays then df.explode
ExampleWhen a record has a list field you want as rows, keep the array intact here and use pandas' explode downstream.
Input:
{ "id": 7, "tags": ["x", "y", "z"] }
keep arrays → { "id": 7, "tags": ["x","y","z"] }
Python:
df = pd.DataFrame([record])
df = df.explode('tags') # 3 rows: (7,x) (7,y) (7,z)
(dot mode would have made tags_0/tags_1/tags_2 columns instead.)Irregular records align to NaN
ExampleWhen records have different nested keys, the flat key sets differ. pandas aligns them across rows and fills missing cells with NaN — correct sparse-data behaviour.
Records (flattened individually):
{ "id": 1, "user_email": "a@x.com" }
{ "id": 2, "user_phone": "555-0100" }
df = pd.DataFrame([r1, r2]):
id user_email user_phone
0 1 a@x.com NaN
1 2 NaN 555-0100Dot delimiter breaks the attribute accessor
ExampleA subtle pandas trap: dot-delimited column names cannot be accessed with df.col and break df.query() strings. Underscore avoids it.
Delimiter: . → column 'user.name'
df.user.name # AttributeError-style failure
df['user.name'] # works (bracket only)
df.query('user.name == "Sue"') # parser error
Delimiter: _ → column 'user_name'
df.user_name # works
df.query('user_name == "Sue"') # worksErrors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Top-level array flattens to numeric keys, not rows
By design[r1, r2, r3] becomes one object with 0.field, 1.field keys — not a list of row-objects. For a multi-row DataFrame, flatten each record individually (use this tool to confirm the per-record shape) and pass the list of flat objects to pd.DataFrame.
Dot delimiter breaks df.col and df.query()
pandas pitfallA column named user.name can only be read with df['user.name']; df.user.name and df.query('user.name == ...') both fail because pandas parses the dot as attribute access. Set the Delimiter box to _ so the attribute accessor and query strings work.
Primitive arrays spread into positional columns
By designIn dot mode, tags: ["a","b"] becomes tags_0, tags_1. With variable-length arrays this creates sparse trailing columns. Use keep arrays to keep a single list column you can df.explode() or pd.json_normalize() on demand.
Empty object/array yields no column
Expected{} and [] produce no key, so the column never appears. If a feature is empty in the record you flatten, it is missing from the DataFrame entirely (not NaN). Flatten a record where the column is populated, or rely on pandas alignment across records to introduce it.
Underscore delimiter collides with underscore keys
Data lossA literal user_id key and a nested user → id both flatten to user_id; the later one overwrites the earlier silently. If your source keys contain underscores, use __ in the Delimiter box so the two paths stay distinct columns.
Very deep records stop at level 10
ExpectedMax depth is fixed at 10. Below that, the inner subtree is left nested as a dict value under the depth-10 column — so pd.DataFrame gets a dict cell there. Re-flatten that fragment, or pd.json_normalize on the offending column.
Invalid JSON throws before output
errorSingle quotes (common when someone pastes a Python repr), trailing commas, or NDJSON throw a SyntaxError. JSON requires double quotes and no trailing commas. Fix with json-format-fixer or pinpoint the position with json-validator.
Mixed types in a flattened column
pandas behaviourIf one record has count: 5 and another has count: "5", the flattener preserves both types; pandas then infers the column as object, not int64. The flattener does not coerce types — clean types at the source or cast after import with pd.to_numeric.
Free-tier 2 MB / one-file limit
Tier limitFree tier caps JSON input at 2 MB, one file. A large response export may be rejected. Flatten a representative record to design columns (sufficient for most DataFrame prep), or upgrade to Pro (100 MB, 10-file batches) for full files.
Frequently asked questions
When should I use this vs pandas json_normalize?
Use pd.json_normalize() when the nesting is regular and you can specify record_path/meta up front. Use this tool when nesting is irregular or mixed-depth across records, which makes record_path impractical, or when you just want a quick flat shape to inspect before writing normalize config. Many workflows do both: flatten here to see the key shape, then write tighter json_normalize code.
Why does pd.DataFrame give me a list/dict in some cells?
Either you flattened with keep arrays (list columns are intentional — use df.explode), or a subtree below depth 10 stayed nested (the flattener stops at 10 levels). For list columns, df.explode('col') or pd.json_normalize(df['col']); for deep subtrees, re-flatten that fragment separately.
What delimiter works best with pandas?
Underscore (_). It gives user_address_city, which works with both df['user_address_city'] and the df.user_address_city attribute accessor, and inside df.query() strings. The default dot delimiter only supports bracket access and breaks query().
How are records with different fields handled?
Each record flattens to its own key set. When you build pd.DataFrame([r1, r2, ...]), pandas takes the union of columns and fills cells absent in a given record with NaN — standard sparse-data behaviour. Note this only kicks in across records; a field empty in every record (e.g. {}) never produces a column at all.
Does flattening change my numeric or boolean dtypes?
No. Leaf values keep their JSON type exactly, so pandas infers int64/float64/bool correctly. The flattener does not coerce types — if a column has mixed types across records (5 and "5"), pandas will infer object; cast it afterwards with pd.to_numeric.
Can I paste JSON instead of dropping a file?
No — the tool reads a dropped .json file and has no paste box. In a notebook you can also flatten programmatically, but to use this UI, write your JSON to a file first (open('x.json','w').write(...)) and drop that file.
Why did a feature column disappear?
It was probably an empty object {} or empty array [] in the record you flattened — those emit no key. Flatten a record where the field is populated, or stack multiple records so pandas' column union picks it up across rows.
How do I turn a list column into rows?
Flatten with keep arrays so the array stays a single column, then df.explode('tags') in pandas to get one row per element. If you flatten with dot mode instead, you get tags_0/tags_1 columns, which you would have to melt back — explode on a kept array is cleaner.
Is my API or ML data uploaded?
No. Flattening runs entirely in your browser. API payloads, feature values, and any PII never reach JAD Apps servers. Only an anonymous file-processed counter (no content) is stored for signed-in dashboard stats, which you can opt out of.
What is the maximum nesting depth?
Fixed at 10 levels and not adjustable in the UI. Anything deeper is left nested as a dict value under the depth-10 column, which pandas reads as a dict cell. For deeper structures, re-flatten the inner fragment or pd.json_normalize that column.
How large a response can I flatten for free?
Free tier caps JSON input at 2 MB, one file. That covers a representative record for column design. Pro raises it to 100 MB with 10-file batches; higher tiers go further. For large exports, flatten a sample for the schema and process the bulk via the API runner or in-notebook code.
Can I flatten inside an automated Python pipeline?
Yes. GET /api/v1/tools/json-flattener returns the option schema (delimiter, arrayHandling); pair the @jadapps/runner and POST records plus options to /api/v1/tools/json-flattener/run locally. Data stays on your machine. Or implement the same recursive flatten in Python at ingest — use this tool to confirm the key shape your code should produce.
Privacy first
Conversion runs locally in your browser. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.