How to flatten nested json for bigquery schema design
- Step 1Save a representative sample as a .json file — The flattener reads a dropped file, not pasted text. Save 1 document (for schema design) or a small representative set as
.json. Cover the full variety of nested fields — rare fields absent from the sample will not appear in the flat key set, so they will be missing from the schema you derive. - Step 2Drop the file onto the dropzone — Flattening runs in your browser; nothing is uploaded. The header shows the file name and size. Invalid JSON throws a
SyntaxError— note that a Firestore export is often an array or NDJSON, so flatten one document at a time (see edge cases). - Step 3Set the delimiter to underscore — Type
_into the Delimiter box. BigQuery column names must match[A-Za-z_][A-Za-z0-9_]*and cannot contain dots — soevent_properties_utm_campaignis legal whereevent.properties.utm.campaignis not. The default.is wrong for BigQuery flat columns. - Step 4Choose array handling deliberately —
dot.0.keymakes positional columns (tags_0,tags_1) — sparse and brittle for BigQuery. keep arrays keeps the array so you can declare itREPEATED/ARRAY. Arrays of objects should stay intact (keep arrays) and be modelled asARRAY<STRUCT>— do not flatten them to positional keys. - Step 5Review flat keys for length and legality — Long keys like
event_properties_page_properties_utm_campaignare legal but unwieldy. Trim or rename overly verbose keys with json-key-renamer. Also confirm no key starts with a digit or contains a character outside[A-Za-z0-9_]— rename any that do. - Step 6Load the flat JSON as NDJSON — Convert the flattened object(s) to newline-delimited JSON (one object per line), upload to GCS, and run
bq load --source_format=NEWLINE_DELIMITED_JSON --autodetect myproject:mydataset.mytable gs://bucket/flat.ndjson. With a consistent flat shape, autodetect produces a clean flat schema.
Delimiter and BigQuery column-name legality
BigQuery column names must match [A-Za-z_][A-Za-z0-9_]* — dots are illegal. The delimiter you choose determines whether the flat key is a legal column.
| Delimiter | Example key | Legal BigQuery column? | Notes |
|---|---|---|---|
_ (underscore) | event_props_utm_campaign | Yes | Recommended — clean bare identifier |
. (dot, default) | event.props.utm.campaign | No | Dots are illegal in a flat column name; must rename before load |
[i] (bracket array mode) | items[0] | No | Brackets are illegal; rename or use keep arrays + REPEATED |
__ (double underscore) | event__props__utm | Yes | Use when source keys already contain single underscores |
How nested structures map to BigQuery types
Recommended BigQuery modelling per structure. Flattening is right for objects; arrays of objects belong in ARRAY<STRUCT>, not positional columns.
| Source structure | Flattener mode | BigQuery model | Why |
|---|---|---|---|
| Nested object | Flatten (underscore) | Flat scalar columns | Simplest to query; no STRUCT access needed |
| Array of primitives | keep arrays | ARRAY<STRING> (REPEATED) | Native repeated field; dot mode would make sparse _0/_1 columns |
| Array of objects | keep arrays | ARRAY<STRUCT<...>> (REPEATED RECORD) | Preserves per-element relationships flattening would destroy |
| Highly variable payload | keep arrays / leave nested | JSON column type | Avoids schema churn for unpredictable event properties |
Cookbook
Before/after documents headed for a BigQuery table. The pattern: flatten objects to legal underscore columns, keep arrays for REPEATED fields.
Nested object → legal flat columns
ExampleUnderscore delimiter turns a nested object into BigQuery-legal column names you can SELECT without backticks.
Input (doc.json):
{ "event": { "name": "click", "props": { "page": "/home" } } }
Delimiter: _ Array handling: keep arrays
Output:
{ "event_name": "click", "event_props_page": "/home" }
Query: SELECT event_name, event_props_page FROM t
(no UNNEST, no backticks)Dot delimiter produces illegal column names
ExampleThe default dot delimiter yields keys BigQuery rejects as flat columns. Always switch to underscore for BigQuery loads.
Delimiter: . → { "event.name": "click" }
bq load with this NDJSON:
Error: Invalid field name "event.name".
Fields must contain only letters, numbers, and
underscores, start with a letter or underscore...
Delimiter: _ → { "event_name": "click" } ✓ loadsArray of objects: keep it for ARRAY<STRUCT>
ExampleFlattening an order's line_items to positional keys loses the per-line relationship. Keep the array and model it as a REPEATED RECORD.
Input:
{ "order_id": 9,
"line_items": [ {"sku":"A","qty":2}, {"sku":"B","qty":1} ] }
dot.0.key → line_items_0_sku, line_items_0_qty, ...
(positional, relationship lost)
keep arrays → line_items stays an array → model as
line_items ARRAY<STRUCT<sku STRING, qty INT64>>Verbose flat key, then rename
ExampleDeeply nested event properties produce long but legal column names. Shorten them with the key renamer before defining the schema.
Input:
{ "event": { "properties": { "page": { "utm": { "campaign": "spring" } } } } }
Flatten (underscore) →
{ "event_properties_page_utm_campaign": "spring" }
Legal, but long. Use json-key-renamer to map it to
utm_campaign before creating the table.Firestore export must be flattened per document
ExampleFirestore exports are typically arrays or NDJSON. A whole array flattens to numeric-prefixed keys, not per-document rows — flatten one document to design the schema.
Firestore export (array):
[ {"user":{"name":"Sue"}}, {"user":{"name":"Jon"}} ]
Flattener output (one object, numeric prefixes):
{ "0.user.name": "Sue", "1.user.name": "Jon" } ← not rows
Instead: flatten ONE document to design columns, then
flatten each doc per-record and write NDJSON for bq load.Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Dot delimiter yields columns BigQuery rejects
Load errorThe default . delimiter makes keys like event.name. bq load rejects them with Invalid field name because flat column names cannot contain dots. Always set the Delimiter box to _ for BigQuery; if you already flattened with dots, rename with json-key-renamer before loading.
Array of objects loses relationships when flattened
Modelling errorFlattening line_items to line_items_0_sku, line_items_1_sku discards which qty belongs to which sku. Use keep arrays and model the field as ARRAY<STRUCT<...>> (REPEATED RECORD) so BigQuery keeps per-element structure.
Primitive arrays become sparse positional columns
By designIn dot mode tags: ["a","b"] becomes tags_0, tags_1. With variable-length arrays this produces sparse columns (tags_7 populated only occasionally). Prefer keep arrays and declare the field REPEATED.
Top-level array flattens to numeric keys, not rows
By designA Firestore/array export flattens to one object with 0.field, 1.field keys — not one row per document. Flatten a single document to design the schema, then flatten each document individually and emit NDJSON (one object per line) for bq load.
Column name starts with a digit or has illegal chars
Load errorIf a source key is numeric or contains characters outside [A-Za-z0-9_] (a dot, dash, space), the flat key is an illegal BigQuery column even with underscore delimiter. Rename offenders with json-key-renamer; a numeric leading char needs a prefix like f_.
Empty object/array drops the column
Expected{} and [] produce no key. A field empty in your sample document is absent from the flat schema, so the column will not be created. Pick a sample where every column is populated, or merge representative documents to cover the full field set.
Very deep document stops at level 10
ExpectedMax depth is fixed at 10. Below that the subtree stays nested as a value — which BigQuery autodetect would then type as a RECORD or JSON column, not flat scalars. Re-flatten that fragment, or model the deep portion as a JSON column on purpose.
Underscore collision in source keys
Data lossA literal user_id key and a nested user → id both flatten to user_id; the later one silently overwrites the earlier. With underscore-heavy source keys, use __ in the Delimiter box so the paths stay distinct columns.
Invalid JSON / NDJSON throws
errorThe flattener parses the file as a single JSON value. A multi-line NDJSON export or any syntax error throws a SyntaxError. Flatten one well-formed document at a time. Repair with json-format-fixer or locate the error with json-validator.
Free-tier 2 MB / one-file limit
Tier limitFree tier caps JSON input at 2 MB, one file. A large Firestore export will be rejected. Flatten one representative document for schema design (all you need), or upgrade to Pro (100 MB, 10-file batches) for bigger files.
Frequently asked questions
Which delimiter should I use for BigQuery columns?
Underscore (_). BigQuery column names must match [A-Za-z_][A-Za-z0-9_]* and cannot contain dots, so the default . delimiter produces illegal column names that bq load rejects with Invalid field name. Set the Delimiter box to _ to get legal bare identifiers like event_props_utm_campaign.
Should I always prefer a flat schema over nested RECORD types?
For analytical workloads with aggregations and GROUP BY, flat is simpler — no UNNEST everywhere. For highly variable payloads (event properties), BigQuery's JSON column type or STRUCT preserves structure more efficiently. Use flat for stable, known field sets; keep JSON/STRUCT for unpredictable ones. The flattener supports the flat path and keep arrays supports the repeated path.
How do I handle arrays of objects for BigQuery?
Don't flatten them. Use keep arrays so the array stays intact, then model it as ARRAY<STRUCT<...>> (a REPEATED RECORD). Flattening line_items to line_items_0_sku destroys the per-element relationship and produces sparse positional columns. Keep arrays + REPEATED RECORD is the correct BigQuery pattern.
Why does bq load reject my flattened column names?
Almost certainly you flattened with the dot delimiter — event.name is illegal in BigQuery. Re-flatten with the _ delimiter, or rename existing dotted keys with json-key-renamer. Also check for keys that start with a digit or contain spaces/dashes; those need renaming too.
My Firestore export is an array — can I flatten the whole thing?
Flattening a whole array gives one object with numeric-prefixed keys (0.user.name), not per-document rows. For a load, flatten one document to design the schema, then flatten each document individually and write newline-delimited JSON (one object per line) for bq load --source_format=NEWLINE_DELIMITED_JSON.
Can I change the maximum nesting depth?
No — it is fixed at 10 levels and not exposed in the UI. Subtrees deeper than 10 are left nested as a value, which BigQuery autodetect types as a RECORD/JSON column rather than flat scalars. Re-flatten the deep fragment, or deliberately model that portion as a JSON column.
Why is a column missing from my derived schema?
The field was probably an empty object {} or empty array [] in the sample document — those emit no key. Use a sample where every field is populated, or flatten several representative documents and union their flat key sets to cover the full schema.
Can I paste JSON instead of dropping a file?
No — the tool reads a dropped .json file and has no paste box. Save your document to a file first. A Firestore/NDJSON export should be split so you flatten one well-formed JSON document per pass.
Does flattening change my data types?
No. Leaf values keep their JSON type, so BigQuery autodetect infers STRING/INT64/FLOAT64/BOOL correctly. The flattener only restructures keys; it does not coerce values. Dates remain strings — define them as DATE/TIMESTAMP explicitly or parse after load.
Is my Firestore or API data uploaded?
No. Flattening runs entirely in your browser. Firestore document data, event payloads, and any PII never reach JAD Apps servers. Only an anonymous file-processed counter (no content) is kept for signed-in dashboard stats, which you can opt out of.
How large a file can I flatten for free?
Free tier caps JSON input at 2 MB, one file. That is enough for a representative schema-design document. Pro raises it to 100 MB with 10-file batches; higher tiers go further. For full exports, design the schema from a sample and load the bulk file via NDJSON + bq load.
Can I flatten in an automated load pipeline?
Yes. GET /api/v1/tools/json-flattener returns the option schema; pair the @jadapps/runner and POST documents plus options (delimiter _, arrayHandling keep arrays) to /api/v1/tools/json-flattener/run locally. Data stays on your machine. A common flow: extract Firestore docs → flatten per-record → write NDJSON to GCS → bq load.
Privacy first
Conversion runs locally in your browser. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.