How to reshape json into columnar format for parquet and arrow
- Step 1
- Step 2Drop the row array — Drag the
.jsonholding your record array onto the dropzone. Input must be a top-level array of objects — the row form Arrow can't ingest directly. - Step 3Select Array → columnar — Pick the Array → columnar radio (
pivot-to-columnar). It rotates rows into one array per field — the column-major form Arrow and Parquet want. - Step 4Click Transpose — Press Transpose. Every key becomes a column array holding that field's values across all records, in input order.
- Step 5Verify equal column lengths — Arrow requires all columns to be the same length. Confirm every output array matches the record count; a short column means a record lacked that key — fix the source and re-run.
- Step 6Feed Arrow / DuckDB — Copy or download
<name>.transposed.json, then build vectors per column (vectorFromArray(out.price)) for an Arrow Table, or load the columns in DuckDB. Convert the Arrow table to Parquet with your writer.
Columnar output → Arrow / DuckDB ingest
How the transposed columns map onto a columnar pipeline. The JSON Transposer produces the column-major JSON; the Arrow/Parquet writing happens in your library of choice.
| Target | Consumes | How the columns are used | Notes |
|---|---|---|---|
| Apache Arrow (JS) | Column arrays | vectorFromArray(out.col) per field → Table | All columns must be equal length |
| Apache Arrow (Python) | Column arrays | pa.table({ "col": out.col, ... }) | Type inferred per column |
| Parquet | Arrow Table | Write the Arrow table via a Parquet writer | Column-major maps 1:1 to Parquet column chunks |
| DuckDB | JSON / Arrow | read_json or scan the Arrow table | Length-uniform columns required for Arrow path |
Row-major vs. column-major, same data
The contiguous column form is what makes columnar scans fast. The transposer rotates between the two losslessly when keys are uniform.
| Field | Row-major (interleaved) | Column-major (contiguous) | Type Arrow infers |
|---|---|---|---|
| price | in each record | [19.99, 5.0, 12.5] | Float64 |
| region | in each record | ["NA", "EU", "NA"] | Utf8 |
| in_stock | in each record | [true, false, true] | Bool |
| constraint | n/a | all arrays length 3 | length mismatch → Arrow error |
Cookbook
Before/after JSON for columnar ingest. Each shows the dropped row array and the column-major output an Arrow builder consumes, in the tool's 2-space indentation.
Row array → Arrow-ready columns
ExampleA product feed of row objects becomes contiguous column arrays you build into an Arrow Table, one vector per field.
Input (products.json):
[
{ "sku": "A1", "price": 19.99, "in_stock": true },
{ "sku": "B2", "price": 5.0, "in_stock": false },
{ "sku": "C3", "price": 12.5, "in_stock": true }
]
Mode: Array → columnar
Output:
{
"sku": ["A1", "B2", "C3"],
"price": [19.99, 5, 12.5],
"in_stock": [true, false, true]
}
// arrow: Table.new({ price: vectorFromArray(out.price), ... })Types preserved for Arrow inference
ExampleBecause values aren't stringified, Arrow infers Float64/Bool/Int columns directly instead of Utf8.
Input:
[
{ "id": 1, "rate": 0.05, "ok": true },
{ "id": 2, "rate": 0.07, "ok": false }
]
Mode: Array → columnar
Output:
{
"id": [1, 2],
"rate": [0.05, 0.07],
"ok": [true, false]
}Short column breaks Arrow length check
ExampleA record missing a field yields a short column. Arrow requires equal-length columns, so this will throw at table construction — normalize first.
Input (second record has no price):
[
{ "sku": "A1", "price": 19.99 },
{ "sku": "B2" },
{ "sku": "C3", "price": 12.5 }
]
Mode: Array → columnar
Output (price length 2, sku length 3 — Arrow will reject):
{
"sku": ["A1", "B2", "C3"],
"price": [19.99, 12.5]
}Null preserved as a column value
ExampleExplicit nulls are kept in the column, so the column length stays correct and Arrow gets a nullable field.
Input:
[
{ "id": 1, "score": 9.1 },
{ "id": 2, "score": null }
]
Mode: Array → columnar
Output:
{
"id": [1, 2],
"score": [9.1, null]
}
// Note: explicit null keeps length; a MISSING key would not.Round-trip columns back to records
ExampleColumnar → array reconstructs the row form from columns — handy when a join or export step needs records.
Input (columns.json):
{
"sku": ["A1", "B2"],
"price": [19.99, 5.0]
}
Mode: Columnar → array
Output:
[
{ "sku": "A1", "price": 19.99 },
{ "sku": "B2", "price": 5 }
]Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Record missing a key (not null)
Length mismatchThe tool appends a value only when the key is present, so a missing key produces a column shorter than the others. Arrow rejects unequal-length columns at table construction. Use an explicit null for absent values, or normalize records so every one has every key.
Input is an object, not an array
RejectedArray → columnar throws pivot-to-columnar requires an array of objects. The ingest path starts from a row array. If your data is already columnar, you may not need the transpose at all — feed Arrow directly.
Mixed types in one field
PreservedIf price is sometimes a number and sometimes a string across records, both land in one column array as-is. Arrow may infer a union type or error depending on the builder — clean types upstream for a stable schema.
Nested objects/arrays as values
PreservedNested values are copied whole into the column. Arrow can model these as Struct/List types, but Parquet schemas get complex; flatten with json-flattener first if you want flat scalar columns.
Explicit null values
PreservedAn explicit null is pushed into the column, keeping the column length correct and giving Arrow a nullable column. This is the safe way to represent 'absent' so lengths stay equal.
Array of primitives
Rejected[1, 2, 3] has no keys, so the result is an empty object {}. Columnar ingest needs an array of objects with named fields.
Empty array `[]`
ExpectedYields {} — no records, no columns. No error; just nothing to ingest.
Invalid JSON
Parse errorMalformed JSON fails at JSON.parse and nothing is produced. Fix with json-format-fixer or check json-validator, then re-run.
Batch larger than the tier limit
BlockedAbove 2 MB on Free / 100 MB on Pro the file is blocked. The whole file parses in memory; for very large ingests, chunk the JSON or stream it server-side before the Arrow step.
Expecting a .parquet file out
Wrong toolThis tool emits column-major JSON, not Parquet. Build an Arrow Table from the columns and write Parquet with your library's writer (pyarrow, parquet-wasm, DuckDB).
Frequently asked questions
Which mode produces Arrow-ready columns?
Array → columnar (pivot-to-columnar). It rotates a row array into one contiguous array per field — the column-major object Arrow table builders and DuckDB's Arrow path consume.
Does it write Parquet directly?
No. It outputs column-major JSON. Build an Arrow Table from those columns, then write Parquet with a writer like pyarrow, parquet-wasm, or DuckDB's COPY.
Why does Arrow reject my columns with a length error?
A record was missing a field, so that column came out shorter than the others, and Arrow requires equal-length columns. Use explicit null for absent values or normalize records to a uniform key set, then re-transpose.
Are numeric and boolean types preserved?
Yes. Values are moved into column arrays without coercion, so Arrow infers Float64/Int/Bool correctly instead of Utf8. Mixed types within a field, however, can break inference.
How are nulls handled?
An explicit null is kept in the column, preserving length and giving Arrow a nullable field. A missing key is different — it shortens the column and causes a length mismatch.
Should I flatten before transposing for Parquet?
If you want flat scalar columns, yes — run json-flattener first. Otherwise nested objects become Struct/List columns, which complicate the Parquet schema.
Can I go from columns back to rows?
Yes — Columnar → array reconstructs records from a columnar object, using the longest column and null-padding shorter ones.
Is there a minify/indent control?
No — the UI exposes only the three mode radios; output is pretty-printed with 2-space indentation. Use json-minifier to compress before storage.
What file size can I prep?
Free up to 2 MB, Pro up to 100 MB. The JSON Transposer is a Pro tool and requires a Pro plan.
Is my data uploaded for processing?
No. The transpose runs in your browser; the file never leaves the tab.
Can DuckDB read the output?
DuckDB can read JSON with read_json, and the Arrow path works once you build a Table from the equal-length columns. Length-uniform columns are required for the Arrow route.
What if a field's values are mixed numbers and strings?
They land in one column as-is. Arrow may infer a union type or throw, depending on the builder. Normalize the field's type upstream for a clean, stable Parquet schema.
Privacy first
Conversion runs locally in your browser. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.