Transpose JSON for Columnar Analytics (Parquet / Arrow)

How to reshape json into columnar format for parquet and arrow

Step 1
Open the JSON Transposer — Go to json-transposer. The transpose runs entirely in your browser.
Step 2
Drop the row array — Drag the .json holding your record array onto the dropzone. Input must be a top-level array of objects — the row form Arrow can't ingest directly.
Step 3
Select Array → columnar — Pick the Array → columnar radio (pivot-to-columnar). It rotates rows into one array per field — the column-major form Arrow and Parquet want.
Step 4
Click Transpose — Press Transpose. Every key becomes a column array holding that field's values across all records, in input order.
Step 5
Verify equal column lengths — Arrow requires all columns to be the same length. Confirm every output array matches the record count; a short column means a record lacked that key — fix the source and re-run.
Step 6
Feed Arrow / DuckDB — Copy or download <name>.transposed.json, then build vectors per column (vectorFromArray(out.price)) for an Arrow Table, or load the columns in DuckDB. Convert the Arrow table to Parquet with your writer.

Columnar output → Arrow / DuckDB ingest

How the transposed columns map onto a columnar pipeline. The JSON Transposer produces the column-major JSON; the Arrow/Parquet writing happens in your library of choice.

Target	Consumes	How the columns are used	Notes
Apache Arrow (JS)	Column arrays	`vectorFromArray(out.col)` per field → `Table`	All columns must be equal length
Apache Arrow (Python)	Column arrays	`pa.table({ "col": out.col, ... })`	Type inferred per column
Parquet	Arrow Table	Write the Arrow table via a Parquet writer	Column-major maps 1:1 to Parquet column chunks
DuckDB	JSON / Arrow	`read_json` or scan the Arrow table	Length-uniform columns required for Arrow path

Row-major vs. column-major, same data

The contiguous column form is what makes columnar scans fast. The transposer rotates between the two losslessly when keys are uniform.

Field	Row-major (interleaved)	Column-major (contiguous)	Type Arrow infers
price	in each record	`[19.99, 5.0, 12.5]`	Float64
region	in each record	`["NA", "EU", "NA"]`	Utf8
in_stock	in each record	`[true, false, true]`	Bool
constraint	n/a	all arrays length 3	length mismatch → Arrow error

Cookbook

Before/after JSON for columnar ingest. Each shows the dropped row array and the column-major output an Arrow builder consumes, in the tool's 2-space indentation.

Row array → Arrow-ready columns

Example

A product feed of row objects becomes contiguous column arrays you build into an Arrow Table, one vector per field.

Input (products.json):
[
  { "sku": "A1", "price": 19.99, "in_stock": true },
  { "sku": "B2", "price": 5.0,   "in_stock": false },
  { "sku": "C3", "price": 12.5,  "in_stock": true }
]

Mode: Array → columnar

Output:
{
  "sku":      ["A1", "B2", "C3"],
  "price":    [19.99, 5, 12.5],
  "in_stock": [true, false, true]
}
// arrow: Table.new({ price: vectorFromArray(out.price), ... })

Types preserved for Arrow inference

Example

Because values aren't stringified, Arrow infers Float64/Bool/Int columns directly instead of Utf8.

Input:
[
  { "id": 1, "rate": 0.05, "ok": true },
  { "id": 2, "rate": 0.07, "ok": false }
]

Mode: Array → columnar

Output:
{
  "id": [1, 2],
  "rate": [0.05, 0.07],
  "ok": [true, false]
}

Short column breaks Arrow length check

Example

A record missing a field yields a short column. Arrow requires equal-length columns, so this will throw at table construction — normalize first.

Input (second record has no price):
[
  { "sku": "A1", "price": 19.99 },
  { "sku": "B2" },
  { "sku": "C3", "price": 12.5 }
]

Mode: Array → columnar

Output (price length 2, sku length 3 — Arrow will reject):
{
  "sku": ["A1", "B2", "C3"],
  "price": [19.99, 12.5]
}

Null preserved as a column value

Example

Explicit nulls are kept in the column, so the column length stays correct and Arrow gets a nullable field.

Input:
[
  { "id": 1, "score": 9.1 },
  { "id": 2, "score": null }
]

Mode: Array → columnar

Output:
{
  "id": [1, 2],
  "score": [9.1, null]
}
// Note: explicit null keeps length; a MISSING key would not.

Round-trip columns back to records

Example

Columnar → array reconstructs the row form from columns — handy when a join or export step needs records.

Input (columns.json):
{
  "sku": ["A1", "B2"],
  "price": [19.99, 5.0]
}

Mode: Columnar → array

Output:
[
  { "sku": "A1", "price": 19.99 },
  { "sku": "B2", "price": 5 }
]

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Record missing a key (not null)

Length mismatch

The tool appends a value only when the key is present, so a missing key produces a column shorter than the others. Arrow rejects unequal-length columns at table construction. Use an explicit null for absent values, or normalize records so every one has every key.

Input is an object, not an array

Rejected

Array → columnar throws pivot-to-columnar requires an array of objects. The ingest path starts from a row array. If your data is already columnar, you may not need the transpose at all — feed Arrow directly.

Mixed types in one field

Preserved

If price is sometimes a number and sometimes a string across records, both land in one column array as-is. Arrow may infer a union type or error depending on the builder — clean types upstream for a stable schema.

Nested objects/arrays as values

Preserved

Nested values are copied whole into the column. Arrow can model these as Struct/List types, but Parquet schemas get complex; flatten with json-flattener first if you want flat scalar columns.

Explicit null values

Preserved

An explicit null is pushed into the column, keeping the column length correct and giving Arrow a nullable column. This is the safe way to represent 'absent' so lengths stay equal.

Array of primitives

Rejected

[1, 2, 3] has no keys, so the result is an empty object {}. Columnar ingest needs an array of objects with named fields.

Empty array `[]`

Expected

Yields {} — no records, no columns. No error; just nothing to ingest.

Invalid JSON

Parse error

Malformed JSON fails at JSON.parse and nothing is produced. Fix with json-format-fixer or check json-validator, then re-run.

Batch larger than the tier limit

Blocked

Above 2 MB on Free / 100 MB on Pro the file is blocked. The whole file parses in memory; for very large ingests, chunk the JSON or stream it server-side before the Arrow step.

Expecting a .parquet file out

Wrong tool

This tool emits column-major JSON, not Parquet. Build an Arrow Table from the columns and write Parquet with your library's writer (pyarrow, parquet-wasm, DuckDB).

Frequently asked questions

Which mode produces Arrow-ready columns?

Array → columnar (pivot-to-columnar). It rotates a row array into one contiguous array per field — the column-major object Arrow table builders and DuckDB's Arrow path consume.

Does it write Parquet directly?

No. It outputs column-major JSON. Build an Arrow Table from those columns, then write Parquet with a writer like pyarrow, parquet-wasm, or DuckDB's COPY.

Why does Arrow reject my columns with a length error?

A record was missing a field, so that column came out shorter than the others, and Arrow requires equal-length columns. Use explicit null for absent values or normalize records to a uniform key set, then re-transpose.

Are numeric and boolean types preserved?

Yes. Values are moved into column arrays without coercion, so Arrow infers Float64/Int/Bool correctly instead of Utf8. Mixed types within a field, however, can break inference.

How are nulls handled?

An explicit null is kept in the column, preserving length and giving Arrow a nullable field. A missing key is different — it shortens the column and causes a length mismatch.

Should I flatten before transposing for Parquet?

If you want flat scalar columns, yes — run json-flattener first. Otherwise nested objects become Struct/List columns, which complicate the Parquet schema.

Can I go from columns back to rows?

Yes — Columnar → array reconstructs records from a columnar object, using the longest column and null-padding shorter ones.

Is there a minify/indent control?

No — the UI exposes only the three mode radios; output is pretty-printed with 2-space indentation. Use json-minifier to compress before storage.

What file size can I prep?

Free up to 2 MB, Pro up to 100 MB. The JSON Transposer is a Pro tool and requires a Pro plan.

Is my data uploaded for processing?

No. The transpose runs in your browser; the file never leaves the tab.

Can DuckDB read the output?

DuckDB can read JSON with read_json, and the Arrow path works once you build a Table from the equal-length columns. Length-uniform columns are required for the Arrow route.

What if a field's values are mixed numbers and strings?

They land in one column as-is. Arrow may infer a union type or throw, depending on the builder. Normalize the field's type upstream for a clean, stable Parquet schema.

Privacy first

Conversion runs locally in your browser. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

How to reshape json into columnar format for parquet and arrow

Step 1
Open the JSON Transposer — Go to json-transposer. The transpose runs entirely in your browser.
Step 2
Drop the row array — Drag the .json holding your record array onto the dropzone. Input must be a top-level array of objects — the row form Arrow can't ingest directly.
Step 3
Select Array → columnar — Pick the Array → columnar radio (pivot-to-columnar). It rotates rows into one array per field — the column-major form Arrow and Parquet want.
Step 4
Click Transpose — Press Transpose. Every key becomes a column array holding that field's values across all records, in input order.
Step 5
Verify equal column lengths — Arrow requires all columns to be the same length. Confirm every output array matches the record count; a short column means a record lacked that key — fix the source and re-run.
Step 6
Feed Arrow / DuckDB — Copy or download <name>.transposed.json, then build vectors per column (vectorFromArray(out.price)) for an Arrow Table, or load the columns in DuckDB. Convert the Arrow table to Parquet with your writer.

Columnar output → Arrow / DuckDB ingest

How the transposed columns map onto a columnar pipeline. The JSON Transposer produces the column-major JSON; the Arrow/Parquet writing happens in your library of choice.

Target	Consumes	How the columns are used	Notes
Apache Arrow (JS)	Column arrays	`vectorFromArray(out.col)` per field → `Table`	All columns must be equal length
Apache Arrow (Python)	Column arrays	`pa.table({ "col": out.col, ... })`	Type inferred per column
Parquet	Arrow Table	Write the Arrow table via a Parquet writer	Column-major maps 1:1 to Parquet column chunks
DuckDB	JSON / Arrow	`read_json` or scan the Arrow table	Length-uniform columns required for Arrow path

Row-major vs. column-major, same data

The contiguous column form is what makes columnar scans fast. The transposer rotates between the two losslessly when keys are uniform.

Field	Row-major (interleaved)	Column-major (contiguous)	Type Arrow infers
price	in each record	`[19.99, 5.0, 12.5]`	Float64
region	in each record	`["NA", "EU", "NA"]`	Utf8
in_stock	in each record	`[true, false, true]`	Bool
constraint	n/a	all arrays length 3	length mismatch → Arrow error

Cookbook

Before/after JSON for columnar ingest. Each shows the dropped row array and the column-major output an Arrow builder consumes, in the tool's 2-space indentation.

Row array → Arrow-ready columns

Example

A product feed of row objects becomes contiguous column arrays you build into an Arrow Table, one vector per field.

Input (products.json):
[
  { "sku": "A1", "price": 19.99, "in_stock": true },
  { "sku": "B2", "price": 5.0,   "in_stock": false },
  { "sku": "C3", "price": 12.5,  "in_stock": true }
]

Mode: Array → columnar

Output:
{
  "sku":      ["A1", "B2", "C3"],
  "price":    [19.99, 5, 12.5],
  "in_stock": [true, false, true]
}
// arrow: Table.new({ price: vectorFromArray(out.price), ... })

Types preserved for Arrow inference

Example

Because values aren't stringified, Arrow infers Float64/Bool/Int columns directly instead of Utf8.

Input:
[
  { "id": 1, "rate": 0.05, "ok": true },
  { "id": 2, "rate": 0.07, "ok": false }
]

Mode: Array → columnar

Output:
{
  "id": [1, 2],
  "rate": [0.05, 0.07],
  "ok": [true, false]
}

Short column breaks Arrow length check

Example

A record missing a field yields a short column. Arrow requires equal-length columns, so this will throw at table construction — normalize first.

Input (second record has no price):
[
  { "sku": "A1", "price": 19.99 },
  { "sku": "B2" },
  { "sku": "C3", "price": 12.5 }
]

Mode: Array → columnar

Output (price length 2, sku length 3 — Arrow will reject):
{
  "sku": ["A1", "B2", "C3"],
  "price": [19.99, 12.5]
}

Null preserved as a column value

Example

Explicit nulls are kept in the column, so the column length stays correct and Arrow gets a nullable field.

Input:
[
  { "id": 1, "score": 9.1 },
  { "id": 2, "score": null }
]

Mode: Array → columnar

Output:
{
  "id": [1, 2],
  "score": [9.1, null]
}
// Note: explicit null keeps length; a MISSING key would not.

Round-trip columns back to records

Example

Columnar → array reconstructs the row form from columns — handy when a join or export step needs records.

Input (columns.json):
{
  "sku": ["A1", "B2"],
  "price": [19.99, 5.0]
}

Mode: Columnar → array

Output:
[
  { "sku": "A1", "price": 19.99 },
  { "sku": "B2", "price": 5 }
]

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Record missing a key (not null)

Length mismatch

Input is an object, not an array

Rejected

Mixed types in one field

Preserved

Nested objects/arrays as values

Preserved

Nested values are copied whole into the column. Arrow can model these as Struct/List types, but Parquet schemas get complex; flatten with json-flattener first if you want flat scalar columns.

Explicit null values

Preserved

An explicit null is pushed into the column, keeping the column length correct and giving Arrow a nullable column. This is the safe way to represent 'absent' so lengths stay equal.

Array of primitives

Rejected

[1, 2, 3] has no keys, so the result is an empty object {}. Columnar ingest needs an array of objects with named fields.

Empty array `[]`

Expected

Yields {} — no records, no columns. No error; just nothing to ingest.

Invalid JSON

Parse error

Malformed JSON fails at JSON.parse and nothing is produced. Fix with json-format-fixer or check json-validator, then re-run.

Batch larger than the tier limit

Blocked

Above 2 MB on Free / 100 MB on Pro the file is blocked. The whole file parses in memory; for very large ingests, chunk the JSON or stream it server-side before the Arrow step.

Expecting a .parquet file out

Wrong tool

This tool emits column-major JSON, not Parquet. Build an Arrow Table from the columns and write Parquet with your library's writer (pyarrow, parquet-wasm, DuckDB).

Frequently asked questions

Which mode produces Arrow-ready columns?

Array → columnar (pivot-to-columnar). It rotates a row array into one contiguous array per field — the column-major object Arrow table builders and DuckDB's Arrow path consume.

Does it write Parquet directly?

No. It outputs column-major JSON. Build an Arrow Table from those columns, then write Parquet with a writer like pyarrow, parquet-wasm, or DuckDB's COPY.

Why does Arrow reject my columns with a length error?

Are numeric and boolean types preserved?

Yes. Values are moved into column arrays without coercion, so Arrow infers Float64/Int/Bool correctly instead of Utf8. Mixed types within a field, however, can break inference.

How are nulls handled?

An explicit null is kept in the column, preserving length and giving Arrow a nullable field. A missing key is different — it shortens the column and causes a length mismatch.

Should I flatten before transposing for Parquet?

If you want flat scalar columns, yes — run json-flattener first. Otherwise nested objects become Struct/List columns, which complicate the Parquet schema.

Can I go from columns back to rows?

Yes — Columnar → array reconstructs records from a columnar object, using the longest column and null-padding shorter ones.

Is there a minify/indent control?

No — the UI exposes only the three mode radios; output is pretty-printed with 2-space indentation. Use json-minifier to compress before storage.

What file size can I prep?

Free up to 2 MB, Pro up to 100 MB. The JSON Transposer is a Pro tool and requires a Pro plan.

Is my data uploaded for processing?

No. The transpose runs in your browser; the file never leaves the tab.

Can DuckDB read the output?

DuckDB can read JSON with read_json, and the Arrow path works once you build a Table from the equal-length columns. Length-uniform columns are required for the Arrow route.

What if a field's values are mixed numbers and strings?

They land in one column as-is. Arrow may infer a union type or throw, depending on the builder. Normalize the field's type upstream for a clean, stable Parquet schema.

Privacy first

Conversion runs locally in your browser. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

Reshape JSON into Columnar Format for Parquet and Arrow

How to reshape json into columnar format for parquet and arrow

Columnar output → Arrow / DuckDB ingest

Row-major vs. column-major, same data

Cookbook

Row array → Arrow-ready columns

Types preserved for Arrow inference

Short column breaks Arrow length check

Null preserved as a column value

Round-trip columns back to records

Errors and edge cases

Record missing a key (not null)

Input is an object, not an array

Mixed types in one field

Nested objects/arrays as values

Explicit null values

Array of primitives

Empty array `[]`

Invalid JSON

Batch larger than the tier limit

Expecting a .parquet file out

Frequently asked questions

Which mode produces Arrow-ready columns?

Does it write Parquet directly?

Why does Arrow reject my columns with a length error?

Are numeric and boolean types preserved?

How are nulls handled?

Should I flatten before transposing for Parquet?

Can I go from columns back to rows?

Is there a minify/indent control?

What file size can I prep?

Is my data uploaded for processing?

Can DuckDB read the output?

What if a field's values are mixed numbers and strings?

Privacy first

Related guides

Reshape JSON into Columnar Format for Parquet and Arrow

How to reshape json into columnar format for parquet and arrow

Columnar output → Arrow / DuckDB ingest

Row-major vs. column-major, same data

Cookbook

Row array → Arrow-ready columns

Types preserved for Arrow inference

Short column breaks Arrow length check

Null preserved as a column value

Round-trip columns back to records

Errors and edge cases

Record missing a key (not null)

Input is an object, not an array

Mixed types in one field

Nested objects/arrays as values

Explicit null values

Array of primitives

Empty array `[]`

Invalid JSON

Batch larger than the tier limit

Expecting a .parquet file out

Frequently asked questions

Which mode produces Arrow-ready columns?

Does it write Parquet directly?

Why does Arrow reject my columns with a length error?

Are numeric and boolean types preserved?

How are nulls handled?

Should I flatten before transposing for Parquet?

Can I go from columns back to rows?

Is there a minify/indent control?

What file size can I prep?

Is my data uploaded for processing?

Can DuckDB read the output?

What if a field's values are mixed numbers and strings?

Privacy first

Related guides