How to convert csv to ndjson for bigquery / snowflake
- Step 1Export the source table as CSV — From a database, spreadsheet, or app, export a CSV with a header row. Headers become NDJSON field names and, with BigQuery autodetect, the column names — so use warehouse-legal names (letters, digits, underscores; no leading digit).
- Step 2Drop the CSV onto the converter above — Accepts
.csv,.tsv,.txt. PapaParse auto-detects the delimiter and is quote-aware, so embedded commas and newlines inside quoted cells do not break records — a common cause of CSV load failures. - Step 3Select NDJSON / JSON Lines as the output mode — This produces one compact JSON object per line with no enclosing array brackets — the line-delimited format every major warehouse load command reads. Indentation does not apply to NDJSON.
- Step 4Keep type inference on for clean autodetect — Leave Infer numbers, booleans, null on so numeric and boolean columns load as their real types. Turn it off only if you want every column loaded as STRING (e.g. you will CAST in SQL afterwards).
- Step 5Consider Skip empty cells for NULL handling — By default a blank cell is
""(an empty string in the column). Turn on Skip empty cells to omit the key, which BigQuery loads as NULL — useful when blank should mean missing, not empty. - Step 6Download the .ndjson and run the load — Click Download NDJSON, then
bq load --source_format=NEWLINE_DELIMITED_JSON --autodetect dataset.table data.ndjson, or stage the file andCOPY INTOit in Snowflake. Confirm the Records stat matches your expected row count.
NDJSON load command by warehouse
The NDJSON this tool produces feeds each of these. The tool generates the file; it does not run the load.
| Warehouse | Source format / file format | Load command (abbreviated) |
|---|---|---|
| BigQuery | NEWLINE_DELIMITED_JSON | bq load --source_format=NEWLINE_DELIMITED_JSON --autodetect ds.t data.ndjson |
| Snowflake | FILE_FORMAT = (TYPE = JSON) | stage file → COPY INTO t FROM @stage FILE_FORMAT=(TYPE=JSON) |
| Redshift | FORMAT AS JSON 'auto' | COPY t FROM 's3://…/data.ndjson' FORMAT AS JSON 'auto' |
| Athena | JSON SerDe | CREATE EXTERNAL TABLE … ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' |
Inference and how the warehouse types it
With BigQuery --autodetect on. Inference is a single global toggle; off means every value is STRING.
| CSV cell | NDJSON value (inference on) | Autodetected type | Note |
|---|---|---|---|
42 | 42 | INTEGER | Safe integers only |
3.14 | 3.14 | FLOAT | Decimal / scientific |
true | true | BOOLEAN | Case-insensitive |
null | null | (NULL) | Loads as SQL NULL |
| `` (empty), Skip empty off | "" | STRING (empty) | Empty string, not NULL |
| `` (empty), Skip empty on | key omitted | NULL | Field absent → NULL |
9007199254740993 | "9007…" | STRING | Beyond safe int — load to NUMERIC/STRING and CAST |
2026-06-12 | "2026-06-12" | STRING | Loads as string; CAST to DATE in SQL or schema |
Cookbook
NDJSON conversion recipes for warehouse loads. Values illustrative; the focus is the shape and the load command.
Typed NDJSON for bq load --autodetect
ExampleNDJSON mode, inference on. BigQuery autodetect reads events as INTEGER and active as BOOLEAN.
Input (users.csv):
user_id,events,active
1,12,true
2,0,false
Output (NDJSON):
{"user_id":1,"events":12,"active":true}
{"user_id":2,"events":0,"active":false}
bq load --source_format=NEWLINE_DELIMITED_JSON --autodetect \
analytics.users users.ndjsonNULL vs empty string for a nullable column
ExampleSkip empty cells on: a blank country omits the key, so BigQuery loads NULL. With it off you would get an empty string instead.
Input:
user_id,country
1,US
2,
Output (Skip empty cells ON):
{"user_id":1,"country":"US"}
{"user_id":2}
→ row 2's country loads as NULL, not ''Snowflake COPY INTO from a JSON stage
ExampleStage the NDJSON, then COPY INTO with a JSON file format. Each line becomes one VARIANT or typed-column row.
Output (NDJSON, staged as @my_stage/users.ndjson):
{"user_id":1,"plan":"pro"}
{"user_id":2,"plan":"free"}
COPY INTO users
FROM @my_stage/users.ndjson
FILE_FORMAT = (TYPE = JSON);Keep a big surrogate key as a string
ExampleA 19-digit key exceeds safe-integer range; inference keeps it a string so it loads to a STRING/NUMERIC column without rounding.
Input:
event_key,name
9223372036854775807,launch
Output (NDJSON):
{"event_key":"9223372036854775807","name":"launch"}
→ load into a STRING or NUMERIC column, then CAST as neededLoad everything as STRING then CAST in SQL
ExampleTurn inference off when you would rather do all typing in the warehouse. Every value is a quoted string; CAST in the load SELECT or a downstream view.
Input:
id,amount,paid
1,19.99,true
Output (NDJSON, inference OFF):
{"id":"1","amount":"19.99","paid":"true"}
→ SELECT CAST(amount AS NUMERIC), CAST(paid AS BOOL) …Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Empty cell loads as empty string instead of NULL
By designBy default a blank cell becomes "", which warehouses load as an empty string, not NULL — these are distinct and can break WHERE col IS NULL logic. Turn on Skip empty cells to omit the key so the column loads as NULL. The literal text null (any case) is always emitted as JSON null regardless of this setting.
Leading-zero key autodetected as INTEGER
By designWith inference on, 00042 becomes 42 and autodetect picks INTEGER. For a key that must stay exact, turn off Infer numbers, booleans, null so the column is a string, then load it to a STRING column. Inference is global — if you need other columns typed, load everything as STRING and CAST in SQL.
Big integer can't fit BigQuery INTEGER (INT64)
PreservedAn integer beyond JavaScript's safe-integer range is kept as a string by inference, avoiding silent corruption. BigQuery's INT64 actually goes higher than JS safe-int, so if you need the numeric value, load this column into NUMERIC/BIGNUMERIC (or STRING) and CAST — the string is exact, so no precision is lost in the file.
Date / timestamp loads as STRING
Expected2026-06-12 and ISO timestamps don't match the numeric/boolean/null patterns, so they stay strings. Autodetect types them as STRING. Either define an explicit schema with a DATE/TIMESTAMP column (BigQuery parses ISO strings into it), or CAST in a downstream view.
Autodetect picks the wrong type from a sparse column
Schema riskBigQuery autodetect samples rows; a column that is mostly numeric but has one stray non-numeric cell may type as STRING or fail. This is a warehouse behaviour, not a converter behaviour — the NDJSON faithfully reflects the cells. Supply an explicit schema to bq load (--schema) for predictable typing on messy columns.
Dotted header becomes a flat field, not a RECORD
Not supported hereaddress.city is a literal field name, not a nested BigQuery RECORD/STRUCT. To load nested structures, convert here, then expand dotted keys with json-unflattener before loading, or define a STRUCT schema and provide nested NDJSON.
Field name illegal for the warehouse
Load rejectA header with spaces or a leading digit (1st_visit, User Name) becomes a JSON key the same way, but BigQuery/Snowflake reject it as a column name on autodetect. Rename the columns in the source CSV (or with csv-header-rename) to legal identifiers before converting.
Free tier file/row cap on a warehouse-scale CSV
LimitFree tier caps at 2 MB / 500 rows — fine for a sample, not a load. Pro is 100 MB / 100,000 rows, Developer 5 GB with no row cap. For very large loads, split the CSV into chunks under your cap, convert each to NDJSON, and load them as a wildcard set (bq load … data_*.ndjson).
Frequently asked questions
Why NDJSON instead of just loading the CSV into BigQuery?
NDJSON carries the field name on every record, so a schema change (new column) doesn't break older files the way a CSV column-order change does, and JSON null is distinct from empty string for nullable columns. With inference on, autodetect also gets real INTEGER/BOOLEAN types instead of guessing from CSV strings.
Which output mode do I pick for a warehouse load?
NDJSON / JSON Lines. That is the exact format BigQuery's NEWLINE_DELIMITED_JSON, Snowflake's TYPE = JSON, Redshift's FORMAT AS JSON, and Athena's JSON SerDe read. Do not use Array mode for these loads — they expect one record per line, not an enclosing array.
How do I make blank cells load as NULL?
Turn on Skip empty cells. That omits the key for a blank cell, and a missing key loads as SQL NULL. With it off, a blank becomes an empty string "", which is a different value. The literal word null always becomes JSON null regardless of this setting.
Will my date columns load as DATE/TIMESTAMP?
Not automatically — dates stay JSON strings, so autodetect types them STRING. Define an explicit schema with a DATE or TIMESTAMP column (BigQuery parses ISO-8601 strings into those types on load), or CAST in a downstream view/query.
What happens to very large integer IDs?
Inference keeps them as strings when they exceed JavaScript's safe-integer range, so the value in the file is exact (no rounding). Load that column into NUMERIC/BIGNUMERIC or STRING and CAST as needed. BigQuery INT64 can hold larger values than JS safe-int, so the string-then-CAST path preserves full precision.
Is the data uploaded anywhere during conversion?
No. PapaParse parses and the NDJSON is generated entirely in your browser. Warehouse-bound data never reaches a JAD Apps server. Only an anonymous run counter is recorded when signed in, and you can opt out.
Can I load nested STRUCT / RECORD data this way?
Not from a flat CSV directly — each row becomes a flat object, and dotted headers stay literal field names. To get nested NDJSON, convert here, then run the records through json-unflattener to expand a.b.c keys into nested objects, and load against a STRUCT schema.
How large a CSV can I convert?
Free: 2 MB / 500 rows. Pro: 100 MB / 100,000 rows. Pro+Media: 500 MB / 500,000 rows. Developer: 5 GB, no row cap. For warehouse-scale loads, split the source CSV into chunks under your cap and convert each — warehouses happily load a wildcard set of NDJSON files.
Can I generate a BigQuery schema from the data too?
Not in this tool, but the sibling json-schema-generator produces a JSON Schema from the converted records, which you can adapt into a bq load --schema definition. Or rely on --autodetect, which works well once inference has typed the values.
My CSV is semicolon-delimited (EU locale) — will it still work?
Yes. PapaParse auto-detects the delimiter from the file content, so comma, semicolon, and tab all work without configuration. The NDJSON output is identical regardless of the source delimiter.
Can I automate CSV→NDJSON in an ELT pipeline?
Yes. Pair the @jadapps/runner once and POST the CSV to 127.0.0.1:9789/v1/tools/csv-to-json/run with outputMode: ndjson. A common pattern: source export → runner converts to NDJSON → land in GCS/S3 → bq load / COPY INTO. The data stays on your machine and never reaches JAD's servers.
How do I go from warehouse JSON back to CSV for a report?
Use the sibling json-to-csv tool. Export query results as a JSON array, then convert to a flat CSV for a spreadsheet or BI handoff.
Privacy first
Conversion runs locally in your browser. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.