How to convert json event data to analytics schema sql
- Step 1Export events as a JSON array — You need a top-level array of event objects:
[{"event":"page_view",...},{...}]. Segment/Mixpanel exports and GA4 BigQuery extracts often nest events under a key — pull the array out with json-path-extractor ($.events[*]) first. - Step 2Flatten nested properties if you want flat columns — Event
properties/event_paramsare stored as one TEXT column unless you flatten them. For a true fact-table shape with one column per property, run the array through json-flattener soproperties.planbecomes a top-levelproperties_plankey first. - Step 3Drop the file on the converter — Drag the
.jsonfile onto the dropzone..xlsx/.xls/.odswork too (first sheet → JSON array). A raw.csvexport needs csv-to-json first. - Step 4Name the fact table and pick a dialect — Set Table name (e.g.
fct_events) and choose PostgreSQL for a Postgres/Redshift/Snowflake stand-in, MySQL, or SQLite for a quick local file. Postgres double-quoting is the safest default for warehouse-style identifiers. - Step 5Exclude PII and high-cardinality fields — List fields to drop in Exclude columns, e.g.
ip, anonymous_id, user_agent. They're removed from the schema and every row — useful for keeping a shareable test seed free of identifiers. - Step 6Generate, review types, and load — Click Generate SQL. Review inferred types — a
revenuefield shows asDOUBLE PRECISION/REAL; tighten toNUMERICfor your real warehouse. Copy or Download SQL and run against your local warehouse or dev Postgres. Stats report event (row) count and statement count.
What this tool does (and doesn't) for analytics data
Grounded in the actual converter behaviour, not warehouse-specific features it doesn't have.
| Capability | Behaviour here | If you need more |
|---|---|---|
| Dialects | PostgreSQL, MySQL, SQLite only | No BigQuery/Redshift/Snowflake dialect; use Postgres output as the closest base |
Nested properties / event_params | Stored as one JSON-stringified TEXT column | json-flattener to get one column per property |
| Sparse fields across events | Union of all keys; missing → NULL | Built in — no action needed |
| Partitioning / clustering DDL | Not generated (columns + types only) | Add PARTITION BY/CLUSTER BY to the DDL manually |
| PII / high-cardinality fields | Removed via Exclude columns | List them comma-separated before generating |
| Upsert / MERGE | Only ON CONFLICT DO NOTHING / INSERT IGNORE (skip) | Hand-edit for MERGE/upsert |
Event-field type inference by dialect
Inferred per field from all non-null values across events. One off-type value loosens the column type.
| Event field example | PostgreSQL | MySQL | SQLite |
|---|---|---|---|
count (integers) | INTEGER | INTEGER | INTEGER |
revenue (decimals) | DOUBLE PRECISION | REAL | REAL |
is_new_user (bool) | BOOLEAN (TRUE/FALSE) | TINYINT(1) (1/0) | BOOLEAN (TRUE/FALSE) |
event / timestamp (string) | TEXT | TEXT | TEXT |
properties (object) | TEXT (stringified) | TEXT (stringified) | TEXT (stringified) |
Free vs Pro limits
JSON to SQL is a Pro tool; limits come from the json family in the tier table.
| Limit | Free | Pro |
|---|---|---|
| Max input file size | 2 MB | 100 MB |
| Files per run | 1 | 10 |
| Rows per INSERT batch | 100 (fixed) | 100 (fixed) |
Cookbook
Real event payloads in, warehouse-seed SQL out — copied from the converter so you can see how event fields, nested properties, and dialect differences resolve.
Flat event seed for a fact table
ExampleA page_view event array with flat fields. Counts infer as INTEGER, the boolean as BOOLEAN, strings as TEXT — a clean fact-table seed.
Input:
[
{"event": "page_view", "user_id": "u1", "is_new": true, "duration": 12},
{"event": "page_view", "user_id": "u2", "is_new": false, "duration": 47}
]
Output (PostgreSQL):
CREATE TABLE IF NOT EXISTS "fct_events" (
"event" TEXT,
"user_id" TEXT,
"is_new" BOOLEAN,
"duration" INTEGER
);
INSERT INTO "fct_events" ("event", "user_id", "is_new", "duration")
VALUES
('page_view', 'u1', TRUE, 12),
('page_view', 'u2', FALSE, 47);Nested properties become one TEXT column
ExampleA Segment-style event with a properties object. The object is NOT exploded into columns — it lands as a single stringified TEXT value.
Input:
[{"event": "order_completed", "properties": {"total": 89.9, "items": 3}}]
Output (PostgreSQL):
CREATE TABLE IF NOT EXISTS "fct_events" (
"event" TEXT,
"properties" TEXT
);
INSERT INTO "fct_events" ("event", "properties")
VALUES
('order_completed', '{"total":89.9,"items":3}');
-- For separate total/items columns, run json-flattener first.Sparse events: key union with NULLs
ExampleDifferent event types carry different fields. The column set is the union of all keys; events missing a field get NULL — so no field is silently dropped.
Input:
[
{"event": "signup", "plan": "pro"},
{"event": "page_view", "path": "/home"}
]
Output (PostgreSQL):
CREATE TABLE IF NOT EXISTS "fct_events" (
"event" TEXT,
"plan" TEXT,
"path" TEXT
);
INSERT INTO "fct_events" ("event", "plan", "path")
VALUES
('signup', 'pro', NULL),
('page_view', NULL, '/home');Excluding PII before seeding a shared warehouse
ExampleDrop identifiers and raw user-agent strings so a test seed is safe to commit or share.
Input:
[{"event": "click", "ip": "203.0.113.4", "anonymous_id": "a-99", "button": "buy"}]
Config: Exclude columns = ip, anonymous_id
Output (PostgreSQL):
CREATE TABLE IF NOT EXISTS "fct_events" (
"event" TEXT,
"button" TEXT
);
INSERT INTO "fct_events" ("event", "button")
VALUES
('click', 'buy');Redshift/Snowflake target via Postgres output
ExampleThere's no dedicated cloud-warehouse dialect. Postgres double-quoted output runs almost unchanged on Redshift and Snowflake; just plan the JSON-stringified columns as SUPER (Redshift) or VARIANT (Snowflake).
Config: dialect = PostgreSQL
Output:
CREATE TABLE IF NOT EXISTS "fct_events" (
"event" TEXT,
"revenue" DOUBLE PRECISION,
"properties" TEXT
);
INSERT INTO "fct_events" ("event", "revenue", "properties")
VALUES
('purchase', 19.99, '{"sku":"A1"}');
-- Redshift: change properties TEXT -> SUPER, DOUBLE PRECISION
-- is native. Snowflake: TEXT -> VARIANT, FLOAT for revenue.Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
No BigQuery / Redshift / Snowflake dialect
Not supportedOnly Postgres, MySQL, and SQLite are offered. There is no cloud-warehouse dialect, so you won't get STRUCT, SUPER, VARIANT, or PARTITION BY syntax. Use the Postgres output as a base and adapt the JSON-stringified columns to your warehouse's semi-structured type by hand.
Nested event properties not flattened
By designproperties, event_params, context, and similar nested blocks become a single JSON-stringified TEXT column, not separate columns. For a flat fact-table shape, run the events through json-flattener first so each nested key is promoted to a top-level column.
GA4 event_params array stored as a string
By designGA4's event_params is an array of key/value objects. As an array, it is JSON.stringify-ed into one TEXT column — you won't get a column per param. Flatten or pre-process the GA4 export into a flat shape before converting if you need queryable param columns.
Revenue/amount columns inferred as float
ExpectedDecimal revenue fields type as DOUBLE PRECISION (Postgres) or REAL (MySQL/SQLite) — binary floats that round. For accurate financial aggregation in a warehouse, change the generated type to NUMERIC/DECIMAL, or store amounts as integer cents in the source events.
High-cardinality timestamp strings as TEXT
ExpectedISO timestamps are strings, so they become TEXT. That's fine for a seed, but for a real fact table you'll want TIMESTAMP/TIMESTAMPTZ. Edit the column type in the generated DDL, or convert timestamps to epoch integers upstream.
Mixed numeric event field widens to float
ExpectedIf a field is 1 in one event and 1.5 in another, the column types as float, not INTEGER. With heterogeneous event data this happens often — review the inferred types before relying on them for aggregation.
Events wrapped under a key
Invalid inputThe top level must be the events array. A Segment/GA4 export shaped as {"events":[...]} won't convert directly — extract the inner array with json-path-extractor ($.events[*]) first.
Large event export over the size limit
Upgrade requiredFree tier caps input at 2 MB, Pro at 100 MB. Event exports are big — sample a representative subset for a seed, or split the array into files and load them in passes. A seed rarely needs the full event volume.
No deduplication of duplicate events
ExpectedEvery event object becomes a row; duplicate events (common in at-least-once delivery from Segment/Kafka) are all inserted. Dedupe upstream, or enable ON CONFLICT DO NOTHING with a unique constraint on a message-id column in your target table.
Frequently asked questions
Can this output BigQuery / Redshift / Snowflake SQL?
Not as a dedicated dialect — the only options are PostgreSQL, MySQL, and SQLite. The Postgres output is the closest fit: it runs nearly unchanged on Redshift and Snowflake. You'll typically swap the JSON-stringified TEXT columns to SUPER (Redshift) or VARIANT (Snowflake), and tighten floats to NUMERIC for revenue.
Why isn't my event's properties object split into columns?
Nested objects and arrays are JSON-stringified into a single TEXT column — the tool doesn't flatten them. For one column per property, run your event array through json-flattener first; it promotes properties.plan to a top-level properties_plan key, which then becomes its own column here.
How does it handle events with different fields?
It takes the union of all keys across every event, so a field present in only some events still becomes a column — events lacking it get NULL. Nothing is silently dropped, which is important when event types in one stream carry different property sets.
Will it dedupe duplicate events?
No. Every object in the array becomes a row, including duplicates from at-least-once delivery. Dedupe upstream, or add a unique constraint on a message-id column in your target table and enable ON CONFLICT DO NOTHING (Postgres/SQLite) / INSERT IGNORE (MySQL) so re-inserts are skipped.
Can I strip PII before generating the seed?
Yes. List fields like ip, anonymous_id, email, or user_agent in Exclude columns and they're removed from both the schema and every row — handy for producing a test seed that's safe to commit or share. For deeper anonymisation of values you keep, see json-anonymizer.
Why is my revenue column a float?
Decimal values infer as DOUBLE PRECISION (Postgres) or REAL (MySQL/SQLite). Floats round, which is bad for financial aggregation. Change the type to NUMERIC/DECIMAL in the generated DDL, or store amounts as integer cents in your events so the column infers as INTEGER.
Does it generate partitioning or clustering DDL?
No. The CREATE TABLE has columns and inferred types only — no PARTITION BY, CLUSTER BY, DISTKEY, or SORTKEY. Add those for your warehouse manually, or use the converter for INSERT-only output (turn off Include CREATE TABLE) against a table you defined with the right physical layout.
How are timestamps handled?
ISO timestamp strings become TEXT columns, single-quoted. For a real fact table you'll usually want TIMESTAMP/TIMESTAMPTZ — edit the generated column type, or convert timestamps to epoch integers in the source events so they infer as INTEGER.
How many events go into each INSERT?
100 rows per multi-row INSERT ... VALUES, fixed. A 1,000-event array becomes ten INSERT statements plus the CREATE TABLE. The stats line shows the exact statement count after generating.
Is my event data uploaded?
No. Conversion runs entirely in your browser, so event exports — including user identifiers — never reach a server. Only an anonymous run counter (no content) is logged for dashboard stats. This makes it safe to test with real exports.
How big an event export can I convert?
Free tier caps input at 2 MB, Pro at 100 MB. Full event exports often exceed that, but a seed rarely needs the whole volume — sample a representative subset, or split the array and load it in passes. The downstream warehouse, not this tool, is the real ceiling for production loads.
Can I seed multiple fact/dimension tables at once?
No — one run, one table. Split your JSON by entity (events vs users vs sessions), run the converter once per table changing the Table name each time, and concatenate the resulting .sql files in dependency order before loading.
Privacy first
Conversion runs locally in your browser. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.