How to compare a/b test json data across experiment variants
- Step 1Export the control variant payload — Capture the JSON the control bucket produces: the feature-flag state object, the rendered page-config, or the API response served with the experiment flag off (or with the control header). Copy it.
- Step 2Export the treatment variant payload the same way — Capture the equivalent JSON for the treatment bucket — same endpoint, same export path, only the variant assignment different. Consistency between the two captures is what makes the diff meaningful.
- Step 3Paste control on the left — Paste the control (variant A) payload into
JSON A (base). It must be a single valid JSON value. - Step 4Paste treatment on the right — Paste the treatment (variant B) payload into
JSON B (modified). Both panels are required before the tool will run. - Step 5Click Compare and verify only the intended delta exists — Read the diff. The entries should match exactly the field(s) your experiment is testing. Anything else — an extra
changed, anadded, aremoved— is a confound to investigate before launch. - Step 6Record the delta in the experiment brief — Use
Copy JSONto capture the diff and paste it into the experiment doc as the 'configuration change' section, so reviewers can confirm the setup varies only what was intended and approve the launch.
Reading the variant diff for confounds
How each diff entry maps to experiment validity. The verdict assumes you know which single field the experiment is meant to vary.
| Diff entry | Means | If it's the tested field | If it's anything else |
|---|---|---|---|
changed | Same key, different value between variants | Good — this is the treatment | Confound — an unintended difference that can bias results |
added (in B / treatment only) | Treatment has a key control lacks | OK if the experiment adds a module/field | Confound — extra surface only treatment users see |
removed (in A / control only) | Control has a key treatment lacks | OK if the experiment hides something | Confound — control users get something treatment doesn't |
changed (type flip) | "true" vs true on a flag | Almost never intended | Bug — mis-typed flag can mis-bucket or break the variant |
Common variant comparisons and what the diff shows
Verified against the diff engine. Paths use dot notation for keys, bracket notation for ordered lists.
| Comparison | Control → Treatment | Diff output |
|---|---|---|
| Single flag flipped (clean) | {"newCheckout":false} → {"newCheckout":true} | changed newCheckout - false + true — only the tested flag |
| Confound: extra flag rode along | {"f1":false} → {"f1":true,"f2":true} | changed f1 + added f2 — f2 is the confound |
| Price test with leaked region drift | {"price":10,"region":"eu"} → {"price":12,"region":"us"} | changed price + changed region — region is the confound |
| Reordered UI module list | ["hero","grid"] → ["grid","hero"] | changed [0] + changed [1] — array compared by index |
| Type-mistyped flag | {"on":true} → {"on":"true"} | changed on - true + "true" — a bug, not a treatment |
Cookbook
Anonymised control-vs-treatment payloads. Left is control (variant A), right is treatment (variant B). The diff is what you'd attach to the experiment brief.
A clean single-flag experiment
ExampleThe experiment tests one flag. The diff shows exactly one changed entry — proof the variants differ only in the tested dimension.
JSON A (control): JSON B (treatment):
{ {
"flags": { "flags": {
"newCheckout": false, "newCheckout": true,
"darkMode": false "darkMode": false
} }
} }
Diff:
~1 changed
changed flags.newCheckout
- false
+ true
→ Only the tested flag differs. Clean experiment.Confound caught: a second flag leaked into treatment
ExampleTreatment accidentally enabled an unrelated flag. The added entry is the confound that would have muddied the result.
JSON A (control): JSON B (treatment):
{ {
"flags": { "flags": {
"newCheckout": false "newCheckout": true,
} "betaSearch": true
} }
}
Diff:
~1 changed +1 added
changed flags.newCheckout - false + true
added flags.betaSearch + true
→ betaSearch is a confound — remove it from treatment.Price test with a leaked region difference
ExampleThe experiment tests price, but treatment also drifted region — which changes tax and shipping and would confound the conversion result.
JSON A (control): JSON B (treatment):
{ {
"price": 10, "price": 12,
"region": "eu-west-1" "region": "us-east-1"
} }
Diff:
~2 changed
changed price - 10 + 12
changed region - "eu-west-1" + "us-east-1"
→ region is a confound; align both variants to one region.A mistyped flag that would mis-bucket users
ExampleTreatment stores the flag as the string "true" instead of boolean true. Many SDKs treat the string as truthy inconsistently. The diff exposes the type flip.
JSON A (control): JSON B (treatment):
{ "experimentOn": true } { "experimentOn": "true" }
Diff:
~1 changed
changed experimentOn
- true
+ "true"
→ Type flipped to string — fix before launch.Confirming variants are otherwise identical
ExampleAfter fixing a confound, re-diff to confirm the only remaining difference is the intended one — or, for two payloads that should match, that they're identical.
JSON A (control): JSON B (treatment):
{ "layout": "grid", { "layout": "grid",
"items": 12 } "items": 12 }
Diff:
No differences
→ The two JSON values are identical.
→ No unintended delta between these two payloads.Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
A mistyped flag is a `changed` entry, not a type error
By designtrue (boolean) vs "true" (string) on a flag is caught — as a changed entry whose from/to reveal the type flip. There's no special type label; the value pair is the evidence. For experiments this matters because some flag SDKs coerce the string inconsistently, so a mistyped flag can silently mis-bucket users.
Reordered module/rule list shows index-wise changes
Positional by designAn ordered list of UI modules or targeting rules is compared by index, so the same items in a different order produce changed [0], changed [1], etc. If order is part of the variant (it often is for UI), that's correct; if order is incidental, sort both lists the same way before pasting so only real differences surface.
An added/removed key is a confound, not just noise
InvestigateAny added key (treatment-only) or removed key (control-only) beyond the field under test means one bucket has surface the other doesn't — a classic confound. Read every added/removed entry as a question: 'is this part of the experiment, or did it leak in?' Don't launch until each is explained.
Variant payload wrapped in an envelope won't parse
Parse errorBoth inputs go through strict JSON.parse. A capture that includes an SDK envelope, log prefix, or trailing commas throws a parser error. Paste only the JSON payload; repair near-JSON first with /tool/json-format-fixer.
Only one variant pasted
InvalidBoth JSON A and JSON B are required; an empty or whitespace-only side returns Please provide both JSON A and JSON B. Capture both the control and treatment payloads before comparing.
Payload over 2 MB per side on the free plan
Upgrade requiredEach pasted payload is capped at 2 MB on free; over that returns Free plan supports JSON inputs up to 2 MB. Upgrade to Pro for unlimited input size. Extract just the experiment-relevant sub-tree with /tool/json-path-extractor and diff that, or upgrade to Pro.
Key order differs between variant generators
SupportedObjects are compared by key, not position, so if control and treatment are emitted by different code paths that order keys differently, the diff stays clean. Only genuine field differences appear — no phantom confounds from serialization order.
`null` flag value vs absent flag
Removed / addedA flag set to null in one variant and absent in the other reports as removed/added of a null value, because null is a present value. If your flag system treats null as 'unset / default', read these entries as semantically equal rather than as a confound.
Frequently asked questions
What is a confound in an A/B test configuration?
A confound is any unintended difference between variants that could influence behaviour independently of the treatment. If treatment differs from control in both button colour and, say, response region, you can't attribute a conversion lift to colour alone. This tool lists every difference between the two payloads, so confounds show up as extra added/removed/changed entries beyond the field you meant to test — before the experiment runs.
How do I compare the API responses that each variant produces?
Make the call once with the experiment flag in the control state and once in the treatment state (or with the relevant variant header), copy each response body, and paste control on the left and treatment on the right. The diff shows which data fields differ between variants, confirming the data-layer change is limited to what the experiment intends.
Why does a reordered list of modules show as changes?
Ordered arrays are compared by index, so the same modules in a different order produce changed [0], changed [1], etc. If the order is itself part of the variant (a reordered layout test), that's the signal you want. If order is incidental, sort both arrays the same way before pasting so only true differences appear.
How do I catch a flag that's set to the wrong type?
The diff fires on serialized inequality, so a flag stored as "true" (string) in one variant and true (boolean) in the other shows as a changed entry exposing both values. This matters because some flag SDKs coerce the string inconsistently, which can mis-bucket users or break the variant in subtle ways.
Does object key order between variants cause false confounds?
No. Objects are compared by key, not position, so two payloads emitted by different code paths that order keys differently still diff clean. You won't see a phantom confound just because the serializer differs between buckets.
Can I diff more than two variants at once?
Not in one pass — the tool compares exactly two payloads. For a multi-arm experiment, diff each treatment against the control separately (B vs A, C vs A, D vs A). That isolates each arm's delta and keeps the confound check clean per arm.
How do I document the variant delta for reviewers?
Run the diff once the variants are final, hit Copy JSON, and paste the entry list into the experiment brief's 'configuration change' section. Reviewers can then confirm the delta is exactly the intended field(s) and nothing else before approving the launch.
Can I upload the variant config files?
No — the tool is paste-only, with a control panel and a treatment panel. Copy each payload and paste it. There's no file picker, and since everything runs locally, your experiment configs and targeting rules never leave the browser.
What size payload can I compare for free?
Up to 2 MB per side on the free plan. Full rendered page-configs can exceed that, so extract just the experiment-relevant sub-tree with /tool/json-path-extractor and diff that. Pro removes the cap entirely.
Should an added field in treatment always block the experiment?
Only if it's not part of the experiment. If the treatment is meant to add a module or field, the added entry is expected. If you didn't intend it, it's a confound — treatment users get surface control users don't, which can bias the metric. Explain every added/removed entry before launch.
Is the experiment configuration data transmitted to JAD Apps?
No. Both payloads are parsed and diffed in your browser. A/B test configs, feature-flag states, and targeting rules are never sent to a server. Clicking Compare triggers no network request — verifiable in your DevTools network tab.
How do I make order-insensitive arrays diff cleanly between variants?
The diff tool doesn't sort for you. Pre-process each variant so set-like arrays are in the same stable order — flatten and sort with /tool/json-flattener, or reshape with /tool/json-transposer into a keyed structure — then diff. Once both sides share an order, index-based comparison won't flag a pure reorder as a difference.
Privacy first
Conversion runs locally in your browser. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.