How to anonymize json records for staging environment data
- Step 1Export a representative production set — Pull a representative slice as a single JSON file. For meaningful staging coverage you want volume, so plan for Pro (100 MB) or Developer (5 GB) — free tier is limited to 2 MB per file.
- Step 2Load the export — Drop the JSON file onto the dropzone. The tool processes one file per run; combine tables into a single JSON file if you want them anonymized together.
- Step 3Define your PII terms — Keep the defaults and add staging-specific keys. Be specific — substring matching means
namealso catchesusername,filename, anddisplayName. - Step 4Choose hash for join-heavy staging — Pick hash so identical source values map to identical tokens across tables, preserving foreign-key joins. Use remove for fields staging should never carry.
- Step 5Keep Deep enabled — Leave Deep on so denormalized, nested records are fully scrubbed.
- Step 6Download and import to staging — Download the
.anon.jsonand load it with your usual importer (psql COPY, mongoimport, ORM bulk insert). Re-run on a cadence to keep staging fresh.
Strategy choice for staging
Optimize for realistic, join-safe data.
| Staging need | Strategy | Join-safe? | Notes |
|---|---|---|---|
| Cross-table joins in E2E tests | hash | Yes | Deterministic 8-char token |
| Readable rows in admin UIs | fake | No | Sequential placeholders |
| Keep format/length for parsers | mask | Value-stable | Reveals shape; not for external sharing |
| Drop sensitive columns | remove | n/a | May break NOT NULL constraints |
Planning staging volume against tier limits
One file per run on every tier.
| Tier | Max file | Batch | Realistic for |
|---|---|---|---|
| Free | 2 MB | 1 | smoke samples only |
| Pro | 100 MB | 10 | mid-size staging loads |
| Developer | 5 GB | unlimited | full production-scale exports |
Deep behavior on denormalized records
Strategy: hash. Matched keys only.
| Structure | Deep on | Deep off |
|---|---|---|
| top-level email | anonymized | anonymized |
| orders[].customer.email | anonymized | untouched |
| non-PII (sku, amount) | unchanged | unchanged |
Cookbook
Loading production-shaped data into staging without the personal data.
Hash across tables to keep joins
ExampleCombine users and orders into one JSON file; hash makes the same email resolve everywhere.
Strategy: hash · Deep: on
Input:
{ "users": [ { "email": "mae@x.com" } ],
"orders": [ { "customerEmail": "mae@x.com" } ] }
Output:
{ "users": [ { "email": "a17f0c92" } ],
"orders": [ { "customerEmail": "a17f0c92" } ] }Remove direct identifiers staging shouldn't hold
ExampleAdd ssn and password to the terms; remove deletes them before they reach the staging DB.
PII terms: email, ssn, password
Strategy: remove
Input:
{ "email": "x@y.com", "ssn": "123-45-6789", "plan": "team" }
Output:
{ "plan": "team" }Deep reaches nested customer in orders
ExampleDenormalized order rows embed the customer object; Deep scrubs it.
Strategy: hash · Deep: on
Input:
{ "orders": [ { "id": 5, "customer": { "email": "q@x.com" } } ] }
Output:
{ "orders": [ { "id": 5, "customer": { "email": "5b8e21fa" } } ] }Mask keeps phone/email shape for parser tests
ExampleWhen staging code validates formats, mask preserves them.
Strategy: mask
Input:
{ "email": "longname@corp.com", "phone": "+44 20 7946 0958" }
Output:
{ "email": "lo***@corp.com", "phone": "+44 20 7946 *958" }Compact for a large staging fixture
ExampleAnonymizer emits 2-space JSON; minify before storing a big staging asset.
Pretty output → run json-minifier →
{"users":[{"email":"a17f0c92"}],"orders":[…]}Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Production-scale export over 2 MB on free tier
BlockedFree tier caps files at 2 MB — too small for real staging volume. Upgrade to Pro (100 MB) or Developer (5 GB), or anonymize in smaller slices.
Multiple table dumps need anonymizing together
Single file onlyThere is no batch mode. To keep cross-table hashes consistent in one pass, combine the tables into a single JSON file before anonymizing.
Export contains trailing commas / comments
Parse errorJSON.parse rejects non-standard JSON. Repair it with json-format-fixer before anonymizing.
remove violates a NOT NULL constraint
Import failsRemoving a column the staging schema requires causes the import to fail. Use hash or fake to keep the column present and populated.
Deep off on denormalized data
SurvivesWith Deep off, nested customer/contact objects keep their PII. Keep Deep on for denormalized staging exports.
fake breaks joins at staging scale
Cautionfake increments a counter, so identical source values diverge — joins fail across tables. Use hash for staging where E2E tests join records.
Free-text PII columns
Not detectedDetection is key-name only. A comments column with embedded emails is not scanned. Add the column and use remove/fake, or filter with json-key-filter.
Numeric keys stringified
Type changeHashed/masked values become strings. If a staging column is typed integer, exclude it from the PII terms or cast on import.
Refresh job re-runs produce new fake values
ExpectedEach anonymization run restarts the fake counter, so a weekly refresh yields different fake values than last week. hash is stable across runs for the same input.
Frequently asked questions
Do staging environments need the same protection as production?
Yes — personal data carries the same obligations wherever it is stored. Loading anonymized data into staging removes that obligation from the staging environment, simplifying your data protection posture.
How do I keep joins working across staging tables?
Combine the tables into one JSON file and use the hash strategy. Identical source values become identical tokens everywhere, so foreign-key joins resolve.
Can I process my whole staging dataset at once?
Only as a single JSON file within your tier's size limit (2 MB Free, 100 MB Pro, 5 GB Developer). There is no multi-file batch.
Is the export uploaded anywhere?
No. Anonymization runs in your browser; the production export is never sent to JAD Apps.
How do I verify no PII remains before importing?
Search the output for original values you know (domains, area codes, names), and manually check free-text fields, since they are never scanned. A tree view via json-tree-viewer helps on large files.
Why did fake produce different values on the staging refresh?
fake uses a counter that resets each run, so values differ between runs. Use hash if you need stable values across refreshes.
What if removing a field breaks staging imports?
remove deletes the key. If a column is required, switch to hash or fake so the field stays present with a scrubbed value.
Why was a non-PII key changed?
Substring matching: name matches username/filename, ip matches recipient/zip. Narrow your terms to your real PII keys.
Can I set indentation for the staging fixture?
No indent control exists; output is 2-space JSON. Use json-minifier to compact a large fixture.
Is the hash secure enough to publish?
It is a fast 8-character hex token, not a cryptographic digest. It de-identifies for staging use; do not treat it as collision-resistant security.
Should I anonymize production data or generate synthetic staging data?
Anonymize real data when production-scale realism matters. Use json-mock-generator for fully synthetic, fixed-shape records when you don't need real structure.
Does removing keys lower the 'fields anonymized' count?
Yes. Removed keys are deleted and not counted; the stat reflects retained anonymized values only. A low count after a remove run is normal.
Privacy first
Conversion runs locally in your browser. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.