How to anonymize json to create gdpr-compliant test data
- Step 1Export a small production sample — Export 100-500 records as JSON. A small representative sample is enough to seed test data; you do not need the full production set, and free-tier processing is capped at a 2 MB file.
- Step 2Drop the JSON in — Drag the file onto the dropzone (one file per run — there is no multi-file batch). The tool reads the file text in the browser.
- Step 3Set the PII key names — Edit the comma-separated PII key list. Defaults are
email, name, phone, address, ssn, dob, birthdate, ip, password. Add model-specific keys likenationalId, taxNumber, passportNumber. Matching is substring-based and case-insensitive. - Step 4Choose a strategy — Pick mask, hash, fake, or remove. For reversible-proof but join-preserving test data, choose hash. For data that should look like names/emails, choose fake. To eliminate a field entirely, choose remove.
- Step 5Keep Deep on for nested data — Leave the Deep checkbox enabled so matched keys are anonymized inside nested objects and arrays. Turn it off only if you want top-level keys treated and nested objects left verbatim.
- Step 6Run, verify, and download — Click Anonymize. The result panel shows the field count and the JSON; Copy or Download writes a
.anon.jsonfile. Scan the output for any key your term list missed before distributing.
Strategy behavior (what each does to a matched value)
Applied to every key whose lowercased name contains a PII term. Values are coerced to strings before masking/hashing.
| Strategy | Example input | Example output | Deterministic? | Reversible? |
|---|---|---|---|---|
| mask | "john.doe@acme.com" (email key) | "jo***@acme.com" | Yes (value-based) | No, but leaks length/format |
| mask | "+1 415 555 0199" (phone key) | all but last 4 digits → "*" | Yes (value-based) | No |
| mask | "Jonathan" (name key) | "Jo****an" (first 2 + last 2 kept) | Yes (value-based) | No |
| hash | "john.doe@acme.com" | 8-char hex e.g. "1a2b3c4d" | Yes — same input, same token | No |
| fake | any email key | "user1@example.com", "user2@example.com" … | No — global counter increments | No |
| remove | any matched key | key deleted from object | n/a | n/a |
Default PII key terms and what they catch
Substring match on lowercased key names. One term catches every key that contains it.
| Term | Catches keys like | Mask shape | Fake shape |
|---|---|---|---|
| email, userEmail, billing_email, emailAddress | first 2 + *** before @ | userN@example.com | |
| name | name, fullName, lastName, username, filename | first 2 + last 2 kept | User N |
| phone | phone, mobilePhone, phoneNumber | all but last 4 digits masked | +1-555-NNNN |
| ip | ip, ipAddress, clientIp (also: recipient, zip) | first 2 + last 2 kept | 192.168.x.y |
| ssn / dob / address / password | ssn, dob, homeAddress, password | first 2 + last 2 kept | [REDACTED-N] |
Tier and size limits
Free tier is gated on file size only — there is no per-row gate for this tool.
| Tier | Max file size | Batch files | Row gate |
|---|---|---|---|
| Free | 2 MB | 1 (single file) | none |
| Pro | 100 MB | 10 | none |
| Developer | 5 GB | unlimited | none |
Cookbook
Real export shapes turned into safe test data. Each shows the input, the strategy chosen, and the exact output the tool produces.
Mask a user record for a UI test fixture
ExampleMask keeps shape and format so the UI still renders a plausible row. Note email and name use special mask rules; everything else uses first-2 + last-2.
Input:
{ "id": 91, "name": "Jonathan", "email": "jonathan@acme.com", "city": "Berlin" }
Strategy: mask · Deep: on
Output:
{ "id": 91, "name": "Jo****an", "email": "jo***@acme.com", "city": "Berlin" }
("city" untouched — "city" is not in the default PII terms.)Hash to keep cross-record joins working
ExampleTwo records reference the same email. With hash, both map to the same token — so a test that joins users to orders by email still joins.
Input:
[ { "role": "user", "email": "sam@acme.com" },
{ "role": "order", "email": "sam@acme.com" } ]
Strategy: hash
Output:
[ { "role": "user", "email": "b3f1c9aa" },
{ "role": "order", "email": "b3f1c9aa" } ]Why fake does NOT preserve joins
Examplefake increments a global counter, so the same source value gets different placeholders. Use hash, not fake, when referential integrity matters.
Input:
[ { "email": "sam@acme.com" },
{ "email": "sam@acme.com" } ]
Strategy: fake
Output:
[ { "email": "user1@example.com" },
{ "email": "user2@example.com" } ] ← different tokensRemove a national ID entirely
Exampleremove drops the key. Add custom terms for fields the defaults miss. Removed keys are deleted, not blanked.
PII terms: email, name, nationalId
Strategy: remove
Input:
{ "name": "Lena", "nationalId": "DE-99887766", "plan": "pro" }
Output:
{ "plan": "pro" }Deep mode reaches nested PII
ExampleWith Deep on, matched keys are anonymized at any depth, including inside arrays of objects.
Strategy: hash · Deep: on
Input:
{ "order": { "customer": { "email": "a@x.com" }, "items": [ { "sku": "X1" } ] } }
Output:
{ "order": { "customer": { "email": "7c2e10b4" }, "items": [ { "sku": "X1" } ] } }Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Free-tier file over 2 MB
BlockedFree tier allows files up to 2 MB. A larger export is blocked with an upgrade prompt. For GDPR test data you rarely need the whole set — export a 100-500 record sample under 2 MB, or upgrade to Pro (100 MB) for full datasets.
Invalid JSON pasted or dropped
Parse errorThe input is parsed with JSON.parse after trimming. Trailing commas, single quotes, or a JS object literal throw a parse error and nothing is anonymized. Fix the syntax with json-format-fixer or json-prettifier first.
PII lives in a free-text value, not a key
Not detectedDetection is by KEY NAME only. An email buried in a notes or comment value is never matched, because notes is not a PII term and the tool never scans values. Replace or blank such fields manually, or strip them with json-key-filter.
Substring match catches an unintended key
By designMatching uses includes(): the term name also matches username, filename, and displayName; ip matches recipient and zip. This over-matching is intentional breadth but can mangle non-PII keys — use the narrowest terms that cover your real fields.
remove drops keys and they are not counted
ExpectedWith strategy remove, the matched key is deleted and is NOT included in the 'fields anonymized' count (only retained, non-undefined results increment it). A low count after a remove run does not mean PII was missed — the keys are simply gone.
Numbers and booleans in a PII key
StringifiedValues are coerced with String() before masking/hashing. A numeric phone: 4155550199 is masked as a string, and a boolean would be hashed as "true"/"false". The output value type changes to string for mask/hash/fake.
Deep turned off
Top-level onlyWith Deep off, only keys on the top-level object (and array elements at the top) are evaluated; nested objects are copied verbatim, so PII inside them survives. Keep Deep on for typical GDPR exports.
Empty PII key list
DisabledIf the comma-separated list is empty, the Anonymize button is disabled — there is nothing to match. Add at least one term.
mask reveals length and format
Cautionmask preserves string length and the position of the @ in emails and the last 4 phone digits. For data shared externally, prefer hash or remove so attackers cannot infer the original from the shape.
Frequently asked questions
Does this make my data truly GDPR-anonymous?
It depends on the strategy and your dataset. remove and fake discard the original value; hash is one-way but the same input always yields the same token, so a known plaintext can be confirmed by hashing it (a linkage risk). True anonymisation requires that re-identification is impossible even by combining quasi-identifiers (rare condition + city, etc.). Review the whole record, not just the obvious PII keys.
How does the tool decide which fields are PII?
By key name. For each key it lowercases the name and checks whether it contains any of your PII terms. The defaults are email, name, phone, address, ssn, dob, birthdate, ip, password. It never inspects the value to guess PII.
Is the same email always replaced with the same value?
Only with the hash strategy. hash is deterministic, so identical inputs produce identical 8-character tokens across the whole file, preserving joins. fake uses a global counter and gives different placeholders to identical inputs. mask depends only on the value, so identical inputs mask identically.
What does the fake strategy generate?
Sequential placeholders, not realistic names: email keys become userN@example.com, name keys become 'User N', phone keys become +1-555-NNNN, ip keys become 192.168.x.y, and anything else becomes [REDACTED-N], where N increments per replaced field.
Is the hash SHA-256?
No. Despite the UI hint, the hash is a fast non-cryptographic 32-bit function rendered as an 8-character hex token. It is fine for de-identifying test data and preserving joins, but do not rely on it as a cryptographic, collision-resistant digest.
Can I anonymize multiple files at once?
No. This tool processes one JSON file per run via the dropzone; there is no multi-file batch UI. Run files individually.
Does anonymization happen on your servers?
No. Parsing and anonymization run entirely in your browser using the file's text. Production data is never transmitted to JAD Apps.
What is the file size limit?
Free tier allows up to 2 MB per file. Pro raises this to 100 MB and Developer to 5 GB. There is no separate row-count gate for this tool.
Can I control the output indentation?
Output is pretty-printed with 2-space indentation. There is no indent control in this tool's UI. To compact or reformat afterwards, use json-minifier or json-prettifier.
How do I scrub PII from free-text fields?
This tool can't — it matches keys, not values. Either add the whole free-text key (e.g. notes) to the term list and use remove/fake to discard it, or drop it with json-key-filter before sharing.
Will it break my schema?
mask, hash, and fake keep all keys and overall shape (values become strings). remove deletes matched keys, which can break consumers that require them. If you need a schema-true sample, prefer hash and keep Deep on.
I need fully synthetic records, not scrubbed real ones — what should I use?
Use json-mock-generator, which emits fresh seeded records with a fixed shape (id, name, email, phone, etc.). This anonymizer transforms YOUR data; the mock generator invents new data.
Privacy first
Conversion runs locally in your browser. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.