Anonymize JSON for GDPR-Compliant Test Data

How to anonymize json to create gdpr-compliant test data

Step 1
Export a small production sample — Export 100-500 records as JSON. A small representative sample is enough to seed test data; you do not need the full production set, and free-tier processing is capped at a 2 MB file.
Step 2
Drop the JSON in — Drag the file onto the dropzone (one file per run — there is no multi-file batch). The tool reads the file text in the browser.
Step 3
Set the PII key names — Edit the comma-separated PII key list. Defaults are email, name, phone, address, ssn, dob, birthdate, ip, password. Add model-specific keys like nationalId, taxNumber, passportNumber. Matching is substring-based and case-insensitive.
Step 4
Choose a strategy — Pick mask, hash, fake, or remove. For reversible-proof but join-preserving test data, choose hash. For data that should look like names/emails, choose fake. To eliminate a field entirely, choose remove.
Step 5
Keep Deep on for nested data — Leave the Deep checkbox enabled so matched keys are anonymized inside nested objects and arrays. Turn it off only if you want top-level keys treated and nested objects left verbatim.
Step 6
Run, verify, and download — Click Anonymize. The result panel shows the field count and the JSON; Copy or Download writes a .anon.json file. Scan the output for any key your term list missed before distributing.

Strategy behavior (what each does to a matched value)

Applied to every key whose lowercased name contains a PII term. Values are coerced to strings before masking/hashing.

Strategy	Example input	Example output	Deterministic?	Reversible?
mask	"john.doe@acme.com" (email key)	"jo***@acme.com"	Yes (value-based)	No, but leaks length/format
mask	"+1 415 555 0199" (phone key)	all but last 4 digits → "*"	Yes (value-based)	No
mask	"Jonathan" (name key)	"Jo****an" (first 2 + last 2 kept)	Yes (value-based)	No
hash	"john.doe@acme.com"	8-char hex e.g. "1a2b3c4d"	Yes — same input, same token	No
fake	any email key	"user1@example.com", "user2@example.com" …	No — global counter increments	No
remove	any matched key	key deleted from object	n/a	n/a

Default PII key terms and what they catch

Substring match on lowercased key names. One term catches every key that contains it.

Term	Catches keys like	Mask shape	Fake shape
email	email, userEmail, billing_email, emailAddress	first 2 + *** before @	userN@example.com
name	name, fullName, lastName, username, filename	first 2 + last 2 kept	User N
phone	phone, mobilePhone, phoneNumber	all but last 4 digits masked	+1-555-NNNN
ip	ip, ipAddress, clientIp (also: recipient, zip)	first 2 + last 2 kept	192.168.x.y
ssn / dob / address / password	ssn, dob, homeAddress, password	first 2 + last 2 kept	[REDACTED-N]

Tier and size limits

Free tier is gated on file size only — there is no per-row gate for this tool.

Tier	Max file size	Batch files	Row gate
Free	2 MB	1 (single file)	none
Pro	100 MB	10	none
Developer	5 GB	unlimited	none

Cookbook

Real export shapes turned into safe test data. Each shows the input, the strategy chosen, and the exact output the tool produces.

Mask a user record for a UI test fixture

Example

Mask keeps shape and format so the UI still renders a plausible row. Note email and name use special mask rules; everything else uses first-2 + last-2.

Input:
{ "id": 91, "name": "Jonathan", "email": "jonathan@acme.com", "city": "Berlin" }

Strategy: mask · Deep: on

Output:
{ "id": 91, "name": "Jo****an", "email": "jo***@acme.com", "city": "Berlin" }

("city" untouched — "city" is not in the default PII terms.)

Hash to keep cross-record joins working

Example

Two records reference the same email. With hash, both map to the same token — so a test that joins users to orders by email still joins.

Input:
[ { "role": "user",  "email": "sam@acme.com" },
  { "role": "order", "email": "sam@acme.com" } ]

Strategy: hash

Output:
[ { "role": "user",  "email": "b3f1c9aa" },
  { "role": "order", "email": "b3f1c9aa" } ]

Why fake does NOT preserve joins

Example

fake increments a global counter, so the same source value gets different placeholders. Use hash, not fake, when referential integrity matters.

Input:
[ { "email": "sam@acme.com" },
  { "email": "sam@acme.com" } ]

Strategy: fake

Output:
[ { "email": "user1@example.com" },
  { "email": "user2@example.com" } ]   ← different tokens

Remove a national ID entirely

Example

remove drops the key. Add custom terms for fields the defaults miss. Removed keys are deleted, not blanked.

PII terms: email, name, nationalId
Strategy: remove

Input:
{ "name": "Lena", "nationalId": "DE-99887766", "plan": "pro" }

Output:
{ "plan": "pro" }

Deep mode reaches nested PII

Example

With Deep on, matched keys are anonymized at any depth, including inside arrays of objects.

Strategy: hash · Deep: on

Input:
{ "order": { "customer": { "email": "a@x.com" }, "items": [ { "sku": "X1" } ] } }

Output:
{ "order": { "customer": { "email": "7c2e10b4" }, "items": [ { "sku": "X1" } ] } }

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Free-tier file over 2 MB

Blocked

Free tier allows files up to 2 MB. A larger export is blocked with an upgrade prompt. For GDPR test data you rarely need the whole set — export a 100-500 record sample under 2 MB, or upgrade to Pro (100 MB) for full datasets.

Invalid JSON pasted or dropped

Parse error

The input is parsed with JSON.parse after trimming. Trailing commas, single quotes, or a JS object literal throw a parse error and nothing is anonymized. Fix the syntax with json-format-fixer or json-prettifier first.

PII lives in a free-text value, not a key

Not detected

Detection is by KEY NAME only. An email buried in a notes or comment value is never matched, because notes is not a PII term and the tool never scans values. Replace or blank such fields manually, or strip them with json-key-filter.

Substring match catches an unintended key

By design

Matching uses includes(): the term name also matches username, filename, and displayName; ip matches recipient and zip. This over-matching is intentional breadth but can mangle non-PII keys — use the narrowest terms that cover your real fields.

remove drops keys and they are not counted

Expected

With strategy remove, the matched key is deleted and is NOT included in the 'fields anonymized' count (only retained, non-undefined results increment it). A low count after a remove run does not mean PII was missed — the keys are simply gone.

Numbers and booleans in a PII key

Stringified

Values are coerced with String() before masking/hashing. A numeric phone: 4155550199 is masked as a string, and a boolean would be hashed as "true"/"false". The output value type changes to string for mask/hash/fake.

Deep turned off

Top-level only

With Deep off, only keys on the top-level object (and array elements at the top) are evaluated; nested objects are copied verbatim, so PII inside them survives. Keep Deep on for typical GDPR exports.

Empty PII key list

Disabled

If the comma-separated list is empty, the Anonymize button is disabled — there is nothing to match. Add at least one term.

mask reveals length and format

Caution

mask preserves string length and the position of the @ in emails and the last 4 phone digits. For data shared externally, prefer hash or remove so attackers cannot infer the original from the shape.

Frequently asked questions

Does this make my data truly GDPR-anonymous?

It depends on the strategy and your dataset. remove and fake discard the original value; hash is one-way but the same input always yields the same token, so a known plaintext can be confirmed by hashing it (a linkage risk). True anonymisation requires that re-identification is impossible even by combining quasi-identifiers (rare condition + city, etc.). Review the whole record, not just the obvious PII keys.

How does the tool decide which fields are PII?

By key name. For each key it lowercases the name and checks whether it contains any of your PII terms. The defaults are email, name, phone, address, ssn, dob, birthdate, ip, password. It never inspects the value to guess PII.

Is the same email always replaced with the same value?

Only with the hash strategy. hash is deterministic, so identical inputs produce identical 8-character tokens across the whole file, preserving joins. fake uses a global counter and gives different placeholders to identical inputs. mask depends only on the value, so identical inputs mask identically.

What does the fake strategy generate?

Sequential placeholders, not realistic names: email keys become userN@example.com, name keys become 'User N', phone keys become +1-555-NNNN, ip keys become 192.168.x.y, and anything else becomes [REDACTED-N], where N increments per replaced field.

Is the hash SHA-256?

No. Despite the UI hint, the hash is a fast non-cryptographic 32-bit function rendered as an 8-character hex token. It is fine for de-identifying test data and preserving joins, but do not rely on it as a cryptographic, collision-resistant digest.

Can I anonymize multiple files at once?

No. This tool processes one JSON file per run via the dropzone; there is no multi-file batch UI. Run files individually.

Does anonymization happen on your servers?

No. Parsing and anonymization run entirely in your browser using the file's text. Production data is never transmitted to JAD Apps.

What is the file size limit?

Free tier allows up to 2 MB per file. Pro raises this to 100 MB and Developer to 5 GB. There is no separate row-count gate for this tool.

Can I control the output indentation?

Output is pretty-printed with 2-space indentation. There is no indent control in this tool's UI. To compact or reformat afterwards, use json-minifier or json-prettifier.

How do I scrub PII from free-text fields?

This tool can't — it matches keys, not values. Either add the whole free-text key (e.g. notes) to the term list and use remove/fake to discard it, or drop it with json-key-filter before sharing.

Will it break my schema?

mask, hash, and fake keep all keys and overall shape (values become strings). remove deletes matched keys, which can break consumers that require them. If you need a schema-true sample, prefer hash and keep Deep on.

I need fully synthetic records, not scrubbed real ones — what should I use?

Use json-mock-generator, which emits fresh seeded records with a fixed shape (id, name, email, phone, etc.). This anonymizer transforms YOUR data; the mock generator invents new data.

Privacy first

Conversion runs locally in your browser. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

Anonymize JSON to Create GDPR-Compliant Test Data

How to anonymize json to create gdpr-compliant test data

Strategy behavior (what each does to a matched value)

Default PII key terms and what they catch

Tier and size limits

Cookbook

Mask a user record for a UI test fixture

Hash to keep cross-record joins working

Why fake does NOT preserve joins

Remove a national ID entirely

Deep mode reaches nested PII

Errors and edge cases

Free-tier file over 2 MB

Invalid JSON pasted or dropped

PII lives in a free-text value, not a key

Substring match catches an unintended key

remove drops keys and they are not counted

Numbers and booleans in a PII key

Deep turned off

Empty PII key list

mask reveals length and format

Frequently asked questions

Does this make my data truly GDPR-anonymous?

How does the tool decide which fields are PII?

Is the same email always replaced with the same value?

What does the fake strategy generate?

Is the hash SHA-256?

Can I anonymize multiple files at once?

Does anonymization happen on your servers?

What is the file size limit?

Can I control the output indentation?

How do I scrub PII from free-text fields?

Will it break my schema?

I need fully synthetic records, not scrubbed real ones — what should I use?

Privacy first

Related guides

Anonymize JSON to Create GDPR-Compliant Test Data

How to anonymize json to create gdpr-compliant test data

Strategy behavior (what each does to a matched value)

Default PII key terms and what they catch

Tier and size limits

Cookbook

Mask a user record for a UI test fixture

Hash to keep cross-record joins working

Why fake does NOT preserve joins

Remove a national ID entirely

Deep mode reaches nested PII

Errors and edge cases

Free-tier file over 2 MB

Invalid JSON pasted or dropped

PII lives in a free-text value, not a key

Substring match catches an unintended key

remove drops keys and they are not counted

Numbers and booleans in a PII key

Deep turned off

Empty PII key list

mask reveals length and format

Frequently asked questions

Does this make my data truly GDPR-anonymous?

How does the tool decide which fields are PII?

Is the same email always replaced with the same value?

What does the fake strategy generate?

Is the hash SHA-256?

Can I anonymize multiple files at once?

Does anonymization happen on your servers?

What is the file size limit?

Can I control the output indentation?

How do I scrub PII from free-text fields?

Will it break my schema?

I need fully synthetic records, not scrubbed real ones — what should I use?

Privacy first

Related guides