How to remove pii from json audit logs for compliance reporting
- Step 1Export the log window as JSON — Export the audit records for the period under review as a JSON array. Each event should keep its type, timestamp, and metadata. Keep the file under 2 MB for free tier.
- Step 2Load the log — Drop the JSON file onto the dropzone. One file per run; concatenate windows into one array if you need them pseudonymised consistently.
- Step 3Name the identifier keys — Add the log's identifier keys to the term list (e.g.
userId, email, ip, sessionId). Keep event/timestamp/resource keys OUT of the list so they survive. - Step 4Choose hash for traceable pseudonyms — Pick hash so each actor maps to a stable token across events. Use remove only if the audience must not see actor patterns at all.
- Step 5Keep Deep on — Leave Deep enabled so identifiers nested inside event detail objects are pseudonymised too.
- Step 6Share the sanitized log — Download the
.anon.jsonand hand it to the auditor. The sequence of actions and timings is intact; only identities are replaced.
Strategy choice for audit logs
Audit work usually wants traceable pseudonyms.
| Goal | Strategy | Actor traceable across events? | Notes |
|---|---|---|---|
| Trace one actor across events | hash | Yes | Same id → same 8-char token |
| Fully anonymous, no pattern | remove | No (actor gone) | Deletes the identifier key |
| Readable pseudonyms | fake | No | Counter changes per occurrence |
| Keep id length/shape | mask | By value only | Reveals format of the id |
Which log fields to scrub vs keep
Put identifiers in the term list; leave analysis fields out.
| Field | In term list? | Result with hash |
|---|---|---|
| userId / actorId | Yes (add it) | stable token per actor |
| Yes (default) | stable token per email | |
| ip | Yes (default) | stable token per ip |
| eventType / action | No | preserved |
| timestamp / resourceId / outcome | No | preserved |
Tier limits for log windows
Single file per run.
| Tier | Max file | Batch |
|---|---|---|
| Free | 2 MB | 1 |
| Pro | 100 MB | 10 |
| Developer | 5 GB | unlimited |
Cookbook
Sanitising audit logs so auditors keep the activity trail but not the identities.
Hash userId to keep the actor traceable
ExampleThe same actor logs in twice; hash keeps both events tied to one pseudonym.
PII terms: userId, ip
Strategy: hash · Deep: on
Input:
[ { "event": "login", "userId": "u_42", "ip": "10.0.0.7" },
{ "event": "login", "userId": "u_42", "ip": "10.0.0.7" } ]
Output:
[ { "event": "login", "userId": "9c1ad3e2", "ip": "b7f0c441" },
{ "event": "login", "userId": "9c1ad3e2", "ip": "b7f0c441" } ]Keep analysis fields, scrub only identities
ExampleEvent type, timestamp, resource, and outcome stay; only userId and email change.
PII terms: userId, email
Strategy: hash
Input:
{ "event": "data.read", "ts": "2026-06-10T08:00Z", "userId": "u_9", "email": "k@x.com", "resource": "file/12", "outcome": "allow" }
Output:
{ "event": "data.read", "ts": "2026-06-10T08:00Z", "userId": "71b2e0aa", "email": "c39f10d4", "resource": "file/12", "outcome": "allow" }Remove the actor entirely for a stricter audience
ExampleWhen even a pseudonym is too much, remove deletes the identifier (no pattern survives).
PII terms: userId
Strategy: remove
Input:
{ "event": "login", "userId": "u_42", "outcome": "allow" }
Output:
{ "event": "login", "outcome": "allow" }Why fake ruins the activity trail
Examplefake increments per occurrence, so one actor's two events get different pseudonyms — the trail is lost. Use hash.
PII terms: userId
Strategy: fake
Input:
[ { "userId": "u_42" }, { "userId": "u_42" } ]
Output:
[ { "userId": "[REDACTED-1]" }, { "userId": "[REDACTED-2]" } ] ← can't tell it's one actorDeep reaches identifiers in nested event detail
ExampleMany log schemas nest the actor under a detail object; Deep scrubs it.
PII terms: email
Strategy: hash · Deep: on
Input:
{ "event": "invite", "detail": { "invitedBy": { "email": "adm@x.com" } } }
Output:
{ "event": "invite", "detail": { "invitedBy": { "email": "e2a91c70" } } }Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Log window over 2 MB on free tier
BlockedFree tier caps files at 2 MB. Logs grow fast — narrow the time window, or upgrade to Pro (100 MB) / Developer (5 GB).
userId not in the term list
LeaksuserId is not a default term. If you don't add it, every userId is copied verbatim into the shared log. Always add your actor key explicitly.
fake used for audit pseudonyms
Cautionfake increments per occurrence, so a single actor's events get different pseudonyms and the activity pattern is destroyed. Use hash for traceable audit logs.
remove erases the activity pattern
By designremove deletes the identifier, so the auditor can no longer group events by actor. Choose remove only when the audience must see no actor linkage at all.
Identifier nested in event detail with Deep off
SurvivesWith Deep off, an actor under detail.invitedBy.email is left intact. Keep Deep on for nested log schemas.
PII inside a free-text log message
Not detectedDetection is key-name only. An email inside a message string is not scrubbed. Add the message key and remove/fake it, or strip it with json-key-filter.
Substring match scrubs eventType containing 'name'
CautionIf a kept field's name contains a term (e.g. an eventName key while 'name' is in the list), it is altered too. Keep analysis-field names out of overlap with your terms or use narrower terms.
Newline-delimited logs (JSONL)
Parse errorThe tool expects a single valid JSON document. NDJSON/JSONL (one object per line) fails JSON.parse. Wrap the lines into a JSON array first, and if the raw export is malformed run it through json-format-fixer.
Numeric userId hashed to a string
Type changeValues are stringified before hashing, so a numeric userId becomes a string token. Downstream tooling that expects a number should cast it.
hash treated as cryptographic
CautionThe token is a fast 8-character hex value, not a SHA digest. A determined party with a known userId can hash it to confirm presence — fine for audit pseudonymisation, not for adversarial secrecy.
Frequently asked questions
Should I anonymize or pseudonymize audit logs?
For audits that trace actions across events, pseudonymisation is usually better. The hash strategy provides it: consistent tokens let an auditor see 'actor X did Y five times' without knowing who X is. Use remove for true anonymisation when no actor linkage is allowed.
How do I keep one actor traceable across events?
Add the actor key (userId, email, ip) to the term list and use hash. The deterministic token is identical for every occurrence of the same source value.
Which fields survive anonymization?
Any key whose name does not contain one of your terms — typically eventType, timestamp, resourceId, and outcome. Keep these out of the term list so the analysis trail is preserved.
Why is fake a bad choice for audit logs?
fake increments a counter, so the same actor receives different pseudonyms across events, destroying the activity pattern. Use hash instead.
Are the logs uploaded?
No. Anonymization runs entirely in your browser; log records are never transmitted to JAD Apps.
Can I process JSONL / newline-delimited logs?
Not directly — the tool parses one JSON document. Wrap your NDJSON lines into a single JSON array before pasting.
Does remove count toward the 'fields anonymized' stat?
No. Removed keys are deleted and not counted; the stat reflects retained anonymized values only.
Can I scrub PII inside a free-text message field?
Not by key matching. Add the message key and use remove/fake to drop the value, or strip the field with json-key-filter.
How long should I keep audit logs?
GDPR doesn't fix a period; document a justified retention in your records of processing. This tool helps you share a sanitized copy without exposing identities, independent of your retention policy.
Is the hash cryptographically secure?
No — it is a fast 8-character hex token, not a SHA-style digest. It pseudonymises well but is not collision-resistant or secret against known-plaintext checks.
Can I anonymize multiple log files together?
Only by concatenating them into one JSON array first; the tool processes a single file per run. Combining ensures hash tokens are consistent across the whole set.
How do I review a large sanitized log?
Open it in json-tree-viewer to confirm identifiers are tokenised and analysis fields survived. To compact the file before sending, run json-minifier; to pretty-print a copy for human review, use json-prettifier.
Privacy first
Conversion runs locally in your browser. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.