Anonymize a CSV / JSON Sample for a Bug Report or Public Dataset

How to make a safe data sample to attach in public

Step 1
Cut down to a minimal repro first — Trim your file to the smallest sample that still triggers the bug — a handful of rows, only the columns involved. Smaller is better for both the maintainer and your privacy surface. The tool reads CSV (comma-delimited, first row = header) and JSON; it does not read .xlsx / .ods, so convert spreadsheets to CSV first.
Step 2
Drop the sample onto the tool — PapaParse (CSV) or JSON.parse (JSON) runs in your browser tab; the file is never uploaded. The JSON path is taken when the filename ends in `.json`, otherwise it's parsed as CSV. If your JSON sample lost its extension, rename it or set format: json so it isn't mis-read as a single-column CSV.
Step 3
Keep the bug-triggering columns named as they are — Only columns whose names match a PII token are changed. The values that usually cause bugs live in columns like id, amount, raw, payload, timestamp — none of which are PII tokens — so they pass through untouched and the repro still works. Don't rename those; only PII columns should match.
Step 4
Set a seed if you'll need to resend the same sample — Leave seed blank for fresh fakes. Enter a number and the tool calls faker.seed(n) first, so the same original sample + same seed regenerates an identical anonymised file — useful when a thread runs long and the maintainer asks for the same attachment again.
Step 5
Scramble and eyeball the output before posting — Every PII-named column is replaced with a faker value and the count is reported. Open the result and scan it: confirm names/emails are fake and that no PII slipped through in a column the regex didn't match (especially compound headers) or inside a free-text cell.
Step 6
Attach the scrambled file, not the original — The result downloads as <name>-scrambled.<ext>. Attach or paste that into the issue, question, ticket, or dataset. The scramble is one-way; keep your original locally. Once a real sample is posted in public it's effectively permanent, so always post the scrambled copy.

What changes vs. what keeps your bug reproducible

Which columns are faked and which survive to trigger the issue. Detection is name-based against a fixed regex (lib/security/security-processor.ts).

Column	Example header	After scrambling	Why it matters for the repro
Identifier	`name`, `email`, `phone`	Faked	Removed before the sample goes public
Location	`address`, `city`, `zip`	Faked	Removed; rarely the cause of a parsing bug anyway
Record key	`id`, `order_id`, `uuid`	Preserved	Not a PII token -> the row the maintainer needs to find survives
Edge-case value	`amount`, `raw`, `payload`	Preserved	The weird value that breaks the parser is untouched
Encoding / delimiter	(the bytes themselves)	Preserved	PapaParse round-trips structure; quoting and commas survive
Timestamp / status	`created_at`, `status_code`	Preserved	Temporal and state context the maintainer needs is intact

The complete control surface

Every control this tool exposes, from lib/security/security-tool-schemas.ts. There is no redaction-style, field-list, or fake-format option.

Control	Type / values	Default	What it actually does
`seed`	number (optional)	(blank)	Blank = fresh randomness each run. A number calls `faker.seed(n)` for identical output from the same input. Determinism, not encryption, not reversible
`format`	enum: `auto` / `csv` / `json`	`auto`	Server-safe `auto` treats a leading `[`/`{` as JSON, else CSV. In-browser the JSON path is the `.json` extension. Force with `csv` / `json`
Field / column list	(not a control)	—	Fixed in code (`PII_FIELDS_REGEX`). Cannot be edited in the UI

Tier, formats, and size limits

Metadata from lib/security/security-tools-registry.ts and limits from lib/tier-limits.ts. Samples for bug reports are usually tiny, so limits rarely bite.

Property	Value	Note
Minimum tier	Pro	`minTier: "pro"` — not on Free
Input formats	CSV, JSON	JSON via `JSON.parse`; no `.xlsx`/`.ods`
Output	Text (CSV or pretty-printed JSON)	`<name>-scrambled.<ext>`
Pro limits	100 MB / 5 files	Security family — far larger than any repro sample
Pro-media / Developer	500 MB / 50 · 2 GB / unlimited	Higher tiers
Multiple files	Accepted	`acceptsMultiple: true` — anonymise a few samples at once

Cookbook

Before/after samples for public sharing. The PII columns change; the columns that reproduce the bug stay exactly the same. Faker values are illustrative — set a seed if you'll resend.

Repro sample for a CSV parser bug

The bug is an unescaped quote in the amount column on row 2. Names/emails get faked, but the amount value (the actual cause) and the id are preserved, so the sample still breaks the parser.

Input (repro.csv):
id,name,email,amount
1,Sarah Chen,sarah.chen@acme.io,"1,200"
2,Tomás Reyes,treyes@globex.com,"3""500"

Output (repro-scrambled.csv):
id,name,email,amount
1,Dr. Elena Rosales,Reanna.Lockman@yahoo.com,"1,200"
2,Marcus Hettinger,Jaylin.Bode@gmail.com,"3""500"

Names/emails faked; the malformed amount that triggers the bug
and the row ids are preserved exactly.

JSON sample for a GitHub issue

A nested record reproducing a serialization bug. name/email are faked; the id, the numeric edge value 0, and the empty-array tags (the actual repro) survive.

Input (issue.json):
{
  "id": "ord_8821",
  "customer": { "name": "Priya Nair", "email": "priya@shop.co" },
  "qty": 0,
  "tags": []
}

Output (issue-scrambled.json):
{
  "id": "ord_8821",
  "customer": { "name": "Dr. Elena Rosales", "email": "Reanna.Lockman@yahoo.com" },
  "qty": 0,
  "tags": []
}

Open dataset row sample

Publishing rows of an analytics dataset. Identifier columns are faked; the measured columns the dataset is actually about are preserved, so the published sample is both useful and safe.

Input (sample.csv):
name,email,page,duration_ms,bounced
Li Wei,li.wei@x.com,/pricing,4200,false

Output (sample-scrambled.csv):
name,email,page,duration_ms,bounced
Mavis Goldner,Lonnie_Cremin@hotmail.com,/pricing,4200,false

The analytic columns (page, duration_ms, bounced) survive;
only the identifiers were faked.

Reproducible sample for a long thread

Set a seed so you can hand the maintainer the exact same anonymised file again later without re-leaking anything.

Input (case.csv):
case_id,first_name,last_name,email,status
9,Aisha,Khan,aisha.khan@corp.net,open

seed = 99

Output (every run with seed 99 is identical):
case_id,first_name,last_name,email,status
9,<fake>,<fake>,<fake>,open

case_id + status preserved -> the maintainer can match the row.

PII inside a message body still needs scrubbing

The tool fakes whole cells in matched columns; it does not scan free text. An email or phone inside a message column would be published as-is because message isn't a PII token. Scrub that column before posting.

Input (log.csv):
id,email,message
5,dana@x.com,"User wrote: call me at 415-555-0199"

Output (log-scrambled.csv):
id,email,message
5,Hilbert.Klein@gmail.com,"User wrote: call me at 415-555-0199"

The email column is faked; the phone inside message is NOT.
Run message through email-phone-scrubber first so it becomes
[REDACTED_PHONE] before you attach the file.

Edge cases and what actually happens

PII inside a message / log / notes column

By design

The tool fakes whole cells in name-matched columns and never scans cell contents. An email, phone, or SSN written inside a free-text column (message, log, notes, description) would be published verbatim because the column name isn't a PII token. Scrub those columns first with email-phone-scrubber, which emits fixed [REDACTED_*] tags, before you attach anything in public.

Compound PII header survives into the public sample

Not matched

Headers like email_address, customer_name, or home_phone are not exact PII tokens, so the anchored regex leaves them — and real PII — in the file you post. Rename them to the bare token (email, name, phone) before scrambling, and eyeball the output before publishing.

Bug-relevant value accidentally lives in a PII-named column

Heads-up

If the value that triggers your bug sits in a column the regex matches (say a malformed name), scrambling will replace it and your sample may stop reproducing. Rename that column to a non-PII header before scrambling so the bug-triggering value is preserved, or reproduce the bug with the value moved to a clearly non-PII column.

JSON sample without a .json extension parsed as CSV

Mis-parse

In-browser the JSON path runs only when the filename ends in .json. A JSON snippet saved as sample.txt is parsed by PapaParse as a one-column CSV and barely changes. Rename to .json, rely on the server-safe auto sniff of a leading [/{, or set format: json.

Malformed JSON snippet

Error

If the very bug you're reporting is invalid JSON, JSON.parse will throw and produce nothing — you can't scramble a document the parser rejects. In that case anonymise a valid version, or share the malformed bytes as a CSV/text attachment after manually replacing the PII. CSV is more forgiving and PapaParse parses ragged rows.

SSN sample is 9 plain digits

Expected

ssn / tax_id columns become faker.string.numeric(9) — nine random digits, no dashes, no checksum. Safe to publish, but if your repro depends on a specifically formatted SSN, note that the fake won't match NNN-NN-NNNN; the tool has no SSN-format option.

Empty or header-only sample

Supported

A header-only CSV returns just the header; a sample with no PII-named columns returns with itemsRedacted = 0 and every cell preserved. No error — the zero count just tells you nothing matched, which is fine for a sample that legitimately has no PII.

Seed gives reproducibility, not a way back

By design

A seed lets you regenerate the identical anonymised sample later, but it is not a key and there's no mapping back to the real values. The operation is one-way — keep your original sample locally if you ever need the real data for your own debugging.

Sample larger than your tier cap

Rejected

Repro samples are usually tiny, but if you attach a big export it must fit the security-family limits: Pro 100 MB / 5 files, Pro-media 500 MB / 50, Developer 2 GB / unlimited (this tool needs at least Pro). Trim to a minimal repro — smaller samples are better for the maintainer and for privacy anyway.

Frequently asked questions

Is it safe to post the output in a public GitHub issue?

For PII in matched columns, yes — those cells become faker fakes before anything leaves your browser. But the tool does NOT scrub PII inside free-text columns or in columns with non-matching (compound) names, so eyeball the output and scrub free-text columns separately before you post. Once a real sample is public it's effectively permanent.

Will the anonymised sample still reproduce my bug?

Usually yes, because the columns that trigger bugs — IDs, malformed values, edge-case numbers, encodings, delimiters, timestamps — are not PII tokens, so they're preserved byte-for-byte. The only risk is if your bug-triggering value happens to sit in a PII-named column; in that case rename the column before scrambling.

What if the bug-causing value is in a column like `name`?

Then scrambling will replace it and the sample may stop reproducing. Rename that column to a non-PII header before scrambling so the value is preserved, or move the value into a clearly non-PII column for the repro.

Can it remove PII from a message or log column?

No — it fakes whole cells in name-matched columns and never scans cell contents. Run free-text columns through email-phone-scrubber first; it matches email, phone, SSN, credit-card (Luhn), IBAN (mod-97) and UK-NI patterns and emits fixed [REDACTED_*] tags, which is what you want before posting publicly.

My header is `email_address` — is it scrambled?

No. The regex expects bare tokens, so email_address, customer_name, home_phone are not matched and real PII would survive into your public sample. Rename to email, name, phone before scrambling.

Does the file get uploaded anywhere?

No. The live tool runs in your browser — PapaParse and faker are loaded client-side and the file is parsed and rewritten in the tab. Your original never leaves your machine; only the scrambled copy is downloaded for you to attach.

Can I regenerate the exact same sample later?

Yes — set a numeric seed. The tool calls faker.seed(n), so the same original sample plus the same seed produces a byte-identical anonymised file. Useful when a maintainer asks for the same attachment again in a long thread.

Does it accept Excel files?

No. Input is CSV (comma-delimited, first row = header) or JSON. Convert .xlsx/.ods to CSV first. Output mirrors input — CSV stays CSV, JSON stays pretty-printed JSON — downloaded as <name>-scrambled.<ext>.

What if my repro is literally invalid JSON?

Then JSON.parse throws and nothing is produced, because the parser rejects the document. Anonymise a valid version, or share the malformed bytes as a text/CSV attachment after manually replacing the PII. CSV input is more forgiving — PapaParse parses ragged rows rather than throwing.

Why is the fake SSN just nine digits?

Because ssn/tax_id columns are filled with faker.string.numeric(9) — nine random digits, no formatting. That's safe to publish; just note it won't match a NNN-NN-NNNN format if your repro depends on that. There's no SSN-format option.

What plan do I need?

This is a Pro-tier security tool, not on Free. Limits are Pro 100 MB / 5 files, Pro-media 500 MB / 50, Developer 2 GB / unlimited — far larger than any minimal repro sample, so size is rarely the issue.

What pairs well for safe public sharing?

Scrub free-text columns first with email-phone-scrubber. If you ever need to share the REAL sample privately instead, encrypt it with aes-256-encryptor (Web Crypto AES-GCM 256, PBKDF2) and pass the passphrase separately. Hash what you posted with multi-hash-fingerprinter so you can prove exactly which file you shared.

Privacy first

Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.

How to make a safe data sample to attach in public

Step 1
Cut down to a minimal repro first — Trim your file to the smallest sample that still triggers the bug — a handful of rows, only the columns involved. Smaller is better for both the maintainer and your privacy surface. The tool reads CSV (comma-delimited, first row = header) and JSON; it does not read .xlsx / .ods, so convert spreadsheets to CSV first.
Step 2
Drop the sample onto the tool — PapaParse (CSV) or JSON.parse (JSON) runs in your browser tab; the file is never uploaded. The JSON path is taken when the filename ends in `.json`, otherwise it's parsed as CSV. If your JSON sample lost its extension, rename it or set format: json so it isn't mis-read as a single-column CSV.
Step 3
Keep the bug-triggering columns named as they are — Only columns whose names match a PII token are changed. The values that usually cause bugs live in columns like id, amount, raw, payload, timestamp — none of which are PII tokens — so they pass through untouched and the repro still works. Don't rename those; only PII columns should match.
Step 4
Set a seed if you'll need to resend the same sample — Leave seed blank for fresh fakes. Enter a number and the tool calls faker.seed(n) first, so the same original sample + same seed regenerates an identical anonymised file — useful when a thread runs long and the maintainer asks for the same attachment again.
Step 5
Scramble and eyeball the output before posting — Every PII-named column is replaced with a faker value and the count is reported. Open the result and scan it: confirm names/emails are fake and that no PII slipped through in a column the regex didn't match (especially compound headers) or inside a free-text cell.
Step 6
Attach the scrambled file, not the original — The result downloads as <name>-scrambled.<ext>. Attach or paste that into the issue, question, ticket, or dataset. The scramble is one-way; keep your original locally. Once a real sample is posted in public it's effectively permanent, so always post the scrambled copy.

What changes vs. what keeps your bug reproducible

Which columns are faked and which survive to trigger the issue. Detection is name-based against a fixed regex (lib/security/security-processor.ts).

Column	Example header	After scrambling	Why it matters for the repro
Identifier	`name`, `email`, `phone`	Faked	Removed before the sample goes public
Location	`address`, `city`, `zip`	Faked	Removed; rarely the cause of a parsing bug anyway
Record key	`id`, `order_id`, `uuid`	Preserved	Not a PII token -> the row the maintainer needs to find survives
Edge-case value	`amount`, `raw`, `payload`	Preserved	The weird value that breaks the parser is untouched
Encoding / delimiter	(the bytes themselves)	Preserved	PapaParse round-trips structure; quoting and commas survive
Timestamp / status	`created_at`, `status_code`	Preserved	Temporal and state context the maintainer needs is intact

The complete control surface

Every control this tool exposes, from lib/security/security-tool-schemas.ts. There is no redaction-style, field-list, or fake-format option.

Control	Type / values	Default	What it actually does
`seed`	number (optional)	(blank)	Blank = fresh randomness each run. A number calls `faker.seed(n)` for identical output from the same input. Determinism, not encryption, not reversible
`format`	enum: `auto` / `csv` / `json`	`auto`	Server-safe `auto` treats a leading `[`/`{` as JSON, else CSV. In-browser the JSON path is the `.json` extension. Force with `csv` / `json`
Field / column list	(not a control)	—	Fixed in code (`PII_FIELDS_REGEX`). Cannot be edited in the UI

Tier, formats, and size limits

Metadata from lib/security/security-tools-registry.ts and limits from lib/tier-limits.ts. Samples for bug reports are usually tiny, so limits rarely bite.

Property	Value	Note
Minimum tier	Pro	`minTier: "pro"` — not on Free
Input formats	CSV, JSON	JSON via `JSON.parse`; no `.xlsx`/`.ods`
Output	Text (CSV or pretty-printed JSON)	`<name>-scrambled.<ext>`
Pro limits	100 MB / 5 files	Security family — far larger than any repro sample
Pro-media / Developer	500 MB / 50 · 2 GB / unlimited	Higher tiers
Multiple files	Accepted	`acceptsMultiple: true` — anonymise a few samples at once

Cookbook

Before/after samples for public sharing. The PII columns change; the columns that reproduce the bug stay exactly the same. Faker values are illustrative — set a seed if you'll resend.

Repro sample for a CSV parser bug

The bug is an unescaped quote in the amount column on row 2. Names/emails get faked, but the amount value (the actual cause) and the id are preserved, so the sample still breaks the parser.

Input (repro.csv):
id,name,email,amount
1,Sarah Chen,sarah.chen@acme.io,"1,200"
2,Tomás Reyes,treyes@globex.com,"3""500"

Output (repro-scrambled.csv):
id,name,email,amount
1,Dr. Elena Rosales,Reanna.Lockman@yahoo.com,"1,200"
2,Marcus Hettinger,Jaylin.Bode@gmail.com,"3""500"

Names/emails faked; the malformed amount that triggers the bug
and the row ids are preserved exactly.

JSON sample for a GitHub issue

A nested record reproducing a serialization bug. name/email are faked; the id, the numeric edge value 0, and the empty-array tags (the actual repro) survive.

Input (issue.json):
{
  "id": "ord_8821",
  "customer": { "name": "Priya Nair", "email": "priya@shop.co" },
  "qty": 0,
  "tags": []
}

Output (issue-scrambled.json):
{
  "id": "ord_8821",
  "customer": { "name": "Dr. Elena Rosales", "email": "Reanna.Lockman@yahoo.com" },
  "qty": 0,
  "tags": []
}

Open dataset row sample

Publishing rows of an analytics dataset. Identifier columns are faked; the measured columns the dataset is actually about are preserved, so the published sample is both useful and safe.

Input (sample.csv):
name,email,page,duration_ms,bounced
Li Wei,li.wei@x.com,/pricing,4200,false

Output (sample-scrambled.csv):
name,email,page,duration_ms,bounced
Mavis Goldner,Lonnie_Cremin@hotmail.com,/pricing,4200,false

The analytic columns (page, duration_ms, bounced) survive;
only the identifiers were faked.

Reproducible sample for a long thread

Set a seed so you can hand the maintainer the exact same anonymised file again later without re-leaking anything.

Input (case.csv):
case_id,first_name,last_name,email,status
9,Aisha,Khan,aisha.khan@corp.net,open

seed = 99

Output (every run with seed 99 is identical):
case_id,first_name,last_name,email,status
9,<fake>,<fake>,<fake>,open

case_id + status preserved -> the maintainer can match the row.

PII inside a message body still needs scrubbing

Input (log.csv):
id,email,message
5,dana@x.com,"User wrote: call me at 415-555-0199"

Output (log-scrambled.csv):
id,email,message
5,Hilbert.Klein@gmail.com,"User wrote: call me at 415-555-0199"

The email column is faked; the phone inside message is NOT.
Run message through email-phone-scrubber first so it becomes
[REDACTED_PHONE] before you attach the file.

Edge cases and what actually happens

PII inside a message / log / notes column

By design

Compound PII header survives into the public sample

Not matched

Bug-relevant value accidentally lives in a PII-named column

Heads-up

JSON sample without a .json extension parsed as CSV

Mis-parse

Malformed JSON snippet

Error

SSN sample is 9 plain digits

Expected

Empty or header-only sample

Supported

Seed gives reproducibility, not a way back

By design

Sample larger than your tier cap

Rejected

Frequently asked questions

Is it safe to post the output in a public GitHub issue?

Will the anonymised sample still reproduce my bug?

What if the bug-causing value is in a column like `name`?

Can it remove PII from a message or log column?

My header is `email_address` — is it scrambled?

Does the file get uploaded anywhere?

Can I regenerate the exact same sample later?

Does it accept Excel files?

What if my repro is literally invalid JSON?

Why is the fake SSN just nine digits?

What plan do I need?

What pairs well for safe public sharing?

Privacy first

Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.

Make a Safe Data Sample to Attach in Public

How to make a safe data sample to attach in public

What changes vs. what keeps your bug reproducible

The complete control surface

Tier, formats, and size limits

Cookbook

Repro sample for a CSV parser bug

JSON sample for a GitHub issue

Open dataset row sample

Reproducible sample for a long thread

PII inside a message body still needs scrubbing

Edge cases and what actually happens

PII inside a message / log / notes column

Compound PII header survives into the public sample

Bug-relevant value accidentally lives in a PII-named column

JSON sample without a .json extension parsed as CSV

Malformed JSON snippet

SSN sample is 9 plain digits

Empty or header-only sample

Seed gives reproducibility, not a way back

Sample larger than your tier cap

Frequently asked questions

Is it safe to post the output in a public GitHub issue?

Will the anonymised sample still reproduce my bug?

What if the bug-causing value is in a column like `name`?

Can it remove PII from a message or log column?

My header is `email_address` — is it scrambled?

Does the file get uploaded anywhere?

Can I regenerate the exact same sample later?

Does it accept Excel files?

What if my repro is literally invalid JSON?

Why is the fake SSN just nine digits?

What plan do I need?

What pairs well for safe public sharing?

Privacy first

Related guides

Make a Safe Data Sample to Attach in Public

How to make a safe data sample to attach in public

What changes vs. what keeps your bug reproducible

The complete control surface

Tier, formats, and size limits

Cookbook

Repro sample for a CSV parser bug

JSON sample for a GitHub issue

Open dataset row sample

Reproducible sample for a long thread

PII inside a message body still needs scrubbing

Edge cases and what actually happens

PII inside a message / log / notes column

Compound PII header survives into the public sample

Bug-relevant value accidentally lives in a PII-named column

JSON sample without a .json extension parsed as CSV

Malformed JSON snippet

SSN sample is 9 plain digits

Empty or header-only sample

Seed gives reproducibility, not a way back

Sample larger than your tier cap

Frequently asked questions

Is it safe to post the output in a public GitHub issue?

Will the anonymised sample still reproduce my bug?

What if the bug-causing value is in a column like `name`?

Can it remove PII from a message or log column?

My header is `email_address` — is it scrambled?

Does the file get uploaded anywhere?

Can I regenerate the exact same sample later?

Does it accept Excel files?

What if my repro is literally invalid JSON?

Why is the fake SSN just nine digits?

What plan do I need?

What pairs well for safe public sharing?

Privacy first

Related guides