How to make a safe data sample to attach in public
- Step 1Cut down to a minimal repro first — Trim your file to the smallest sample that still triggers the bug — a handful of rows, only the columns involved. Smaller is better for both the maintainer and your privacy surface. The tool reads CSV (comma-delimited, first row = header) and JSON; it does not read
.xlsx/.ods, so convert spreadsheets to CSV first. - Step 2Drop the sample onto the tool — PapaParse (CSV) or
JSON.parse(JSON) runs in your browser tab; the file is never uploaded. The JSON path is taken when the filename ends in `.json`, otherwise it's parsed as CSV. If your JSON sample lost its extension, rename it or setformat: jsonso it isn't mis-read as a single-column CSV. - Step 3Keep the bug-triggering columns named as they are — Only columns whose names match a PII token are changed. The values that usually cause bugs live in columns like
id,amount,raw,payload,timestamp— none of which are PII tokens — so they pass through untouched and the repro still works. Don't rename those; only PII columns should match. - Step 4Set a seed if you'll need to resend the same sample — Leave
seedblank for fresh fakes. Enter a number and the tool callsfaker.seed(n)first, so the same original sample + same seed regenerates an identical anonymised file — useful when a thread runs long and the maintainer asks for the same attachment again. - Step 5Scramble and eyeball the output before posting — Every PII-named column is replaced with a faker value and the count is reported. Open the result and scan it: confirm names/emails are fake and that no PII slipped through in a column the regex didn't match (especially compound headers) or inside a free-text cell.
- Step 6Attach the scrambled file, not the original — The result downloads as
<name>-scrambled.<ext>. Attach or paste that into the issue, question, ticket, or dataset. The scramble is one-way; keep your original locally. Once a real sample is posted in public it's effectively permanent, so always post the scrambled copy.
What changes vs. what keeps your bug reproducible
Which columns are faked and which survive to trigger the issue. Detection is name-based against a fixed regex (lib/security/security-processor.ts).
| Column | Example header | After scrambling | Why it matters for the repro |
|---|---|---|---|
| Identifier | name, email, phone | Faked | Removed before the sample goes public |
| Location | address, city, zip | Faked | Removed; rarely the cause of a parsing bug anyway |
| Record key | id, order_id, uuid | Preserved | Not a PII token -> the row the maintainer needs to find survives |
| Edge-case value | amount, raw, payload | Preserved | The weird value that breaks the parser is untouched |
| Encoding / delimiter | (the bytes themselves) | Preserved | PapaParse round-trips structure; quoting and commas survive |
| Timestamp / status | created_at, status_code | Preserved | Temporal and state context the maintainer needs is intact |
The complete control surface
Every control this tool exposes, from lib/security/security-tool-schemas.ts. There is no redaction-style, field-list, or fake-format option.
| Control | Type / values | Default | What it actually does |
|---|---|---|---|
seed | number (optional) | (blank) | Blank = fresh randomness each run. A number calls faker.seed(n) for identical output from the same input. Determinism, not encryption, not reversible |
format | enum: auto / csv / json | auto | Server-safe auto treats a leading [/{ as JSON, else CSV. In-browser the JSON path is the .json extension. Force with csv / json |
| Field / column list | (not a control) | — | Fixed in code (PII_FIELDS_REGEX). Cannot be edited in the UI |
Tier, formats, and size limits
Metadata from lib/security/security-tools-registry.ts and limits from lib/tier-limits.ts. Samples for bug reports are usually tiny, so limits rarely bite.
| Property | Value | Note |
|---|---|---|
| Minimum tier | Pro | minTier: "pro" — not on Free |
| Input formats | CSV, JSON | JSON via JSON.parse; no .xlsx/.ods |
| Output | Text (CSV or pretty-printed JSON) | <name>-scrambled.<ext> |
| Pro limits | 100 MB / 5 files | Security family — far larger than any repro sample |
| Pro-media / Developer | 500 MB / 50 · 2 GB / unlimited | Higher tiers |
| Multiple files | Accepted | acceptsMultiple: true — anonymise a few samples at once |
Cookbook
Before/after samples for public sharing. The PII columns change; the columns that reproduce the bug stay exactly the same. Faker values are illustrative — set a seed if you'll resend.
Repro sample for a CSV parser bug
The bug is an unescaped quote in the amount column on row 2. Names/emails get faked, but the amount value (the actual cause) and the id are preserved, so the sample still breaks the parser.
Input (repro.csv): id,name,email,amount 1,Sarah Chen,sarah.chen@acme.io,"1,200" 2,Tomás Reyes,treyes@globex.com,"3""500" Output (repro-scrambled.csv): id,name,email,amount 1,Dr. Elena Rosales,Reanna.Lockman@yahoo.com,"1,200" 2,Marcus Hettinger,Jaylin.Bode@gmail.com,"3""500" Names/emails faked; the malformed amount that triggers the bug and the row ids are preserved exactly.
JSON sample for a GitHub issue
A nested record reproducing a serialization bug. name/email are faked; the id, the numeric edge value 0, and the empty-array tags (the actual repro) survive.
Input (issue.json):
{
"id": "ord_8821",
"customer": { "name": "Priya Nair", "email": "priya@shop.co" },
"qty": 0,
"tags": []
}
Output (issue-scrambled.json):
{
"id": "ord_8821",
"customer": { "name": "Dr. Elena Rosales", "email": "Reanna.Lockman@yahoo.com" },
"qty": 0,
"tags": []
}Open dataset row sample
Publishing rows of an analytics dataset. Identifier columns are faked; the measured columns the dataset is actually about are preserved, so the published sample is both useful and safe.
Input (sample.csv): name,email,page,duration_ms,bounced Li Wei,li.wei@x.com,/pricing,4200,false Output (sample-scrambled.csv): name,email,page,duration_ms,bounced Mavis Goldner,Lonnie_Cremin@hotmail.com,/pricing,4200,false The analytic columns (page, duration_ms, bounced) survive; only the identifiers were faked.
Reproducible sample for a long thread
Set a seed so you can hand the maintainer the exact same anonymised file again later without re-leaking anything.
Input (case.csv): case_id,first_name,last_name,email,status 9,Aisha,Khan,aisha.khan@corp.net,open seed = 99 Output (every run with seed 99 is identical): case_id,first_name,last_name,email,status 9,<fake>,<fake>,<fake>,open case_id + status preserved -> the maintainer can match the row.
PII inside a message body still needs scrubbing
The tool fakes whole cells in matched columns; it does not scan free text. An email or phone inside a message column would be published as-is because message isn't a PII token. Scrub that column before posting.
Input (log.csv): id,email,message 5,dana@x.com,"User wrote: call me at 415-555-0199" Output (log-scrambled.csv): id,email,message 5,Hilbert.Klein@gmail.com,"User wrote: call me at 415-555-0199" The email column is faked; the phone inside message is NOT. Run message through email-phone-scrubber first so it becomes [REDACTED_PHONE] before you attach the file.
Edge cases and what actually happens
PII inside a message / log / notes column
By designThe tool fakes whole cells in name-matched columns and never scans cell contents. An email, phone, or SSN written inside a free-text column (message, log, notes, description) would be published verbatim because the column name isn't a PII token. Scrub those columns first with email-phone-scrubber, which emits fixed [REDACTED_*] tags, before you attach anything in public.
Compound PII header survives into the public sample
Not matchedHeaders like email_address, customer_name, or home_phone are not exact PII tokens, so the anchored regex leaves them — and real PII — in the file you post. Rename them to the bare token (email, name, phone) before scrambling, and eyeball the output before publishing.
Bug-relevant value accidentally lives in a PII-named column
Heads-upIf the value that triggers your bug sits in a column the regex matches (say a malformed name), scrambling will replace it and your sample may stop reproducing. Rename that column to a non-PII header before scrambling so the bug-triggering value is preserved, or reproduce the bug with the value moved to a clearly non-PII column.
JSON sample without a .json extension parsed as CSV
Mis-parseIn-browser the JSON path runs only when the filename ends in .json. A JSON snippet saved as sample.txt is parsed by PapaParse as a one-column CSV and barely changes. Rename to .json, rely on the server-safe auto sniff of a leading [/{, or set format: json.
Malformed JSON snippet
ErrorIf the very bug you're reporting is invalid JSON, JSON.parse will throw and produce nothing — you can't scramble a document the parser rejects. In that case anonymise a valid version, or share the malformed bytes as a CSV/text attachment after manually replacing the PII. CSV is more forgiving and PapaParse parses ragged rows.
SSN sample is 9 plain digits
Expectedssn / tax_id columns become faker.string.numeric(9) — nine random digits, no dashes, no checksum. Safe to publish, but if your repro depends on a specifically formatted SSN, note that the fake won't match NNN-NN-NNNN; the tool has no SSN-format option.
Empty or header-only sample
SupportedA header-only CSV returns just the header; a sample with no PII-named columns returns with itemsRedacted = 0 and every cell preserved. No error — the zero count just tells you nothing matched, which is fine for a sample that legitimately has no PII.
Seed gives reproducibility, not a way back
By designA seed lets you regenerate the identical anonymised sample later, but it is not a key and there's no mapping back to the real values. The operation is one-way — keep your original sample locally if you ever need the real data for your own debugging.
Sample larger than your tier cap
RejectedRepro samples are usually tiny, but if you attach a big export it must fit the security-family limits: Pro 100 MB / 5 files, Pro-media 500 MB / 50, Developer 2 GB / unlimited (this tool needs at least Pro). Trim to a minimal repro — smaller samples are better for the maintainer and for privacy anyway.
Frequently asked questions
Is it safe to post the output in a public GitHub issue?
For PII in matched columns, yes — those cells become faker fakes before anything leaves your browser. But the tool does NOT scrub PII inside free-text columns or in columns with non-matching (compound) names, so eyeball the output and scrub free-text columns separately before you post. Once a real sample is public it's effectively permanent.
Will the anonymised sample still reproduce my bug?
Usually yes, because the columns that trigger bugs — IDs, malformed values, edge-case numbers, encodings, delimiters, timestamps — are not PII tokens, so they're preserved byte-for-byte. The only risk is if your bug-triggering value happens to sit in a PII-named column; in that case rename the column before scrambling.
What if the bug-causing value is in a column like `name`?
Then scrambling will replace it and the sample may stop reproducing. Rename that column to a non-PII header before scrambling so the value is preserved, or move the value into a clearly non-PII column for the repro.
Can it remove PII from a message or log column?
No — it fakes whole cells in name-matched columns and never scans cell contents. Run free-text columns through email-phone-scrubber first; it matches email, phone, SSN, credit-card (Luhn), IBAN (mod-97) and UK-NI patterns and emits fixed [REDACTED_*] tags, which is what you want before posting publicly.
My header is `email_address` — is it scrambled?
No. The regex expects bare tokens, so email_address, customer_name, home_phone are not matched and real PII would survive into your public sample. Rename to email, name, phone before scrambling.
Does the file get uploaded anywhere?
No. The live tool runs in your browser — PapaParse and faker are loaded client-side and the file is parsed and rewritten in the tab. Your original never leaves your machine; only the scrambled copy is downloaded for you to attach.
Can I regenerate the exact same sample later?
Yes — set a numeric seed. The tool calls faker.seed(n), so the same original sample plus the same seed produces a byte-identical anonymised file. Useful when a maintainer asks for the same attachment again in a long thread.
Does it accept Excel files?
No. Input is CSV (comma-delimited, first row = header) or JSON. Convert .xlsx/.ods to CSV first. Output mirrors input — CSV stays CSV, JSON stays pretty-printed JSON — downloaded as <name>-scrambled.<ext>.
What if my repro is literally invalid JSON?
Then JSON.parse throws and nothing is produced, because the parser rejects the document. Anonymise a valid version, or share the malformed bytes as a text/CSV attachment after manually replacing the PII. CSV input is more forgiving — PapaParse parses ragged rows rather than throwing.
Why is the fake SSN just nine digits?
Because ssn/tax_id columns are filled with faker.string.numeric(9) — nine random digits, no formatting. That's safe to publish; just note it won't match a NNN-NN-NNNN format if your repro depends on that. There's no SSN-format option.
What plan do I need?
This is a Pro-tier security tool, not on Free. Limits are Pro 100 MB / 5 files, Pro-media 500 MB / 50, Developer 2 GB / unlimited — far larger than any minimal repro sample, so size is rarely the issue.
What pairs well for safe public sharing?
Scrub free-text columns first with email-phone-scrubber. If you ever need to share the REAL sample privately instead, encrypt it with aes-256-encryptor (Web Crypto AES-GCM 256, PBKDF2) and pass the passphrase separately. Hash what you posted with multi-hash-fingerprinter so you can prove exactly which file you shared.
Privacy first
Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.