How to scramble pii fields with realistic fake data
- Step 1Export a real CSV or JSON from production — Pull the file you would otherwise copy into staging — a database
COPY ... TO CSV, an admin-panel export, or a JSON API dump. The scrambler reads CSV (comma-delimited, first row = header) and JSON (object, array, or nested). It does not read.xlsx/.ods— convert those to CSV first. - Step 2Drop the file onto the tool — PapaParse (CSV) or
JSON.parse(JSON) runs in your browser tab; the file is never sent to a server. In the browser the JSON path is taken when the filename ends in `.json` — otherwise the file is parsed as CSV. If your JSON file has a non-.jsonextension, rename it or set theformatoption so it isn't mis-parsed as a single-column CSV. - Step 3Set a seed if you need reproducible fixtures — Leave
seedblank for fresh random fakes every run. Enter any number (e.g.42) to callfaker.seed(42)first — the same input + same seed yields identical fakes, so committed test fixtures and snapshot assertions stay stable across runs and machines. - Step 4Leave format on auto (or force it) —
formatdefaults toauto. On the server-safe pathautosniffs the first non-whitespace byte — a leading[or{means JSON. Set it tocsvorjsononly to override that detection. There is no field-list option: the PII column set is fixed in code and cannot be edited in the UI. - Step 5Scramble — Every column / key whose name matches the PII regex is overwritten with a faker value of the matching kind; the running count of replaced fields is reported. Columns whose names don't match (and, for JSON, values that aren't strings or numbers) are passed through unchanged.
- Step 6Download and use in staging — The result downloads as
<original-name>-scrambled.<ext>(e.g.customers.csv->customers-scrambled.csv). Deploy it to staging, CI fixtures, or a demo seed script. Keep your original file out of the repo — the scramble is one-way and there is no reverse mapping.
PII field names and the fake each produces
Detection is name-based against a fixed regex anchored to the whole column / key name (case-insensitive). - and _ separators are accepted where shown (e.g. first_name, first-name, firstname). A header named email_address, customer, or notes does NOT match — only the exact tokens below.
| Matched column / key name | faker call used | Example output | Notes |
|---|---|---|---|
email, e-mail, e_mail | faker.internet.email() | Reanna.Lockman@yahoo.com | Matches the email-containing branch first; a plain email header is the common case |
first_name / first-name / firstname | faker.person.firstName() | Marcus | The first branch wins before the generic name branch |
last_name / last-name / lastname | faker.person.lastName() | Hettinger | The last branch wins before generic name |
name, full_name, fullname | faker.person.fullName() | Dr. Elena Rosales | Generic full-name fake; can include a title/suffix |
phone, mobile, telephone | faker.phone.number() | (555) 123-4567 | Format follows faker's locale-default phone format |
street, address | faker.location.streetAddress() | 742 Evergreen Terrace | Street line only — no city/state appended |
city | faker.location.city() | East Garfield | City name only |
zip, postal | faker.location.zipCode() | 90210 | Postal-code shape per faker locale default |
ssn, tax_id, tax-id | faker.string.numeric(9) | 830174265 | Plain 9 random digits — NOT formatted as NNN-NN-NNNN |
Options, real behaviour, and what is NOT configurable
The complete control surface for this tool, from lib/security/security-tool-schemas.ts. Anything not listed here does not exist as a UI control.
| Control | Type / values | Default | What it actually does |
|---|---|---|---|
seed | number (optional) | (blank) | Blank = fresh randomness each run. A number calls faker.seed(n) so the same input + seed gives identical fakes. It is a determinism control, NOT encryption and NOT a reversible mapping |
format | enum: auto / csv / json | auto | On the server-safe path, auto treats text starting with [ or { as JSON; otherwise CSV. In-browser, the JSON path is chosen by the .json filename extension. Set csv/json to force it |
| Field / column list | (not a control) | — | Fixed in code (PII_FIELDS_REGEX). You cannot add, remove, or rename which columns are scrambled from the UI |
| Mask / replacement style | (not a control) | — | Replacements are faker fakes, not [REDACTED] masks. To get fixed [REDACTED_*] tags on free-text instead, use email-phone-scrubber |
Where it runs, what it accepts, and the size limits
Tool metadata from lib/security/security-tools-registry.ts and family limits from lib/tier-limits.ts. This is a server-safe security tool that also runs in the browser; the live tool is browser-side, so the original file is never transmitted.
| Property | Value | Source / note |
|---|---|---|
| Minimum tier | Pro | minTier: "pro" — this tool is not on the Free plan |
| Input formats | CSV, JSON | inputType: "csv"; JSON handled via JSON.parse. No .xlsx / .ods (convert to CSV first) |
| Output | Text (CSV stays CSV, JSON stays pretty-printed JSON) | outputType: "text"; filename becomes <name>-scrambled.<ext> |
| Multiple files | Accepted | acceptsMultiple: true — within the per-batch file count for your tier |
| Pro limits | 100 MB / 5 files | Security family, Pro tier |
| Pro-media limits | 500 MB / 50 files | Security family, Pro-media tier |
| Developer limits | 2 GB / unlimited files | Security family, Developer tier |
Cookbook
Real before/after files. Watch which columns change (PII) and which are preserved exactly (everything else). Values shown are illustrative faker output — yours will differ unless you set a seed.
Customer CSV scrambled for staging
A typical customer export. name, email, phone, and city match the PII regex and are replaced; id, signup_date, plan, and mrr are not PII column names and pass through untouched — so your billing logic still sees real-shaped numbers and dates.
Input (customers.csv): id,name,email,phone,city,signup_date,plan,mrr 1001,Sarah Chen,sarah.chen@acme.io,+1-415-555-0182,Oakland,2025-11-03,pro,49 1002,Tomás Reyes,treyes@globex.com,+1-312-555-0144,Chicago,2026-01-17,team,149 Output (customers-scrambled.csv): id,name,email,phone,city,signup_date,plan,mrr 1001,Dr. Elena Rosales,Reanna.Lockman@yahoo.com,(555) 123-4567,East Garfield,2025-11-03,pro,49 1002,Marcus Hettinger,Jaylin.Bode@gmail.com,(555) 987-6543,Lake Verda,2026-01-17,team,149 PII columns rewritten; id / signup_date / plan / mrr preserved exactly.
Reproducible fixture with a seed
For committed test fixtures you want the same fakes every run so snapshot diffs stay clean. Set seed to any number; identical input + identical seed = identical output. Re-running without the seed (or with a different one) produces different fakes.
Input (users.csv): user_id,first_name,last_name,email 7,Aisha,Khan,aisha.khan@corp.net 8,L0,Park,lo.park@corp.net seed = 42 Output (run #1, seed 42): user_id,first_name,last_name,email 7,Brent,Schiller,Garnet_Wuckert@hotmail.com 8,Maybell,Kihn,Bridie.Hahn@yahoo.com Output (run #2, seed 42): IDENTICAL to run #1. Output (no seed): different first_name/last_name/email each run.
Nested JSON dump — only matching keys change
JSON is walked recursively. A key whose name matches the PII regex AND whose value is a string or number gets replaced. Non-matching keys are walked into; arrays and nested objects are traversed. Note email_address does NOT match (only email), so it survives — rename it to email upstream if you want it scrambled.
Input (order.json):
{
"order_id": "ord_8821",
"total": 129.99,
"customer": {
"name": "Priya Nair",
"email": "priya@shop.co",
"email_address": "priya.alt@shop.co",
"phone": "+44 20 7946 0958"
},
"items": [{ "sku": "A-12", "qty": 2 }]
}
Output (order-scrambled.json):
{
"order_id": "ord_8821",
"total": 129.99,
"customer": {
"name": "Dr. Elena Rosales",
"email": "Reanna.Lockman@yahoo.com",
"email_address": "priya.alt@shop.co",
"phone": "(555) 123-4567"
},
"items": [{ "sku": "A-12", "qty": 2 }]
}Header that looks like PII but isn't matched
Detection is anchored to the whole column name. Common real-world headers like email_address, customer_name, home_phone, or mailing_address do NOT match the regex (which expects the bare tokens email, name, phone, address, etc.). If those slip through unscrambled, rename the header to the matched token first, or run the file through a different tool.
Input (contacts.csv): customer_name,email_address,home_phone,city Li Wei,li.wei@x.com,+1-206-555-0117,Seattle Output (contacts-scrambled.csv): customer_name,email_address,home_phone,city Li Wei,li.wei@x.com,+1-206-555-0117,Lake Verda Only 'city' matched -> only 'city' changed. The other three headers are not exact PII tokens, so the real PII survived. Fix: rename headers to name / email / phone before scrambling.
Free-text column with PII embedded in the value
This tool replaces whole cells in matched columns; it does NOT scan inside cell values. A notes column containing an email or SSN in prose is not touched, because notes isn't a PII column name and the tool doesn't pattern-match cell contents. For value-level redaction of free text use the scrubber instead.
Input (tickets.csv): ticket_id,email,notes T-9,dana@x.com,"Call back at 415-555-0199 re: card 4111 1111 1111 1111" Output (tickets-scrambled.csv): ticket_id,email,notes T-9,Hilbert.Klein@gmail.com,"Call back at 415-555-0199 re: card 4111 1111 1111 1111" The 'email' column was scrambled; the phone + card number inside 'notes' were NOT (this tool is column-name based). For that, run notes through email-phone-scrubber, which emits fixed [REDACTED_PHONE] / [REDACTED_CARD] tags.
Edge cases and what actually happens
JSON file without a .json extension parsed as CSV (browser)
Mis-parseOn the in-browser path the JSON branch is taken only when the filename ends in .json. A JSON payload saved as data.txt or export (no extension) is handed to PapaParse, which treats it as a one-column CSV and scrambles nothing useful. Rename the file to .json, or rely on the server-safe auto detection (which sniffs a leading [/{). Setting format: json forces the JSON path.
Header is `email_address` / `customer_name` / `home_phone`
Not matchedThe PII regex is anchored to the whole column name and expects the bare tokens (email, name, phone, ...). Compound headers like email_address, customer_name, home_phone, or mailing_address do not match, so the real PII passes through unchanged. Rename the column to the matched token before scrambling, or you will ship live data to staging.
PII sitting inside a free-text / notes value
By designThe tool replaces entire cells in matched columns; it never scans cell contents. An SSN, card number, or email written inside a notes, comments, or description column is left intact because the column name isn't a PII token. For value-level redaction of free text, use email-phone-scrubber, which matches email / phone / SSN / card (Luhn) / IBAN (mod-97) / UK-NI patterns and emits fixed [REDACTED_*] tags.
Malformed JSON
ErrorJSON input goes through JSON.parse. A trailing comma, unquoted key, single quotes, or a truncated dump throws a parse error and nothing is produced. Validate the JSON (or repair the export) before scrambling. CSV is more forgiving — PapaParse will parse ragged rows rather than throw.
JSON value is an object/array under a PII key
PreservedIn the JSON walk, a matched key is replaced only when its value is a string or number. If a key named address holds a nested object (e.g. {"line1":..., "city":...}), the parent isn't overwritten — instead the walker recurses into it, and a nested city key would then be scrambled on its own. Structured address objects therefore need PII tokens at the leaf level to be caught.
SSN output is 9 plain digits, not NNN-NN-NNNN
Expectedssn / tax_id columns are filled with faker.string.numeric(9) — nine random digits with no dashes. If your staging code validates the dashed NNN-NN-NNNN format, the fake will fail that validation. This is the actual behaviour; the tool has no SSN-format option.
Same email appears in multiple rows / referential integrity
Not preservedEach matched cell gets an independent faker value, so a real email that recurs across many rows (or that is a join key between two files) becomes a different fake in each row. The scramble is per-cell, not per-value, and there is no consistent mapping. If you need the same real value to map to the same fake everywhere, this tool will not give you referential integrity across rows or files.
Seed gives determinism, not reversibility
By designA seed makes faker produce the same sequence of fakes for the same input — useful for stable fixtures. It is not a key and creates no lookup table back to the originals. The operation is one-way: keep your original file separately if you ever need the real values.
Empty file, header-only CSV, or no PII columns
SupportedA CSV with headers but no PII-named columns parses fine and comes back with zero replacements — every cell preserved. A header-only file returns just the header. There is no error; the replaced-field count simply reads 0, which is your signal that no column names matched.
File exceeds your tier's size or count limit
RejectedSecurity-family limits apply: Pro 100 MB / 5 files, Pro-media 500 MB / 50 files, Developer 2 GB / unlimited. This tool requires at least the Pro plan to begin with. A file over your cap, or a batch with too many files, is rejected before processing — split large exports or upgrade the tier.
Frequently asked questions
How does it decide which columns to scramble?
By the column / key NAME, matched case-insensitively against a fixed regex anchored to the whole name. The accepted tokens are email (and e-mail/e_mail), name, first_name, last_name, full_name, phone, telephone, mobile, address, street, city, zip, postal, ssn, and tax_id (separators -/_ allowed). It does not look at cell contents — a header named notes that happens to contain an email is not scrambled.
Can I customise or add to the field list?
No. Despite older copy that suggested otherwise, the PII field set is fixed in code and there is no UI control to add, remove, or rename which columns are scrambled. If your column is named something like email_address, rename it to a matched token (email) before scrambling, or use a value-level tool for the parts this one won't touch.
What does each PII type get replaced with?
Type-appropriate @faker-js/faker values: emails from faker.internet.email(), full names from faker.person.fullName() (first/last names from the dedicated person calls), phones from faker.phone.number(), streets/addresses from faker.location.streetAddress(), cities from faker.location.city(), zip/postal from faker.location.zipCode(), and SSN/tax-id from faker.string.numeric(9) (nine plain digits).
Is the scramble reversible?
No. There is no key, no mapping table, and no way to recover the originals from the output — even with the seed. The seed only makes the random fakes reproducible. Always keep your original file stored separately; treat the scrambled file as a one-way derivative.
What is the seed for?
Reproducibility. Leave it blank for fresh randomness on every run. Enter a number and the tool calls faker.seed(n) first, so the same input file plus the same seed yields byte-identical fakes — exactly what committed test fixtures and snapshot assertions need to stay stable across runs, machines, and CI.
Does the same real email become the same fake everywhere?
No. Replacement is per-cell, so a real value that repeats across rows (or is a join key between files) becomes a different fake each time. There is no referential-integrity mode. If you need consistent mapping across rows or files, this tool is not the right fit.
Does it handle JSON, and how is JSON detected?
Yes. JSON is walked recursively and any matched key with a string/number value is replaced; structure, nesting, and arrays are preserved, and output is pretty-printed JSON. In the browser the JSON path is chosen when the filename ends in .json; on the server-safe path format: auto also sniffs a leading [ or {. Set format to json to force it.
Can it scrub PII that's embedded inside free-text cells?
No — it operates on whole cells in name-matched columns and never scans cell contents. For email, phone, SSN, credit-card (Luhn-checked), IBAN (mod-97), and UK-NI patterns sitting inside free text, use email-phone-scrubber, which emits fixed [REDACTED_*] tags. The two tools are complementary: scrambler for structured PII columns, scrubber for free-text values.
Does my real data get uploaded?
No. The live tool runs in your browser — PapaParse and faker are loaded client-side and the file is parsed and rewritten in the tab. Your original PII never leaves the machine, which is the point for GDPR / CCPA: the scrambled copy that does travel to staging contains only fakes.
Does it accept Excel files (.xlsx / .ods)?
No. Input is CSV (comma-delimited, first row = header) or JSON. Convert spreadsheets to CSV first. The output mirrors the input: CSV in -> CSV out, JSON in -> pretty-printed JSON out, downloaded as <name>-scrambled.<ext>.
What plan and file sizes do I need?
This is a Pro-tier security tool, so it is not on the Free plan. Security-family size limits are Pro 100 MB / 5 files, Pro-media 500 MB / 50 files, and Developer 2 GB / unlimited files. Split very large exports or step up a tier if you hit the cap.
What else pairs well with this in a data-handling pipeline?
Run email-phone-scrubber on free-text columns the scrambler leaves alone. If you must move the REAL file securely instead of faking it, encrypt it with aes-256-encryptor (Web Crypto AES-GCM 256, PBKDF2 key derivation). To prove a fixture hasn't been tampered with between runs, fingerprint it with multi-hash-fingerprinter, and use entropy-analyzer to spot-check whether a file is already encrypted or compressed.
Privacy first
Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.