How to strip real pii from customer files you hand to vendors
- Step 1Export only the columns the vendor actually needs — Pull the customer CSV or JSON from your database or admin panel. Trim columns the vendor doesn't need before scrambling — fewer columns means less surface area. The tool reads CSV (comma-delimited, first row = header) and JSON (object, array, or nested); it does not read
.xlsx/.ods, so convert spreadsheets to CSV first. - Step 2Drop the file onto the tool — PapaParse (CSV) or
JSON.parse(JSON) runs in your browser tab — the file is never sent to a server. In the browser the JSON path is taken when the filename ends in `.json`; otherwise the file is parsed as CSV. Rename a JSON export that lost its extension, or setformat: json, so it isn't mis-read as a one-column CSV. - Step 3Check your PII headers match the recognised tokens — Detection is anchored to the whole column name. A header literally named
email,phone,name,address,city,zip,ssnetc. is replaced. Compound headers likeemail_address,customer_name,home_phoneormailing_addressdo NOT match — rename them to the bare token before scrambling, or you will ship real PII to the vendor. - Step 4Set a seed only if you need reproducibility — Leave
seedblank for fresh random fakes. Enter a number (e.g.42) and the tool callsfaker.seed(42)first, so the same original file + same seed produces an identical scrambled file — handy when a vendor references a specific row and you need to regenerate the exact same output to investigate. - Step 5Scramble and confirm the replaced-field count — Every column / key whose name matches the PII regex is overwritten with a faker value; the tool reports how many fields were replaced (
itemsRedacted). If that count looks too low for the file, a PII header probably didn't match the token list — fix the header and re-run before sending. - Step 6Send the scrambled file, keep the original behind your perimeter — The result downloads as
<original-name>-scrambled.<ext>(e.g.customers.csv->customers-scrambled.csv). Attach that to the vendor. The scramble is one-way with no reverse mapping, so keep your real file stored securely inside your own environment — never send both.
What the vendor receives vs. what you keep
Side-by-side of which column types are faked before sharing and which pass through untouched. Detection is name-based against a fixed regex (lib/security/security-processor.ts PII_FIELDS_REGEX).
| Column type | Example headers | In the shared file | Why |
|---|---|---|---|
| Direct identifiers | name, first_name, last_name, email, phone | Replaced with faker fakes | Header matches a PII token -> overwritten with a type-appropriate fake |
| Location PII | address, street, city, zip, postal | Replaced with faker fakes | Matches the location branch -> street/city/zip fakes |
| Government IDs | ssn, tax_id, tax-id | Replaced with 9 random digits | faker.string.numeric(9) — NOT a real or validating SSN |
| Business metrics | mrr, plan, churn, seats | Preserved exactly | Not PII tokens -> passed through, so the vendor's analysis still works |
| Keys & timestamps | customer_id, created_at, region | Preserved exactly | Not PII tokens -> joins, cohorts and dates survive |
| Compound PII headers | email_address, customer_name, home_phone | Preserved (NOT matched) | Anchored regex expects bare tokens -> rename before scrambling |
The complete control surface
Every control this tool exposes, from lib/security/security-tool-schemas.ts. Anything not listed does not exist as a UI option.
| Control | Type / values | Default | What it actually does |
|---|---|---|---|
seed | number (optional) | (blank) | Blank = fresh randomness each run. A number calls faker.seed(n) so the same input + seed gives identical fakes. It is determinism, NOT encryption and NOT a reversible mapping |
format | enum: auto / csv / json | auto | On the server-safe path auto treats text starting with [ or { as JSON, else CSV. In-browser the JSON path is chosen by the .json extension. Set csv / json to force it |
| Field / column list | (not a control) | — | Fixed in code (PII_FIELDS_REGEX). You cannot add, remove, or rename which columns are scrambled from the UI |
Where it runs, what it accepts, and the size limits
Tool metadata from lib/security/security-tools-registry.ts and family limits from lib/tier-limits.ts. This tool is server-safe but the live tool runs in the browser, so the original file is never transmitted.
| Property | Value | Source / note |
|---|---|---|
| Minimum tier | Pro | minTier: "pro" — not on the Free plan |
| Input formats | CSV, JSON | inputType CSV; JSON via JSON.parse. No .xlsx / .ods |
| Output | Text — CSV stays CSV, JSON stays pretty-printed JSON | Downloads as <name>-scrambled.<ext> |
| Multiple files | Accepted | acceptsMultiple: true, within your tier's file count |
| Pro limits | 100 MB / 5 files | Security family, Pro tier |
| Pro-media / Developer | 500 MB / 50 files · 2 GB / unlimited | Security family, higher tiers |
Cookbook
Real before/after files for vendor-sharing scenarios. Watch which columns change (PII the vendor doesn't need) and which survive exactly (the data they do need). Faker values are illustrative — yours differ unless you set a seed.
Customer list for an analytics agency
The agency needs to build cohort dashboards. They get fake names/emails/cities but real plan, mrr, signup_date and customer_id, so their segmentation and revenue charts are accurate while no real person is exposed.
Input (customers.csv): customer_id,name,email,city,plan,mrr,signup_date 1001,Sarah Chen,sarah.chen@acme.io,Oakland,pro,49,2025-11-03 1002,Tomás Reyes,treyes@globex.com,Chicago,team,149,2026-01-17 Output (customers-scrambled.csv): customer_id,name,email,city,plan,mrr,signup_date 1001,Dr. Elena Rosales,Reanna.Lockman@yahoo.com,East Garfield,pro,49,2025-11-03 1002,Marcus Hettinger,Jaylin.Bode@gmail.com,Lake Verda,team,149,2026-01-17 PII columns faked; customer_id / plan / mrr / signup_date untouched.
Compound headers leak unless renamed
A real export from a CRM often uses email_address, customer_name, home_phone. None of these are exact PII tokens, so they are NOT matched and the real PII would ship to the vendor. The fix is to rename to the bare token before scrambling.
Input (crm_export.csv): id,customer_name,email_address,home_phone,region 7,Li Wei,li.wei@x.com,+1-206-555-0117,NW Output (crm_export-scrambled.csv): <-- DANGER id,customer_name,email_address,home_phone,region 7,Li Wei,li.wei@x.com,+1-206-555-0117,NW Nothing matched -> real PII survived (itemsRedacted = 0). Fix: rename headers to name / email / phone, then re-run: id,name,email,phone,region 7,Mavis Goldner,Lonnie_Cremin@hotmail.com,(555) 123-4567,NW
Nested vendor JSON dump
JSON is walked recursively; any key matching a PII token with a string/number value is replaced. The account_id, tier, and usage keys survive so the vendor's billing reconciliation still works. Note email_address is NOT a token and survives.
Input (account.json):
{
"account_id": "acc_551",
"tier": "enterprise",
"owner": {
"name": "Priya Nair",
"email": "priya@shop.co",
"phone": "+44 20 7946 0958"
},
"usage": { "seats": 40, "region": "eu" }
}
Output (account-scrambled.json):
{
"account_id": "acc_551",
"tier": "enterprise",
"owner": {
"name": "Dr. Elena Rosales",
"email": "Reanna.Lockman@yahoo.com",
"phone": "(555) 123-4567"
},
"usage": { "seats": 40, "region": "eu" }
}Reproducible share for bug investigation
When a vendor flags an issue in a specific row, set the same seed you'd use to regenerate the identical scrambled file from your original — so you can both look at the exact same data without you ever sending the real file.
Input (users.csv): user_id,first_name,last_name,email 7,Aisha,Khan,aisha.khan@corp.net 8,Leo,Park,leo.park@corp.net seed = 42 Output (run #1, seed 42): user_id,first_name,last_name,email 7,Brent,Schiller,Garnet_Wuckert@hotmail.com 8,Maybell,Kihn,Bridie.Hahn@yahoo.com Output (run #2, seed 42): IDENTICAL byte-for-byte.
PII inside a free-text column survives
This tool replaces whole cells in name-matched columns; it never scans cell contents. A phone number or email written inside a notes column is NOT touched because notes is not a PII token. Run those columns through the scrubber before sharing.
Input (tickets.csv): ticket_id,email,notes T-9,dana@x.com,"Call back at 415-555-0199 re: card 4111 1111 1111 1111" Output (tickets-scrambled.csv): ticket_id,email,notes T-9,Hilbert.Klein@gmail.com,"Call back at 415-555-0199 re: card 4111 1111 1111 1111" The email column was faked; the phone + card inside notes were NOT. For that, run notes through email-phone-scrubber, which emits fixed [REDACTED_PHONE] / [REDACTED_CARD] tags.
Edge cases and what actually happens
Compound PII header like `email_address` ships real data
Not matchedThe biggest risk in vendor sharing: the PII regex is anchored to the whole column name and expects bare tokens (email, name, phone, ...). CRM exports with email_address, customer_name, home_phone, or mailing_address do NOT match, so live PII passes straight through to the vendor. Always check the replaced-field count and rename compound headers to the bare token before sending.
PII written inside a free-text / notes value
By designThe tool swaps entire cells in matched columns; it never inspects cell contents. An email, phone, or card number inside a notes, comments, or description column is left intact because the column name isn't a PII token. Before sharing, run those free-text columns through email-phone-scrubber, which matches email / phone / SSN / card (Luhn) / IBAN (mod-97) / UK-NI and emits fixed [REDACTED_*] tags.
JSON file without a .json extension parsed as CSV (browser)
Mis-parseIn-browser, the JSON branch runs only when the filename ends in .json. A JSON payload saved as export.txt is handed to PapaParse and treated as a one-column CSV, so almost nothing is faked. Rename it to .json (or rely on the server-safe auto sniff of a leading [/{). Setting format: json forces the JSON path.
Replaced-field count is lower than expected
Check headersThe tool reports itemsRedacted. If a customer file comes back with a count well below the number of PII columns you expected, one or more headers didn't match the token list (usually compound names). Treat a surprisingly low count as a red flag, fix the headers, and re-run before the file leaves your machine.
Same customer appears in many rows / across two shared files
Not preservedEach matched cell gets an independent faker value, so one real customer who recurs across rows — or who is a join key between two files you send the same vendor — becomes a different fake each time. There is no consistent mapping, so cross-file or cross-row referential integrity is not preserved.
SSN / tax_id output is 9 plain digits
Expectedssn / tax_id columns are filled with faker.string.numeric(9) — nine random digits, no dashes, no checksum. They will not pass an SSN-format validator and are not real numbers. That is intended; the tool has no SSN-format option, and the point is that they're fake.
Structured address object under an `address` key (JSON)
PreservedIn the JSON walk a matched key is replaced only when its value is a string or number. If address holds a nested object ({"line1":..., "city":...}), the parent isn't overwritten — the walker recurses, so a nested city key gets faked on its own but line1 (not a token) survives. Structured addresses need PII tokens at the leaf level to be fully caught.
Malformed JSON export
ErrorJSON goes through JSON.parse. A trailing comma, single quotes, an unquoted key, or a truncated dump throws and produces nothing. Validate or repair the export first. CSV is more forgiving — PapaParse parses ragged rows rather than throwing.
Empty file or no PII columns
SupportedA CSV with headers but no PII-named columns parses fine and returns with itemsRedacted = 0 — every cell preserved. A header-only file returns just the header. No error; the zero count is your signal that nothing matched, which for vendor sharing means you should double-check the file actually contained the PII you meant to scramble.
File exceeds your tier's size or count limit
RejectedSecurity-family limits apply: Pro 100 MB / 5 files, Pro-media 500 MB / 50, Developer 2 GB / unlimited; this tool needs at least Pro. A file over your cap or a batch with too many files is rejected before processing — split the export or step up a tier.
Frequently asked questions
Is the file I send the vendor actually free of real PII?
It is for the columns whose names match the PII tokens (email, name, first_name, last_name, full_name, phone, telephone, mobile, address, street, city, zip, postal, ssn, tax_id). Those cells are overwritten with faker fakes. PII in columns with non-matching names, or PII embedded inside free-text cells, is NOT removed — so verify your headers and check the replaced-field count before sending.
Will the vendor's analysis still work on the fake file?
Yes, for everything that isn't a PII identifier. Non-PII columns — IDs, plan tiers, revenue, dates, region codes, flags — are preserved byte-for-byte, and row count, column order and data types are unchanged. The vendor's joins, cohorts, and aggregates run exactly as they would on the real file, just with fake people.
My headers are `email_address` and `customer_name` — are those scrambled?
No. The regex is anchored to the whole column name and expects bare tokens, so email_address, customer_name, home_phone and mailing_address are not matched and pass through with real data. Rename them to email, name, phone before scrambling, or you will leak live PII to the vendor.
Does my real customer data get uploaded anywhere?
No. The live tool runs in your browser — PapaParse and faker are loaded client-side and the file is parsed and rewritten in the tab. Your original PII never leaves the machine; only the fake-filled copy is downloaded for you to share.
Can the vendor reverse the scramble back to real customers?
No. There is no key, no mapping table, and no way to recover originals from the output — the seed only makes the fakes reproducible. Keep your real file stored securely inside your own environment and never send it alongside the scrambled copy.
Can I customise which columns are scrambled for a specific vendor?
No. The PII field set is fixed in code and there is no UI control to add, remove, or rename which columns are scrambled. To make a non-matching column get scrambled, rename its header to a recognised token before running the tool.
What if the vendor only needs a few columns?
Trim the file before scrambling — export only the columns the vendor needs, then scramble. Fewer columns means less PII surface and a smaller file. The tool preserves the column order from the parsed header, so the trimmed layout you send is exactly what they receive.
Does it handle PII hidden inside a notes or comments field?
No — it replaces whole cells in name-matched columns and never scans cell contents. For email, phone, SSN, credit-card (Luhn), IBAN (mod-97) and UK-NI patterns sitting inside free text, run those columns through email-phone-scrubber first; it emits fixed [REDACTED_*] tags. The two tools are complementary.
Does it accept Excel files?
No. Input is CSV (comma-delimited, first row = header) or JSON. Convert .xlsx / .ods to CSV first. Output mirrors input — CSV in -> CSV out, JSON in -> pretty-printed JSON out — downloaded as <name>-scrambled.<ext>.
Why is the SSN in my shared file just nine digits with no dashes?
Because ssn / tax_id columns are filled with faker.string.numeric(9) — nine random digits, no formatting, no checksum. That's intentional: it's a placeholder, not a valid SSN, so it's safe to share. There is no SSN-format option.
What plan and file sizes do I need?
This is a Pro-tier security tool, not on Free. Security-family limits are Pro 100 MB / 5 files, Pro-media 500 MB / 50, Developer 2 GB / unlimited. Split very large exports or step up a tier if you hit the cap.
What else helps when handing data to a third party?
Run email-phone-scrubber on free-text columns this tool leaves alone. If the vendor genuinely needs the REAL file, don't fake it — encrypt it with aes-256-encryptor (Web Crypto AES-GCM 256, PBKDF2) and share the passphrase out of band. Fingerprint the file you sent with multi-hash-fingerprinter so you can prove later exactly which bytes left your machine.
Privacy first
Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.