How to anonymize a customer csv before sharing with a vendor
- Step 1Export the customer list from your CRM or database — Pull the customer table you want to share — from Salesforce, HubSpot, your warehouse, or a raw database
COPY ... TO CSV. Keep the real export local; you'll anonymize a copy. The first row must be a header row, because every rule targets a column by its header name. - Step 2Drop the CSV onto the anonymizer above — PapaParse reads it in your browser — nothing is uploaded. The tool reads the header row and, if auto-detect is on (it is by default), pre-fills a
hashrule for every column whose name looks like PII (email,phone,name,address,dob, etc.). - Step 3Review the auto-detected columns, then add or remove rules — The panel lists auto-detected columns under the rule list. Use Add rule to cover a column auto-detect missed (e.g. a custom
loyalty_emailfield), and the trash icon to remove one you don't want touched. Each rule is a column + a strategy. - Step 4Pick a strategy per column — Hash for keys the vendor must still join/count on (customer id, email). Mask when they need a recognisable shape (keep first 1 + last 4 of a phone). Redact to blank a value to
[REDACTED]. Sequential to renumber rows asid-1,id-2. Drop to delete a column outright (home address, notes). - Step 5Set a salt if the vendor must not be able to reverse the hashes — Type a secret into the Hash salt field — it's prepended to every value before hashing, so a vendor can't precompute common-email tokens. Keep the salt private. Re-use the exact same salt on a future export if the vendor needs the tokens to match across files; change it to deliberately break cross-file linkage.
- Step 6Anonymize, verify the stats, and download — Click Anonymize CSV. The result panel shows rows in, rows out, fields anonymized, columns dropped, and the applied rules as chips. Eyeball the first-10-row preview to confirm the right columns were transformed, then Download CSV — it saves as
<name>.anon.csv. Hand that file to the vendor; keep the original.
The five anonymization strategies
Every column rule uses exactly one of these. value is the original cell content; behaviour is taken directly from the tool's logic.
| Strategy | What it outputs | When to use it for a vendor share | Reversible by you? |
|---|---|---|---|
| Hash | A deterministic 16-char hex token (two FNV-1a digests concatenated); same input + same salt → same token | Keys the vendor must still join or count on — customer id, email — without seeing the real value | No — it's a one-way digest. You can re-derive the token from the source, but you can't recover the source from the token |
| Mask | Keeps keepStart chars at the front + keepEnd chars at the end, stars the middle (j****n); if the value is shorter than keepStart+keepEnd it becomes all stars | When the vendor needs a recognisable shape for spot-checks (last 4 of a phone, first letter of a name) | No — the starred characters are discarded |
| Redact | The literal string [REDACTED] in every cell of that column | A free-text column you want to keep as a column (so row width matches) but blank entirely | No |
| Sequential | id-1, id-2, id-3… following the row order (1-based) | When the vendor just needs a stable per-row label, not the real id — note tokens are NOT stable across files | No — and the same real value in two rows gets two different ids |
| Drop | Removes the column from the header and every row — it's gone from the output | Columns the vendor should never receive at all (home address, internal notes) | N/A — the column isn't in the file |
Auto-detected PII column names
When no explicit rules are set and auto-detect is on, columns whose header matches one of these patterns get a hash rule. Matching is case-insensitive against the header text.
| Pattern (header matches) | Example headers caught | Default action |
|---|---|---|
| email / e-mail / e_mail | Email, email, work_email, e-mail | hash |
| phone / mobile | phone, Phone Number, mobile | hash |
| name / full name / first name / last name | name, Full Name, first_name, LastName | hash |
| address / postcode / zip | address, Billing Address, postcode, zip | hash |
| ssn / social security | ssn, Social Security, social_security | hash |
| dob / birth date | dob, birth_date, DateOfBirth (via birth-date) | hash |
| credit card / iban / passport | credit_card, iban, passport | hash |
Tier limits for this tool
Free CSV limits apply browser-side. The CSV Anonymizer is a Pro feature.
| Limit | Free | Pro |
|---|---|---|
| Max file size | 2 MB | 100 MB |
| Max rows | 500 | 100,000 |
| Batch files | 2 | 10 |
| Where it runs | Your browser (no upload) | Your browser (no upload) |
Cookbook
Before/after rows from real customer-export shapes. Salts shown are placeholders; tokens are illustrative of the deterministic 16-char hex format.
Hash the email so the vendor can count distinct customers
ExampleThe agency needs to know how many unique customers are in the file and join it to a second anonymized file you send next month — but must never see real addresses. Hash with a private salt: the same email always maps to the same token, so DISTINCT and joins work; the original is unrecoverable.
Input: customer_id,email,plan 1001,jane@acme.com,Pro 1002,bob@globex.com,Free 1003,jane@acme.com,Pro Rule: email → hash (salt: "vendor-q2-2026") Output: customer_id,email,plan 1001,a3f10b9c4e7d2118,Pro 1002,7c2e9f04b1a83dd6,Free 1003,a3f10b9c4e7d2118,Pro → jane's two rows share one token; vendor counts 2 distinct.
Drop the home address, mask the phone
ExampleThe vendor needs to validate phone-number shape (last 4 for support callbacks) but has no business seeing home addresses at all. Mask the phone keeping the last 4; drop the address column entirely so it never appears in the file.
Input: name,phone,home_address Jane Doe,+15551234567,12 Oak St Bob Lee,+15559876543,98 Pine Ave Rules: phone → mask (keepStart 0, keepEnd 4); home_address → drop (name auto-detected → hash) Output: name,phone 5f9c...d2,***********4567 b4a1...e8,***********6543 → address column gone; phone keeps last 4.
Sequential ids for a row-level sample with no real keys
ExampleYou're handing a data scientist a sample to prototype a model; they don't need real ids, just a stable per-row label. Sequential renumbers each row as id-1, id-2 in row order. Note: it does not deduplicate — identical inputs get different ids.
Input: account_ref,signup_source ACME-7781,paid ACME-7781,organic GLBX-2210,paid Rule: account_ref → sequential Output: account_ref,signup_source id-1,paid id-2,organic id-3,paid → the two ACME-7781 rows became id-1 and id-2 (not merged).
Keep tokens matching across two monthly files
ExampleThe vendor receives a file each month and needs last month's customer tokens to line up with this month's so they can track retention. Re-use the exact same salt both months: deterministic hashing guarantees the same email → same token across files.
May export rule: email → hash (salt: "retention-2026") jane@acme.com → a3f10b9c4e7d2118 June export rule: email → hash (salt: "retention-2026") jane@acme.com → a3f10b9c4e7d2118 (identical) → vendor joins May.token = June.token to track Jane across months. Change the salt and the tokens diverge — use that to break linkage.
Let auto-detect do the first pass, then refine
ExampleAuto-detect pre-fills hash rules for the obvious PII headers. You then drop the columns the vendor doesn't need and leave the analytic columns untouched.
Input headers: email,full_name,signup_date,plan,internal_notes Auto-detect pre-fills: email → hash full_name → hash You add: internal_notes → drop Untouched (no rule): signup_date, plan Output headers: email,full_name,signup_date,plan → email/full_name hashed, notes dropped, dates+plan verbatim.
Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Auto-detect is ignored the moment you add any explicit rule
Behaviour to knowAuto-detect only runs when you have zero explicit rules. As soon as you add a single rule, auto-detect is disabled entirely — so if you add one rule and assume email is still being hashed automatically, it won't be. After adding any rule, add explicit rules for every column you want anonymized. The result panel's 'Applied rules' chips show exactly what was acted on — check them.
Hashing is FNV-1a, not a cryptographic hash
Security limitationThe hash is two FNV-1a digests (forward + reversed seed) concatenated to 16 hex chars. FNV-1a is fast and deterministic but not cryptographically secure — without a salt, a motivated party could brute-force common values (emails from a known list). Always set a private salt for vendor shares, and treat the tokens as pseudonyms, not as cryptographically protected data. For regulated data needing certified de-identification, this tool is not a substitute for a compliance-reviewed process.
Same value maps to the same token (a feature, sometimes a leak)
Behaviour to knowDeterminism is what lets the vendor count distinct and join — but it also means a column with low cardinality (e.g. gender, city) leaks its distribution: identical inputs are visibly identical tokens, so the vendor can see how many people share each value even without knowing the value. If that distribution itself is sensitive, use redact or drop instead of hash for that column.
Mask of a short value becomes all stars
Behaviour to knowIf keepStart + keepEnd is greater than or equal to the value's length, the whole value is replaced with stars matching its length. So masking a 3-character value with keepStart 2 / keepEnd 2 gives ***, not 3 characters of original. Set keep counts smaller than the shortest value in the column if you need any characters to survive.
A rule targets a column name that isn't in the file
No-opRules match by exact header text. If the header is Email but your rule says email, or the column was renamed upstream, the rule simply never matches and that column passes through untouched. Always confirm the rule's column dropdown shows the real header (the UI populates it from the file), and verify in the preview that the intended column changed.
Sequential does not deduplicate
Behaviour to knowsequential assigns id-{rowIndex+1} purely by position, so two rows with the identical real value get two different ids, and the ids are not stable across re-runs or across files. If you need a stable, value-based pseudonym that's identical for identical inputs, use hash instead. To collapse duplicate rows first, run csv-deduplicator before anonymizing.
File over 2 MB or 500 rows on free tier
BlockedThis tool is a Pro feature, and the free CSV limits are 2 MB / 500 rows. A larger customer export is blocked until you upgrade or trim it. To shrink first, take a representative sample with csv-row-limiter or split into chunks with csv-row-splitter, anonymize each, and recombine.
Quotes and special characters in cells
HandledParsing is RFC-4180-aware via PapaParse, so commas, quotes, and newlines inside properly-quoted cells are read correctly. The output is re-serialised; cells you didn't put a rule on pass through with their original content. If a value needs reformatting (e.g. stripping wrapper brackets) before sharing, do that with csv-find-replace first, then anonymize.
Empty input file
HandledAn empty CSV (no rows) returns an empty result with zero rows in and out — no error. If you expected data, the file likely failed to export or has only a header. Confirm the source export actually contains rows before anonymizing.
Frequently asked questions
Will the original file with real names and emails ever be uploaded?
No. The CSV Anonymizer runs entirely in your browser via PapaParse — the source file is parsed and transformed locally, and only the anonymized output is downloadable. The unredacted file never reaches a server. The only thing saved server-side is a single usage counter (no content) for signed-in dashboard stats.
Can the vendor reverse the hashes back to real emails?
Not directly — the hash is a one-way digest. But because it's a fast, non-cryptographic FNV-1a hash, a vendor with a list of likely emails could brute-force matches if you used no salt. Always set a private salt for vendor shares: it's prepended to every value before hashing, so the vendor can't precompute tokens for common values. Treat the tokens as pseudonyms, not as strongly encrypted data.
How does the vendor count distinct customers if the email is hashed?
Hashing is deterministic: the same email plus the same salt always produces the same 16-character token. So COUNT(DISTINCT email_token) gives the correct number of unique customers, and the vendor can join two of your anonymized files on the token — all without ever seeing a real address. That's the main reason to choose hash over redact for a key column.
What's the difference between redact and drop?
Redact keeps the column but replaces every value with the literal [REDACTED] — useful when you want the row width and column to survive. Drop removes the column from the header and every row entirely, so it's not in the output at all. Use drop for data the vendor should never receive (home address); use redact when the column's presence matters but its content doesn't.
How do I make tokens match across two files I send a month apart?
Use the exact same salt both times. Because hashing is deterministic, jane@acme.com with salt retention-2026 produces the identical token in May and June, so the vendor can join the two files to track the same customer over time. Conversely, change the salt to deliberately break cross-file linkage.
Auto-detect missed my custom email column — why?
Auto-detect matches header names against a fixed set of PII patterns (email, phone, name, address, ssn, dob, credit card, iban, passport, and a few more). A header like loyalty_contact won't match because it doesn't contain a recognised PII word. Just add an explicit rule for it with the strategy you want. Remember: adding any explicit rule disables auto-detect for the rest of the columns, so add rules for all the columns you care about.
Does masking keep the @ and domain of an email?
Mask is character-position based, not email-aware. It keeps keepStart characters from the front and keepEnd from the end and stars everything in between — it has no concept of @ or domain. So masking an email keeping the last 4 might leave ...m.com depending on the address. If you need email-shaped masking specifically, hash is usually the better choice for a vendor share; mask suits values where a fixed-position prefix/suffix is meaningful (phone, card).
Can I anonymize multiple columns with different strategies in one go?
Yes — that's the design. Add one rule per column, each with its own strategy: hash the email, mask the phone, redact the notes, drop the address. They all apply in a single pass, and the result panel lists every applied rule as a chip so you can confirm nothing was missed.
What does the output file get named?
The download is named after your input with .anon.csv appended — e.g. customers.csv becomes customers.anon.csv. That makes it easy to keep the anonymized copy distinct from the original. Keep the original local and share only the .anon.csv.
Is there a row or size cap?
On the free tier, CSV tools cap at 2 MB and 500 rows, and the anonymizer itself is a Pro feature. Pro raises the cap to 100 MB / 100,000 rows. For a larger customer export, sample it down with csv-row-limiter or split it with csv-row-splitter first, anonymize each part, then recombine with csv-merger.
Should I deduplicate before anonymizing?
If you plan to use sequential ids, yes — sequential numbers rows by position and won't merge duplicates, so dedupe first with csv-deduplicator. If you're hashing, you don't have to: duplicates naturally collapse to the same token, so the vendor can still count distinct correctly even with duplicate rows present.
Can I automate this in a pipeline so vendor files are anonymized on a schedule?
Yes — GET /api/v1/tools/csv-anonymizer returns the option schema; pair the @jadapps/runner once and POST the payload to 127.0.0.1:9789/v1/tools/csv-anonymizer/run. The data is processed by the local runner on your machine, so real PII never reaches JAD's servers. A common pipeline: nightly CRM export → runner anonymizes with a fixed salt → drop the .anon.csv into the vendor's shared folder.
Privacy first
Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.