Anonymize a Customer CSV Before Sharing It With a Vendor

How to anonymize a customer csv before sharing with a vendor

Step 1
Export the customer list from your CRM or database — Pull the customer table you want to share — from Salesforce, HubSpot, your warehouse, or a raw database COPY ... TO CSV. Keep the real export local; you'll anonymize a copy. The first row must be a header row, because every rule targets a column by its header name.
Step 2
Drop the CSV onto the anonymizer above — PapaParse reads it in your browser — nothing is uploaded. The tool reads the header row and, if auto-detect is on (it is by default), pre-fills a hash rule for every column whose name looks like PII (email, phone, name, address, dob, etc.).
Step 3
Review the auto-detected columns, then add or remove rules — The panel lists auto-detected columns under the rule list. Use Add rule to cover a column auto-detect missed (e.g. a custom loyalty_email field), and the trash icon to remove one you don't want touched. Each rule is a column + a strategy.
Step 4
Pick a strategy per column — Hash for keys the vendor must still join/count on (customer id, email). Mask when they need a recognisable shape (keep first 1 + last 4 of a phone). Redact to blank a value to [REDACTED]. Sequential to renumber rows as id-1, id-2. Drop to delete a column outright (home address, notes).
Step 5
Set a salt if the vendor must not be able to reverse the hashes — Type a secret into the Hash salt field — it's prepended to every value before hashing, so a vendor can't precompute common-email tokens. Keep the salt private. Re-use the exact same salt on a future export if the vendor needs the tokens to match across files; change it to deliberately break cross-file linkage.
Step 6
Anonymize, verify the stats, and download — Click Anonymize CSV. The result panel shows rows in, rows out, fields anonymized, columns dropped, and the applied rules as chips. Eyeball the first-10-row preview to confirm the right columns were transformed, then Download CSV — it saves as <name>.anon.csv. Hand that file to the vendor; keep the original.

The five anonymization strategies

Every column rule uses exactly one of these. value is the original cell content; behaviour is taken directly from the tool's logic.

Strategy	What it outputs	When to use it for a vendor share	Reversible by you?
Hash	A deterministic 16-char hex token (two FNV-1a digests concatenated); same input + same salt → same token	Keys the vendor must still join or count on — customer id, email — without seeing the real value	No — it's a one-way digest. You can re-derive the token from the source, but you can't recover the source from the token
Mask	Keeps `keepStart` chars at the front + `keepEnd` chars at the end, stars the middle (`j****n`); if the value is shorter than keepStart+keepEnd it becomes all stars	When the vendor needs a recognisable shape for spot-checks (last 4 of a phone, first letter of a name)	No — the starred characters are discarded
Redact	The literal string `[REDACTED]` in every cell of that column	A free-text column you want to keep as a column (so row width matches) but blank entirely	No
Sequential	`id-1`, `id-2`, `id-3`… following the row order (1-based)	When the vendor just needs a stable per-row label, not the real id — note tokens are NOT stable across files	No — and the same real value in two rows gets two different ids
Drop	Removes the column from the header and every row — it's gone from the output	Columns the vendor should never receive at all (home address, internal notes)	N/A — the column isn't in the file

Auto-detected PII column names

When no explicit rules are set and auto-detect is on, columns whose header matches one of these patterns get a hash rule. Matching is case-insensitive against the header text.

Pattern (header matches)	Example headers caught	Default action
email / e-mail / e_mail	`Email`, `email`, `work_email`, `e-mail`	hash
phone / mobile	`phone`, `Phone Number`, `mobile`	hash
name / full name / first name / last name	`name`, `Full Name`, `first_name`, `LastName`	hash
address / postcode / zip	`address`, `Billing Address`, `postcode`, `zip`	hash
ssn / social security	`ssn`, `Social Security`, `social_security`	hash
dob / birth date	`dob`, `birth_date`, `DateOfBirth` (via birth-date)	hash
credit card / iban / passport	`credit_card`, `iban`, `passport`	hash

Tier limits for this tool

Free CSV limits apply browser-side. The CSV Anonymizer is a Pro feature.

Limit	Free	Pro
Max file size	2 MB	100 MB
Max rows	500	100,000
Batch files	2	10
Where it runs	Your browser (no upload)	Your browser (no upload)

Cookbook

Before/after rows from real customer-export shapes. Salts shown are placeholders; tokens are illustrative of the deterministic 16-char hex format.

Hash the email so the vendor can count distinct customers

Example

The agency needs to know how many unique customers are in the file and join it to a second anonymized file you send next month — but must never see real addresses. Hash with a private salt: the same email always maps to the same token, so DISTINCT and joins work; the original is unrecoverable.

Input:
customer_id,email,plan
1001,jane@acme.com,Pro
1002,bob@globex.com,Free
1003,jane@acme.com,Pro

Rule: email → hash (salt: "vendor-q2-2026")

Output:
customer_id,email,plan
1001,a3f10b9c4e7d2118,Pro
1002,7c2e9f04b1a83dd6,Free
1003,a3f10b9c4e7d2118,Pro

→ jane's two rows share one token; vendor counts 2 distinct.

Drop the home address, mask the phone

Example

The vendor needs to validate phone-number shape (last 4 for support callbacks) but has no business seeing home addresses at all. Mask the phone keeping the last 4; drop the address column entirely so it never appears in the file.

Input:
name,phone,home_address
Jane Doe,+15551234567,12 Oak St
Bob Lee,+15559876543,98 Pine Ave

Rules: phone → mask (keepStart 0, keepEnd 4); home_address → drop
(name auto-detected → hash)

Output:
name,phone
5f9c...d2,***********4567
b4a1...e8,***********6543

→ address column gone; phone keeps last 4.

Sequential ids for a row-level sample with no real keys

Example

You're handing a data scientist a sample to prototype a model; they don't need real ids, just a stable per-row label. Sequential renumbers each row as id-1, id-2 in row order. Note: it does not deduplicate — identical inputs get different ids.

Input:
account_ref,signup_source
ACME-7781,paid
ACME-7781,organic
GLBX-2210,paid

Rule: account_ref → sequential

Output:
account_ref,signup_source
id-1,paid
id-2,organic
id-3,paid

→ the two ACME-7781 rows became id-1 and id-2 (not merged).

Keep tokens matching across two monthly files

Example

The vendor receives a file each month and needs last month's customer tokens to line up with this month's so they can track retention. Re-use the exact same salt both months: deterministic hashing guarantees the same email → same token across files.

May export rule: email → hash (salt: "retention-2026")
  jane@acme.com → a3f10b9c4e7d2118

June export rule: email → hash (salt: "retention-2026")
  jane@acme.com → a3f10b9c4e7d2118  (identical)

→ vendor joins May.token = June.token to track Jane across months.
Change the salt and the tokens diverge — use that to break linkage.

Let auto-detect do the first pass, then refine

Example

Auto-detect pre-fills hash rules for the obvious PII headers. You then drop the columns the vendor doesn't need and leave the analytic columns untouched.

Input headers:
email,full_name,signup_date,plan,internal_notes

Auto-detect pre-fills:
  email → hash
  full_name → hash

You add:
  internal_notes → drop

Untouched (no rule): signup_date, plan

Output headers:
email,full_name,signup_date,plan
→ email/full_name hashed, notes dropped, dates+plan verbatim.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Auto-detect is ignored the moment you add any explicit rule

Behaviour to know

Auto-detect only runs when you have zero explicit rules. As soon as you add a single rule, auto-detect is disabled entirely — so if you add one rule and assume email is still being hashed automatically, it won't be. After adding any rule, add explicit rules for every column you want anonymized. The result panel's 'Applied rules' chips show exactly what was acted on — check them.

Hashing is FNV-1a, not a cryptographic hash

Security limitation

The hash is two FNV-1a digests (forward + reversed seed) concatenated to 16 hex chars. FNV-1a is fast and deterministic but not cryptographically secure — without a salt, a motivated party could brute-force common values (emails from a known list). Always set a private salt for vendor shares, and treat the tokens as pseudonyms, not as cryptographically protected data. For regulated data needing certified de-identification, this tool is not a substitute for a compliance-reviewed process.

Same value maps to the same token (a feature, sometimes a leak)

Behaviour to know

Determinism is what lets the vendor count distinct and join — but it also means a column with low cardinality (e.g. gender, city) leaks its distribution: identical inputs are visibly identical tokens, so the vendor can see how many people share each value even without knowing the value. If that distribution itself is sensitive, use redact or drop instead of hash for that column.

Mask of a short value becomes all stars

Behaviour to know

If keepStart + keepEnd is greater than or equal to the value's length, the whole value is replaced with stars matching its length. So masking a 3-character value with keepStart 2 / keepEnd 2 gives ***, not 3 characters of original. Set keep counts smaller than the shortest value in the column if you need any characters to survive.

A rule targets a column name that isn't in the file

No-op

Rules match by exact header text. If the header is Email but your rule says email, or the column was renamed upstream, the rule simply never matches and that column passes through untouched. Always confirm the rule's column dropdown shows the real header (the UI populates it from the file), and verify in the preview that the intended column changed.

Sequential does not deduplicate

Behaviour to know

sequential assigns id-{rowIndex+1} purely by position, so two rows with the identical real value get two different ids, and the ids are not stable across re-runs or across files. If you need a stable, value-based pseudonym that's identical for identical inputs, use hash instead. To collapse duplicate rows first, run csv-deduplicator before anonymizing.

File over 2 MB or 500 rows on free tier

Blocked

This tool is a Pro feature, and the free CSV limits are 2 MB / 500 rows. A larger customer export is blocked until you upgrade or trim it. To shrink first, take a representative sample with csv-row-limiter or split into chunks with csv-row-splitter, anonymize each, and recombine.

Quotes and special characters in cells

Handled

Parsing is RFC-4180-aware via PapaParse, so commas, quotes, and newlines inside properly-quoted cells are read correctly. The output is re-serialised; cells you didn't put a rule on pass through with their original content. If a value needs reformatting (e.g. stripping wrapper brackets) before sharing, do that with csv-find-replace first, then anonymize.

Empty input file

Handled

An empty CSV (no rows) returns an empty result with zero rows in and out — no error. If you expected data, the file likely failed to export or has only a header. Confirm the source export actually contains rows before anonymizing.

Frequently asked questions

Will the original file with real names and emails ever be uploaded?

No. The CSV Anonymizer runs entirely in your browser via PapaParse — the source file is parsed and transformed locally, and only the anonymized output is downloadable. The unredacted file never reaches a server. The only thing saved server-side is a single usage counter (no content) for signed-in dashboard stats.

Can the vendor reverse the hashes back to real emails?

Not directly — the hash is a one-way digest. But because it's a fast, non-cryptographic FNV-1a hash, a vendor with a list of likely emails could brute-force matches if you used no salt. Always set a private salt for vendor shares: it's prepended to every value before hashing, so the vendor can't precompute tokens for common values. Treat the tokens as pseudonyms, not as strongly encrypted data.

How does the vendor count distinct customers if the email is hashed?

Hashing is deterministic: the same email plus the same salt always produces the same 16-character token. So COUNT(DISTINCT email_token) gives the correct number of unique customers, and the vendor can join two of your anonymized files on the token — all without ever seeing a real address. That's the main reason to choose hash over redact for a key column.

What's the difference between redact and drop?

Redact keeps the column but replaces every value with the literal [REDACTED] — useful when you want the row width and column to survive. Drop removes the column from the header and every row entirely, so it's not in the output at all. Use drop for data the vendor should never receive (home address); use redact when the column's presence matters but its content doesn't.

How do I make tokens match across two files I send a month apart?

Use the exact same salt both times. Because hashing is deterministic, jane@acme.com with salt retention-2026 produces the identical token in May and June, so the vendor can join the two files to track the same customer over time. Conversely, change the salt to deliberately break cross-file linkage.

Auto-detect missed my custom email column — why?

Auto-detect matches header names against a fixed set of PII patterns (email, phone, name, address, ssn, dob, credit card, iban, passport, and a few more). A header like loyalty_contact won't match because it doesn't contain a recognised PII word. Just add an explicit rule for it with the strategy you want. Remember: adding any explicit rule disables auto-detect for the rest of the columns, so add rules for all the columns you care about.

Does masking keep the @ and domain of an email?

Mask is character-position based, not email-aware. It keeps keepStart characters from the front and keepEnd from the end and stars everything in between — it has no concept of @ or domain. So masking an email keeping the last 4 might leave ...m.com depending on the address. If you need email-shaped masking specifically, hash is usually the better choice for a vendor share; mask suits values where a fixed-position prefix/suffix is meaningful (phone, card).

Can I anonymize multiple columns with different strategies in one go?

Yes — that's the design. Add one rule per column, each with its own strategy: hash the email, mask the phone, redact the notes, drop the address. They all apply in a single pass, and the result panel lists every applied rule as a chip so you can confirm nothing was missed.

What does the output file get named?

The download is named after your input with .anon.csv appended — e.g. customers.csv becomes customers.anon.csv. That makes it easy to keep the anonymized copy distinct from the original. Keep the original local and share only the .anon.csv.

Is there a row or size cap?

On the free tier, CSV tools cap at 2 MB and 500 rows, and the anonymizer itself is a Pro feature. Pro raises the cap to 100 MB / 100,000 rows. For a larger customer export, sample it down with csv-row-limiter or split it with csv-row-splitter first, anonymize each part, then recombine with csv-merger.

Should I deduplicate before anonymizing?

If you plan to use sequential ids, yes — sequential numbers rows by position and won't merge duplicates, so dedupe first with csv-deduplicator. If you're hashing, you don't have to: duplicates naturally collapse to the same token, so the vendor can still count distinct correctly even with duplicate rows present.

Can I automate this in a pipeline so vendor files are anonymized on a schedule?

Yes — GET /api/v1/tools/csv-anonymizer returns the option schema; pair the @jadapps/runner once and POST the payload to 127.0.0.1:9789/v1/tools/csv-anonymizer/run. The data is processed by the local runner on your machine, so real PII never reaches JAD's servers. A common pipeline: nightly CRM export → runner anonymizes with a fixed salt → drop the .anon.csv into the vendor's shared folder.

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

How to anonymize a customer csv before sharing with a vendor

Step 1
Export the customer list from your CRM or database — Pull the customer table you want to share — from Salesforce, HubSpot, your warehouse, or a raw database COPY ... TO CSV. Keep the real export local; you'll anonymize a copy. The first row must be a header row, because every rule targets a column by its header name.
Step 2
Drop the CSV onto the anonymizer above — PapaParse reads it in your browser — nothing is uploaded. The tool reads the header row and, if auto-detect is on (it is by default), pre-fills a hash rule for every column whose name looks like PII (email, phone, name, address, dob, etc.).
Step 3
Review the auto-detected columns, then add or remove rules — The panel lists auto-detected columns under the rule list. Use Add rule to cover a column auto-detect missed (e.g. a custom loyalty_email field), and the trash icon to remove one you don't want touched. Each rule is a column + a strategy.
Step 4
Pick a strategy per column — Hash for keys the vendor must still join/count on (customer id, email). Mask when they need a recognisable shape (keep first 1 + last 4 of a phone). Redact to blank a value to [REDACTED]. Sequential to renumber rows as id-1, id-2. Drop to delete a column outright (home address, notes).
Step 5
Set a salt if the vendor must not be able to reverse the hashes — Type a secret into the Hash salt field — it's prepended to every value before hashing, so a vendor can't precompute common-email tokens. Keep the salt private. Re-use the exact same salt on a future export if the vendor needs the tokens to match across files; change it to deliberately break cross-file linkage.
Step 6
Anonymize, verify the stats, and download — Click Anonymize CSV. The result panel shows rows in, rows out, fields anonymized, columns dropped, and the applied rules as chips. Eyeball the first-10-row preview to confirm the right columns were transformed, then Download CSV — it saves as <name>.anon.csv. Hand that file to the vendor; keep the original.

The five anonymization strategies

Every column rule uses exactly one of these. value is the original cell content; behaviour is taken directly from the tool's logic.

Strategy	What it outputs	When to use it for a vendor share	Reversible by you?
Hash	A deterministic 16-char hex token (two FNV-1a digests concatenated); same input + same salt → same token	Keys the vendor must still join or count on — customer id, email — without seeing the real value	No — it's a one-way digest. You can re-derive the token from the source, but you can't recover the source from the token
Mask	Keeps `keepStart` chars at the front + `keepEnd` chars at the end, stars the middle (`j****n`); if the value is shorter than keepStart+keepEnd it becomes all stars	When the vendor needs a recognisable shape for spot-checks (last 4 of a phone, first letter of a name)	No — the starred characters are discarded
Redact	The literal string `[REDACTED]` in every cell of that column	A free-text column you want to keep as a column (so row width matches) but blank entirely	No
Sequential	`id-1`, `id-2`, `id-3`… following the row order (1-based)	When the vendor just needs a stable per-row label, not the real id — note tokens are NOT stable across files	No — and the same real value in two rows gets two different ids
Drop	Removes the column from the header and every row — it's gone from the output	Columns the vendor should never receive at all (home address, internal notes)	N/A — the column isn't in the file

Auto-detected PII column names

When no explicit rules are set and auto-detect is on, columns whose header matches one of these patterns get a hash rule. Matching is case-insensitive against the header text.

Pattern (header matches)	Example headers caught	Default action
email / e-mail / e_mail	`Email`, `email`, `work_email`, `e-mail`	hash
phone / mobile	`phone`, `Phone Number`, `mobile`	hash
name / full name / first name / last name	`name`, `Full Name`, `first_name`, `LastName`	hash
address / postcode / zip	`address`, `Billing Address`, `postcode`, `zip`	hash
ssn / social security	`ssn`, `Social Security`, `social_security`	hash
dob / birth date	`dob`, `birth_date`, `DateOfBirth` (via birth-date)	hash
credit card / iban / passport	`credit_card`, `iban`, `passport`	hash

Tier limits for this tool

Free CSV limits apply browser-side. The CSV Anonymizer is a Pro feature.

Limit	Free	Pro
Max file size	2 MB	100 MB
Max rows	500	100,000
Batch files	2	10
Where it runs	Your browser (no upload)	Your browser (no upload)

Cookbook

Before/after rows from real customer-export shapes. Salts shown are placeholders; tokens are illustrative of the deterministic 16-char hex format.

Hash the email so the vendor can count distinct customers

Example

Input:
customer_id,email,plan
1001,jane@acme.com,Pro
1002,bob@globex.com,Free
1003,jane@acme.com,Pro

Rule: email → hash (salt: "vendor-q2-2026")

Output:
customer_id,email,plan
1001,a3f10b9c4e7d2118,Pro
1002,7c2e9f04b1a83dd6,Free
1003,a3f10b9c4e7d2118,Pro

→ jane's two rows share one token; vendor counts 2 distinct.

Drop the home address, mask the phone

Example

Input:
name,phone,home_address
Jane Doe,+15551234567,12 Oak St
Bob Lee,+15559876543,98 Pine Ave

Rules: phone → mask (keepStart 0, keepEnd 4); home_address → drop
(name auto-detected → hash)

Output:
name,phone
5f9c...d2,***********4567
b4a1...e8,***********6543

→ address column gone; phone keeps last 4.

Sequential ids for a row-level sample with no real keys

Example

Input:
account_ref,signup_source
ACME-7781,paid
ACME-7781,organic
GLBX-2210,paid

Rule: account_ref → sequential

Output:
account_ref,signup_source
id-1,paid
id-2,organic
id-3,paid

→ the two ACME-7781 rows became id-1 and id-2 (not merged).

Keep tokens matching across two monthly files

Example

May export rule: email → hash (salt: "retention-2026")
  jane@acme.com → a3f10b9c4e7d2118

June export rule: email → hash (salt: "retention-2026")
  jane@acme.com → a3f10b9c4e7d2118  (identical)

→ vendor joins May.token = June.token to track Jane across months.
Change the salt and the tokens diverge — use that to break linkage.

Let auto-detect do the first pass, then refine

Example

Auto-detect pre-fills hash rules for the obvious PII headers. You then drop the columns the vendor doesn't need and leave the analytic columns untouched.

Input headers:
email,full_name,signup_date,plan,internal_notes

Auto-detect pre-fills:
  email → hash
  full_name → hash

You add:
  internal_notes → drop

Untouched (no rule): signup_date, plan

Output headers:
email,full_name,signup_date,plan
→ email/full_name hashed, notes dropped, dates+plan verbatim.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Auto-detect is ignored the moment you add any explicit rule

Behaviour to know

Hashing is FNV-1a, not a cryptographic hash

Security limitation

Same value maps to the same token (a feature, sometimes a leak)

Behaviour to know

Mask of a short value becomes all stars

Behaviour to know

A rule targets a column name that isn't in the file

No-op

Sequential does not deduplicate

Behaviour to know

File over 2 MB or 500 rows on free tier

Blocked

Quotes and special characters in cells

Handled

Empty input file

Handled

Frequently asked questions

Will the original file with real names and emails ever be uploaded?

Can the vendor reverse the hashes back to real emails?

How does the vendor count distinct customers if the email is hashed?

What's the difference between redact and drop?

How do I make tokens match across two files I send a month apart?

Auto-detect missed my custom email column — why?

Does masking keep the @ and domain of an email?

Can I anonymize multiple columns with different strategies in one go?

What does the output file get named?

Is there a row or size cap?

Should I deduplicate before anonymizing?

Can I automate this in a pipeline so vendor files are anonymized on a schedule?

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

Anonymize a Customer CSV Before Sharing With a Vendor

How to anonymize a customer csv before sharing with a vendor

The five anonymization strategies

Auto-detected PII column names

Tier limits for this tool

Cookbook

Hash the email so the vendor can count distinct customers

Drop the home address, mask the phone

Sequential ids for a row-level sample with no real keys

Keep tokens matching across two monthly files

Let auto-detect do the first pass, then refine

Errors and edge cases

Auto-detect is ignored the moment you add any explicit rule

Hashing is FNV-1a, not a cryptographic hash

Same value maps to the same token (a feature, sometimes a leak)

Mask of a short value becomes all stars

A rule targets a column name that isn't in the file

Sequential does not deduplicate

File over 2 MB or 500 rows on free tier

Quotes and special characters in cells

Empty input file

Frequently asked questions

Will the original file with real names and emails ever be uploaded?

Can the vendor reverse the hashes back to real emails?

How does the vendor count distinct customers if the email is hashed?

What's the difference between redact and drop?

How do I make tokens match across two files I send a month apart?

Auto-detect missed my custom email column — why?

Does masking keep the @ and domain of an email?

Can I anonymize multiple columns with different strategies in one go?

What does the output file get named?

Is there a row or size cap?

Should I deduplicate before anonymizing?

Can I automate this in a pipeline so vendor files are anonymized on a schedule?

Privacy first

Related guides

Anonymize a Customer CSV Before Sharing With a Vendor

How to anonymize a customer csv before sharing with a vendor

The five anonymization strategies

Auto-detected PII column names

Tier limits for this tool

Cookbook

Hash the email so the vendor can count distinct customers

Drop the home address, mask the phone

Sequential ids for a row-level sample with no real keys

Keep tokens matching across two monthly files

Let auto-detect do the first pass, then refine

Errors and edge cases

Auto-detect is ignored the moment you add any explicit rule

Hashing is FNV-1a, not a cryptographic hash

Same value maps to the same token (a feature, sometimes a leak)

Mask of a short value becomes all stars

A rule targets a column name that isn't in the file

Sequential does not deduplicate

File over 2 MB or 500 rows on free tier

Quotes and special characters in cells

Empty input file

Frequently asked questions

Will the original file with real names and emails ever be uploaded?

Can the vendor reverse the hashes back to real emails?

How does the vendor count distinct customers if the email is hashed?

What's the difference between redact and drop?

How do I make tokens match across two files I send a month apart?

Auto-detect missed my custom email column — why?

Does masking keep the @ and domain of an email?

Can I anonymize multiple columns with different strategies in one go?

What does the output file get named?

Is there a row or size cap?

Should I deduplicate before anonymizing?

Can I automate this in a pipeline so vendor files are anonymized on a schedule?

Privacy first

Related guides