Mask PII in a CSV for a GDPR / CCPA Data Request — Free Browser Tool

How to mask pii in a csv for a privacy request

Step 1
Identify which columns are personal data under your obligation — List the direct identifiers (name, email, national ID, passport, IBAN) and quasi-identifiers (DOB, postcode) in your export. GDPR/CCPA scope is broader than 'name and email' — a postcode plus DOB plus gender can re-identify someone. Decide column-by-column whether each needs pseudonymising (hash/mask) or removing (redact/drop).
Step 2
Drop the CSV onto the anonymizer above — It parses in your browser — the personal record is never uploaded. Auto-detect (on by default) pre-fills hash rules for headers it recognises as PII. Treat that as a starting point, not a complete legal assessment — it only matches header names, not content.
Step 3
Decide hash vs redact vs drop for each identifier — Use hash when analysts must still link/count records (remember: with the salt it's still personal data). Use redact to blank a value to [REDACTED] while keeping the column. Use drop to remove the column entirely. For a data-subject access response you usually hash internal ids and drop or redact third-party PII that isn't the requester's.
Step 4
Set and safeguard the salt — For pseudonymisation, set a salt and store it separately from the output — the salt is effectively the key that makes the tokens reversible-by-re-derivation, so under GDPR it must be protected like the additional information that re-links pseudonymous data. If you want the tokens to be one-time and non-linkable, use a fresh random salt and don't keep it.
Step 5
Mask where a verifiable tail is required — For columns where a human still needs to confirm identity (last 4 of a card or national ID), use mask with keepEnd 4 and keepStart 0. Remember that masking a value shorter than your keep counts produces all stars, so set keep counts below the shortest value length.
Step 6
Anonymize, audit the stats, and download — Click Anonymize CSV. Record the stats (fields anonymized, columns dropped, applied rules) as part of your processing log — they document what was done. Verify the preview, then Download CSV (saved as <name>.anon.csv). Store the salt and the output according to your retention policy.

Strategy vs privacy operation

Map each strategy to the GDPR/CCPA concept it most closely serves. This is an engineering description of the tool's behaviour, not legal advice.

Strategy	Output	Privacy operation	Still personal data?
Hash	Deterministic 16-char hex token; same value+salt → same token	Pseudonymisation — linkable if you hold the salt/source	Yes, if you can re-derive it (you keep the salt or source)
Mask	Keep front/back chars, star the middle (`****4567`)	Partial redaction — leaves a verifiable tail	Depends on how much is left; a tail like last-4 can still aid re-identification
Redact	Literal `[REDACTED]` in every cell	Field-level removal, column retained	No value remains in that column
Drop	Column removed from header and all rows	Field-level removal, column gone	Column absent from the file
Sequential	`id-1`, `id-2`… by row position	Pseudonymisation without a stable mapping (positional only)	Token isn't tied to value; linkage requires the original file's row order

Regulated headers auto-detect recognises

With no explicit rules and auto-detect on, these header patterns get a hash rule. Match is case-insensitive on the header text only.

Category	Header patterns matched	Suggested action for compliance
Contact identifiers	email, e-mail, phone, mobile	hash (linkable) or redact (remove)
Names	name, full name, first name, last name	hash or drop depending on request type
National / financial IDs	ssn, social security, passport, iban, credit card	drop or redact — rarely needed by analysts
Location	address, postcode, zip	drop quasi-identifiers; mask postcode if coarse geo is needed
Dates of birth	dob, birth date	drop or generalise externally — exact DOB is a strong quasi-identifier

Tier limits

Browser-side CSV limits. The CSV Anonymizer is a Pro feature.

Limit	Free	Pro
Max file size	2 MB	100 MB
Max rows	500	100,000
Processing location	Your browser	Your browser

Cookbook

Compliance-flavoured before/after rows. This is how the tool behaves, not a legal sign-off — confirm the operation matches your obligation.

Drop the national ID, hash the email for analysts

Example

Analysts need to keep counting distinct customers but the SSN must leave the dataset entirely. Hash the email (pseudonymisation — keep the salt secured) and drop the SSN column so it's gone from the file.

Input:
email,ssn,plan
jane@acme.com,123-45-6789,Pro
bob@globex.com,987-65-4321,Free

Rules: email → hash (salt: "priv-2026"); ssn → drop

Output:
email,plan
a3f10b9c4e7d2118,Pro
7c2e9f04b1a83dd6,Free

→ ssn column removed; email is a linkable pseudonym.

Mask a card number to last 4 for support verification

Example

A support workflow needs the last 4 digits to verify a caller, but the full PAN must not be in the file. Mask with keepStart 0 / keepEnd 4.

Input:
name,card_number
Jane Doe,4111111111111234

Rules: card_number → mask (keepStart 0, keepEnd 4)
       name → hash

Output:
name,card_number
5f9c...d2,************1234

→ only the last 4 survive; name is pseudonymised.

Redact a free-text notes column that may contain third-party PII

Example

A notes field might mention other people (not the data subject). For a data-subject access response you keep the column structure but blank its content with redact.

Input:
ticket_id,notes
T-1,"Spoke to Jane; her sister Mary called too"
T-2,"Refund issued"

Rule: notes → redact

Output:
ticket_id,notes
T-1,[REDACTED]
T-2,[REDACTED]

→ column kept, all third-party-risk text removed.

One-time non-linkable tokens with a throwaway salt

Example

For a release where the data must NOT be re-linkable to your records, use a fresh random salt and discard it. The tokens are stable within this one file but you can't reproduce them later.

Rule: customer_id → hash (salt: "4Kq9...random...discard")

  ACME-7781 → 1b8e44c9f0a2d7e3 (this file only)

Discard the salt → you can no longer derive the token from the
source, so the tokens are not re-linkable by you on a future run.
(Within this file, identical ids still share a token.)

Generalise DOB outside the tool, then drop the exact date

Example

Exact DOB is a strong quasi-identifier. The anonymizer can't bucket dates into year-only — do that with another tool first, then drop the precise column.

Step 1 (csv-find-replace, regex):
  dob '1987-03-14' → year '1987'  in a new birth_year column

Step 2 (this tool):
  Rule: dob → drop

Output keeps birth_year, removes the exact dob column.
→ coarser data, lower re-identification risk.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Hashing is pseudonymisation, not anonymisation

Legal caveat

Under GDPR, a hashed value you can re-derive (because you keep the salt and source) is still personal data — it's pseudonymous, not anonymous. Don't treat a hashed export as out-of-scope. If you need data genuinely outside GDPR's reach, you must break the link entirely (drop/redact the identifier, discard the salt) and ensure remaining quasi-identifiers can't re-identify someone.

FNV-1a is not a compliance-grade hash

Security limitation

The tool uses FNV-1a (fast, non-cryptographic). Without a salt it's brute-forceable for low-entropy values; even with a salt it isn't a certified de-identification method. For regulated, audited de-identification (HIPAA Safe Harbor, formal DPIA-backed pseudonymisation), use a process your compliance team has reviewed. This tool is for practical pseudonymisation of internal/analyst-facing exports.

Quasi-identifiers can re-identify after you remove direct ones

Legal caveat

Dropping name and email isn't enough if DOB + postcode + gender remain — that combination can single out individuals. The tool operates per column and has no view of combined re-identification risk. Assess the whole row: drop or generalise quasi-identifiers (use csv-find-replace to coarsen postcodes/dates) before considering the file de-identified.

Adding one rule disables auto-detect for the rest

Behaviour to know

Auto-detect only runs when there are zero explicit rules. The moment you add a rule, auto-detect stops — so a regulated column you assumed was being auto-hashed will pass through untouched. After adding any rule, add explicit rules for every PII column, and confirm via the 'Applied rules' chips that each regulated field was acted on.

Mask leaves a re-identifying tail

Legal caveat

Keeping the last 4 of a national ID or card helps verification but is still partial PII — combined with other fields it can aid re-identification, and some regulators treat masked-but-partial identifiers as personal data. Use mask only where the verifiable tail is genuinely needed; prefer redact/drop where it isn't.

Sequential ids depend on row order for any linkage

Behaviour to know

Sequential assigns ids by position (id-1, id-2), so the only way to map a token back is to have the original file in the same row order. That makes it weakly linkable and not stable across files. For an access request where you must reliably correlate rows, hash a key column instead.

Free tier blocks the file

Blocked

This tool is Pro, and free CSV limits are 2 MB / 500 rows. A full personal-data export usually exceeds that. Upgrade, or for a one-off redaction split the file with csv-row-splitter, process each part, and recombine — keeping every intermediate file local.

The salt is the key — losing or leaking it matters

Operational risk

If you keep tokens linkable, the salt is the additional information that re-links them. Leak it and the pseudonymisation is undone for anyone who also has a candidate value list; lose it and you can no longer reproduce the tokens for a future linked export. Store the salt with the same care as any encryption key, separate from the output file.

Header name mismatch leaves a regulated column untouched

Silent miss

Rules match the exact header text. If your export labels the column National Insurance No. but you wrote a rule for ssn, nothing happens to it. Always pick the column from the dropdown (populated from the file) and verify in the preview that every regulated column changed — a silent miss here is a compliance gap.

Frequently asked questions

Is a hashed column anonymous under GDPR?

No — if you can re-derive the token (you keep the salt and have the source), it's pseudonymous, which GDPR still treats as personal data. Hashing reduces exposure and is a recognised safeguard, but it doesn't take the data out of scope. To approach genuine anonymisation you must remove the direct identifiers (drop/redact), discard the salt, and ensure remaining fields can't re-identify someone.

Which strategy should I use for a national ID or passport number?

Usually drop (remove the column) or redact (blank to [REDACTED]) — analysts rarely need the raw value. If a verification workflow needs a tail, mask keeping the last 4. Avoid hashing a national ID unless you specifically need to link records on it, and even then guard the salt, because a hashed-but-linkable national ID is still personal data.

Does processing the file create a data transfer I have to log?

The anonymisation itself happens in your browser — the file isn't uploaded, so the tool doesn't introduce a third-party transfer. You should still log the processing activity (what columns, what operation) per your records-of-processing obligations; the result panel's applied-rules and dropped-columns stats make a useful audit note.

How do I make tokens non-linkable for a public-ish release?

Use a fresh random salt for that single run and discard it afterwards. Identical values within the one file still share a token (so distinct-counts work), but because you no longer hold the salt you can't reproduce or reverse the mapping later. Combine with dropping quasi-identifiers to lower re-identification risk.

Can the tool bucket dates of birth into age ranges?

No — it works per cell and doesn't transform date values into ranges. To generalise DOB (e.g. keep year only or bucket into ranges) use csv-find-replace with a regex first, then drop the exact-date column with this tool. Exact DOB is a strong quasi-identifier, so generalising or dropping it matters.

Is FNV-1a hashing strong enough for compliance?

It's adequate for practical internal pseudonymisation when combined with a private salt, but it's not a cryptographic or certified de-identification method. For audited regimes (HIPAA Safe Harbor, formal DPIA pseudonymisation) follow a process your compliance team has signed off. Treat this tool as a fast way to reduce exposure in analyst-facing exports, not as legal de-identification.

Will dropping a column shift my other data?

No — drop removes the column cleanly from the header and every row, and the remaining columns keep their values and alignment. Parsing is RFC-4180-aware, so quoted commas/newlines in surviving columns are preserved. Check the preview to confirm the right column went and the rest line up.

Can I keep a verification tail without exposing the full value?

Yes — mask with keepStart 0 and keepEnd 4 leaves the last four characters and stars the rest. Be aware that a value shorter than your keep counts becomes all stars, so set keepEnd below the shortest value length in the column. And remember a tail is still partial PII — only keep it where a workflow genuinely needs it.

Does anything about the personal data leave my machine?

No content does. PapaParse parses and transforms the file locally in your browser, and only the anonymised output is downloadable. The single server-side counter records that a tool ran (no content) for signed-in stats and can be opted out of. This is by design: the file you're de-identifying is itself the sensitive asset.

What if my export is bigger than the row limit?

Free tier caps at 2 MB / 500 rows and the tool is Pro; Pro allows 100 MB / 100,000 rows. For larger personal-data exports on a one-off basis, split with csv-row-splitter, anonymise each chunk with the same salt, then recombine with csv-merger, keeping every intermediate file local.

How do I document what I redacted for the request file?

After running, the result panel shows fields anonymised, columns dropped, and the exact applied rules as chips. Capture that as your processing note. For a complete audit trail you'd also record the salt handling (kept vs discarded) and which columns were dropped versus pseudonymised.

Can I run this on a schedule for recurring redaction jobs?

Yes — GET /api/v1/tools/csv-anonymizer returns the option schema; pair the @jadapps/runner and POST to 127.0.0.1:9789/v1/tools/csv-anonymizer/run. Everything runs on your local runner, so personal data never reaches JAD's servers — appropriate for a compliance pipeline that redacts nightly exports before they reach a reporting store.

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

Mask PII in a CSV for a Privacy Request

How to mask pii in a csv for a privacy request

Strategy vs privacy operation

Regulated headers auto-detect recognises

Tier limits

Cookbook

Drop the national ID, hash the email for analysts

Mask a card number to last 4 for support verification

Redact a free-text notes column that may contain third-party PII

One-time non-linkable tokens with a throwaway salt

Generalise DOB outside the tool, then drop the exact date

Errors and edge cases

Hashing is pseudonymisation, not anonymisation

FNV-1a is not a compliance-grade hash

Quasi-identifiers can re-identify after you remove direct ones

Adding one rule disables auto-detect for the rest

Mask leaves a re-identifying tail

Sequential ids depend on row order for any linkage

Free tier blocks the file

The salt is the key — losing or leaking it matters

Header name mismatch leaves a regulated column untouched

Frequently asked questions

Is a hashed column anonymous under GDPR?

Which strategy should I use for a national ID or passport number?

Does processing the file create a data transfer I have to log?

How do I make tokens non-linkable for a public-ish release?

Can the tool bucket dates of birth into age ranges?

Is FNV-1a hashing strong enough for compliance?

Will dropping a column shift my other data?

Can I keep a verification tail without exposing the full value?

Does anything about the personal data leave my machine?

What if my export is bigger than the row limit?

How do I document what I redacted for the request file?

Can I run this on a schedule for recurring redaction jobs?

Privacy first

Related guides

Mask PII in a CSV for a Privacy Request

How to mask pii in a csv for a privacy request

Strategy vs privacy operation

Regulated headers auto-detect recognises

Tier limits

Cookbook

Drop the national ID, hash the email for analysts

Mask a card number to last 4 for support verification

Redact a free-text notes column that may contain third-party PII

One-time non-linkable tokens with a throwaway salt

Generalise DOB outside the tool, then drop the exact date

Errors and edge cases

Hashing is pseudonymisation, not anonymisation

FNV-1a is not a compliance-grade hash

Quasi-identifiers can re-identify after you remove direct ones

Adding one rule disables auto-detect for the rest

Mask leaves a re-identifying tail

Sequential ids depend on row order for any linkage

Free tier blocks the file

The salt is the key — losing or leaking it matters

Header name mismatch leaves a regulated column untouched

Frequently asked questions

Is a hashed column anonymous under GDPR?

Which strategy should I use for a national ID or passport number?

Does processing the file create a data transfer I have to log?

How do I make tokens non-linkable for a public-ish release?

Can the tool bucket dates of birth into age ranges?

Is FNV-1a hashing strong enough for compliance?

Will dropping a column shift my other data?

Can I keep a verification tail without exposing the full value?

Does anything about the personal data leave my machine?

What if my export is bigger than the row limit?

How do I document what I redacted for the request file?

Can I run this on a schedule for recurring redaction jobs?

Privacy first

Related guides