How to mask pii in a csv for a privacy request
- Step 1Identify which columns are personal data under your obligation — List the direct identifiers (name, email, national ID, passport, IBAN) and quasi-identifiers (DOB, postcode) in your export. GDPR/CCPA scope is broader than 'name and email' — a postcode plus DOB plus gender can re-identify someone. Decide column-by-column whether each needs pseudonymising (hash/mask) or removing (redact/drop).
- Step 2Drop the CSV onto the anonymizer above — It parses in your browser — the personal record is never uploaded. Auto-detect (on by default) pre-fills hash rules for headers it recognises as PII. Treat that as a starting point, not a complete legal assessment — it only matches header names, not content.
- Step 3Decide hash vs redact vs drop for each identifier — Use hash when analysts must still link/count records (remember: with the salt it's still personal data). Use redact to blank a value to
[REDACTED]while keeping the column. Use drop to remove the column entirely. For a data-subject access response you usually hash internal ids and drop or redact third-party PII that isn't the requester's. - Step 4Set and safeguard the salt — For pseudonymisation, set a salt and store it separately from the output — the salt is effectively the key that makes the tokens reversible-by-re-derivation, so under GDPR it must be protected like the additional information that re-links pseudonymous data. If you want the tokens to be one-time and non-linkable, use a fresh random salt and don't keep it.
- Step 5Mask where a verifiable tail is required — For columns where a human still needs to confirm identity (last 4 of a card or national ID), use mask with keepEnd 4 and keepStart 0. Remember that masking a value shorter than your keep counts produces all stars, so set keep counts below the shortest value length.
- Step 6Anonymize, audit the stats, and download — Click Anonymize CSV. Record the stats (fields anonymized, columns dropped, applied rules) as part of your processing log — they document what was done. Verify the preview, then Download CSV (saved as
<name>.anon.csv). Store the salt and the output according to your retention policy.
Strategy vs privacy operation
Map each strategy to the GDPR/CCPA concept it most closely serves. This is an engineering description of the tool's behaviour, not legal advice.
| Strategy | Output | Privacy operation | Still personal data? |
|---|---|---|---|
| Hash | Deterministic 16-char hex token; same value+salt → same token | Pseudonymisation — linkable if you hold the salt/source | Yes, if you can re-derive it (you keep the salt or source) |
| Mask | Keep front/back chars, star the middle (****4567) | Partial redaction — leaves a verifiable tail | Depends on how much is left; a tail like last-4 can still aid re-identification |
| Redact | Literal [REDACTED] in every cell | Field-level removal, column retained | No value remains in that column |
| Drop | Column removed from header and all rows | Field-level removal, column gone | Column absent from the file |
| Sequential | id-1, id-2… by row position | Pseudonymisation without a stable mapping (positional only) | Token isn't tied to value; linkage requires the original file's row order |
Regulated headers auto-detect recognises
With no explicit rules and auto-detect on, these header patterns get a hash rule. Match is case-insensitive on the header text only.
| Category | Header patterns matched | Suggested action for compliance |
|---|---|---|
| Contact identifiers | email, e-mail, phone, mobile | hash (linkable) or redact (remove) |
| Names | name, full name, first name, last name | hash or drop depending on request type |
| National / financial IDs | ssn, social security, passport, iban, credit card | drop or redact — rarely needed by analysts |
| Location | address, postcode, zip | drop quasi-identifiers; mask postcode if coarse geo is needed |
| Dates of birth | dob, birth date | drop or generalise externally — exact DOB is a strong quasi-identifier |
Tier limits
Browser-side CSV limits. The CSV Anonymizer is a Pro feature.
| Limit | Free | Pro |
|---|---|---|
| Max file size | 2 MB | 100 MB |
| Max rows | 500 | 100,000 |
| Processing location | Your browser | Your browser |
Cookbook
Compliance-flavoured before/after rows. This is how the tool behaves, not a legal sign-off — confirm the operation matches your obligation.
Drop the national ID, hash the email for analysts
ExampleAnalysts need to keep counting distinct customers but the SSN must leave the dataset entirely. Hash the email (pseudonymisation — keep the salt secured) and drop the SSN column so it's gone from the file.
Input: email,ssn,plan jane@acme.com,123-45-6789,Pro bob@globex.com,987-65-4321,Free Rules: email → hash (salt: "priv-2026"); ssn → drop Output: email,plan a3f10b9c4e7d2118,Pro 7c2e9f04b1a83dd6,Free → ssn column removed; email is a linkable pseudonym.
Mask a card number to last 4 for support verification
ExampleA support workflow needs the last 4 digits to verify a caller, but the full PAN must not be in the file. Mask with keepStart 0 / keepEnd 4.
Input:
name,card_number
Jane Doe,4111111111111234
Rules: card_number → mask (keepStart 0, keepEnd 4)
name → hash
Output:
name,card_number
5f9c...d2,************1234
→ only the last 4 survive; name is pseudonymised.Redact a free-text notes column that may contain third-party PII
ExampleA notes field might mention other people (not the data subject). For a data-subject access response you keep the column structure but blank its content with redact.
Input: ticket_id,notes T-1,"Spoke to Jane; her sister Mary called too" T-2,"Refund issued" Rule: notes → redact Output: ticket_id,notes T-1,[REDACTED] T-2,[REDACTED] → column kept, all third-party-risk text removed.
One-time non-linkable tokens with a throwaway salt
ExampleFor a release where the data must NOT be re-linkable to your records, use a fresh random salt and discard it. The tokens are stable within this one file but you can't reproduce them later.
Rule: customer_id → hash (salt: "4Kq9...random...discard") ACME-7781 → 1b8e44c9f0a2d7e3 (this file only) Discard the salt → you can no longer derive the token from the source, so the tokens are not re-linkable by you on a future run. (Within this file, identical ids still share a token.)
Generalise DOB outside the tool, then drop the exact date
ExampleExact DOB is a strong quasi-identifier. The anonymizer can't bucket dates into year-only — do that with another tool first, then drop the precise column.
Step 1 (csv-find-replace, regex): dob '1987-03-14' → year '1987' in a new birth_year column Step 2 (this tool): Rule: dob → drop Output keeps birth_year, removes the exact dob column. → coarser data, lower re-identification risk.
Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Hashing is pseudonymisation, not anonymisation
Legal caveatUnder GDPR, a hashed value you can re-derive (because you keep the salt and source) is still personal data — it's pseudonymous, not anonymous. Don't treat a hashed export as out-of-scope. If you need data genuinely outside GDPR's reach, you must break the link entirely (drop/redact the identifier, discard the salt) and ensure remaining quasi-identifiers can't re-identify someone.
FNV-1a is not a compliance-grade hash
Security limitationThe tool uses FNV-1a (fast, non-cryptographic). Without a salt it's brute-forceable for low-entropy values; even with a salt it isn't a certified de-identification method. For regulated, audited de-identification (HIPAA Safe Harbor, formal DPIA-backed pseudonymisation), use a process your compliance team has reviewed. This tool is for practical pseudonymisation of internal/analyst-facing exports.
Quasi-identifiers can re-identify after you remove direct ones
Legal caveatDropping name and email isn't enough if DOB + postcode + gender remain — that combination can single out individuals. The tool operates per column and has no view of combined re-identification risk. Assess the whole row: drop or generalise quasi-identifiers (use csv-find-replace to coarsen postcodes/dates) before considering the file de-identified.
Adding one rule disables auto-detect for the rest
Behaviour to knowAuto-detect only runs when there are zero explicit rules. The moment you add a rule, auto-detect stops — so a regulated column you assumed was being auto-hashed will pass through untouched. After adding any rule, add explicit rules for every PII column, and confirm via the 'Applied rules' chips that each regulated field was acted on.
Mask leaves a re-identifying tail
Legal caveatKeeping the last 4 of a national ID or card helps verification but is still partial PII — combined with other fields it can aid re-identification, and some regulators treat masked-but-partial identifiers as personal data. Use mask only where the verifiable tail is genuinely needed; prefer redact/drop where it isn't.
Sequential ids depend on row order for any linkage
Behaviour to knowSequential assigns ids by position (id-1, id-2), so the only way to map a token back is to have the original file in the same row order. That makes it weakly linkable and not stable across files. For an access request where you must reliably correlate rows, hash a key column instead.
Free tier blocks the file
BlockedThis tool is Pro, and free CSV limits are 2 MB / 500 rows. A full personal-data export usually exceeds that. Upgrade, or for a one-off redaction split the file with csv-row-splitter, process each part, and recombine — keeping every intermediate file local.
The salt is the key — losing or leaking it matters
Operational riskIf you keep tokens linkable, the salt is the additional information that re-links them. Leak it and the pseudonymisation is undone for anyone who also has a candidate value list; lose it and you can no longer reproduce the tokens for a future linked export. Store the salt with the same care as any encryption key, separate from the output file.
Header name mismatch leaves a regulated column untouched
Silent missRules match the exact header text. If your export labels the column National Insurance No. but you wrote a rule for ssn, nothing happens to it. Always pick the column from the dropdown (populated from the file) and verify in the preview that every regulated column changed — a silent miss here is a compliance gap.
Frequently asked questions
Is a hashed column anonymous under GDPR?
No — if you can re-derive the token (you keep the salt and have the source), it's pseudonymous, which GDPR still treats as personal data. Hashing reduces exposure and is a recognised safeguard, but it doesn't take the data out of scope. To approach genuine anonymisation you must remove the direct identifiers (drop/redact), discard the salt, and ensure remaining fields can't re-identify someone.
Which strategy should I use for a national ID or passport number?
Usually drop (remove the column) or redact (blank to [REDACTED]) — analysts rarely need the raw value. If a verification workflow needs a tail, mask keeping the last 4. Avoid hashing a national ID unless you specifically need to link records on it, and even then guard the salt, because a hashed-but-linkable national ID is still personal data.
Does processing the file create a data transfer I have to log?
The anonymisation itself happens in your browser — the file isn't uploaded, so the tool doesn't introduce a third-party transfer. You should still log the processing activity (what columns, what operation) per your records-of-processing obligations; the result panel's applied-rules and dropped-columns stats make a useful audit note.
How do I make tokens non-linkable for a public-ish release?
Use a fresh random salt for that single run and discard it afterwards. Identical values within the one file still share a token (so distinct-counts work), but because you no longer hold the salt you can't reproduce or reverse the mapping later. Combine with dropping quasi-identifiers to lower re-identification risk.
Can the tool bucket dates of birth into age ranges?
No — it works per cell and doesn't transform date values into ranges. To generalise DOB (e.g. keep year only or bucket into ranges) use csv-find-replace with a regex first, then drop the exact-date column with this tool. Exact DOB is a strong quasi-identifier, so generalising or dropping it matters.
Is FNV-1a hashing strong enough for compliance?
It's adequate for practical internal pseudonymisation when combined with a private salt, but it's not a cryptographic or certified de-identification method. For audited regimes (HIPAA Safe Harbor, formal DPIA pseudonymisation) follow a process your compliance team has signed off. Treat this tool as a fast way to reduce exposure in analyst-facing exports, not as legal de-identification.
Will dropping a column shift my other data?
No — drop removes the column cleanly from the header and every row, and the remaining columns keep their values and alignment. Parsing is RFC-4180-aware, so quoted commas/newlines in surviving columns are preserved. Check the preview to confirm the right column went and the rest line up.
Can I keep a verification tail without exposing the full value?
Yes — mask with keepStart 0 and keepEnd 4 leaves the last four characters and stars the rest. Be aware that a value shorter than your keep counts becomes all stars, so set keepEnd below the shortest value length in the column. And remember a tail is still partial PII — only keep it where a workflow genuinely needs it.
Does anything about the personal data leave my machine?
No content does. PapaParse parses and transforms the file locally in your browser, and only the anonymised output is downloadable. The single server-side counter records that a tool ran (no content) for signed-in stats and can be opted out of. This is by design: the file you're de-identifying is itself the sensitive asset.
What if my export is bigger than the row limit?
Free tier caps at 2 MB / 500 rows and the tool is Pro; Pro allows 100 MB / 100,000 rows. For larger personal-data exports on a one-off basis, split with csv-row-splitter, anonymise each chunk with the same salt, then recombine with csv-merger, keeping every intermediate file local.
How do I document what I redacted for the request file?
After running, the result panel shows fields anonymised, columns dropped, and the exact applied rules as chips. Capture that as your processing note. For a complete audit trail you'd also record the salt handling (kept vs discarded) and which columns were dropped versus pseudonymised.
Can I run this on a schedule for recurring redaction jobs?
Yes — GET /api/v1/tools/csv-anonymizer returns the option schema; pair the @jadapps/runner and POST to 127.0.0.1:9789/v1/tools/csv-anonymizer/run. Everything runs on your local runner, so personal data never reaches JAD's servers — appropriate for a compliance pipeline that redacts nightly exports before they reach a reporting store.
Privacy first
Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.