How to remove pii columns from a csv before sharing
- Step 1Export the dataset with PII included — Download the full CSV from your CRM, database, or warehouse. Keep this original secured — the tool never alters it; you only share the stripped copy.
- Step 2Drop the file onto the remover above — PapaParse reads the header row locally and lists every column as a checkbox. The data never leaves your browser. Free accepts up to 2 MB and 500 data rows; bigger datasets need Pro.
- Step 3Tick every direct identifier — Select
FullName(and splitFirstName/LastName),Email,Phone,AddressLine1/2,City,PostCode,DateOfBirth,NationalID, and any free-textNotes. There is no auto-detection of PII — you decide what counts. - Step 4Look for quasi-identifiers too — Postcode + date of birth + gender can re-identify someone even with names gone. Also tick
IPAddress,DeviceID,CustomerRef, or any stable key the recipient could join back to a named source. - Step 5Run and verify the counts — Click
Remove N columns. Confirm the columns-removed and columns-remaining stats match your intent, then scan the preview to be sure no identifier slipped through. - Step 6Download and share the stripped file — Download saves
<yourfile>.columns-removed.csv(UTF-8, comma-delimited, no BOM). Share that file; the originals stay with you.
Direct identifiers vs. quasi-identifiers vs. safe-to-keep
A working checklist for a typical customer dataset. Direct identifiers should almost always go; quasi-identifiers need a judgement call; analytical columns usually stay. Not legal advice — confirm with your DPO.
| Column type | Examples | Action | Risk if kept |
|---|---|---|---|
| Direct identifier | FullName, Email, Phone, NationalID, AddressLine1 | Remove | Immediately identifies the individual |
| Direct identifier (free text) | Notes, SupportTranscript, Comments | Remove | Often contains names, emails, or candid detail |
| Quasi-identifier | PostCode, DateOfBirth, Gender, JobTitle, Employer | Judgement call | Combined, they can re-identify even without a name |
| Stable join key | CustomerRef, IPAddress, DeviceID, CookieID | Remove | Lets the recipient join back to a named source |
| Analytical / behavioural | SignupDate, Plan, MRR, Sessions, Churned | Keep | The reason you're sharing the file |
| Aggregate / coarse | Region, AgeBand, PlanTier | Keep | Low re-identification risk; useful for analysis |
What this tool does (and where it stops)
Column removal is data minimisation, not formal anonymisation. Verified against removeColumns() in lib/csv-utils.ts.
| Capability | Does the tool do it? | Notes |
|---|---|---|
| Delete whole columns | Yes | Positional removal from header + every data row in one pass |
| Mask / hash specific cells | No | Use csv-find-replace to overwrite values while keeping the column |
| Pseudonymise (replace names with tokens) | No | Use the dedicated anonymiser flow if you need consistent tokens |
| Auto-detect which columns are PII | No | You choose; the tool has no notion of what an email or name looks like |
| Drop empty rows / trim whitespace | No | Only columns are removed; rows and cell whitespace are untouched |
| Suppress rare quasi-identifier combinations (k-anonymity) | No | Out of scope — handle in your analysis pipeline |
| Guarantee GDPR compliance | No | It is one minimisation step; consult your DPO for the full process |
Cookbook
Before/after slices from typical customer exports. All values fabricated; the point is which columns survive.
Strip direct identifiers from a CRM export for an external analyst
ExampleThe analyst wants churn behaviour, not names. Ticking the four identifier columns leaves a clean analytical table.
Input (CRM export): FullName,Email,Phone,Plan,MRR,Churned Jane Doe,jane@x.com,07700900123,Pro,49,false John Roe,john@y.com,07700900999,Free,0,true Ticked to remove: FullName, Email, Phone Output (<file>.columns-removed.csv): Plan,MRR,Churned Pro,49,false Free,0,true
Remove the free-text Notes column that hides PII
ExampleStructured columns can look clean while a Notes field quietly contains names and emails. Remove it wholesale rather than trying to scrub it cell by cell.
Input: CustomerRef,Plan,Notes,MRR C-001,Pro,"Call Jane on 07700 900123 re upgrade",49 C-002,Free,"Spoke to john@y.com about churn",0 Ticked to remove: CustomerRef, Notes Output: Plan,MRR Pro,49 Free,0
Quasi-identifier combination left behind by accident
ExampleRemoving only the name is not enough — postcode + DOB + gender can re-identify. This example shows the residual risk; the fix is to also tick those columns.
After removing only FullName/Email: PostCode,DOB,Gender,Plan SW1A 1AA,1991-04-02,F,Pro Risk: in a small postcode, one woman born on that exact date may be uniquely identifiable. Safer: also tick PostCode and DOB (or coarsen to Region/AgeBand before export) — this tool removes whole columns only.
Mask instead of delete (use a different tool)
ExampleIf the recipient's schema requires the column to exist (e.g. a fixed import template) but the values must be hidden, redact with find-replace rather than removing the column.
Goal: keep the Email column header but blank the values. This tool: removes the whole Email column (header + data). Alternative — csv-find-replace: pattern: .+ (regex) replace: [redacted] on the Email column → keeps the column, masks the cells: Email,Plan [redacted],Pro [redacted],Free
Large export: chunk it on the free tier
ExampleA 40,000-row customer export blows past the 500-row and 2 MB free caps. Split, strip, recombine — or use Pro.
Export: 40,000 rows, 6 MB → over free limits Free-tier workflow: 1. csv-row-splitter → 80 chunks of 500 rows 2. csv-column-remover on each → drop Name/Email/Phone/Address 3. csv-merger → one anonymised dataset Pro: remove columns from the full 6 MB file directly.
Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Column removal is not formal anonymisation
Important caveatDeleting direct-identifier columns is data minimisation, not anonymisation under GDPR. Residual quasi-identifiers (postcode, date of birth, rare job titles) can still re-identify individuals when combined or joined to an external dataset. Treat this as one step; consult your DPO and consider coarsening or k-anonymity for genuinely anonymous sharing.
Quasi-identifier left behind
Re-identification riskRemoving Name and Email while keeping PostCode + DateOfBirth + Gender often leaves rows uniquely identifiable. The tool removes only whole columns, so either tick the quasi-identifiers too or coarsen them (region instead of postcode, age band instead of DOB) before export.
PII hidden in a free-text column
Remove the columnA Notes, Comments, or SupportTranscript column can contain names, emails, and phone numbers in prose. Cell-by-cell scrubbing is error-prone — remove the whole column. If the recipient needs the structured analysis but not the text, this is the safest choice.
You need to keep the column but hide the values
Use a different toolIf a downstream import template requires the Email column to exist, removing it breaks the schema. Use csv-find-replace to overwrite the values (e.g. regex .+ → [redacted]) while keeping the column header.
Output has no BOM
EncodingThe download is plain UTF-8 with no byte-order mark. If accented names in the columns you kept garble when opened directly in Excel-on-Windows, import via Data → From Text/CSV and select UTF-8. (Best practice: you usually remove name columns anyway, so this is rarely an issue.)
Dataset over 2 MB or 500 rows on free tier
Blocked (free limit)Free caps at 2 MB and 500 data rows per job — either over the limit blocks the run. Customer exports routinely exceed both. Upgrade to Pro, or split with csv-row-splitter, strip each chunk, and recombine with csv-merger.
No auto-detection of PII columns
By designThe tool does not recognise emails, names, or phone numbers — it removes the column positions you tick. That keeps it predictable, but it means accuracy is on you: review the full header list and the preview so no identifier is missed.
A stable join key undermines the strip
Re-identification riskLeaving CustomerRef, IPAddress, or DeviceID lets the recipient (or anyone with the source) join the 'anonymised' file straight back to named records. Tick those columns too unless the recipient genuinely needs a key and you've agreed it is not re-linkable on their side.
Frequently asked questions
Will the personal data be uploaded to JAD Apps?
No. PapaParse parses the file and removes columns entirely in your browser. The PII never reaches a server. When signed in, only a content-free usage counter is recorded for your dashboard. This is the key reason to strip PII here rather than in a cloud tool.
Is removing the PII columns enough to make the file GDPR-compliant?
It is one important step (data minimisation), not a complete answer. Removing direct identifiers reduces risk, but quasi-identifiers (postcode + DOB + gender) can still re-identify individuals, and your obligations depend on the data-sharing agreement and lawful basis. Treat this as a tool in your process and confirm the full picture with your DPO.
Does it hash or mask the values, or delete the whole column?
It deletes the entire column — header and all cell values — from every row. There is no hashing or masking. If you need to keep the column but hide the values, use csv-find-replace to overwrite the cells (e.g. regex .+ → [redacted]).
Can it automatically detect which columns are PII?
No. The tool has no notion of what an email or a name looks like; it removes the column positions you tick. You review the header list and decide. The upside is predictability — it will never silently keep or drop a column you didn't choose.
Does removing a column shift the analytical data into the wrong cells?
No. Removal is positional and applied uniformly to every row, so the columns you keep retain their exact values and stay aligned. The first-10-row preview lets you confirm before downloading.
What about a free-text Notes column full of names and emails?
Remove the whole column. Scrubbing PII out of prose cell-by-cell is unreliable. If the recipient needs the structured columns but not the text, removing Notes entirely is the safe choice.
Can I undo a removal if I tick the wrong column?
Your source file is never modified — the tool only produces a new <name>.columns-removed.csv. If you remove the wrong column, re-drop the original and re-run with the correct selection.
How big a dataset can I process for free?
Free accepts files up to 2 MB and up to 500 data rows; either over the cap blocks the run. Customer exports often exceed both. Pro removes the limits, or you can split with csv-row-splitter, strip each chunk, and recombine with csv-merger.
Does it also drop empty rows or trim whitespace?
No — it only removes the columns you select. Empty rows and cell whitespace are left as-is. Use csv-empty-row-remover or csv-whitespace-trimmer separately if you need those.
Are quasi-identifiers a real problem if I remove names and emails?
Yes. Studies have shown that a high share of a population can be uniquely identified by postcode, date of birth, and gender together — even with names removed. If you need genuinely anonymous data, also remove or coarsen those columns (region instead of postcode, age band instead of exact DOB).
Can I keep a stable customer key for the analyst to join on?
Only if it can't be re-linked to identities on their side. A raw CustomerRef or Email-derived key lets them join straight back to named records, defeating the strip. If a join key is genuinely needed, agree a one-way pseudonymous token and confirm the recipient has no mapping table.
Can I automate PII removal before a recurring export goes to a partner?
Yes, and it keeps the data on-device. Pair the @jadapps/runner once, then POST the export to 127.0.0.1:9789/v1/tools/csv-column-remover/run with options.columns listing the PII columns by name (matched case-insensitively) or index. It returns the stripped CSV without the data leaving your machine.
Privacy first
Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.