Remove PII Columns From a CSV Before Sharing

How to remove pii columns from a csv before sharing

Step 1
Export the dataset with PII included — Download the full CSV from your CRM, database, or warehouse. Keep this original secured — the tool never alters it; you only share the stripped copy.
Step 2
Drop the file onto the remover above — PapaParse reads the header row locally and lists every column as a checkbox. The data never leaves your browser. Free accepts up to 2 MB and 500 data rows; bigger datasets need Pro.
Step 3
Tick every direct identifier — Select FullName (and split FirstName/LastName), Email, Phone, AddressLine1/2, City, PostCode, DateOfBirth, NationalID, and any free-text Notes. There is no auto-detection of PII — you decide what counts.
Step 4
Look for quasi-identifiers too — Postcode + date of birth + gender can re-identify someone even with names gone. Also tick IPAddress, DeviceID, CustomerRef, or any stable key the recipient could join back to a named source.
Step 5
Run and verify the counts — Click Remove N columns. Confirm the columns-removed and columns-remaining stats match your intent, then scan the preview to be sure no identifier slipped through.
Step 6
Download and share the stripped file — Download saves <yourfile>.columns-removed.csv (UTF-8, comma-delimited, no BOM). Share that file; the originals stay with you.

Direct identifiers vs. quasi-identifiers vs. safe-to-keep

A working checklist for a typical customer dataset. Direct identifiers should almost always go; quasi-identifiers need a judgement call; analytical columns usually stay. Not legal advice — confirm with your DPO.

Column type	Examples	Action	Risk if kept
Direct identifier	`FullName`, `Email`, `Phone`, `NationalID`, `AddressLine1`	Remove	Immediately identifies the individual
Direct identifier (free text)	`Notes`, `SupportTranscript`, `Comments`	Remove	Often contains names, emails, or candid detail
Quasi-identifier	`PostCode`, `DateOfBirth`, `Gender`, `JobTitle`, `Employer`	Judgement call	Combined, they can re-identify even without a name
Stable join key	`CustomerRef`, `IPAddress`, `DeviceID`, `CookieID`	Remove	Lets the recipient join back to a named source
Analytical / behavioural	`SignupDate`, `Plan`, `MRR`, `Sessions`, `Churned`	Keep	The reason you're sharing the file
Aggregate / coarse	`Region`, `AgeBand`, `PlanTier`	Keep	Low re-identification risk; useful for analysis

What this tool does (and where it stops)

Column removal is data minimisation, not formal anonymisation. Verified against removeColumns() in lib/csv-utils.ts.

Capability	Does the tool do it?	Notes
Delete whole columns	Yes	Positional removal from header + every data row in one pass
Mask / hash specific cells	No	Use csv-find-replace to overwrite values while keeping the column
Pseudonymise (replace names with tokens)	No	Use the dedicated anonymiser flow if you need consistent tokens
Auto-detect which columns are PII	No	You choose; the tool has no notion of what an email or name looks like
Drop empty rows / trim whitespace	No	Only columns are removed; rows and cell whitespace are untouched
Suppress rare quasi-identifier combinations (k-anonymity)	No	Out of scope — handle in your analysis pipeline
Guarantee GDPR compliance	No	It is one minimisation step; consult your DPO for the full process

Cookbook

Before/after slices from typical customer exports. All values fabricated; the point is which columns survive.

Strip direct identifiers from a CRM export for an external analyst

Example

The analyst wants churn behaviour, not names. Ticking the four identifier columns leaves a clean analytical table.

Input (CRM export):
FullName,Email,Phone,Plan,MRR,Churned
Jane Doe,jane@x.com,07700900123,Pro,49,false
John Roe,john@y.com,07700900999,Free,0,true

Ticked to remove: FullName, Email, Phone

Output (<file>.columns-removed.csv):
Plan,MRR,Churned
Pro,49,false
Free,0,true

Remove the free-text Notes column that hides PII

Example

Structured columns can look clean while a Notes field quietly contains names and emails. Remove it wholesale rather than trying to scrub it cell by cell.

Input:
CustomerRef,Plan,Notes,MRR
C-001,Pro,"Call Jane on 07700 900123 re upgrade",49
C-002,Free,"Spoke to john@y.com about churn",0

Ticked to remove: CustomerRef, Notes

Output:
Plan,MRR
Pro,49
Free,0

Quasi-identifier combination left behind by accident

Example

Removing only the name is not enough — postcode + DOB + gender can re-identify. This example shows the residual risk; the fix is to also tick those columns.

After removing only FullName/Email:
PostCode,DOB,Gender,Plan
SW1A 1AA,1991-04-02,F,Pro

Risk: in a small postcode, one woman born on that exact date
may be uniquely identifiable.

Safer: also tick PostCode and DOB (or coarsen to Region/AgeBand
before export) — this tool removes whole columns only.

Mask instead of delete (use a different tool)

Example

If the recipient's schema requires the column to exist (e.g. a fixed import template) but the values must be hidden, redact with find-replace rather than removing the column.

Goal: keep the Email column header but blank the values.

This tool: removes the whole Email column (header + data).

Alternative — csv-find-replace:
  pattern: .+   (regex)   replace: [redacted]
  on the Email column → keeps the column, masks the cells:
  Email,Plan
  [redacted],Pro
  [redacted],Free

Large export: chunk it on the free tier

Example

A 40,000-row customer export blows past the 500-row and 2 MB free caps. Split, strip, recombine — or use Pro.

Export: 40,000 rows, 6 MB  →  over free limits

Free-tier workflow:
  1. csv-row-splitter → 80 chunks of 500 rows
  2. csv-column-remover on each → drop Name/Email/Phone/Address
  3. csv-merger → one anonymised dataset

Pro: remove columns from the full 6 MB file directly.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Column removal is not formal anonymisation

Important caveat

Deleting direct-identifier columns is data minimisation, not anonymisation under GDPR. Residual quasi-identifiers (postcode, date of birth, rare job titles) can still re-identify individuals when combined or joined to an external dataset. Treat this as one step; consult your DPO and consider coarsening or k-anonymity for genuinely anonymous sharing.

Quasi-identifier left behind

Re-identification risk

Removing Name and Email while keeping PostCode + DateOfBirth + Gender often leaves rows uniquely identifiable. The tool removes only whole columns, so either tick the quasi-identifiers too or coarsen them (region instead of postcode, age band instead of DOB) before export.

PII hidden in a free-text column

Remove the column

A Notes, Comments, or SupportTranscript column can contain names, emails, and phone numbers in prose. Cell-by-cell scrubbing is error-prone — remove the whole column. If the recipient needs the structured analysis but not the text, this is the safest choice.

You need to keep the column but hide the values

Use a different tool

If a downstream import template requires the Email column to exist, removing it breaks the schema. Use csv-find-replace to overwrite the values (e.g. regex .+ → [redacted]) while keeping the column header.

Output has no BOM

Encoding

The download is plain UTF-8 with no byte-order mark. If accented names in the columns you kept garble when opened directly in Excel-on-Windows, import via Data → From Text/CSV and select UTF-8. (Best practice: you usually remove name columns anyway, so this is rarely an issue.)

Dataset over 2 MB or 500 rows on free tier

Blocked (free limit)

Free caps at 2 MB and 500 data rows per job — either over the limit blocks the run. Customer exports routinely exceed both. Upgrade to Pro, or split with csv-row-splitter, strip each chunk, and recombine with csv-merger.

No auto-detection of PII columns

By design

The tool does not recognise emails, names, or phone numbers — it removes the column positions you tick. That keeps it predictable, but it means accuracy is on you: review the full header list and the preview so no identifier is missed.

A stable join key undermines the strip

Re-identification risk

Leaving CustomerRef, IPAddress, or DeviceID lets the recipient (or anyone with the source) join the 'anonymised' file straight back to named records. Tick those columns too unless the recipient genuinely needs a key and you've agreed it is not re-linkable on their side.

Frequently asked questions

Will the personal data be uploaded to JAD Apps?

No. PapaParse parses the file and removes columns entirely in your browser. The PII never reaches a server. When signed in, only a content-free usage counter is recorded for your dashboard. This is the key reason to strip PII here rather than in a cloud tool.

Is removing the PII columns enough to make the file GDPR-compliant?

It is one important step (data minimisation), not a complete answer. Removing direct identifiers reduces risk, but quasi-identifiers (postcode + DOB + gender) can still re-identify individuals, and your obligations depend on the data-sharing agreement and lawful basis. Treat this as a tool in your process and confirm the full picture with your DPO.

Does it hash or mask the values, or delete the whole column?

It deletes the entire column — header and all cell values — from every row. There is no hashing or masking. If you need to keep the column but hide the values, use csv-find-replace to overwrite the cells (e.g. regex .+ → [redacted]).

Can it automatically detect which columns are PII?

No. The tool has no notion of what an email or a name looks like; it removes the column positions you tick. You review the header list and decide. The upside is predictability — it will never silently keep or drop a column you didn't choose.

Does removing a column shift the analytical data into the wrong cells?

No. Removal is positional and applied uniformly to every row, so the columns you keep retain their exact values and stay aligned. The first-10-row preview lets you confirm before downloading.

What about a free-text Notes column full of names and emails?

Remove the whole column. Scrubbing PII out of prose cell-by-cell is unreliable. If the recipient needs the structured columns but not the text, removing Notes entirely is the safe choice.

Can I undo a removal if I tick the wrong column?

Your source file is never modified — the tool only produces a new <name>.columns-removed.csv. If you remove the wrong column, re-drop the original and re-run with the correct selection.

How big a dataset can I process for free?

Free accepts files up to 2 MB and up to 500 data rows; either over the cap blocks the run. Customer exports often exceed both. Pro removes the limits, or you can split with csv-row-splitter, strip each chunk, and recombine with csv-merger.

Does it also drop empty rows or trim whitespace?

No — it only removes the columns you select. Empty rows and cell whitespace are left as-is. Use csv-empty-row-remover or csv-whitespace-trimmer separately if you need those.

Are quasi-identifiers a real problem if I remove names and emails?

Yes. Studies have shown that a high share of a population can be uniquely identified by postcode, date of birth, and gender together — even with names removed. If you need genuinely anonymous data, also remove or coarsen those columns (region instead of postcode, age band instead of exact DOB).

Can I keep a stable customer key for the analyst to join on?

Only if it can't be re-linked to identities on their side. A raw CustomerRef or Email-derived key lets them join straight back to named records, defeating the strip. If a join key is genuinely needed, agree a one-way pseudonymous token and confirm the recipient has no mapping table.

Can I automate PII removal before a recurring export goes to a partner?

Yes, and it keeps the data on-device. Pair the @jadapps/runner once, then POST the export to 127.0.0.1:9789/v1/tools/csv-column-remover/run with options.columns listing the PII columns by name (matched case-insensitively) or index. It returns the stripped CSV without the data leaving your machine.

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

How to remove pii columns from a csv before sharing

Step 1
Export the dataset with PII included — Download the full CSV from your CRM, database, or warehouse. Keep this original secured — the tool never alters it; you only share the stripped copy.
Step 2
Drop the file onto the remover above — PapaParse reads the header row locally and lists every column as a checkbox. The data never leaves your browser. Free accepts up to 2 MB and 500 data rows; bigger datasets need Pro.
Step 3
Tick every direct identifier — Select FullName (and split FirstName/LastName), Email, Phone, AddressLine1/2, City, PostCode, DateOfBirth, NationalID, and any free-text Notes. There is no auto-detection of PII — you decide what counts.
Step 4
Look for quasi-identifiers too — Postcode + date of birth + gender can re-identify someone even with names gone. Also tick IPAddress, DeviceID, CustomerRef, or any stable key the recipient could join back to a named source.
Step 5
Run and verify the counts — Click Remove N columns. Confirm the columns-removed and columns-remaining stats match your intent, then scan the preview to be sure no identifier slipped through.
Step 6
Download and share the stripped file — Download saves <yourfile>.columns-removed.csv (UTF-8, comma-delimited, no BOM). Share that file; the originals stay with you.

Direct identifiers vs. quasi-identifiers vs. safe-to-keep

Column type	Examples	Action	Risk if kept
Direct identifier	`FullName`, `Email`, `Phone`, `NationalID`, `AddressLine1`	Remove	Immediately identifies the individual
Direct identifier (free text)	`Notes`, `SupportTranscript`, `Comments`	Remove	Often contains names, emails, or candid detail
Quasi-identifier	`PostCode`, `DateOfBirth`, `Gender`, `JobTitle`, `Employer`	Judgement call	Combined, they can re-identify even without a name
Stable join key	`CustomerRef`, `IPAddress`, `DeviceID`, `CookieID`	Remove	Lets the recipient join back to a named source
Analytical / behavioural	`SignupDate`, `Plan`, `MRR`, `Sessions`, `Churned`	Keep	The reason you're sharing the file
Aggregate / coarse	`Region`, `AgeBand`, `PlanTier`	Keep	Low re-identification risk; useful for analysis

What this tool does (and where it stops)

Column removal is data minimisation, not formal anonymisation. Verified against removeColumns() in lib/csv-utils.ts.

Capability	Does the tool do it?	Notes
Delete whole columns	Yes	Positional removal from header + every data row in one pass
Mask / hash specific cells	No	Use csv-find-replace to overwrite values while keeping the column
Pseudonymise (replace names with tokens)	No	Use the dedicated anonymiser flow if you need consistent tokens
Auto-detect which columns are PII	No	You choose; the tool has no notion of what an email or name looks like
Drop empty rows / trim whitespace	No	Only columns are removed; rows and cell whitespace are untouched
Suppress rare quasi-identifier combinations (k-anonymity)	No	Out of scope — handle in your analysis pipeline
Guarantee GDPR compliance	No	It is one minimisation step; consult your DPO for the full process

Cookbook

Before/after slices from typical customer exports. All values fabricated; the point is which columns survive.

Strip direct identifiers from a CRM export for an external analyst

Example

The analyst wants churn behaviour, not names. Ticking the four identifier columns leaves a clean analytical table.

Input (CRM export):
FullName,Email,Phone,Plan,MRR,Churned
Jane Doe,jane@x.com,07700900123,Pro,49,false
John Roe,john@y.com,07700900999,Free,0,true

Ticked to remove: FullName, Email, Phone

Output (<file>.columns-removed.csv):
Plan,MRR,Churned
Pro,49,false
Free,0,true

Remove the free-text Notes column that hides PII

Example

Structured columns can look clean while a Notes field quietly contains names and emails. Remove it wholesale rather than trying to scrub it cell by cell.

Input:
CustomerRef,Plan,Notes,MRR
C-001,Pro,"Call Jane on 07700 900123 re upgrade",49
C-002,Free,"Spoke to john@y.com about churn",0

Ticked to remove: CustomerRef, Notes

Output:
Plan,MRR
Pro,49
Free,0

Quasi-identifier combination left behind by accident

Example

Removing only the name is not enough — postcode + DOB + gender can re-identify. This example shows the residual risk; the fix is to also tick those columns.

After removing only FullName/Email:
PostCode,DOB,Gender,Plan
SW1A 1AA,1991-04-02,F,Pro

Risk: in a small postcode, one woman born on that exact date
may be uniquely identifiable.

Safer: also tick PostCode and DOB (or coarsen to Region/AgeBand
before export) — this tool removes whole columns only.

Mask instead of delete (use a different tool)

Example

If the recipient's schema requires the column to exist (e.g. a fixed import template) but the values must be hidden, redact with find-replace rather than removing the column.

Goal: keep the Email column header but blank the values.

This tool: removes the whole Email column (header + data).

Alternative — csv-find-replace:
  pattern: .+   (regex)   replace: [redacted]
  on the Email column → keeps the column, masks the cells:
  Email,Plan
  [redacted],Pro
  [redacted],Free

Large export: chunk it on the free tier

Example

A 40,000-row customer export blows past the 500-row and 2 MB free caps. Split, strip, recombine — or use Pro.

Export: 40,000 rows, 6 MB  →  over free limits

Free-tier workflow:
  1. csv-row-splitter → 80 chunks of 500 rows
  2. csv-column-remover on each → drop Name/Email/Phone/Address
  3. csv-merger → one anonymised dataset

Pro: remove columns from the full 6 MB file directly.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Column removal is not formal anonymisation

Important caveat

Quasi-identifier left behind

Re-identification risk

PII hidden in a free-text column

Remove the column

You need to keep the column but hide the values

Use a different tool

Output has no BOM

Encoding

Dataset over 2 MB or 500 rows on free tier

Blocked (free limit)

No auto-detection of PII columns

By design

A stable join key undermines the strip

Re-identification risk

Frequently asked questions

Will the personal data be uploaded to JAD Apps?

Is removing the PII columns enough to make the file GDPR-compliant?

Does it hash or mask the values, or delete the whole column?

Can it automatically detect which columns are PII?

Does removing a column shift the analytical data into the wrong cells?

No. Removal is positional and applied uniformly to every row, so the columns you keep retain their exact values and stay aligned. The first-10-row preview lets you confirm before downloading.

What about a free-text Notes column full of names and emails?

Remove the whole column. Scrubbing PII out of prose cell-by-cell is unreliable. If the recipient needs the structured columns but not the text, removing Notes entirely is the safe choice.

Can I undo a removal if I tick the wrong column?

Your source file is never modified — the tool only produces a new <name>.columns-removed.csv. If you remove the wrong column, re-drop the original and re-run with the correct selection.

How big a dataset can I process for free?

Does it also drop empty rows or trim whitespace?

No — it only removes the columns you select. Empty rows and cell whitespace are left as-is. Use csv-empty-row-remover or csv-whitespace-trimmer separately if you need those.

Are quasi-identifiers a real problem if I remove names and emails?

Can I keep a stable customer key for the analyst to join on?

Can I automate PII removal before a recurring export goes to a partner?

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

How to remove pii columns from a csv before sharing

Direct identifiers vs. quasi-identifiers vs. safe-to-keep

What this tool does (and where it stops)

Cookbook

Strip direct identifiers from a CRM export for an external analyst

Remove the free-text Notes column that hides PII

Quasi-identifier combination left behind by accident

Mask instead of delete (use a different tool)

Large export: chunk it on the free tier

Errors and edge cases

Column removal is not formal anonymisation

Quasi-identifier left behind

PII hidden in a free-text column

You need to keep the column but hide the values

Output has no BOM

Dataset over 2 MB or 500 rows on free tier

No auto-detection of PII columns

A stable join key undermines the strip

Frequently asked questions

Will the personal data be uploaded to JAD Apps?

Is removing the PII columns enough to make the file GDPR-compliant?

Does it hash or mask the values, or delete the whole column?

Can it automatically detect which columns are PII?

Does removing a column shift the analytical data into the wrong cells?

What about a free-text Notes column full of names and emails?

Can I undo a removal if I tick the wrong column?

How big a dataset can I process for free?

Does it also drop empty rows or trim whitespace?

Are quasi-identifiers a real problem if I remove names and emails?

Can I keep a stable customer key for the analyst to join on?

Can I automate PII removal before a recurring export goes to a partner?

Privacy first

Related guides

Remove PII Columns From a CSV Before Sharing

How to remove pii columns from a csv before sharing

Direct identifiers vs. quasi-identifiers vs. safe-to-keep

What this tool does (and where it stops)

Cookbook

Strip direct identifiers from a CRM export for an external analyst

Remove the free-text Notes column that hides PII

Quasi-identifier combination left behind by accident

Mask instead of delete (use a different tool)

Large export: chunk it on the free tier

Errors and edge cases

Column removal is not formal anonymisation

Quasi-identifier left behind

PII hidden in a free-text column

You need to keep the column but hide the values

Output has no BOM

Dataset over 2 MB or 500 rows on free tier

No auto-detection of PII columns

A stable join key undermines the strip

Frequently asked questions

Will the personal data be uploaded to JAD Apps?

Is removing the PII columns enough to make the file GDPR-compliant?

Does it hash or mask the values, or delete the whole column?

Can it automatically detect which columns are PII?

Does removing a column shift the analytical data into the wrong cells?

What about a free-text Notes column full of names and emails?

Can I undo a removal if I tick the wrong column?

How big a dataset can I process for free?

Does it also drop empty rows or trim whitespace?

Are quasi-identifiers a real problem if I remove names and emails?

Can I keep a stable customer key for the analyst to join on?

Can I automate PII removal before a recurring export goes to a partner?

Privacy first

Related guides