Dedupe Customer CSV by Email - Browser CSV Deduplicator

How to dedupe a customer csv by email

Step 1
Pull and combine your customer sources — Export customers from each system (store admin, billing portal, CRM). The deduplicator works on one file, so concatenate them first with csv-merger. Order matters: put the system whose records you trust most at the top so its row survives dedup.
Step 2
Drop the file onto the deduplicator above — Accepts .csv plus .xlsx/.xls/.ods (first sheet is converted to CSV automatically). The delimiter is auto-detected, so a comma export and a semicolon export both load without configuration.
Step 3
Pick Email or Customer ID as the key — Open the Unique key column dropdown and choose the column that identifies a customer. Use Email for most lists. Prefer a Customer ID / Stripe ID if customers can change their email but keep their account — it survives address changes.
Step 4
Leave case-sensitivity off for emails — Keep Case-sensitive keys unchecked so Jane@Shop.com matches jane@shop.com. Only enable it when your key is a case-meaningful identifier (e.g. some externally-generated IDs that distinguish case).
Step 5
Run and check Empty keys — Click Remove duplicates. Read the tiles: Rows in, Rows out, Duplicates, Unique keys, Empty keys. A non-zero Empty keys count means some customers have no value in your key column — they were kept untouched and need manual attention before import.
Step 6
Download the deduplicated customer file — Click Download CSV (or get an .xlsx back if you uploaded a spreadsheet). The first row per customer is kept in original order. Your unique-customer count now equals the row count — import with confidence.

The two real controls

The deduplicator exposes exactly two options. There is no multi-column key, no keep-last, and no merge-fields-on-collision behaviour.

Control	What it does	Default	Customer-list guidance
Unique key column	The single column whose value defines a duplicate; first row per value is kept	First column	Choose `Email`, or `Customer ID`/`Stripe ID` if customers keep accounts across email changes
Case-sensitive keys	Off matches `A@x.com` to `a@x.com`; On requires exact case match	Off	Keep off for emails; on only for case-distinct external IDs
Whitespace in the key	Always trimmed before comparison; the stored cell keeps its original text	Always trimmed	Run csv-whitespace-trimmer afterward if you also want the visible value cleaned
Empty key value	Row is never deduped; passes through and increments the Empty keys counter	Always kept	Filter out before dedup if you only want customers with an email

Email vs Customer ID as the dedup key

Choosing the right key column for a customer migration. Pick the one that stays stable for the same person over time.

Scenario	Better key	Why
Customer changed their email but kept the account	`Customer ID` / `Stripe ID`	Email-based dedup would treat the old and new address as two people
Sources have no shared internal ID	`Email`	Email is the only field both systems agree on
Same customer, two store accounts, same email	`Email`	Collapses the duplicate accounts into one record
Billing export with subscription IDs but inconsistent emails	`Customer ID`	The billing ID is canonical; emails were entered ad hoc
B2B list where several people share a shared inbox	neither alone	Shared `info@` inbox would over-collapse; dedupe on a contact-level field instead

Cookbook

Real before/after rows from customer exports. Emails anonymised. The tool keeps the first row per key value, matching case-insensitively unless you turn case-sensitivity on.

Same customer in store + billing exports, casing differs

Example

The store stored the email lowercase; the billing system stored it title-cased from a checkout autofill. Default case-insensitive matching collapses them, keeping the first (store) row.

Input (store rows concatenated above billing):
email,name,source
jane@shop.com,Jane,Store
bob@shop.com,Bob,Store
Jane@Shop.com,Jane D.,Billing

Key column: email   ·   Case-sensitive keys: OFF

Output:
email,name,source
jane@shop.com,Jane,Store
bob@shop.com,Bob,Store

Stats: Rows in 3 · Rows out 2 · Duplicates 1 · Unique keys 2

Customer changed email — dedupe on Customer ID instead

Example

A customer updated their email; an old export still has the previous address. Keying on email would keep both. Keying on the stable Customer ID collapses them correctly.

Input:
customer_id,email,name
CUS_001,old@x.com,Lee
CUS_001,new@x.com,Lee
CUS_002,pat@y.com,Pat

Key column: customer_id

Output (first row per ID kept):
customer_id,email,name
CUS_001,old@x.com,Lee
CUS_002,pat@y.com,Pat

Tip: sort by an 'updated_at' column descending first if you
want the NEW email to be the surviving row.

Blank-email rows preserved for manual merge

Example

Some legacy customers have no email on file. They are not duplicates of each other just because the key is blank — every blank-key row is kept and counted as an Empty key.

Input:
email,name,id
,Anon Buyer,CUS_77
,Walk-in,CUS_78
ava@z.com,Ava,CUS_79

Key column: email

Output (both blank-email rows kept):
email,name,id
,Anon Buyer,CUS_77
,Walk-in,CUS_78
ava@z.com,Ava,CUS_79

Stats: Rows in 3 · Rows out 3 · Duplicates 0 · Empty keys 2

Trailing space from a manual spreadsheet edit

Example

An analyst hand-edited one email and left a trailing space. The trim-before-compare behaviour recognises it as the same customer and removes the duplicate.

Input (trailing space on row 1):
email,plan
dee@corp.com ,Pro
dee@corp.com,Pro

Key column: email

Output:
email,plan
dee@corp.com ,Pro

The surviving cell still has its trailing space — trim affects
the key only. Clean the value with csv-whitespace-trimmer next.

Keep the most recent record by pre-sorting

Example

You want the latest customer record (newest plan, current address) to survive, but the tool keeps the FIRST row. Sort by last-updated descending first, then dedupe.

Step 1 — csv-sorter on updated_at, direction desc:
email,plan,updated_at
sam@x.com,Pro,2026-05-01
sam@x.com,Free,2025-11-02

Step 2 — deduplicator, key column: email:
email,plan,updated_at
sam@x.com,Pro,2026-05-01

The newest row is now first, so it's the one kept.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Customer file over the free 500-row limit

Pro required

This is a Pro tool; the free tier caps input at 500 rows / 2 MB. Customer exports usually exceed that. Pro raises it to 100,000 rows / 100 MB. Beyond 100k, split with csv-row-splitter, dedupe each chunk, then concatenate and run a final pass.

Want the newest record kept, not the first

First-row only

Dedup always keeps the first occurrence — there is no 'keep last'. Sort the file by a last-updated/created date descending with csv-sorter so the most recent record sits first, then dedupe on email or Customer ID.

Two customers share one company inbox

Over-collapse risk

If several contacts share orders@bigco.com, deduping on email would merge distinct people into one row. Use a contact-level key (Customer ID, Contact ID) instead, or filter those shared-inbox rows out first with csv-column-filter and handle them separately.

Customer changed email between exports

By design

Email-keyed dedup treats the old and new address as two customers, because they are different strings. If both rows carry the same stable Customer ID, dedupe on that column instead — it survives email changes. There is no fuzzy person-matching across fields.

Need to merge data on collision, not drop the row

Not supported

When two rows for one customer each hold partial data (one has the phone, the other the address), the deduplicator keeps the first whole row and discards the second — it does NOT field-merge. Reconcile partial records before deduping, or pick the more complete source to place first.

Blank-email customers all retained

Preserved

Every row with an empty key value passes through untouched and is counted under Empty keys. This prevents accidental deletion of legacy/walk-in customers with no email. If you only want emailed customers in the output, pre-filter with csv-column-filter (email is_not_empty).

Composite key (email + region) needed

Single key only

The key is one column. To dedupe on email-within-region, merge the two fields first with csv-column-merger into one key column, dedupe on it, then split back with csv-column-value-splitter if you need the original columns.

Excel saved the export as UTF-8 with BOM

Supported

A BOM at the start of the file is handled by the parser and doesn't break the first header. The download is plain CSV; if you need a BOM back for Excel-on-Windows, the spreadsheet round-trip (upload .xlsx, download .xlsx) avoids encoding issues entirely.

Semicolon delimiter from a EU billing system

Supported

Delimiter auto-detection handles ;-separated exports (common from EU-locale billing tools) without any setting. The result is written comma-delimited. No data is altered — only the delimiter normalises.

You only want to audit duplicates, not delete

Use the finder

To review which customers are duplicated before removing anything, use csv-duplicate-finder — it flags each row YES/NO and groups the matches. The deduplicator is the destructive step once you've confirmed.

Frequently asked questions

Should I dedupe customers on Email or Customer ID?

Use Customer ID (or Stripe ID) if you have one and customers can change email while keeping the account — the ID stays stable. Use Email when there's no shared internal ID across your sources, or when you specifically want to merge two accounts that share an address.

Does it match emails regardless of capitalisation?

Yes, by default. With Case-sensitive keys off (the default), the comparison key is trimmed and lowercased, so Jane@Shop.com and jane@shop.com collapse. Turn the checkbox on only if your key is a case-meaningful identifier.

Which duplicate row is kept?

The first one in file order. To keep your authoritative source, concatenate it first with csv-merger. To keep the most recent record, sort by an update date descending with csv-sorter before deduping.

What happens to customers with no email?

They're kept. Rows with a blank key value are excluded from dedup and counted as Empty keys, so you never lose walk-in or legacy customers with no address on file. Pre-filter them out with csv-column-filter if you want only emailed customers.

Will it merge data from the two duplicate rows?

No. It keeps the first complete row and drops the rest — there's no field-level merge on collision. If your duplicates hold complementary data, reconcile them first or place the more complete source on top before deduping.

Is my customer data uploaded to a server?

No. PapaParse runs in your browser; customer emails, names, and purchase data stay on your device. The only server write is an anonymous usage count for signed-in dashboard stats — no row content leaves your machine.

Can I upload an Excel file of customers?

Yes — .xlsx, .xls, and .ods are accepted. The first sheet is converted to CSV, deduped, and can be downloaded back as .xlsx. Plain .csv works directly. The delimiter is detected automatically.

How many customer rows can it process?

Free tier: 500 rows / 2 MB (this is a Pro tool). Pro: 100,000 rows / 100 MB. For larger customer bases, split with csv-row-splitter, dedupe each part, then concatenate and dedupe once more.

Can I dedupe on email and account region together?

Not in one pass — the key is a single column. Merge email and region into one column with csv-column-merger, dedupe on the combined key, then split it back with csv-column-value-splitter if you need the columns separated again.

Does it remove the trailing space from the email value too?

No — trimming applies only to the comparison key, so the matching works, but the surviving cell keeps its original text. Run csv-whitespace-trimmer afterward if you also want the visible values cleaned.

How do I just see which customers are duplicated?

Use csv-duplicate-finder, which adds an _is_duplicate YES/NO column and groups the matches for review. Use this deduplicator when you're ready to actually remove the extra rows.

Why does Rows out look higher than I expected?

Empty-key rows are always kept and don't count as duplicates, so they inflate Rows out relative to a naive 'unique emails' count. Check the Empty keys tile — if it's high, you have many customers with no value in your chosen key column.

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

How to dedupe a customer csv by email

Step 1
Pull and combine your customer sources — Export customers from each system (store admin, billing portal, CRM). The deduplicator works on one file, so concatenate them first with csv-merger. Order matters: put the system whose records you trust most at the top so its row survives dedup.
Step 2
Drop the file onto the deduplicator above — Accepts .csv plus .xlsx/.xls/.ods (first sheet is converted to CSV automatically). The delimiter is auto-detected, so a comma export and a semicolon export both load without configuration.
Step 3
Pick Email or Customer ID as the key — Open the Unique key column dropdown and choose the column that identifies a customer. Use Email for most lists. Prefer a Customer ID / Stripe ID if customers can change their email but keep their account — it survives address changes.
Step 4
Leave case-sensitivity off for emails — Keep Case-sensitive keys unchecked so Jane@Shop.com matches jane@shop.com. Only enable it when your key is a case-meaningful identifier (e.g. some externally-generated IDs that distinguish case).
Step 5
Run and check Empty keys — Click Remove duplicates. Read the tiles: Rows in, Rows out, Duplicates, Unique keys, Empty keys. A non-zero Empty keys count means some customers have no value in your key column — they were kept untouched and need manual attention before import.
Step 6
Download the deduplicated customer file — Click Download CSV (or get an .xlsx back if you uploaded a spreadsheet). The first row per customer is kept in original order. Your unique-customer count now equals the row count — import with confidence.

The two real controls

The deduplicator exposes exactly two options. There is no multi-column key, no keep-last, and no merge-fields-on-collision behaviour.

Control	What it does	Default	Customer-list guidance
Unique key column	The single column whose value defines a duplicate; first row per value is kept	First column	Choose `Email`, or `Customer ID`/`Stripe ID` if customers keep accounts across email changes
Case-sensitive keys	Off matches `A@x.com` to `a@x.com`; On requires exact case match	Off	Keep off for emails; on only for case-distinct external IDs
Whitespace in the key	Always trimmed before comparison; the stored cell keeps its original text	Always trimmed	Run csv-whitespace-trimmer afterward if you also want the visible value cleaned
Empty key value	Row is never deduped; passes through and increments the Empty keys counter	Always kept	Filter out before dedup if you only want customers with an email

Email vs Customer ID as the dedup key

Choosing the right key column for a customer migration. Pick the one that stays stable for the same person over time.

Scenario	Better key	Why
Customer changed their email but kept the account	`Customer ID` / `Stripe ID`	Email-based dedup would treat the old and new address as two people
Sources have no shared internal ID	`Email`	Email is the only field both systems agree on
Same customer, two store accounts, same email	`Email`	Collapses the duplicate accounts into one record
Billing export with subscription IDs but inconsistent emails	`Customer ID`	The billing ID is canonical; emails were entered ad hoc
B2B list where several people share a shared inbox	neither alone	Shared `info@` inbox would over-collapse; dedupe on a contact-level field instead

Cookbook

Real before/after rows from customer exports. Emails anonymised. The tool keeps the first row per key value, matching case-insensitively unless you turn case-sensitivity on.

Same customer in store + billing exports, casing differs

Example

The store stored the email lowercase; the billing system stored it title-cased from a checkout autofill. Default case-insensitive matching collapses them, keeping the first (store) row.

Input (store rows concatenated above billing):
email,name,source
jane@shop.com,Jane,Store
bob@shop.com,Bob,Store
Jane@Shop.com,Jane D.,Billing

Key column: email   ·   Case-sensitive keys: OFF

Output:
email,name,source
jane@shop.com,Jane,Store
bob@shop.com,Bob,Store

Stats: Rows in 3 · Rows out 2 · Duplicates 1 · Unique keys 2

Customer changed email — dedupe on Customer ID instead

Example

A customer updated their email; an old export still has the previous address. Keying on email would keep both. Keying on the stable Customer ID collapses them correctly.

Input:
customer_id,email,name
CUS_001,old@x.com,Lee
CUS_001,new@x.com,Lee
CUS_002,pat@y.com,Pat

Key column: customer_id

Output (first row per ID kept):
customer_id,email,name
CUS_001,old@x.com,Lee
CUS_002,pat@y.com,Pat

Tip: sort by an 'updated_at' column descending first if you
want the NEW email to be the surviving row.

Blank-email rows preserved for manual merge

Example

Some legacy customers have no email on file. They are not duplicates of each other just because the key is blank — every blank-key row is kept and counted as an Empty key.

Input:
email,name,id
,Anon Buyer,CUS_77
,Walk-in,CUS_78
ava@z.com,Ava,CUS_79

Key column: email

Output (both blank-email rows kept):
email,name,id
,Anon Buyer,CUS_77
,Walk-in,CUS_78
ava@z.com,Ava,CUS_79

Stats: Rows in 3 · Rows out 3 · Duplicates 0 · Empty keys 2

Trailing space from a manual spreadsheet edit

Example

An analyst hand-edited one email and left a trailing space. The trim-before-compare behaviour recognises it as the same customer and removes the duplicate.

Input (trailing space on row 1):
email,plan
dee@corp.com ,Pro
dee@corp.com,Pro

Key column: email

Output:
email,plan
dee@corp.com ,Pro

The surviving cell still has its trailing space — trim affects
the key only. Clean the value with csv-whitespace-trimmer next.

Keep the most recent record by pre-sorting

Example

You want the latest customer record (newest plan, current address) to survive, but the tool keeps the FIRST row. Sort by last-updated descending first, then dedupe.

Step 1 — csv-sorter on updated_at, direction desc:
email,plan,updated_at
sam@x.com,Pro,2026-05-01
sam@x.com,Free,2025-11-02

Step 2 — deduplicator, key column: email:
email,plan,updated_at
sam@x.com,Pro,2026-05-01

The newest row is now first, so it's the one kept.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Customer file over the free 500-row limit

Pro required

Want the newest record kept, not the first

First-row only

Two customers share one company inbox

Over-collapse risk

Customer changed email between exports

By design

Need to merge data on collision, not drop the row

Not supported

Blank-email customers all retained

Preserved

Composite key (email + region) needed

Single key only

Excel saved the export as UTF-8 with BOM

Supported

Semicolon delimiter from a EU billing system

Supported

You only want to audit duplicates, not delete

Use the finder

Frequently asked questions

Should I dedupe customers on Email or Customer ID?

Does it match emails regardless of capitalisation?

Which duplicate row is kept?

What happens to customers with no email?

Will it merge data from the two duplicate rows?

Is my customer data uploaded to a server?

Can I upload an Excel file of customers?

How many customer rows can it process?

Free tier: 500 rows / 2 MB (this is a Pro tool). Pro: 100,000 rows / 100 MB. For larger customer bases, split with csv-row-splitter, dedupe each part, then concatenate and dedupe once more.

Can I dedupe on email and account region together?

Does it remove the trailing space from the email value too?

How do I just see which customers are duplicated?

Use csv-duplicate-finder, which adds an _is_duplicate YES/NO column and groups the matches for review. Use this deduplicator when you're ready to actually remove the extra rows.

Why does Rows out look higher than I expected?

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

Dedupe a Customer CSV by Email

How to dedupe a customer csv by email

The two real controls

Email vs Customer ID as the dedup key

Cookbook

Same customer in store + billing exports, casing differs

Customer changed email — dedupe on Customer ID instead

Blank-email rows preserved for manual merge

Trailing space from a manual spreadsheet edit

Keep the most recent record by pre-sorting

Errors and edge cases

Customer file over the free 500-row limit

Want the newest record kept, not the first

Two customers share one company inbox

Customer changed email between exports

Need to merge data on collision, not drop the row

Blank-email customers all retained

Composite key (email + region) needed

Excel saved the export as UTF-8 with BOM

Semicolon delimiter from a EU billing system

You only want to audit duplicates, not delete

Frequently asked questions

Should I dedupe customers on Email or Customer ID?

Does it match emails regardless of capitalisation?

Which duplicate row is kept?

What happens to customers with no email?

Will it merge data from the two duplicate rows?

Is my customer data uploaded to a server?

Can I upload an Excel file of customers?

How many customer rows can it process?

Can I dedupe on email and account region together?

Does it remove the trailing space from the email value too?

How do I just see which customers are duplicated?

Why does Rows out look higher than I expected?

Privacy first

Related guides

Dedupe a Customer CSV by Email

How to dedupe a customer csv by email

The two real controls

Email vs Customer ID as the dedup key

Cookbook

Same customer in store + billing exports, casing differs

Customer changed email — dedupe on Customer ID instead

Blank-email rows preserved for manual merge

Trailing space from a manual spreadsheet edit

Keep the most recent record by pre-sorting

Errors and edge cases

Customer file over the free 500-row limit

Want the newest record kept, not the first

Two customers share one company inbox

Customer changed email between exports

Need to merge data on collision, not drop the row

Blank-email customers all retained

Composite key (email + region) needed

Excel saved the export as UTF-8 with BOM

Semicolon delimiter from a EU billing system

You only want to audit duplicates, not delete

Frequently asked questions

Should I dedupe customers on Email or Customer ID?

Does it match emails regardless of capitalisation?

Which duplicate row is kept?

What happens to customers with no email?

Will it merge data from the two duplicate rows?

Is my customer data uploaded to a server?

Can I upload an Excel file of customers?

How many customer rows can it process?

Can I dedupe on email and account region together?

Does it remove the trailing space from the email value too?

How do I just see which customers are duplicated?

Why does Rows out look higher than I expected?

Privacy first

Related guides