How to dedupe a customer csv by email
- Step 1Pull and combine your customer sources — Export customers from each system (store admin, billing portal, CRM). The deduplicator works on one file, so concatenate them first with csv-merger. Order matters: put the system whose records you trust most at the top so its row survives dedup.
- Step 2Drop the file onto the deduplicator above — Accepts
.csvplus.xlsx/.xls/.ods(first sheet is converted to CSV automatically). The delimiter is auto-detected, so a comma export and a semicolon export both load without configuration. - Step 3Pick Email or Customer ID as the key — Open the Unique key column dropdown and choose the column that identifies a customer. Use
Emailfor most lists. Prefer aCustomer ID/Stripe IDif customers can change their email but keep their account — it survives address changes. - Step 4Leave case-sensitivity off for emails — Keep Case-sensitive keys unchecked so
Jane@Shop.commatchesjane@shop.com. Only enable it when your key is a case-meaningful identifier (e.g. some externally-generated IDs that distinguish case). - Step 5Run and check Empty keys — Click Remove duplicates. Read the tiles: Rows in, Rows out, Duplicates, Unique keys, Empty keys. A non-zero Empty keys count means some customers have no value in your key column — they were kept untouched and need manual attention before import.
- Step 6Download the deduplicated customer file — Click Download CSV (or get an
.xlsxback if you uploaded a spreadsheet). The first row per customer is kept in original order. Your unique-customer count now equals the row count — import with confidence.
The two real controls
The deduplicator exposes exactly two options. There is no multi-column key, no keep-last, and no merge-fields-on-collision behaviour.
| Control | What it does | Default | Customer-list guidance |
|---|---|---|---|
| Unique key column | The single column whose value defines a duplicate; first row per value is kept | First column | Choose Email, or Customer ID/Stripe ID if customers keep accounts across email changes |
| Case-sensitive keys | Off matches A@x.com to a@x.com; On requires exact case match | Off | Keep off for emails; on only for case-distinct external IDs |
| Whitespace in the key | Always trimmed before comparison; the stored cell keeps its original text | Always trimmed | Run csv-whitespace-trimmer afterward if you also want the visible value cleaned |
| Empty key value | Row is never deduped; passes through and increments the Empty keys counter | Always kept | Filter out before dedup if you only want customers with an email |
Email vs Customer ID as the dedup key
Choosing the right key column for a customer migration. Pick the one that stays stable for the same person over time.
| Scenario | Better key | Why |
|---|---|---|
| Customer changed their email but kept the account | Customer ID / Stripe ID | Email-based dedup would treat the old and new address as two people |
| Sources have no shared internal ID | Email | Email is the only field both systems agree on |
| Same customer, two store accounts, same email | Email | Collapses the duplicate accounts into one record |
| Billing export with subscription IDs but inconsistent emails | Customer ID | The billing ID is canonical; emails were entered ad hoc |
| B2B list where several people share a shared inbox | neither alone | Shared info@ inbox would over-collapse; dedupe on a contact-level field instead |
Cookbook
Real before/after rows from customer exports. Emails anonymised. The tool keeps the first row per key value, matching case-insensitively unless you turn case-sensitivity on.
Same customer in store + billing exports, casing differs
ExampleThe store stored the email lowercase; the billing system stored it title-cased from a checkout autofill. Default case-insensitive matching collapses them, keeping the first (store) row.
Input (store rows concatenated above billing): email,name,source jane@shop.com,Jane,Store bob@shop.com,Bob,Store Jane@Shop.com,Jane D.,Billing Key column: email · Case-sensitive keys: OFF Output: email,name,source jane@shop.com,Jane,Store bob@shop.com,Bob,Store Stats: Rows in 3 · Rows out 2 · Duplicates 1 · Unique keys 2
Customer changed email — dedupe on Customer ID instead
ExampleA customer updated their email; an old export still has the previous address. Keying on email would keep both. Keying on the stable Customer ID collapses them correctly.
Input: customer_id,email,name CUS_001,old@x.com,Lee CUS_001,new@x.com,Lee CUS_002,pat@y.com,Pat Key column: customer_id Output (first row per ID kept): customer_id,email,name CUS_001,old@x.com,Lee CUS_002,pat@y.com,Pat Tip: sort by an 'updated_at' column descending first if you want the NEW email to be the surviving row.
Blank-email rows preserved for manual merge
ExampleSome legacy customers have no email on file. They are not duplicates of each other just because the key is blank — every blank-key row is kept and counted as an Empty key.
Input: email,name,id ,Anon Buyer,CUS_77 ,Walk-in,CUS_78 ava@z.com,Ava,CUS_79 Key column: email Output (both blank-email rows kept): email,name,id ,Anon Buyer,CUS_77 ,Walk-in,CUS_78 ava@z.com,Ava,CUS_79 Stats: Rows in 3 · Rows out 3 · Duplicates 0 · Empty keys 2
Trailing space from a manual spreadsheet edit
ExampleAn analyst hand-edited one email and left a trailing space. The trim-before-compare behaviour recognises it as the same customer and removes the duplicate.
Input (trailing space on row 1): email,plan dee@corp.com ,Pro dee@corp.com,Pro Key column: email Output: email,plan dee@corp.com ,Pro The surviving cell still has its trailing space — trim affects the key only. Clean the value with csv-whitespace-trimmer next.
Keep the most recent record by pre-sorting
ExampleYou want the latest customer record (newest plan, current address) to survive, but the tool keeps the FIRST row. Sort by last-updated descending first, then dedupe.
Step 1 — csv-sorter on updated_at, direction desc: email,plan,updated_at sam@x.com,Pro,2026-05-01 sam@x.com,Free,2025-11-02 Step 2 — deduplicator, key column: email: email,plan,updated_at sam@x.com,Pro,2026-05-01 The newest row is now first, so it's the one kept.
Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Customer file over the free 500-row limit
Pro requiredThis is a Pro tool; the free tier caps input at 500 rows / 2 MB. Customer exports usually exceed that. Pro raises it to 100,000 rows / 100 MB. Beyond 100k, split with csv-row-splitter, dedupe each chunk, then concatenate and run a final pass.
Want the newest record kept, not the first
First-row onlyDedup always keeps the first occurrence — there is no 'keep last'. Sort the file by a last-updated/created date descending with csv-sorter so the most recent record sits first, then dedupe on email or Customer ID.
Two customers share one company inbox
Over-collapse riskIf several contacts share orders@bigco.com, deduping on email would merge distinct people into one row. Use a contact-level key (Customer ID, Contact ID) instead, or filter those shared-inbox rows out first with csv-column-filter and handle them separately.
Customer changed email between exports
By designEmail-keyed dedup treats the old and new address as two customers, because they are different strings. If both rows carry the same stable Customer ID, dedupe on that column instead — it survives email changes. There is no fuzzy person-matching across fields.
Need to merge data on collision, not drop the row
Not supportedWhen two rows for one customer each hold partial data (one has the phone, the other the address), the deduplicator keeps the first whole row and discards the second — it does NOT field-merge. Reconcile partial records before deduping, or pick the more complete source to place first.
Blank-email customers all retained
PreservedEvery row with an empty key value passes through untouched and is counted under Empty keys. This prevents accidental deletion of legacy/walk-in customers with no email. If you only want emailed customers in the output, pre-filter with csv-column-filter (email is_not_empty).
Composite key (email + region) needed
Single key onlyThe key is one column. To dedupe on email-within-region, merge the two fields first with csv-column-merger into one key column, dedupe on it, then split back with csv-column-value-splitter if you need the original columns.
Excel saved the export as UTF-8 with BOM
SupportedA BOM at the start of the file is handled by the parser and doesn't break the first header. The download is plain CSV; if you need a BOM back for Excel-on-Windows, the spreadsheet round-trip (upload .xlsx, download .xlsx) avoids encoding issues entirely.
Semicolon delimiter from a EU billing system
SupportedDelimiter auto-detection handles ;-separated exports (common from EU-locale billing tools) without any setting. The result is written comma-delimited. No data is altered — only the delimiter normalises.
You only want to audit duplicates, not delete
Use the finderTo review which customers are duplicated before removing anything, use csv-duplicate-finder — it flags each row YES/NO and groups the matches. The deduplicator is the destructive step once you've confirmed.
Frequently asked questions
Should I dedupe customers on Email or Customer ID?
Use Customer ID (or Stripe ID) if you have one and customers can change email while keeping the account — the ID stays stable. Use Email when there's no shared internal ID across your sources, or when you specifically want to merge two accounts that share an address.
Does it match emails regardless of capitalisation?
Yes, by default. With Case-sensitive keys off (the default), the comparison key is trimmed and lowercased, so Jane@Shop.com and jane@shop.com collapse. Turn the checkbox on only if your key is a case-meaningful identifier.
Which duplicate row is kept?
The first one in file order. To keep your authoritative source, concatenate it first with csv-merger. To keep the most recent record, sort by an update date descending with csv-sorter before deduping.
What happens to customers with no email?
They're kept. Rows with a blank key value are excluded from dedup and counted as Empty keys, so you never lose walk-in or legacy customers with no address on file. Pre-filter them out with csv-column-filter if you want only emailed customers.
Will it merge data from the two duplicate rows?
No. It keeps the first complete row and drops the rest — there's no field-level merge on collision. If your duplicates hold complementary data, reconcile them first or place the more complete source on top before deduping.
Is my customer data uploaded to a server?
No. PapaParse runs in your browser; customer emails, names, and purchase data stay on your device. The only server write is an anonymous usage count for signed-in dashboard stats — no row content leaves your machine.
Can I upload an Excel file of customers?
Yes — .xlsx, .xls, and .ods are accepted. The first sheet is converted to CSV, deduped, and can be downloaded back as .xlsx. Plain .csv works directly. The delimiter is detected automatically.
How many customer rows can it process?
Free tier: 500 rows / 2 MB (this is a Pro tool). Pro: 100,000 rows / 100 MB. For larger customer bases, split with csv-row-splitter, dedupe each part, then concatenate and dedupe once more.
Can I dedupe on email and account region together?
Not in one pass — the key is a single column. Merge email and region into one column with csv-column-merger, dedupe on the combined key, then split it back with csv-column-value-splitter if you need the columns separated again.
Does it remove the trailing space from the email value too?
No — trimming applies only to the comparison key, so the matching works, but the surviving cell keeps its original text. Run csv-whitespace-trimmer afterward if you also want the visible values cleaned.
How do I just see which customers are duplicated?
Use csv-duplicate-finder, which adds an _is_duplicate YES/NO column and groups the matches for review. Use this deduplicator when you're ready to actually remove the extra rows.
Why does Rows out look higher than I expected?
Empty-key rows are always kept and don't count as duplicates, so they inflate Rows out relative to a naive 'unique emails' count. Check the Empty keys tile — if it's high, you have many customers with no value in your chosen key column.
Privacy first
Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.