How to dedupe marketing leads by email
- Step 1Combine your lead sources into one CSV first — The deduplicator works on a single file. If your leads are spread across a webinar export, an ad-platform CSV, and a CRM pull, concatenate them first with csv-merger — list them in priority order (most-trusted source first) so the kept row is the one you want.
- Step 2Drop the merged file onto the deduplicator above — Drag in a
.csv(or an.xlsx/.xls/.ods— the tool reads the first sheet and converts it to CSV automatically). PapaParse auto-detects the delimiter, so comma- and semicolon-delimited exports both work. - Step 3Choose your unique key column — Pick the column that uniquely identifies a lead from the Unique key column dropdown. For lead lists this is almost always
Email. If emails are unreliable, dedupe onPhoneor aCRM IDinstead — but pick the single most reliable identifier, because the tool keys on one column only. - Step 4Decide on case sensitivity — Leave Case-sensitive keys unchecked (the default) for email dedup —
Sue@Acme.comandsue@acme.comare the same person and should collapse. Only check it when the key is genuinely case-meaningful (rare for leads; relevant for some case-sensitive external IDs). - Step 5Run and read the five stat tiles — Click Remove duplicates. The result panel shows Rows in, Rows out, Duplicates (removed), Unique keys, and Empty keys. If Empty keys is high, you have leads with no email — review them before importing, since they slipped through dedup untouched.
- Step 6Download and import — Click Download CSV (or download back to
.xlsxif you uploaded a spreadsheet). The output keeps the first row per key in original order. Import the deduplicated file into your CRM or ESP — your unique-contact count now matches the row count exactly.
What the two options actually do
The deduplicator has exactly two controls. Everything else is automatic. There is no multi-column key, no 'keep last', and no fuzzy matching.
| Control | Effect | Default | When to change it |
|---|---|---|---|
| Unique key column (dropdown) | Rows sharing this column's value (after trim + optional lowercase) are duplicates; only the first is kept | First column (index 0) | Set it to Email — or Phone/CRM ID if emails are unreliable |
| Case-sensitive keys (checkbox) | Off = Sue@x.com matches sue@x.com. On = only byte-exact values match | Off (case-insensitive) | Leave off for emails; turn on only for case-meaningful external IDs |
| Whitespace handling | Always trimmed for the comparison key ( a@x.com matches a@x.com); the output cell keeps its original text | Always on (not configurable) | n/a |
| Blank-key rows | Never deduped — every row with an empty key value passes through untouched and is counted as an Empty key | Always preserved | n/a |
Lead source quirks and the right key column
Typical multi-source lead lists and which column makes the most reliable dedup key.
| Lead source | Common duplicate cause | Best key column | Notes |
|---|---|---|---|
| Facebook / Meta Lead Ads CSV | Same person fills two different lead forms | email | Meta lowercases emails on capture, but other sources won't — keep case-insensitive on |
| Webinar platform (Zoom, GoTo) | Registered + attended exported separately | Email | Casing often differs from the original signup; default case-insensitive matching handles it |
| Gated-content / form fills | Repeat downloads by the same prospect | Email | Trailing spaces from autofill are ignored in matching |
| CRM export (HubSpot/Salesforce) | Re-imported list created near-duplicates | Record ID / CRM ID | If you have a stable internal ID, it's a cleaner key than email |
| Cold-outreach purchased list | Overlap with your existing CRM | Email | Concatenate CRM first so your owned record wins (first occurrence kept) |
Cookbook
Real before/after rows from multi-source lead lists. Emails and names anonymised. The deduplicator keeps the first row per key value.
Case-different email duplicate across two ad sources
ExampleA prospect filled a LinkedIn lead form (which preserved their capitalised email) and later a webform (lowercase). To your CRM these look like two contacts. With case-insensitive matching (the default) they collapse to one — the first row in the file wins.
Input (webinar list concatenated above the ad list): email,first_name,source Sue@Acme.com,Sue,Webinar bob@globex.io,Bob,LinkedIn sue@acme.com,Sue,WebForm Key column: email · Case-sensitive keys: OFF Output (first occurrence kept): email,first_name,source Sue@Acme.com,Sue,Webinar bob@globex.io,Bob,LinkedIn Stats: Rows in 3 · Rows out 2 · Duplicates 1 · Unique keys 2 · Empty keys 0
Trailing space from spreadsheet copy-paste
ExampleAn ops teammate pasted emails from another sheet, dragging in a trailing space on some cells. Visually identical, byte-different. The deduplicator trims the key before comparing, so the padded copy is recognised as a duplicate.
Input (note the trailing space on row 1): email,campaign jan@umbrella.co ,Q2 Nurture jan@umbrella.co,Q2 Nurture Key column: email · Case-sensitive keys: OFF Output: email,campaign jan@umbrella.co ,Q2 Nurture Note: the surviving cell keeps its original trailing space — trim only affects the matching key, not the stored value. To clean the visible value too, run csv-whitespace-trimmer after.
Dedupe on phone when emails are missing
ExampleAn event-booth list captured phone numbers but inconsistent emails. Switch the key column to phone. Rows with the same phone collapse; rows missing a phone are kept as Empty keys for follow-up.
Input: name,email,phone Lee,,555-0100 Lee,lee@x.com,555-0100 Pat,pat@y.com, Key column: phone · Case-sensitive keys: OFF Output (first row per phone kept; blank-phone row preserved): name,email,phone Lee,,555-0100 Pat,pat@y.com, Stats: Rows in 3 · Rows out 2 · Duplicates 1 · Empty keys 1
Source priority: keep the CRM row, drop the cold-list copy
ExampleYou want your owned CRM record (with lifecycle stage and owner) to survive over a purchased cold-list row. Because the first occurrence wins, concatenate the CRM file ABOVE the cold list before deduping.
Input (CRM rows first, cold list appended below): email,owner,stage tom@beta.com,Dana,SQL ava@beta.com,Raj,MQL tom@beta.com,,Cold Key column: email Output (CRM row for Tom kept, cold copy dropped): email,owner,stage tom@beta.com,Dana,SQL ava@beta.com,Raj,MQL
Same person, different domains — NOT a duplicate
ExampleA lead used a work email and a personal email. These are different key values, so the deduplicator correctly keeps both. The tool dedupes on exact key equality, not identity-resolution heuristics.
Input: email,name maya@workco.com,Maya maya.l@gmail.com,Maya Key column: email Output (both kept — different keys): email,name maya@workco.com,Maya maya.l@gmail.com,Maya For cross-field identity matching, dedupe separately on each candidate key, or pre-normalise with csv-find-replace first.
Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Lead list exceeds the free 500-row limit
Pro requiredThe CSV Deduplicator is a Pro tool, and the free tier caps input at 500 rows / 2 MB. A real merged lead list is usually far bigger. Pro raises this to 100,000 rows / 100 MB. For lists beyond 100k rows, split with csv-row-splitter, dedupe each chunk, then concatenate and dedupe once more.
Same person under two different email addresses
By designThe tool matches on exact key equality (after trim + optional lowercase). maya@workco.com and maya.l@gmail.com are different keys, so both are kept. There is no fuzzy or cross-field identity resolution. If you need to collapse a person across multiple emails, normalise to a single canonical address first with csv-find-replace.
Gmail dot/plus aliases treated as distinct
By designs.ue@gmail.com, sue@gmail.com, and sue+ads@gmail.com all deliver to the same Gmail inbox, but they are different strings — the deduplicator keeps all three. Gmail-style normalisation (stripping dots and +tags) is not built in. Pre-normalise with csv-find-replace (regex \+[^@]* → empty) if your audience is Gmail-heavy.
You wanted to keep the LAST duplicate, not the first
First-row onlyThe deduplicator always keeps the FIRST occurrence of each key. There is no 'keep last' option. To make the most recent record win, sort the file so the row you want to keep comes first — use csv-sorter on a date column descending — then dedupe.
Need to dedupe on email AND campaign together
Single key onlyThe key is one column. There is no composite/multi-column key. To dedupe on a combination, first merge the columns into one with csv-column-merger (e.g. email|campaign), dedupe on that merged column, then split it back with csv-column-value-splitter if needed.
Rows with a blank email all kept
PreservedAny row whose key cell is empty (after trimming) is never treated as a duplicate — it passes straight through and is counted under Empty keys. This is intentional so you don't lose leads that are merely missing an email. Filter them out first with csv-column-filter (email is_not_empty) if you only want keyed leads.
Just wanting to SEE the duplicates, not remove them
Use the finderThe deduplicator removes duplicate rows and gives counts, but it doesn't list each duplicate group. To audit which leads are duplicated and how many times — before deciding to delete — use csv-duplicate-finder, which marks every row YES/NO and groups the matches.
Semicolon-delimited EU export
SupportedPapaParse auto-detects the delimiter, so a European-locale lead export using ; is parsed correctly without any setting. The output is written with standard comma delimiting. If a downstream system needs semicolons back, convert after with a find/replace on the delimiter.
Quotes get stripped from already-quoted fields
ExpectedOutput is written with quotes only where required, so fields that were defensively quoted in the source ("Acme, Inc.") are re-quoted automatically because they contain a comma, while fields that didn't need quotes lose them. Cell values are unchanged — only the surrounding quoting is normalised.
Header row counted as data
Header-awareThe first row is always treated as the header and is never deduped against the body. If your file has a leading metadata/banner row before the real header, remove it first (e.g. with a spreadsheet or csv-row-limiter with an offset) so the true header lands on row 1.
Frequently asked questions
Does the deduplicator match emails case-insensitively by default?
Yes. The default has Case-sensitive keys unchecked, so the comparison key is the cell value trimmed and lowercased. Sue@Acme.com, sue@acme.com, and SUE@acme.com all match and collapse to one lead. Only check the box if you genuinely need byte-exact matching.
Which row survives when there are duplicates?
The first occurrence of each key value, in file order. Later duplicates are dropped. To control which row wins, order your sources before deduping (most-trusted first) or sort by a recency column with csv-sorter.
Can I dedupe on email and phone at the same time?
Not directly — the key is a single column. Run it once on email, then on the result run it again on phone if you want both passes. For a true composite key, merge the two columns first with csv-column-merger, dedupe on the merged column, then split it back.
What happens to leads with no email address?
They're kept. Any row with a blank key value is excluded from dedup entirely and counted under Empty keys in the result panel. This avoids silently deleting leads that are just missing an email. Filter empties out beforehand with csv-column-filter if you only want keyed records.
Will this catch Gmail dot and plus-alias duplicates?
No. sue@gmail.com, s.ue@gmail.com, and sue+ads@gmail.com are different strings and all survive. Gmail address normalisation isn't built in. If your list is Gmail-heavy, strip the +tag and dots first with csv-find-replace using a regex pattern, then dedupe.
Does my lead data get uploaded anywhere?
No. Parsing and deduplication run entirely in your browser via PapaParse. Lead emails, phones, and names never reach a server. The only server-side write is an anonymous usage counter for signed-in dashboard stats — no row content. This is important for GDPR/CCPA-covered prospect data.
What file formats can I upload?
CSV directly, plus Excel .xlsx/.xls and OpenDocument .ods. For spreadsheets the tool reads the first sheet, converts it to CSV, dedupes, and can download the result back as .xlsx. The delimiter is auto-detected, so comma and semicolon files both work.
How big a lead list can it handle?
Free tier caps at 500 rows / 2 MB, and this is a Pro tool. Pro raises the limit to 100,000 rows / 100 MB. For larger lists, split with csv-row-splitter, dedupe each part, then concatenate and run one final dedupe pass.
Can I just preview which leads are duplicated before deleting?
Use csv-duplicate-finder for that. It adds an _is_duplicate YES/NO column and groups matches so you can review them. The deduplicator is for actually removing the extra rows once you've decided.
Does it remove duplicate columns or duplicate headers?
No — it operates on rows only, keyed by one column. Duplicate columns or repeated header names aren't touched. For removing columns use csv-column-remover; for renaming clashing headers use csv-header-rename.
Why does my output have fewer rows than 'Rows in' minus 'Duplicates'?
It shouldn't — Rows out always equals Rows in minus Duplicates removed (empty-key rows are kept and don't count as duplicates). If the numbers look off, check that you picked the intended key column and that your header row is on line 1, not buried under a banner row.
Can I automate this in a pipeline?
The dedup logic is the same browser-local engine used in the tool. For a repeatable workflow, pair it with csv-merger (combine sources) upstream and your CRM import downstream. There is no required server round-trip — your lead PII stays on your machine throughout.
Privacy first
Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.