How to find duplicate leads in a csv before sales outreach
- Step 1Export or assemble the lead CSV — Download from your lead-gen tool, list provider, webinar platform, or form. If you have several lists, append them first with csv-merger so duplicates across sources are caught in one pass.
- Step 2Drop the file onto the tool — Parsing runs in your browser via PapaParse — contact data never reaches a server. Free runs handle up to 2 MB / 500 rows; Pro handles 100 MB / 100,000 rows.
- Step 3Select the Email column — In Find duplicates in column, choose
Email(orEmail Address). One key column per run, so do email first. - Step 4Keep case-insensitive on (the default) — Leave Case-sensitive matching off so email casing is ignored — the correct setting for addresses. Click Find duplicates.
- Step 5Download, then run the phone pass — Click Download Marked CSV. Then re-run the tool on the same source selecting the
Phonecolumn to catch same-number, different-email duplicates. Reconcile the two marked files. - Step 6Clean and import — Filter
_is_duplicate = YES, decide which record to keep (usually the most enriched), remove the rest, and import the de-duplicated list into your CRM or sequencer.
What the lead duplicate finder does
The full control set. One key column per pass, one checkbox, flag-only output. No fuzzy matching and no cross-column key.
| Control | Behaviour | Default |
|---|---|---|
| Find duplicates in column | Single key column (e.g. Email, then Phone on a second run); values are grouped to find repeats | First column |
| Case-sensitive matching | Off lowercases before comparing — correct for email. On requires identical casing | Off |
_is_duplicate column | YES if the key value appears 2+ times, NO if once; first occurrence is YES as well | Always added |
| Removal / merge | None — leads are flagged, not merged or deleted. Use csv-deduplicator to drop surplus rows | Zero removed |
Two-pass plan for lead lists
Because the key is one column, run separate passes to cover both identifiers. Reconcile the two marked outputs afterwards.
| Pass | Key column | Catches | Note |
|---|---|---|---|
| 1 | Email | Same inbox imported twice (case ignored by default) | Plus-addressing (me+a@x.com vs me@x.com) is NOT merged — different text |
| 2 | Phone | Same person under a different email but identical phone | Normalise phone format first (strip spaces/+) so +44 7… and 07… match |
| Optional | Combined Email|Phone | Exact same email AND phone | Build the column with csv-column-merger first |
Cookbook
Before/after rows from real prospect exports. Emails and phones anonymised; the _is_duplicate column is exactly what the tool appends.
Case-different email captured by the default
ExampleThe same prospect signed up via two forms — once with autocapitalised email, once lowercase. Case-insensitive matching (default) treats them as one inbox and flags both rows.
Input (leads.csv): Email,First Name,Source Sue@Acme.com,Sue,Webinar jon@borex.io,Jon,Cold list sue@acme.com,Sue,Newsletter Key column: Email · Case-sensitive: off (default) Output (leads.duplicates-marked.csv): Email,First Name,Source,_is_duplicate Sue@Acme.com,Sue,Webinar,YES jon@borex.io,Jon,Cold list,NO sue@acme.com,Sue,Newsletter,YES
Plus-addressing is NOT treated as the same lead
ExampleMatching is whole-cell exact, so me+webinar@x.com and me@x.com are different strings and are not flagged — even though they reach the same inbox. If you want them merged, strip the plus-tag first.
Input: Email,Campaign me+webinar@x.com,Spring me@x.com,Spring Key column: Email · Case-sensitive: off Output (not flagged — text differs): Email,Campaign,_is_duplicate me+webinar@x.com,Spring,NO me@x.com,Spring,NO Fix: use csv-find-replace with pattern \+[^@]+ -> empty, then re-run on Email.
Trailing space hides a duplicate email
ExampleMobile autocomplete added a trailing space to one email. Because matching does not trim, the pair is not detected. Trim the column first so genuine duplicates surface.
Input (space after the first .com): Email,Name lead@x.com ,A lead@x.com,B Key column: Email · Case-sensitive: off Output (missed): Email,Name,_is_duplicate lead@x.com ,A,NO lead@x.com,B,NO Fix: run csv-whitespace-trimmer first, then re-check.
Phone pass after the email pass
ExampleTwo leads have different emails but the same phone — only the second (Phone) pass catches them. Normalise phone formatting first so spacing and country prefixes don't block the match.
Input (after normalising phone to bare digits): Email,Phone work@x.com,447700900111 personal@y.com,447700900111 Key column: Phone · Case-sensitive: off Output: Email,Phone,_is_duplicate work@x.com,447700900111,YES personal@y.com,447700900111,YES
Reading the list-health summary
ExampleFor a 900-row merged lead list, the summary tells you how much overlap the sources had before you decide what to import.
Summary after Find duplicates (Email pass): Duplicate groups : 64 (email addresses that repeat) Extra copies : 71 (surplus rows to review/remove) Unique values : 765 (emails appearing exactly once) Meaning: 829 distinct emails across 900 rows; 64 addresses repeat, some more than twice. 71 rows are surplus copies.
Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
You expected duplicate leads merged automatically
By designThis tool flags only — it appends _is_duplicate so you choose which record to keep (the more enriched one usually wins). To physically remove duplicate leads and keep one per group, use csv-deduplicator.
Plus-addressed or dotted Gmail variants
Not matchedme+a@x.com vs me@x.com, or j.smith@gmail.com vs jsmith@gmail.com, are different text and won't group, even though they hit the same inbox. Normalise with csv-find-replace (strip +tag; for Gmail, remove dots in the local part) before running.
Phone numbers in mixed formats
Not matched+44 7700 900111 and 07700900111 are different strings. The phone pass only matches identical text, so normalise to a single format (strip spaces, +, and leading-zero/country-code differences) with csv-find-replace first.
Trailing spaces on emails from form fills
Not matchedWhole-cell matching does not trim, so lead@x.com and lead@x.com look distinct. Run csv-whitespace-trimmer before the email pass to avoid missing real duplicates.
First occurrence marked YES too
ExpectedAll members of a duplicate group are flagged YES, including the first. This is so you can compare every record and keep the best one. For surplus-only removal, use csv-deduplicator.
Empty email cells
Grouped togetherRows with a blank email all share one empty key and get flagged YES together; the duplicate list shows (empty). Filter out blanks (or use a different identifier) before importing.
Need to match on email AND phone in one pass
Single key onlyThe key is one column. Build a combined Email|Phone column with csv-column-merger and key on it for an exact composite match, or do two separate passes and reconcile.
Lead list over the free 500-row / 2 MB cap
Upgrade requiredFree runs cap at 2 MB and 500 rows; bigger lists are blocked with a Pro prompt. Pro raises it to 100 MB / 100,000 rows. Splitting with csv-row-splitter works for a one-off but won't catch duplicates across chunks.
Frequently asked questions
Should I check email and phone as separate passes?
Yes. The key is a single column, so run Email first, then run Phone on the same source to catch leads that have different emails but the same number. Reconcile the two marked files. For an exact composite match, combine the columns with csv-column-merger and key on that.
Does this work across multiple lists merged into one CSV?
Yes, and it's the recommended workflow. Append your lists with csv-merger first (they should share a header schema), then run the duplicate finder on the combined file so cross-source duplicates are caught in one pass.
Does it merge or delete the duplicate leads?
Neither. It appends an _is_duplicate column (YES/NO) and keeps every row so you decide which record to retain. To actually remove duplicates and keep one per group, use csv-deduplicator.
Will it catch plus-addressed emails like me+tag@x.com?
No. Matching is exact text, so me+tag@x.com and me@x.com are treated as different even though they share an inbox. Strip the plus-tag with csv-find-replace (pattern \+[^@]+) before the email pass if you want them merged.
Why didn't it flag two leads with the same email?
Usually an invisible difference: a trailing space, a non-breaking space, or different casing with case-sensitive matching accidentally on. Keep case-sensitive off for email and trim with csv-whitespace-trimmer first.
Is contact data uploaded?
No. All parsing and detection happen in your browser. Prospect names, emails, and phone numbers never reach a server. Only an anonymous usage counter is recorded when signed in, and it can be turned off in settings.
How do I make phone numbers match across formats?
Normalise to a single format before the phone pass — strip spaces, dashes, and the +, and reconcile country-code vs leading-zero forms — using csv-find-replace. Then +44 7700 900111 and 07700900111 reduce to the same digits and will match.
What does each summary number mean for my list?
Duplicate groups = how many email (or phone) values repeat. Extra copies = surplus rows you'd remove on a clean import. Unique values = leads appearing exactly once. Together they tell you how much overlap your sources had.
How large a lead list can I check?
Free runs handle up to 2 MB and 500 rows; larger files are blocked with a Pro prompt. Pro handles 100 MB and 100,000 rows. For lists beyond that, split with csv-row-splitter, accepting that cross-chunk duplicates won't be detected.
What's the output file?
Your original CSV plus a trailing _is_duplicate column, saved as <yourfile>.duplicates-marked.csv. Filter that column to YES in your spreadsheet to see only the duplicate leads.
Can I keep the most recently captured duplicate?
Sort the file by a capture-date column descending with csv-sorter before running, so the first occurrence within each group is the newest. The flag still marks all copies; you then keep whichever row your policy prefers.
Should I dedupe before or after CRM import?
Before. Most CRMs either reject or silently merge duplicate emails on import, which makes your row counts unpredictable. Flagging and cleaning the CSV first gives you a list whose count matches what lands in the CRM.
Privacy first
Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.