Dedupe Marketing Leads by Email

How to dedupe marketing leads by email

Step 1
Combine your lead sources into one CSV first — The deduplicator works on a single file. If your leads are spread across a webinar export, an ad-platform CSV, and a CRM pull, concatenate them first with csv-merger — list them in priority order (most-trusted source first) so the kept row is the one you want.
Step 2
Drop the merged file onto the deduplicator above — Drag in a .csv (or an .xlsx/.xls/.ods — the tool reads the first sheet and converts it to CSV automatically). PapaParse auto-detects the delimiter, so comma- and semicolon-delimited exports both work.
Step 3
Choose your unique key column — Pick the column that uniquely identifies a lead from the Unique key column dropdown. For lead lists this is almost always Email. If emails are unreliable, dedupe on Phone or a CRM ID instead — but pick the single most reliable identifier, because the tool keys on one column only.
Step 4
Decide on case sensitivity — Leave Case-sensitive keys unchecked (the default) for email dedup — Sue@Acme.com and sue@acme.com are the same person and should collapse. Only check it when the key is genuinely case-meaningful (rare for leads; relevant for some case-sensitive external IDs).
Step 5
Run and read the five stat tiles — Click Remove duplicates. The result panel shows Rows in, Rows out, Duplicates (removed), Unique keys, and Empty keys. If Empty keys is high, you have leads with no email — review them before importing, since they slipped through dedup untouched.
Step 6
Download and import — Click Download CSV (or download back to .xlsx if you uploaded a spreadsheet). The output keeps the first row per key in original order. Import the deduplicated file into your CRM or ESP — your unique-contact count now matches the row count exactly.

What the two options actually do

The deduplicator has exactly two controls. Everything else is automatic. There is no multi-column key, no 'keep last', and no fuzzy matching.

Control	Effect	Default	When to change it
Unique key column (dropdown)	Rows sharing this column's value (after trim + optional lowercase) are duplicates; only the first is kept	First column (index 0)	Set it to `Email` — or `Phone`/`CRM ID` if emails are unreliable
Case-sensitive keys (checkbox)	Off = `Sue@x.com` matches `sue@x.com`. On = only byte-exact values match	Off (case-insensitive)	Leave off for emails; turn on only for case-meaningful external IDs
Whitespace handling	Always trimmed for the comparison key ( `a@x.com` matches `a@x.com`); the output cell keeps its original text	Always on (not configurable)	n/a
Blank-key rows	Never deduped — every row with an empty key value passes through untouched and is counted as an Empty key	Always preserved	n/a

Lead source quirks and the right key column

Typical multi-source lead lists and which column makes the most reliable dedup key.

Lead source	Common duplicate cause	Best key column	Notes
Facebook / Meta Lead Ads CSV	Same person fills two different lead forms	`email`	Meta lowercases emails on capture, but other sources won't — keep case-insensitive on
Webinar platform (Zoom, GoTo)	Registered + attended exported separately	`Email`	Casing often differs from the original signup; default case-insensitive matching handles it
Gated-content / form fills	Repeat downloads by the same prospect	`Email`	Trailing spaces from autofill are ignored in matching
CRM export (HubSpot/Salesforce)	Re-imported list created near-duplicates	`Record ID` / `CRM ID`	If you have a stable internal ID, it's a cleaner key than email
Cold-outreach purchased list	Overlap with your existing CRM	`Email`	Concatenate CRM first so your owned record wins (first occurrence kept)

Cookbook

Real before/after rows from multi-source lead lists. Emails and names anonymised. The deduplicator keeps the first row per key value.

Case-different email duplicate across two ad sources

Example

A prospect filled a LinkedIn lead form (which preserved their capitalised email) and later a webform (lowercase). To your CRM these look like two contacts. With case-insensitive matching (the default) they collapse to one — the first row in the file wins.

Input (webinar list concatenated above the ad list):
email,first_name,source
Sue@Acme.com,Sue,Webinar
bob@globex.io,Bob,LinkedIn
sue@acme.com,Sue,WebForm

Key column: email   ·   Case-sensitive keys: OFF

Output (first occurrence kept):
email,first_name,source
Sue@Acme.com,Sue,Webinar
bob@globex.io,Bob,LinkedIn

Stats: Rows in 3 · Rows out 2 · Duplicates 1 · Unique keys 2 · Empty keys 0

Trailing space from spreadsheet copy-paste

Example

An ops teammate pasted emails from another sheet, dragging in a trailing space on some cells. Visually identical, byte-different. The deduplicator trims the key before comparing, so the padded copy is recognised as a duplicate.

Input (note the trailing space on row 1):
email,campaign
jan@umbrella.co ,Q2 Nurture
jan@umbrella.co,Q2 Nurture

Key column: email   ·   Case-sensitive keys: OFF

Output:
email,campaign
jan@umbrella.co ,Q2 Nurture

Note: the surviving cell keeps its original trailing space —
trim only affects the matching key, not the stored value.
To clean the visible value too, run csv-whitespace-trimmer after.

Dedupe on phone when emails are missing

Example

An event-booth list captured phone numbers but inconsistent emails. Switch the key column to phone. Rows with the same phone collapse; rows missing a phone are kept as Empty keys for follow-up.

Input:
name,email,phone
Lee,,555-0100
Lee,lee@x.com,555-0100
Pat,pat@y.com,

Key column: phone   ·   Case-sensitive keys: OFF

Output (first row per phone kept; blank-phone row preserved):
name,email,phone
Lee,,555-0100
Pat,pat@y.com,

Stats: Rows in 3 · Rows out 2 · Duplicates 1 · Empty keys 1

Source priority: keep the CRM row, drop the cold-list copy

Example

You want your owned CRM record (with lifecycle stage and owner) to survive over a purchased cold-list row. Because the first occurrence wins, concatenate the CRM file ABOVE the cold list before deduping.

Input (CRM rows first, cold list appended below):
email,owner,stage
tom@beta.com,Dana,SQL
ava@beta.com,Raj,MQL
tom@beta.com,,Cold

Key column: email

Output (CRM row for Tom kept, cold copy dropped):
email,owner,stage
tom@beta.com,Dana,SQL
ava@beta.com,Raj,MQL

Same person, different domains — NOT a duplicate

Example

A lead used a work email and a personal email. These are different key values, so the deduplicator correctly keeps both. The tool dedupes on exact key equality, not identity-resolution heuristics.

Input:
email,name
maya@workco.com,Maya
maya.l@gmail.com,Maya

Key column: email

Output (both kept — different keys):
email,name
maya@workco.com,Maya
maya.l@gmail.com,Maya

For cross-field identity matching, dedupe separately on each
candidate key, or pre-normalise with csv-find-replace first.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Lead list exceeds the free 500-row limit

Pro required

The CSV Deduplicator is a Pro tool, and the free tier caps input at 500 rows / 2 MB. A real merged lead list is usually far bigger. Pro raises this to 100,000 rows / 100 MB. For lists beyond 100k rows, split with csv-row-splitter, dedupe each chunk, then concatenate and dedupe once more.

Same person under two different email addresses

By design

The tool matches on exact key equality (after trim + optional lowercase). maya@workco.com and maya.l@gmail.com are different keys, so both are kept. There is no fuzzy or cross-field identity resolution. If you need to collapse a person across multiple emails, normalise to a single canonical address first with csv-find-replace.

Gmail dot/plus aliases treated as distinct

By design

s.ue@gmail.com, sue@gmail.com, and sue+ads@gmail.com all deliver to the same Gmail inbox, but they are different strings — the deduplicator keeps all three. Gmail-style normalisation (stripping dots and +tags) is not built in. Pre-normalise with csv-find-replace (regex \+[^@]* → empty) if your audience is Gmail-heavy.

You wanted to keep the LAST duplicate, not the first

First-row only

The deduplicator always keeps the FIRST occurrence of each key. There is no 'keep last' option. To make the most recent record win, sort the file so the row you want to keep comes first — use csv-sorter on a date column descending — then dedupe.

Need to dedupe on email AND campaign together

Single key only

The key is one column. There is no composite/multi-column key. To dedupe on a combination, first merge the columns into one with csv-column-merger (e.g. email|campaign), dedupe on that merged column, then split it back with csv-column-value-splitter if needed.

Rows with a blank email all kept

Preserved

Any row whose key cell is empty (after trimming) is never treated as a duplicate — it passes straight through and is counted under Empty keys. This is intentional so you don't lose leads that are merely missing an email. Filter them out first with csv-column-filter (email is_not_empty) if you only want keyed leads.

Just wanting to SEE the duplicates, not remove them

Use the finder

The deduplicator removes duplicate rows and gives counts, but it doesn't list each duplicate group. To audit which leads are duplicated and how many times — before deciding to delete — use csv-duplicate-finder, which marks every row YES/NO and groups the matches.

Semicolon-delimited EU export

Supported

PapaParse auto-detects the delimiter, so a European-locale lead export using ; is parsed correctly without any setting. The output is written with standard comma delimiting. If a downstream system needs semicolons back, convert after with a find/replace on the delimiter.

Quotes get stripped from already-quoted fields

Expected

Output is written with quotes only where required, so fields that were defensively quoted in the source ("Acme, Inc.") are re-quoted automatically because they contain a comma, while fields that didn't need quotes lose them. Cell values are unchanged — only the surrounding quoting is normalised.

Header row counted as data

Header-aware

The first row is always treated as the header and is never deduped against the body. If your file has a leading metadata/banner row before the real header, remove it first (e.g. with a spreadsheet or csv-row-limiter with an offset) so the true header lands on row 1.

Frequently asked questions

Does the deduplicator match emails case-insensitively by default?

Yes. The default has Case-sensitive keys unchecked, so the comparison key is the cell value trimmed and lowercased. Sue@Acme.com, sue@acme.com, and SUE@acme.com all match and collapse to one lead. Only check the box if you genuinely need byte-exact matching.

Which row survives when there are duplicates?

The first occurrence of each key value, in file order. Later duplicates are dropped. To control which row wins, order your sources before deduping (most-trusted first) or sort by a recency column with csv-sorter.

Can I dedupe on email and phone at the same time?

Not directly — the key is a single column. Run it once on email, then on the result run it again on phone if you want both passes. For a true composite key, merge the two columns first with csv-column-merger, dedupe on the merged column, then split it back.

What happens to leads with no email address?

They're kept. Any row with a blank key value is excluded from dedup entirely and counted under Empty keys in the result panel. This avoids silently deleting leads that are just missing an email. Filter empties out beforehand with csv-column-filter if you only want keyed records.

Will this catch Gmail dot and plus-alias duplicates?

No. sue@gmail.com, s.ue@gmail.com, and sue+ads@gmail.com are different strings and all survive. Gmail address normalisation isn't built in. If your list is Gmail-heavy, strip the +tag and dots first with csv-find-replace using a regex pattern, then dedupe.

Does my lead data get uploaded anywhere?

No. Parsing and deduplication run entirely in your browser via PapaParse. Lead emails, phones, and names never reach a server. The only server-side write is an anonymous usage counter for signed-in dashboard stats — no row content. This is important for GDPR/CCPA-covered prospect data.

What file formats can I upload?

CSV directly, plus Excel .xlsx/.xls and OpenDocument .ods. For spreadsheets the tool reads the first sheet, converts it to CSV, dedupes, and can download the result back as .xlsx. The delimiter is auto-detected, so comma and semicolon files both work.

How big a lead list can it handle?

Free tier caps at 500 rows / 2 MB, and this is a Pro tool. Pro raises the limit to 100,000 rows / 100 MB. For larger lists, split with csv-row-splitter, dedupe each part, then concatenate and run one final dedupe pass.

Can I just preview which leads are duplicated before deleting?

Use csv-duplicate-finder for that. It adds an _is_duplicate YES/NO column and groups matches so you can review them. The deduplicator is for actually removing the extra rows once you've decided.

Does it remove duplicate columns or duplicate headers?

No — it operates on rows only, keyed by one column. Duplicate columns or repeated header names aren't touched. For removing columns use csv-column-remover; for renaming clashing headers use csv-header-rename.

Why does my output have fewer rows than 'Rows in' minus 'Duplicates'?

It shouldn't — Rows out always equals Rows in minus Duplicates removed (empty-key rows are kept and don't count as duplicates). If the numbers look off, check that you picked the intended key column and that your header row is on line 1, not buried under a banner row.

Can I automate this in a pipeline?

The dedup logic is the same browser-local engine used in the tool. For a repeatable workflow, pair it with csv-merger (combine sources) upstream and your CRM import downstream. There is no required server round-trip — your lead PII stays on your machine throughout.

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

How to dedupe marketing leads by email

Step 1
Combine your lead sources into one CSV first — The deduplicator works on a single file. If your leads are spread across a webinar export, an ad-platform CSV, and a CRM pull, concatenate them first with csv-merger — list them in priority order (most-trusted source first) so the kept row is the one you want.
Step 2
Drop the merged file onto the deduplicator above — Drag in a .csv (or an .xlsx/.xls/.ods — the tool reads the first sheet and converts it to CSV automatically). PapaParse auto-detects the delimiter, so comma- and semicolon-delimited exports both work.
Step 3
Choose your unique key column — Pick the column that uniquely identifies a lead from the Unique key column dropdown. For lead lists this is almost always Email. If emails are unreliable, dedupe on Phone or a CRM ID instead — but pick the single most reliable identifier, because the tool keys on one column only.
Step 4
Decide on case sensitivity — Leave Case-sensitive keys unchecked (the default) for email dedup — Sue@Acme.com and sue@acme.com are the same person and should collapse. Only check it when the key is genuinely case-meaningful (rare for leads; relevant for some case-sensitive external IDs).
Step 5
Run and read the five stat tiles — Click Remove duplicates. The result panel shows Rows in, Rows out, Duplicates (removed), Unique keys, and Empty keys. If Empty keys is high, you have leads with no email — review them before importing, since they slipped through dedup untouched.
Step 6
Download and import — Click Download CSV (or download back to .xlsx if you uploaded a spreadsheet). The output keeps the first row per key in original order. Import the deduplicated file into your CRM or ESP — your unique-contact count now matches the row count exactly.

What the two options actually do

The deduplicator has exactly two controls. Everything else is automatic. There is no multi-column key, no 'keep last', and no fuzzy matching.

Control	Effect	Default	When to change it
Unique key column (dropdown)	Rows sharing this column's value (after trim + optional lowercase) are duplicates; only the first is kept	First column (index 0)	Set it to `Email` — or `Phone`/`CRM ID` if emails are unreliable
Case-sensitive keys (checkbox)	Off = `Sue@x.com` matches `sue@x.com`. On = only byte-exact values match	Off (case-insensitive)	Leave off for emails; turn on only for case-meaningful external IDs
Whitespace handling	Always trimmed for the comparison key ( `a@x.com` matches `a@x.com`); the output cell keeps its original text	Always on (not configurable)	n/a
Blank-key rows	Never deduped — every row with an empty key value passes through untouched and is counted as an Empty key	Always preserved	n/a

Lead source quirks and the right key column

Typical multi-source lead lists and which column makes the most reliable dedup key.

Lead source	Common duplicate cause	Best key column	Notes
Facebook / Meta Lead Ads CSV	Same person fills two different lead forms	`email`	Meta lowercases emails on capture, but other sources won't — keep case-insensitive on
Webinar platform (Zoom, GoTo)	Registered + attended exported separately	`Email`	Casing often differs from the original signup; default case-insensitive matching handles it
Gated-content / form fills	Repeat downloads by the same prospect	`Email`	Trailing spaces from autofill are ignored in matching
CRM export (HubSpot/Salesforce)	Re-imported list created near-duplicates	`Record ID` / `CRM ID`	If you have a stable internal ID, it's a cleaner key than email
Cold-outreach purchased list	Overlap with your existing CRM	`Email`	Concatenate CRM first so your owned record wins (first occurrence kept)

Cookbook

Real before/after rows from multi-source lead lists. Emails and names anonymised. The deduplicator keeps the first row per key value.

Case-different email duplicate across two ad sources

Example

Input (webinar list concatenated above the ad list):
email,first_name,source
Sue@Acme.com,Sue,Webinar
bob@globex.io,Bob,LinkedIn
sue@acme.com,Sue,WebForm

Key column: email   ·   Case-sensitive keys: OFF

Output (first occurrence kept):
email,first_name,source
Sue@Acme.com,Sue,Webinar
bob@globex.io,Bob,LinkedIn

Stats: Rows in 3 · Rows out 2 · Duplicates 1 · Unique keys 2 · Empty keys 0

Trailing space from spreadsheet copy-paste

Example

Input (note the trailing space on row 1):
email,campaign
jan@umbrella.co ,Q2 Nurture
jan@umbrella.co,Q2 Nurture

Key column: email   ·   Case-sensitive keys: OFF

Output:
email,campaign
jan@umbrella.co ,Q2 Nurture

Note: the surviving cell keeps its original trailing space —
trim only affects the matching key, not the stored value.
To clean the visible value too, run csv-whitespace-trimmer after.

Dedupe on phone when emails are missing

Example

An event-booth list captured phone numbers but inconsistent emails. Switch the key column to phone. Rows with the same phone collapse; rows missing a phone are kept as Empty keys for follow-up.

Input:
name,email,phone
Lee,,555-0100
Lee,lee@x.com,555-0100
Pat,pat@y.com,

Key column: phone   ·   Case-sensitive keys: OFF

Output (first row per phone kept; blank-phone row preserved):
name,email,phone
Lee,,555-0100
Pat,pat@y.com,

Stats: Rows in 3 · Rows out 2 · Duplicates 1 · Empty keys 1

Source priority: keep the CRM row, drop the cold-list copy

Example

Input (CRM rows first, cold list appended below):
email,owner,stage
tom@beta.com,Dana,SQL
ava@beta.com,Raj,MQL
tom@beta.com,,Cold

Key column: email

Output (CRM row for Tom kept, cold copy dropped):
email,owner,stage
tom@beta.com,Dana,SQL
ava@beta.com,Raj,MQL

Same person, different domains — NOT a duplicate

Example

A lead used a work email and a personal email. These are different key values, so the deduplicator correctly keeps both. The tool dedupes on exact key equality, not identity-resolution heuristics.

Input:
email,name
maya@workco.com,Maya
maya.l@gmail.com,Maya

Key column: email

Output (both kept — different keys):
email,name
maya@workco.com,Maya
maya.l@gmail.com,Maya

For cross-field identity matching, dedupe separately on each
candidate key, or pre-normalise with csv-find-replace first.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Lead list exceeds the free 500-row limit

Pro required

Same person under two different email addresses

By design

Gmail dot/plus aliases treated as distinct

By design

You wanted to keep the LAST duplicate, not the first

First-row only

Need to dedupe on email AND campaign together

Single key only

Rows with a blank email all kept

Preserved

Just wanting to SEE the duplicates, not remove them

Use the finder

Semicolon-delimited EU export

Supported

Quotes get stripped from already-quoted fields

Expected

Header row counted as data

Header-aware

Frequently asked questions

Does the deduplicator match emails case-insensitively by default?

Which row survives when there are duplicates?

Can I dedupe on email and phone at the same time?

What happens to leads with no email address?

Will this catch Gmail dot and plus-alias duplicates?

Does my lead data get uploaded anywhere?

What file formats can I upload?

How big a lead list can it handle?

Can I just preview which leads are duplicated before deleting?

Use csv-duplicate-finder for that. It adds an _is_duplicate YES/NO column and groups matches so you can review them. The deduplicator is for actually removing the extra rows once you've decided.

Does it remove duplicate columns or duplicate headers?

Why does my output have fewer rows than 'Rows in' minus 'Duplicates'?

Can I automate this in a pipeline?

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

How to dedupe marketing leads by email

What the two options actually do

Lead source quirks and the right key column

Cookbook

Case-different email duplicate across two ad sources

Trailing space from spreadsheet copy-paste

Dedupe on phone when emails are missing

Source priority: keep the CRM row, drop the cold-list copy

Same person, different domains — NOT a duplicate

Errors and edge cases

Lead list exceeds the free 500-row limit

Same person under two different email addresses

Gmail dot/plus aliases treated as distinct

You wanted to keep the LAST duplicate, not the first

Need to dedupe on email AND campaign together

Rows with a blank email all kept

Just wanting to SEE the duplicates, not remove them

Semicolon-delimited EU export

Quotes get stripped from already-quoted fields

Header row counted as data

Frequently asked questions

Does the deduplicator match emails case-insensitively by default?

Which row survives when there are duplicates?

Can I dedupe on email and phone at the same time?

What happens to leads with no email address?

Will this catch Gmail dot and plus-alias duplicates?

Does my lead data get uploaded anywhere?

What file formats can I upload?

How big a lead list can it handle?

Can I just preview which leads are duplicated before deleting?

Does it remove duplicate columns or duplicate headers?

Why does my output have fewer rows than 'Rows in' minus 'Duplicates'?

Can I automate this in a pipeline?

Privacy first

Related guides

Dedupe Marketing Leads by Email

How to dedupe marketing leads by email

What the two options actually do

Lead source quirks and the right key column

Cookbook

Case-different email duplicate across two ad sources

Trailing space from spreadsheet copy-paste

Dedupe on phone when emails are missing

Source priority: keep the CRM row, drop the cold-list copy

Same person, different domains — NOT a duplicate

Errors and edge cases

Lead list exceeds the free 500-row limit

Same person under two different email addresses

Gmail dot/plus aliases treated as distinct

You wanted to keep the LAST duplicate, not the first

Need to dedupe on email AND campaign together

Rows with a blank email all kept

Just wanting to SEE the duplicates, not remove them

Semicolon-delimited EU export

Quotes get stripped from already-quoted fields

Header row counted as data

Frequently asked questions

Does the deduplicator match emails case-insensitively by default?

Which row survives when there are duplicates?

Can I dedupe on email and phone at the same time?

What happens to leads with no email address?

Will this catch Gmail dot and plus-alias duplicates?

Does my lead data get uploaded anywhere?

What file formats can I upload?

How big a lead list can it handle?

Can I just preview which leads are duplicated before deleting?

Does it remove duplicate columns or duplicate headers?

Why does my output have fewer rows than 'Rows in' minus 'Duplicates'?

Can I automate this in a pipeline?

Privacy first

Related guides