How to remove near-duplicate contacts from excel before email marketing import
- Step 1Export your contact list — Download the subscriber list from your platform (or your source spreadsheet) as
.xlsxor.csv, with both the email and name columns present. Fuzzy Dedup reads the first sheet only. - Step 2Run exact email dedup first — Before fuzzy matching, dedup identical emails with the exact csv-deduplicator on the email column. This removes the unambiguous duplicates cheaply and shrinks the list before the fuzzy pass.
- Step 3Open Fuzzy Dedup and set the name Key column — Drop the email-deduped file onto this tool and type the name column's exact header into the Key column field (free text), e.g.
full_nameorName. - Step 4Choose a threshold for names — Default 85; for personal names 88–95 is safer to avoid merging different people. Enter a value from 50 to 100. Lower catches more variants but raises false positives on short names.
- Step 5Process and review the merged-contact report — The panel shows
{removedCount} removed · {keptCount} keptand previews up to 5 merges (50 in the downloadable report). Scan for false merges — two different subscribers with similar names — before trusting the cut. - Step 6Download and import — Download
deduped-fuzzy.xlsx(sheetDeduped, kept contacts with all columns). Import to Mailchimp/Klaviyo/HubSpot. If real subscribers merged, raise the threshold and re-run on the original.
Two-pass dedup: email then name
Exact email dedup and fuzzy name dedup catch different duplicates. Run both for the cleanest list.
| Pass | Tool | Column | Catches |
|---|---|---|---|
| 1 — exact | csv-deduplicator | Identical emails (a@x.com twice) | |
| 2 — fuzzy | Fuzzy Dedup (this tool) | name | Same person, different emails (Rob/Robert Johnson) |
| Optional — composite | Fuzzy Dedup on a name+email key | concatenated | Similar name AND same email (safer) |
Email-list name patterns and scores
Normalized Levenshtein similarity (case-/whitespace-insensitive) for common subscriber-name duplicates.
| Pattern | Example | Approx. similarity | Removed at 88%? |
|---|---|---|---|
| Trailing space / case | Rob Johnson / rob johnson | 100% | Yes |
| Nickname (long form) | Rob Johnson / Robert Johnson | ~73% | No (needs ~70%) |
| Nickname (short diff) | Mike Lee / Michael Lee | ~67% | No |
| Single typo | Jennifer / Jenifer | ~89% | Yes |
| Different people, similar name | Sara Cohen / Sarah Cohen | ~91% | Yes (possible false merge) |
Tier and behavior
Fuzzy Dedup is Pro-gated and dedups one column only.
| Aspect | Behavior |
|---|---|
| Tier required | Pro minimum (Free blocked) |
| Pro capacity | 50 MB · 100,000 rows · 5 files |
| Key columns | One only (concatenate for name+email) |
| Survivor | First contact of each cluster (file order) |
| Email awareness | None — scores the chosen column's string only |
| Output | deduped-fuzzy.xlsx, sheet Deduped, all columns kept |
Cookbook
Real email-list duplicate patterns, the threshold and pass that catch each, and the report. Report row numbers are 1-based including the header row.
Same person, two emails (the fuzzy-name win)
Exact email dedup can't catch this — the emails differ. Fuzzy on the name surfaces Rob Johnson and Robert Johnson as likely the same subscriber. Note the ~73% score means you need a threshold around 70%, lower than the default.
After exact email dedup, remaining rows: full_name,email Rob Johnson,rob@work.com Robert Johnson,robert@home.com Fuzzy Dedup on full_name, threshold 70 Report 1 near-duplicate row(s) removed · 1 rows kept. Row 3 "Robert Johnson" ≈ "Rob Johnson" (73%) Output keeps the FIRST row (Rob Johnson, rob@work.com). The second email is dropped — review the report first.
Signup-form casing/whitespace (always collapses)
Double signups often produce the same name with different casing or a trailing space. Trim+lowercase make these score 100%, so they collapse at any threshold — a safe, automatic win.
Input (column: full_name) full_name,email Rob Johnson,rob@x.com rob johnson ,rob@x.com threshold: 95 Report 1 near-duplicate row(s) removed · 1 rows kept. Row 3 "rob johnson " ≈ "Rob Johnson" (100%) Output: one Rob Johnson row. (Exact email dedup would also catch this since the email is identical — fuzzy is a backstop.)
False-merge danger: Sara vs Sarah Cohen
Sara Cohen and Sarah Cohen differ by one letter (~91%) and would merge at 88% — but they could be two different people. For email lists, prefer a name+email composite key so only same-email near-names collapse.
Input (column: full_name) full_name,email Sara Cohen,sara@a.com Sarah Cohen,sarah@b.com Fuzzy on full_name, threshold 88 Report: Row 3 "Sarah Cohen" ≈ "Sara Cohen" (91%) removed -> deletes a possibly-real subscriber (different email!) Safer: build a name|email key and dedup on that (see next).
Composite name+email key (safer for marketing)
To require a similar name AND the same email before merging, concatenate them into one column first. Then two different Cohens with different emails stay separate.
Step 1 — add a key column: full_name,email,key Rob Johnson,rob@x.com,Rob Johnson|rob@x.com robert johnson,rob@x.com,robert johnson|rob@x.com Sara Cohen,sara@a.com,Sara Cohen|sara@a.com Sarah Cohen,sarah@b.com,Sarah Cohen|sarah@b.com Step 2 — Fuzzy Dedup on key, threshold 85 Rows 1&2 (same email) -> high score, collapse. Rows 3&4 (different email) -> lower score, BOTH kept.
Keep the subscriber record you want
First-occurrence-wins decides who survives. To keep the contact with the engaged email or richer profile, sort that row to the top before processing.
Before (stale record first): full_name,email,last_open Robert Johnson,old@x.com,2023-02-01 Rob Johnson,rob@x.com,2026-05-20 Sort by last_open DESC, then Fuzzy Dedup (threshold 70): full_name,email,last_open Rob Johnson,rob@x.com,2026-05-20 <- kept (engaged) Robert Johnson,old@x.com,2023-02-01 <- removed No "keep most engaged" option exists — sorting is the lever.
Edge cases and what actually happens
Fuzzy name dedup ignores the email
False mergeFuzzy Dedup scores only the column you name. Run on the name column, it has no idea the emails differ, so two different people with similar names (Sara/Sarah Cohen) can collapse and you'd lose a real subscriber. Use a name+email composite key, or review the report, before importing.
Long-form nicknames score below the default
Missed duplicatesRob/Robert (~73%) and Mike/Michael (~67%) score below the 85% default, so they survive. Lower the threshold to ~65–70% to catch them — but that raises the false-merge rate on short names, so review carefully.
Free tier marketer
Pro requiredThe processor throws Fuzzy Deduplicator requires Pro tier. for Free accounts. Email lists also often exceed Free's 10,000-row Excel cap. Pro gives 100,000 rows / 50 MB / 5 files; for the exact email pass, the csv-deduplicator is available on lower tiers.
Name column header typed wrong
Empty matchesThe Key column is free text. If it doesn't match a header, every name reads empty, all blanks score 100%, and the whole list collapses to one contact. Copy the header verbatim and confirm the kept count looks right before importing.
Wanting to dedup on email only
Wrong toolFor identical-email duplicates, fuzzy matching is overkill and risks false positives (a@x.com vs a@x.con). Use the exact csv-deduplicator on the email column. Reserve Fuzzy Dedup for the name column to catch the same person across different emails.
Survivor has the wrong email
Order-dependentFirst-occurrence-wins keeps whichever row is first, which may carry a stale or unengaged email. Sort by last-open or signup date before processing so the preferred record survives — there's no "keep most engaged" setting.
Merged contact loses the second email/tags
By designOnly the first contact's row is kept; the duplicate's unique data (second email, extra tags, a phone) is not merged in and exists only in the report. If you need to combine subscriber attributes, reconcile from the report or use your platform's merge feature.
Platform also dedups on import
ExpectedMailchimp and others merge exact-email duplicates silently on import, so your row count may not match their contact count even after this tool. Do the email exact-dedup first so the numbers are predictable; fuzzy name dedup catches what the platform's email-only merge won't.
Multi-sheet export
First sheet onlyFuzzy Dedup reads only the first sheet. If your export has a summary tab, move the subscriber rows to the first sheet or export them alone before deduplicating.
Reconciling two lists before a migration
Wrong toolTo match subscribers across two separate exports (e.g. old platform vs new) by approximate name and merge their columns, use excel-fuzzy-merger (Developer tier), not this single-file deduper.
Frequently asked questions
Should I dedup on email or name?
Both, in order. First exact-dedup the email column with the csv-deduplicator to remove identical-email duplicates. Then run Fuzzy Dedup on the name column to catch the same person under different email addresses, which the email pass can't see.
What if the threshold flags too many legitimate unique contacts?
Raise it to 92–95%. You can also review the report (count plus up to 50 previewed Row N "value" ≈ "matched" (score%) lines) and re-run on the original at a higher threshold. For safety, dedup on a name+email composite key so different-email near-names don't merge.
Why didn't 'Rob Johnson' and 'Robert Johnson' merge at the default threshold?
They score about 73%, below the 85% default — Robert adds three characters to Rob. Lower the threshold to roughly 70% to catch long-form nicknames, then review the report because shorter names get riskier as you lower the bar.
Will it merge two different people with similar names?
Yes — it scores the name string only and ignores the email. Sara Cohen and Sarah Cohen (~91%) would merge at 88% even with different emails. Use a name+email composite key or a higher threshold, and review the report before importing.
Does it combine the two emails or tags when it merges contacts?
No. Only the first contact's row is kept; the duplicate's second email, extra tags, or phone are dropped from the file and appear only in the report. Reconcile from the report or use your email platform's merge feature for attribute-level merging.
How do I make sure the engaged record survives?
Sort the file so that row is first (e.g. by last-open date descending) before processing — first-occurrence-wins keeps it. There is no "keep most engaged" or "keep most complete" option in the tool.
Is the matching case-sensitive?
No — names are lowercased and trimmed before scoring, so Rob Johnson, rob johnson, and Rob Johnson all score 100% and collapse even at a 100% threshold. Signup-form casing and whitespace artifacts disappear automatically.
Which platforms can I import the cleaned list to?
Any that accept .xlsx/.csv — Mailchimp, Klaviyo, HubSpot, ActiveCampaign, and others. The clean output is a standard .xlsx; export to CSV if your platform prefers it.
How big a list can I process?
Pro tier handles 100,000 rows / 50 MB / 5 files; Pro-media 500,000 rows; Developer is unlimited. Free tier cannot run Fuzzy Dedup, though the exact email pass via csv-deduplicator is available on lower tiers.
Can I preview the merges before they happen?
Deduplication runs when you process; the panel then shows what merged (count plus up to 5 previews, up to 50 in the report). There's no per-contact confirm. To change the result, adjust the threshold or build a composite key and re-run on the original.
Is my subscriber data uploaded anywhere?
No. Everything runs in your browser via SheetJS — names and emails stay on your machine, and the clean .xlsx is generated and downloaded locally.
What if I deduped too aggressively?
Your input is untouched; the output is a separate deduped-fuzzy.xlsx. Re-process the original at a higher threshold (or with a composite key) to keep more contacts, and compare the reports before importing either version.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.