How to clean duplicate crm contacts in excel using approximate name matching
- Step 1Export the contact list from your CRM — Salesforce: Reports/Data Export → CSV/XLSX. HubSpot: Contacts → Export → CSV/XLSX. Zoho: Contacts → Export. Pick a format the tool accepts (
.xlsxor.csv) and make sure the name column is present. - Step 2Drop the file onto the tool — The tool reads the first sheet only and treats the top row as headers. If your export has a cover/summary tab, move the contact rows to the first sheet or export them alone.
- Step 3Type the name column into the Key column field — The Key column is a free-text input — type the exact header, e.g.
Full Name,Name, orContact Name. Case and spelling must match. Only this column is scored; email, phone, account, and owner columns ride along unchanged. - Step 4Set the threshold for personal names — Default is 85; for personal names a higher 90–95 is safer because short names like
Jackson/Jasonare close in edit distance. Enter a value from 50 to 100 and re-run to compare results. - Step 5Process and review the merged-contacts report — After processing, the panel shows how many contacts were merged and previews up to 5 as
Row N "value" ≈ "matchedValue" (score%)(up to 50 in the downloadable report). Scan it for false merges — two genuinely different people who share a similar name. - Step 6Download and re-import — Download
deduped-fuzzy.xlsx(sheetDeduped, kept records with all columns). If a real pair merged, raise the threshold and re-run; if obvious duplicates survived, lower it. Then re-import to your CRM.
Common CRM name patterns and how they score
Normalized Levenshtein similarity (case-/whitespace-insensitive) for typical CRM duplicate names. Approximate scores; verify in your own report.
| Pattern | Example pair | Approx. similarity | Removed at 85% / 92%? |
|---|---|---|---|
| Trailing space / case | John Smith / john smith | 100% | Yes / Yes |
| Nickname spelling | Jon Smith / John Smith | ~91% | Yes / No |
| Initial vs full | J. Smith / John Smith | ~70% | No / No |
| Mc/Mac variant | McDonald / MacDonald | ~89% | Yes / No |
| Single typo | Catherine / Catherne | ~89% | Yes / No |
| Different people, similar name | Jackson / Jason | ~71% | No / No |
Threshold choice for contact data
The threshold is the only similarity setting (50–100, default 85). Higher protects against false merges of real people; lower catches more spelling drift.
| Goal | Threshold | Trade-off |
|---|---|---|
| Conservative — avoid merging real people | 92–95 | Misses some nickname/typo duplicates; safest for re-import |
| Balanced default | 85 | Catches most nickname and typo variants; review the report for short-name collisions |
| Aggressive cleanup before manual review | 75–80 | Surfaces more candidates but raises false-positive rate on short names |
Limits and behavior for CRM-sized lists
Fuzzy Dedup is Pro-gated. Free tier cannot run it. Single key column only — no composite name+email key.
| Aspect | Behavior |
|---|---|
| Tier required | Pro minimum (Free is blocked) |
| Pro capacity | 50 MB · 100,000 rows · 5 files |
| Pro-media / Developer | 200 MB / 500,000 rows · or 500 MB / unlimited |
| Key columns | One only — concatenate name+email into a column first for composite matching |
| Survivor | First record of each cluster (file order) |
| Output | deduped-fuzzy.xlsx, sheet Deduped, all columns preserved |
Cookbook
Real CRM contact patterns, the threshold that catches each, and what the report looks like. Report row numbers are 1-based and count the header row, so the first contact is Row 2.
Nickname spelling: Jon vs John
The classic CRM duplicate. Jon Smith vs John Smith is one inserted character in a 10-character string, scoring around 91% — caught at the default 85% but not at a strict 92%.
Input (column: Full Name) Full Name,Email Jon Smith,jon@acme.com John Smith,jsmith@acme.com threshold: 85 Report 1 near-duplicate row(s) removed · 1 rows kept. Row 3 "John Smith" ≈ "Jon Smith" (91%) Output keeps the FIRST row (Jon Smith, jon@acme.com). Note: the two emails differ — fuzzy dedup ignores email and keeps only the first record's columns.
Re-import artifact: trailing space + case
A re-import or copy-paste often produces John Smith or JOHN SMITH. Because values are trimmed and lowercased first, these always score 100% and collapse regardless of threshold.
Input (column: Name) Name,Owner John Smith,Alice JOHN SMITH,Bob John Smith ,Carol threshold: 95 Report 2 near-duplicate row(s) removed · 1 rows kept. Row 3 "JOHN SMITH" ≈ "John Smith" (100%) Row 4 "John Smith " ≈ "John Smith" (100%) Output keeps John Smith / Owner: Alice (first row).
Protecting different people with a high threshold
Jackson and Jason score around 71%. At the default 85% they stay separate — good. But at an aggressive 70% they'd merge, deleting a real contact. For personal names, keep the bar high.
Input (column: First Name) First Name Jackson Jason threshold: 85 -> 0 removed (71% < 85) [correct] threshold: 70 -> Report 1 near-duplicate row(s) removed · 1 rows kept. Row 3 "Jason" ≈ "Jackson" (71%) [FALSE MERGE] Lesson: don't drop below ~90% for short personal names.
Composite name + email key (preparation step)
The tool dedups on one column. To require BOTH a similar name and the same email (so two different John Smiths stay separate), build a combined key column before processing, then point the Key column at it.
Step 1 — add a combined column in your sheet: Full Name,Email,namekey John Smith,a@x.com,John Smith|a@x.com Jon Smith,a@x.com,Jon Smith|a@x.com John Smith,b@y.com,John Smith|b@y.com Step 2 — Fuzzy Dedup on Key column: namekey, threshold 90 Rows 1 & 2 (same email, similar name) -> ~95%, collapse. Row 3 (same name, different email) -> low score, KEPT. Result: the two real John Smiths stay separate.
Sort so the best record wins
First-occurrence-wins means whichever record sits first survives. To keep the most complete or most recently modified contact, sort that record to the top before processing.
Before sort (sparse record first): Full Name,Phone,Last Modified John Smith,,2024-01-01 John Smith,+1 555 0100,2026-05-01 After sorting by Last Modified DESC (newest first), re-run: Full Name,Phone,Last Modified John Smith,+1 555 0100,2026-05-01 <- kept (first now) John Smith,,2024-01-01 <- removed The tool itself has no "keep most complete" option — sorting is the lever.
Edge cases and what actually happens
Two genuinely different people with the same name
False mergeFuzzy Dedup scores the name string only — it has no awareness of email, phone, or account. Two distinct customers both named John Smith score 100% and one is removed. If your list can contain real same-name contacts, build a name+email composite key first or raise the threshold and review the report; the tool cannot distinguish them on the name alone.
Free tier account
Pro requiredThe processor throws Fuzzy Deduplicator requires Pro tier. for Free users before reading any rows. CRM lists also tend to exceed Free's 10,000-row Excel cap — Pro raises that to 100,000 rows / 50 MB / 5 files.
Name column header typed incorrectly
Empty matchesThe Key column is free text; if it doesn't match a header exactly, every row reads an empty name. All empty values score 100% against each other and the whole list collapses to one row. Copy the header verbatim (mind capitalization and spaces) and confirm the kept count is plausible before re-importing.
Composite key needed but only one column scored
By designThere is no multi-column option. To dedup on name AND email together, concatenate them into a new column before uploading and point the Key column at that combined field. Scoring then reflects both parts.
Last name vs first name ordering
Order-sensitiveSmith, John and John Smith are a large edit distance apart and won't match at a sensible threshold. Normalize name order before deduplicating (e.g. split and recombine columns) so the same person is represented the same way.
Email or phone is the real identity
Wrong columnIf the reliable identifier is the email, fuzzy-matching names is the wrong approach — emails should be exact-deduped. Run the exact csv-deduplicator on the email column instead, and reserve fuzzy name matching for catching the same person across different emails.
Survivor is the wrong record
Order-dependentBecause the first row of a cluster wins, an older or sparser record can survive over a newer, fuller one. Sort the export so the preferred record is first (e.g. by Last Modified descending) before processing — the tool has no "keep most complete" setting.
Merging across two CRM exports
Wrong toolFuzzy Dedup cleans one file. To reconcile contacts across two separate exports (e.g. Salesforce vs HubSpot) by approximate name and bring columns from both, use excel-fuzzy-merger (Developer tier).
Output keeps only one record per cluster
By designThe clean .xlsx contains the kept record's full row; the removed duplicates' unique data (a phone the survivor lacks, a second email) is gone from the file and exists only in the report listing. If you need to merge field values, review the report and reconcile manually or in your CRM's merge UI.
Very large mostly-unique list feels slow
ExpectedEach row is compared to the list of kept representatives, which grows as distinct names accumulate. A 100,000-row list of mostly-unique names does far more comparisons than a heavily-duplicated one. It still runs in the browser; close other heavy tabs if the UI stalls.
Frequently asked questions
What threshold works best for personal names?
90–95% is recommended for personal names. Short names like Jackson/Jason or Jan/Jon are close in edit distance, so a high bar avoids merging different people. The default 85% catches more nickname/typo variants but increases the chance of a false merge on short names — read the report.
Can I deduplicate on name + email together?
Not directly — the tool scores one Key column. Concatenate name and email into a new column in your sheet (e.g. John Smith|a@x.com) and point the Key column at it. Then two different John Smiths with different emails stay separate while a similar name with the same email collapses.
Will it merge two different people who happen to share a name?
Yes — it scores the name string only and has no idea about email, phone, or account. Identical names score 100% and one is removed. If real same-name contacts are possible, use a composite key or raise the threshold and verify the report before re-importing.
Which contact does it keep?
The first record of each cluster in file order. To keep the most complete or most recent contact, sort that record to the top before processing — there is no "keep most complete" option in the tool.
Does the merged contact combine fields from both records?
No. Only the first record's row is kept, with all its columns. The removed duplicate's unique fields (e.g. a phone the survivor lacks) are not merged into the survivor — they appear only in the report. Reconcile those manually or in your CRM's merge tool.
Is matching case-sensitive?
No — values are lowercased and trimmed before scoring, so John Smith, john smith, and John Smith all score 100% and collapse even at a 100% threshold. That makes re-import casing/whitespace artifacts disappear automatically.
Which CRMs does this work with?
Any that export .xlsx or .csv — Salesforce, HubSpot, Zoho, Pipedrive, Dynamics, and others. The tool is column-agnostic: point the Key column at whatever the export calls the name field (Full Name, Name, Contact Name).
How many contacts can I process?
Pro tier handles 100,000 rows / 50 MB / 5 files; Pro-media 500,000 rows / 200 MB; Developer is unlimited rows / 500 MB. Free tier cannot run the tool at all.
Can I preview the merges before they happen?
Deduplication runs when you process; the panel then shows what was merged (count plus up to 5 previewed pairs, up to 50 in the report). There is no confirm-each-merge step. To change the result, adjust the threshold and re-run on the original file.
Is my contact data uploaded anywhere?
No. Everything runs in your browser via SheetJS — names, emails, and phone numbers stay on your machine, and the clean .xlsx is generated and downloaded locally.
What if I dedup the wrong way — can I undo?
Your input file is untouched; the output is a separate deduped-fuzzy.xlsx. To recover, just re-process the original with a different threshold. Keep the original export until you've confirmed the clean list.
Should I exact-dedup emails first?
Often yes. Run the exact csv-deduplicator on the email column to collapse identical-email records cheaply, then run Fuzzy Dedup on the name column to catch the same person appearing under two different emails.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.