How to excel fuzzy join vs exact join: choosing the right merge strategy
- Step 1Classify your join key — Numeric ID, UUID, or code that's identical in both files? Use exact join. Name, address, or description with formatting variation? Use fuzzy join.
- Step 2Try exact first when a clean key exists — Run the Sheet Joiner (left or inner join) on the shared key. It's fast and produces zero false positives.
- Step 3Isolate the rows exact join missed — If exact join leaves rows unmatched (or you used an inner join and rows fell out), pull those rows into a separate file.
- Step 4Fuzzy-join the remainder on a text field — Run Fuzzy Merger on the unmatched rows using a name or address column, starting at threshold 80.
- Step 5Audit the fuzzy matches — Sort the fuzzy output by
__match_scoreascending and verify the weakest pairings before merging the two passes back together. - Step 6Stack the results — Append the exact-matched rows and the validated fuzzy-matched rows into one final dataset.
Exact join vs fuzzy join at a glance
Side-by-side behavior of the Sheet Joiner (exact) and Fuzzy Merger (approximate). Both are in-browser and accept XLSX or CSV.
| Aspect | Sheet Joiner (exact) | Fuzzy Merger (approximate) |
|---|---|---|
| Match test | Key value identical | Levenshtein similarity ≥ threshold |
| Join types | left or inner | left only (all File A rows kept) |
| False positives | None | Possible — tune threshold, review scores |
| Cardinality | Standard key join | Best single File B match per File A row |
| Tier required | (see Sheet Joiner) | Developer |
| Output | XLSX | XLSX with __match_score / __matched_value |
Which tool for which key
Decision guide by join-key type. When in doubt, prefer exact on a clean key and fuzzy only on the residue.
| Join key | Recommended tool | Why |
|---|---|---|
| Numeric ID / UUID | Sheet Joiner | Identical keys; exact is fast and safe |
| Company name | Fuzzy Merger | Legal/trade-name variation breaks exact |
| Street address | Fuzzy Merger (low threshold) | Abbreviations vary; needs similarity |
| Product SKU | Sheet Joiner | Codes should match exactly if present |
| Product name | Fuzzy Merger | Word order and attributes vary |
Fuzzy Merger's real controls and limits
If you do choose fuzzy, these are the actual settings and constraints — no presets or algorithm picker exist.
| Setting | Value | Notes |
|---|---|---|
| Threshold range | 50–100 | Default 80; clamped by the input |
| Files required | Two (File A + File B) | Both join columns required |
| Match per row | Single best | No many-to-many, no candidate list |
| Complexity | O(n × m) | Slow on very large pairs |
| Tier | Developer | Free/Pro/Pro+Media cannot run it |
Cookbook
The exact-then-fuzzy two-pass pattern shown step by step, plus the behavioral differences that drive the choice.
Pass 1 — exact join on the clean ID
Start with the Sheet Joiner on a shared numeric ID. Rows that match are done and false-positive-free.
Sheet Joiner — leftKey: account_id, rightKey: account_id, joinType: left account_id name right_revenue 1001 Acme Corp 52000 1002 Globex null <- no exact ID match in File B Rows like 1002 fall through to pass 2.
Pass 2 — fuzzy join the leftovers on name
Take the rows exact join couldn't match and run Fuzzy Merger on the name field.
Fuzzy Merger — Left: name, Right: company_name, Threshold: 80 name __match_score __matched_value right_revenue Globex 100 globex 41000 Panel: '1 matched · 0 unmatched (threshold: 80%)'. Append to pass 1.
Why exact join would have dropped a row
An exact name join treats any difference as a miss; fuzzy recovers it. This is the core reason fuzzy exists.
Exact name join: 'Globex' vs 'globex' -> no match (case differs) Fuzzy join: 'Globex' vs 'globex' -> 100 (lowercased) -> match Exact would silently drop the row; fuzzy keeps it with a score.
Why a loose fuzzy join can mislead
Set the threshold too low and distinct records pair up. Always verify with the score column.
Threshold 55: 'Acme Corp' matched 'Acme Group' (58) <- wrong company Threshold 80: 'Acme Corp' -> (no match) <- correctly flagged Lower deliberately, then audit __match_score before trusting matches.
Choosing the tool by inspecting the key
A quick rule: if the key is numeric and clean, go exact; if it's free text, go fuzzy.
Key sample: 'A1042-X', 'A1043-X', 'A1044-X' -> clean codes -> exact join
Key sample: 'Acme Corp', 'Acme Corporation' -> text variation -> fuzzy join
For numeric keys, Levenshtein gives false highs ('1001' vs '1010' = 75%).Edge cases and what actually happens
Fuzzy chosen for a clean numeric ID
Wrong toolLevenshtein on '1001' vs '1010' scores 75% — a false positive waiting to happen. For clean numeric or code keys, always use the exact Sheet Joiner.
Exact join used on case-different keys
Silent drop riskExact join treats 'Globex' and 'globex' as different and drops the row. If your key has casing or punctuation variation, fuzzy join (which lowercases and trims) recovers it.
Fuzzy Merger needs Developer tier
Requires DeveloperFuzzy Merger is gated to Developer tier; the exact Sheet Joiner has its own (lower) tier requirement. Check tier before planning a two-pass workflow.
Inner vs left join confusion
By designSheet Joiner offers left or inner join; Fuzzy Merger is always effectively a left join — every File A row is kept, matched or '(no match)'. Don't expect an inner-join option in Fuzzy Merger.
Expecting many-to-many from fuzzy
Best-match onlyFuzzy Merger returns only the single best File B match per File A row. If you need every candidate above threshold, fuzzy join can't do it — it's strictly one match per row.
Both passes have overlapping matches
Watch for duplicatesIf you don't remove exact-matched rows before the fuzzy pass, a row can be matched twice. Filter out pass-1 matches before running the fuzzy pass to avoid duplicate rows.
Threshold left at default for a hard field
Tune required80% suits company names but is often too strict for addresses and too loose for short codes. Set the threshold to the field, don't accept the default blindly.
Very large datasets on the fuzzy pass
SlowFuzzy is O(n × m) while exact join scales far better. Run exact first to shrink the set, then fuzzy only the small remainder — never fuzzy-join two huge files directly.
Mixed-type key column (numbers stored as text)
Use exact, normalize firstIf IDs are sometimes text, sometimes numbers, exact join can miss them. Normalize the key type (e.g. via formula-to-value or a text cast) before an exact join rather than reaching for fuzzy.
Output expected as a single merged sheet
SupportedBoth tools output a single XLSX sheet. To combine the exact and fuzzy passes, append the two result files — there's no built-in step that runs both passes for you.
Frequently asked questions
When should I use exact join over fuzzy?
Whenever both files share a clean, identical key — numeric ID, UUID, or code. Exact join via the Sheet Joiner is faster and has zero false positives. Reserve fuzzy for text keys with formatting variation.
When is fuzzy join the right choice?
When the join key is a name, address, or description that varies between systems and there's no clean shared ID. Fuzzy Merger scores similarity so 'Google LLC' can match 'Google Inc.'
Can I use both tools together?
Yes — that's the recommended pattern. Run exact join on the clean key first, isolate the unmatched rows, fuzzy-join those on a text field, then append the two result sets.
Which scales better for large datasets?
Exact join, by far. Fuzzy Merger is O(n × m), so it slows on big pairs. Use exact first to reduce the set, then fuzzy only the small remainder.
Does fuzzy join produce false positives?
It can, if the threshold is too low — distinct records may score above it. That's why every output row carries __match_score: set the threshold deliberately and audit the low scores.
What join types does each tool support?
Sheet Joiner supports left and inner joins. Fuzzy Merger is always effectively a left join — every File A row is kept, matched or marked '(no match)'.
Can fuzzy join return multiple matches per row?
No. It returns only the single highest-scoring File B match per File A row. If you need all candidates, fuzzy join isn't the tool for that.
Why is fuzzy join bad for numeric IDs?
Levenshtein measures character edits, so '1001' and '1010' score 75% despite being different accounts. Numeric and code keys belong in an exact join.
Do both tools need the same tier?
No. Fuzzy Merger requires Developer tier; the Sheet Joiner has its own, lower requirement. Confirm both before designing a two-pass workflow.
What threshold should I set for the fuzzy pass?
Match it to the field: ~80% for company names, 70–80% for addresses, higher for clean text. The control accepts 50–100; the default is 80.
How do I avoid double-matching in the two-pass pattern?
Remove the rows that matched in the exact pass before running the fuzzy pass, so each row is matched at most once. Then append the two result files.
What output do both tools produce?
Both download a binary .xlsx workbook in your browser. The fuzzy output additionally carries __match_score and __matched_value columns for auditing.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.