How to match and merge excel address fields with fuzzy joining for geo enrichment
- Step 1Normalize both address columns first — Trim whitespace with the whitespace trimmer and lowercase or unify casing — fewer formatting differences means higher, more honest scores.
- Step 2Drop your addresses as File A — File A holds the addresses you want to enrich; every File A address becomes one output row.
- Step 3Drop the reference file as File B — File B is the lookup — for example a delivery-zone or geocode table with the coordinates you want to append.
- Step 4Enter the address join columns — Type the address header into Left join col (e.g.
customer_address) and the reference header into Right join col (e.g.address). - Step 5Set Threshold % to 75 and run — Addresses tolerate a lower threshold than names. The panel reports 'X matched · Y unmatched (threshold: 75%)'.
- Step 6Audit low scores before trusting the geo data — Download, sort by
__match_scoreascending, and verify the weakest matches — a wrong address match means wrong coordinates.
Why addresses score the way they do
Scores from the live formula after lowercasing and trimming. Address abbreviations and unit syntax cost edits even when the location is identical.
| Address A | Address B | Score | Matched at 75%? |
|---|---|---|---|
| 123 Main St | 123 Main Street | 79% | Yes |
| 742 Evergreen Ave | 742 Evergreen Avenue | 85% | Yes |
| 10 Oak Blvd Ste 200 | 10 Oak Boulevard Suite 200 | 73% | No |
| 55 5th Ave Apt 3B | 55 Fifth Avenue Apt 3B | 68% | No |
| 1 Infinite Loop | 1 infinite loop | 100% | Yes |
Threshold guidance for address joins
Addresses carry more legitimate variation than names, so lower thresholds are common — but accept that each step down adds false positives. The control accepts 50–100.
| Threshold | Effect on addresses | Review burden |
|---|---|---|
| 85–90 | Only near-identical strings join | Low — but misses St/Street pairs |
| 75–80 | Common abbreviation variants join | Moderate — spot-check the tail |
| 65–70 | Aggressive — partial addresses join | High — verify every match |
When fuzzy join is the wrong choice
Fuzzy Merger is a string matcher, not a geocoder. Pick the right approach for the job.
| Situation | Use Fuzzy Merger? | Better alternative |
|---|---|---|
| Small file, no API budget | Yes | — |
| 100K+ addresses needing accuracy | No | Geocoding API (Google, HERE, Pelias) |
| Need rooftop coordinates | No | Address normalization + geocoder |
| Joining to a small zone lookup | Yes | — |
Cookbook
Address enrichment shown end-to-end. Note how reference columns arrive prefixed right_ and how every row keeps its score.
Appending coordinates from a zone file
Each customer address picks up the best-matching reference row's latitude and longitude, prefixed right_.
File A (customers.xlsx) File B (zones.xlsx) customer_address order address lat lng 123 Main St A-1 123 Main Street 40.71 -74.00 742 Evergreen Ave A-2 742 Evergreen Avenue 34.05 -118.24 Left: customer_address Right: address Threshold: 75 Output: customer_address order __match_score __matched_value right_lat right_lng 123 Main St A-1 79 123 Main Street 40.71 -74.00 742 Evergreen Ave A-2 85 742 Evergreen Avenue 34.05 -118.24
Catching the unit-syntax miss
Suite/apartment spelling differences can drop a real match below threshold. Sort by score to find them.
Threshold 75, sorted by __match_score: customer_address __match_score __matched_value 10 Oak Blvd Ste 200 73 (no match) 55 5th Ave Apt 3B 68 (no match) Both are real locations that just missed — normalize 'Ste'->'Suite' and re-run.
Pre-normalizing abbreviations to lift scores
Expanding St/Ave/Blvd before the merge pushes genuine matches above the threshold.
Before: '10 Oak Blvd Ste 200' vs '10 Oak Boulevard Suite 200' -> 73 (no match)
After expanding Blvd->Boulevard, Ste->Suite in File A:
'10 Oak Boulevard Suite 200' vs same -> 100 -> matches
Do the expansion in a helper column, then merge on it.Lowering the threshold deliberately
Drop to 70% to catch more, but accept that you must verify each match — a wrong address join means wrong geo data.
Threshold 75: 12 matched · 8 unmatched Threshold 70: 17 matched · 3 unmatched The 5 new matches scored 70-74 — eyeball each before using their coordinates.
Keeping a same-named column distinct
If both files have a 'city' column, the reference one becomes right_city, so you can compare the typed city against the matched zone's city as a sanity check.
File A headers: customer_address, city
File B headers: address, city, zone
Right join col: address
Output: customer_address | city | __match_score | __matched_value
| right_city | right_zone
If city != right_city on a match, the address pairing is likely wrong.Edge cases and what actually happens
Not on Developer tier
Requires DeveloperAddress enrichment via Fuzzy Merger needs Developer tier; lower tiers get 'Fuzzy Merger requires Developer tier.' before processing.
Abbreviations not normalized first
Lower scoresSt/Street, Ave/Avenue, Ste/Suite all cost edits. '123 Main St' vs '123 Main Street' is only 79%. Expand abbreviations in a helper column before merging to raise scores.
Only one address file uploaded
Two files requiredBoth the address file and the reference file are required. One file fails with 'Two files are required for fuzzy merging.'
Join column header wrong or empty
RequiredBoth Left and Right join col are required. A misspelled header reads as empty for every row, so nothing matches — verify the exact address-column header in each file.
Two reference rows on the same street
By designOnly the single best-scoring reference row is returned per address. Two nearby addresses on the same street won't both match — the closer string wins, ties go to the first encountered.
Address has no good reference match
PreservedThe address row is kept with __matched_value: "(no match)" and the best score found, so you can decide whether to lower the threshold or geocode it separately.
Low threshold matches the wrong building
False positive riskBelow ~75%, distinct addresses can score above threshold and pick up wrong coordinates. Always sort by __match_score and verify the tail before trusting low-confidence geo data.
Tens of thousands of addresses
Slow / use a geocoderMatching is O(n × m). For 50K+ addresses a dedicated geocoding API is both faster and more accurate. Fuzzy Merger fits small-to-medium files with no API budget.
Accented street or city names
Counts as edits'Montréal' vs 'Montreal' costs an edit. Strip accents with the non-ASCII remover first if you want them to score as identical.
Reference file is huge but addresses are few
Supportedn × m is symmetric, so a small File A against a large File B is the same cost as the reverse. Trim the reference to the relevant region to speed it up.
Frequently asked questions
What threshold is best for addresses?
Lower than for names — try 70–80%. Addresses carry legitimate variation (St/Street, Suite/Ste), so the default 80% can miss real matches. Trade a lower threshold for more manual review.
Should I use fuzzy join or a geocoding API?
For 100K+ addresses or where accuracy is critical, a geocoder (Google Maps, HERE, Pelias) is better. Fuzzy Merger is for small-to-medium files where API cost is prohibitive and approximate matching is acceptable.
Does it understand that 'St' means 'Street'?
No — it scores raw character similarity, not address semantics. Expand abbreviations yourself in a helper column before merging to get higher, more reliable scores.
Why did '123 Main St' not match '123 Main Street'?
It scores ~79%, which clears 75% but not the default 80%. Either lower the threshold to 75 or expand 'St' to 'Street' first.
What columns does the reference file add?
Every File B column except the right join column, each prefixed right_ — so lat becomes right_lat and zone becomes right_zone, never colliding with your own headers.
Are unmatched addresses kept?
Yes. They stay in the output with __matched_value: "(no match)" and the best score found, so you can isolate addresses that still need geocoding.
Can one address match several reference rows?
No. Only the single highest-scoring reference row is returned per address. There is no list of candidates.
Are my customer addresses uploaded?
No. Parsing and matching happen entirely in your browser via SheetJS. Address data never leaves your machine.
What output do I get?
A binary .xlsx file (fuzzy-merged.xlsx, sheet 'FuzzyMerged') with reference columns appended. Save-as CSV from Excel afterward if a downstream tool needs CSV.
How do I match on address plus city together?
Concatenate address and city into one helper column in each file, then set that as the join column — the tool joins on a single column per side only.
What tier do I need?
Developer tier. Free, Pro, and Pro + Media cannot run Fuzzy Merger.
How do I cut false positives at a low threshold?
Add a sanity column: include city in both files so the reference city arrives as right_city, then flag any match where city differs from right_city.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.