Fuzzy Join Address Data in Excel for Geolocation Enrichment

How to match and merge excel address fields with fuzzy joining for geo enrichment

Step 1
Normalize both address columns first — Trim whitespace with the whitespace trimmer and lowercase or unify casing — fewer formatting differences means higher, more honest scores.
Step 2
Drop your addresses as File A — File A holds the addresses you want to enrich; every File A address becomes one output row.
Step 3
Drop the reference file as File B — File B is the lookup — for example a delivery-zone or geocode table with the coordinates you want to append.
Step 4
Enter the address join columns — Type the address header into Left join col (e.g. customer_address) and the reference header into Right join col (e.g. address).
Step 5
Set Threshold % to 75 and run — Addresses tolerate a lower threshold than names. The panel reports 'X matched · Y unmatched (threshold: 75%)'.
Step 6
Audit low scores before trusting the geo data — Download, sort by __match_score ascending, and verify the weakest matches — a wrong address match means wrong coordinates.

Why addresses score the way they do

Scores from the live formula after lowercasing and trimming. Address abbreviations and unit syntax cost edits even when the location is identical.

Address A	Address B	Score	Matched at 75%?
123 Main St	123 Main Street	79%	Yes
742 Evergreen Ave	742 Evergreen Avenue	85%	Yes
10 Oak Blvd Ste 200	10 Oak Boulevard Suite 200	73%	No
55 5th Ave Apt 3B	55 Fifth Avenue Apt 3B	68%	No
1 Infinite Loop	1 infinite loop	100%	Yes

Threshold guidance for address joins

Addresses carry more legitimate variation than names, so lower thresholds are common — but accept that each step down adds false positives. The control accepts 50–100.

Threshold	Effect on addresses	Review burden
85–90	Only near-identical strings join	Low — but misses St/Street pairs
75–80	Common abbreviation variants join	Moderate — spot-check the tail
65–70	Aggressive — partial addresses join	High — verify every match

When fuzzy join is the wrong choice

Fuzzy Merger is a string matcher, not a geocoder. Pick the right approach for the job.

Situation	Use Fuzzy Merger?	Better alternative
Small file, no API budget	Yes	—
100K+ addresses needing accuracy	No	Geocoding API (Google, HERE, Pelias)
Need rooftop coordinates	No	Address normalization + geocoder
Joining to a small zone lookup	Yes	—

Cookbook

Address enrichment shown end-to-end. Note how reference columns arrive prefixed right_ and how every row keeps its score.

Appending coordinates from a zone file

Each customer address picks up the best-matching reference row's latitude and longitude, prefixed right_.

File A (customers.xlsx)        File B (zones.xlsx)
customer_address     order   address            lat      lng
123 Main St          A-1     123 Main Street    40.71   -74.00
742 Evergreen Ave    A-2     742 Evergreen Avenue 34.05 -118.24

Left: customer_address  Right: address  Threshold: 75

Output:
customer_address  order __match_score __matched_value      right_lat right_lng
123 Main St       A-1   79            123 Main Street      40.71     -74.00
742 Evergreen Ave A-2   85            742 Evergreen Avenue 34.05     -118.24

Catching the unit-syntax miss

Suite/apartment spelling differences can drop a real match below threshold. Sort by score to find them.

Threshold 75, sorted by __match_score:

customer_address      __match_score  __matched_value
10 Oak Blvd Ste 200   73             (no match)
55 5th Ave Apt 3B     68             (no match)

Both are real locations that just missed — normalize 'Ste'->'Suite' and re-run.

Pre-normalizing abbreviations to lift scores

Expanding St/Ave/Blvd before the merge pushes genuine matches above the threshold.

Before: '10 Oak Blvd Ste 200' vs '10 Oak Boulevard Suite 200' -> 73 (no match)
After expanding Blvd->Boulevard, Ste->Suite in File A:
        '10 Oak Boulevard Suite 200' vs same                  -> 100 -> matches

Do the expansion in a helper column, then merge on it.

Lowering the threshold deliberately

Drop to 70% to catch more, but accept that you must verify each match — a wrong address join means wrong geo data.

Threshold 75: 12 matched · 8 unmatched
Threshold 70: 17 matched · 3 unmatched

The 5 new matches scored 70-74 — eyeball each before using their coordinates.

Keeping a same-named column distinct

If both files have a 'city' column, the reference one becomes right_city, so you can compare the typed city against the matched zone's city as a sanity check.

File A headers: customer_address, city
File B headers: address, city, zone
Right join col: address

Output: customer_address | city | __match_score | __matched_value
        | right_city | right_zone

If city != right_city on a match, the address pairing is likely wrong.

Edge cases and what actually happens

Not on Developer tier

Requires Developer

Address enrichment via Fuzzy Merger needs Developer tier; lower tiers get 'Fuzzy Merger requires Developer tier.' before processing.

Abbreviations not normalized first

Lower scores

St/Street, Ave/Avenue, Ste/Suite all cost edits. '123 Main St' vs '123 Main Street' is only 79%. Expand abbreviations in a helper column before merging to raise scores.

Only one address file uploaded

Two files required

Both the address file and the reference file are required. One file fails with 'Two files are required for fuzzy merging.'

Join column header wrong or empty

Required

Both Left and Right join col are required. A misspelled header reads as empty for every row, so nothing matches — verify the exact address-column header in each file.

Two reference rows on the same street

By design

Only the single best-scoring reference row is returned per address. Two nearby addresses on the same street won't both match — the closer string wins, ties go to the first encountered.

Address has no good reference match

Preserved

The address row is kept with __matched_value: "(no match)" and the best score found, so you can decide whether to lower the threshold or geocode it separately.

Low threshold matches the wrong building

False positive risk

Below ~75%, distinct addresses can score above threshold and pick up wrong coordinates. Always sort by __match_score and verify the tail before trusting low-confidence geo data.

Tens of thousands of addresses

Slow / use a geocoder

Matching is O(n × m). For 50K+ addresses a dedicated geocoding API is both faster and more accurate. Fuzzy Merger fits small-to-medium files with no API budget.

Accented street or city names

Counts as edits

'Montréal' vs 'Montreal' costs an edit. Strip accents with the non-ASCII remover first if you want them to score as identical.

Reference file is huge but addresses are few

Supported

n × m is symmetric, so a small File A against a large File B is the same cost as the reverse. Trim the reference to the relevant region to speed it up.

Frequently asked questions

What threshold is best for addresses?

Lower than for names — try 70–80%. Addresses carry legitimate variation (St/Street, Suite/Ste), so the default 80% can miss real matches. Trade a lower threshold for more manual review.

Should I use fuzzy join or a geocoding API?

For 100K+ addresses or where accuracy is critical, a geocoder (Google Maps, HERE, Pelias) is better. Fuzzy Merger is for small-to-medium files where API cost is prohibitive and approximate matching is acceptable.

Does it understand that 'St' means 'Street'?

No — it scores raw character similarity, not address semantics. Expand abbreviations yourself in a helper column before merging to get higher, more reliable scores.

Why did '123 Main St' not match '123 Main Street'?

It scores ~79%, which clears 75% but not the default 80%. Either lower the threshold to 75 or expand 'St' to 'Street' first.

What columns does the reference file add?

Every File B column except the right join column, each prefixed right_ — so lat becomes right_lat and zone becomes right_zone, never colliding with your own headers.

Are unmatched addresses kept?

Yes. They stay in the output with __matched_value: "(no match)" and the best score found, so you can isolate addresses that still need geocoding.

Can one address match several reference rows?

No. Only the single highest-scoring reference row is returned per address. There is no list of candidates.

Are my customer addresses uploaded?

No. Parsing and matching happen entirely in your browser via SheetJS. Address data never leaves your machine.

What output do I get?

A binary .xlsx file (fuzzy-merged.xlsx, sheet 'FuzzyMerged') with reference columns appended. Save-as CSV from Excel afterward if a downstream tool needs CSV.

How do I match on address plus city together?

Concatenate address and city into one helper column in each file, then set that as the join column — the tool joins on a single column per side only.

What tier do I need?

Developer tier. Free, Pro, and Pro + Media cannot run Fuzzy Merger.

How do I cut false positives at a low threshold?

Add a sanity column: include city in both files so the reference city arrives as right_city, then flag any match where city differs from right_city.

Privacy first

Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.

How to match and merge excel address fields with fuzzy joining for geo enrichment

Step 1
Normalize both address columns first — Trim whitespace with the whitespace trimmer and lowercase or unify casing — fewer formatting differences means higher, more honest scores.
Step 2
Drop your addresses as File A — File A holds the addresses you want to enrich; every File A address becomes one output row.
Step 3
Drop the reference file as File B — File B is the lookup — for example a delivery-zone or geocode table with the coordinates you want to append.
Step 4
Enter the address join columns — Type the address header into Left join col (e.g. customer_address) and the reference header into Right join col (e.g. address).
Step 5
Set Threshold % to 75 and run — Addresses tolerate a lower threshold than names. The panel reports 'X matched · Y unmatched (threshold: 75%)'.
Step 6
Audit low scores before trusting the geo data — Download, sort by __match_score ascending, and verify the weakest matches — a wrong address match means wrong coordinates.

Why addresses score the way they do

Scores from the live formula after lowercasing and trimming. Address abbreviations and unit syntax cost edits even when the location is identical.

Address A	Address B	Score	Matched at 75%?
123 Main St	123 Main Street	79%	Yes
742 Evergreen Ave	742 Evergreen Avenue	85%	Yes
10 Oak Blvd Ste 200	10 Oak Boulevard Suite 200	73%	No
55 5th Ave Apt 3B	55 Fifth Avenue Apt 3B	68%	No
1 Infinite Loop	1 infinite loop	100%	Yes

Threshold guidance for address joins

Addresses carry more legitimate variation than names, so lower thresholds are common — but accept that each step down adds false positives. The control accepts 50–100.

Threshold	Effect on addresses	Review burden
85–90	Only near-identical strings join	Low — but misses St/Street pairs
75–80	Common abbreviation variants join	Moderate — spot-check the tail
65–70	Aggressive — partial addresses join	High — verify every match

When fuzzy join is the wrong choice

Fuzzy Merger is a string matcher, not a geocoder. Pick the right approach for the job.

Situation	Use Fuzzy Merger?	Better alternative
Small file, no API budget	Yes	—
100K+ addresses needing accuracy	No	Geocoding API (Google, HERE, Pelias)
Need rooftop coordinates	No	Address normalization + geocoder
Joining to a small zone lookup	Yes	—

Cookbook

Address enrichment shown end-to-end. Note how reference columns arrive prefixed right_ and how every row keeps its score.

Appending coordinates from a zone file

Each customer address picks up the best-matching reference row's latitude and longitude, prefixed right_.

File A (customers.xlsx)        File B (zones.xlsx)
customer_address     order   address            lat      lng
123 Main St          A-1     123 Main Street    40.71   -74.00
742 Evergreen Ave    A-2     742 Evergreen Avenue 34.05 -118.24

Left: customer_address  Right: address  Threshold: 75

Output:
customer_address  order __match_score __matched_value      right_lat right_lng
123 Main St       A-1   79            123 Main Street      40.71     -74.00
742 Evergreen Ave A-2   85            742 Evergreen Avenue 34.05     -118.24

Catching the unit-syntax miss

Suite/apartment spelling differences can drop a real match below threshold. Sort by score to find them.

Threshold 75, sorted by __match_score:

customer_address      __match_score  __matched_value
10 Oak Blvd Ste 200   73             (no match)
55 5th Ave Apt 3B     68             (no match)

Both are real locations that just missed — normalize 'Ste'->'Suite' and re-run.

Pre-normalizing abbreviations to lift scores

Expanding St/Ave/Blvd before the merge pushes genuine matches above the threshold.

Before: '10 Oak Blvd Ste 200' vs '10 Oak Boulevard Suite 200' -> 73 (no match)
After expanding Blvd->Boulevard, Ste->Suite in File A:
        '10 Oak Boulevard Suite 200' vs same                  -> 100 -> matches

Do the expansion in a helper column, then merge on it.

Lowering the threshold deliberately

Drop to 70% to catch more, but accept that you must verify each match — a wrong address join means wrong geo data.

Threshold 75: 12 matched · 8 unmatched
Threshold 70: 17 matched · 3 unmatched

The 5 new matches scored 70-74 — eyeball each before using their coordinates.

Keeping a same-named column distinct

If both files have a 'city' column, the reference one becomes right_city, so you can compare the typed city against the matched zone's city as a sanity check.

File A headers: customer_address, city
File B headers: address, city, zone
Right join col: address

Output: customer_address | city | __match_score | __matched_value
        | right_city | right_zone

If city != right_city on a match, the address pairing is likely wrong.

Edge cases and what actually happens

Not on Developer tier

Requires Developer

Address enrichment via Fuzzy Merger needs Developer tier; lower tiers get 'Fuzzy Merger requires Developer tier.' before processing.

Abbreviations not normalized first

Lower scores

St/Street, Ave/Avenue, Ste/Suite all cost edits. '123 Main St' vs '123 Main Street' is only 79%. Expand abbreviations in a helper column before merging to raise scores.

Only one address file uploaded

Two files required

Both the address file and the reference file are required. One file fails with 'Two files are required for fuzzy merging.'

Join column header wrong or empty

Required

Both Left and Right join col are required. A misspelled header reads as empty for every row, so nothing matches — verify the exact address-column header in each file.

Two reference rows on the same street

By design

Only the single best-scoring reference row is returned per address. Two nearby addresses on the same street won't both match — the closer string wins, ties go to the first encountered.

Address has no good reference match

Preserved

The address row is kept with __matched_value: "(no match)" and the best score found, so you can decide whether to lower the threshold or geocode it separately.

Low threshold matches the wrong building

False positive risk

Below ~75%, distinct addresses can score above threshold and pick up wrong coordinates. Always sort by __match_score and verify the tail before trusting low-confidence geo data.

Tens of thousands of addresses

Slow / use a geocoder

Matching is O(n × m). For 50K+ addresses a dedicated geocoding API is both faster and more accurate. Fuzzy Merger fits small-to-medium files with no API budget.

Accented street or city names

Counts as edits

'Montréal' vs 'Montreal' costs an edit. Strip accents with the non-ASCII remover first if you want them to score as identical.

Reference file is huge but addresses are few

Supported

n × m is symmetric, so a small File A against a large File B is the same cost as the reverse. Trim the reference to the relevant region to speed it up.

Frequently asked questions

What threshold is best for addresses?

Lower than for names — try 70–80%. Addresses carry legitimate variation (St/Street, Suite/Ste), so the default 80% can miss real matches. Trade a lower threshold for more manual review.

Should I use fuzzy join or a geocoding API?

Does it understand that 'St' means 'Street'?

No — it scores raw character similarity, not address semantics. Expand abbreviations yourself in a helper column before merging to get higher, more reliable scores.

Why did '123 Main St' not match '123 Main Street'?

It scores ~79%, which clears 75% but not the default 80%. Either lower the threshold to 75 or expand 'St' to 'Street' first.

What columns does the reference file add?

Every File B column except the right join column, each prefixed right_ — so lat becomes right_lat and zone becomes right_zone, never colliding with your own headers.

Are unmatched addresses kept?

Yes. They stay in the output with __matched_value: "(no match)" and the best score found, so you can isolate addresses that still need geocoding.

Can one address match several reference rows?

No. Only the single highest-scoring reference row is returned per address. There is no list of candidates.

Are my customer addresses uploaded?

No. Parsing and matching happen entirely in your browser via SheetJS. Address data never leaves your machine.

What output do I get?

A binary .xlsx file (fuzzy-merged.xlsx, sheet 'FuzzyMerged') with reference columns appended. Save-as CSV from Excel afterward if a downstream tool needs CSV.

How do I match on address plus city together?

Concatenate address and city into one helper column in each file, then set that as the join column — the tool joins on a single column per side only.

What tier do I need?

Developer tier. Free, Pro, and Pro + Media cannot run Fuzzy Merger.

How do I cut false positives at a low threshold?

Add a sanity column: include city in both files so the reference city arrives as right_city, then flag any match where city differs from right_city.

Match and Merge Excel Address Fields with Fuzzy Joining for Geo Enrichment

How to match and merge excel address fields with fuzzy joining for geo enrichment

Why addresses score the way they do

Threshold guidance for address joins

When fuzzy join is the wrong choice

Cookbook

Appending coordinates from a zone file

Catching the unit-syntax miss

Pre-normalizing abbreviations to lift scores

Lowering the threshold deliberately

Keeping a same-named column distinct

Edge cases and what actually happens

Not on Developer tier

Abbreviations not normalized first

Only one address file uploaded

Join column header wrong or empty

Two reference rows on the same street

Address has no good reference match

Low threshold matches the wrong building

Tens of thousands of addresses

Accented street or city names

Reference file is huge but addresses are few

Frequently asked questions

What threshold is best for addresses?

Should I use fuzzy join or a geocoding API?

Does it understand that 'St' means 'Street'?

Why did '123 Main St' not match '123 Main Street'?

What columns does the reference file add?

Are unmatched addresses kept?

Can one address match several reference rows?

Are my customer addresses uploaded?

What output do I get?

How do I match on address plus city together?

What tier do I need?

How do I cut false positives at a low threshold?

Privacy first

Related guides

Match and Merge Excel Address Fields with Fuzzy Joining for Geo Enrichment

How to match and merge excel address fields with fuzzy joining for geo enrichment

Why addresses score the way they do

Threshold guidance for address joins

When fuzzy join is the wrong choice

Cookbook

Appending coordinates from a zone file

Catching the unit-syntax miss

Pre-normalizing abbreviations to lift scores

Lowering the threshold deliberately

Keeping a same-named column distinct

Edge cases and what actually happens

Not on Developer tier

Abbreviations not normalized first

Only one address file uploaded

Join column header wrong or empty

Two reference rows on the same street

Address has no good reference match

Low threshold matches the wrong building

Tens of thousands of addresses

Accented street or city names

Reference file is huge but addresses are few

Frequently asked questions

What threshold is best for addresses?

Should I use fuzzy join or a geocoding API?

Does it understand that 'St' means 'Street'?

Why did '123 Main St' not match '123 Main Street'?

What columns does the reference file add?

Are unmatched addresses kept?

Can one address match several reference rows?

Are my customer addresses uploaded?

What output do I get?

How do I match on address plus city together?

What tier do I need?

How do I cut false positives at a low threshold?

Privacy first

Related guides