Merge Two Excel Files Without Exact ID Match Using Fuzzy Join

How to join two excel datasets using approximate string matching when ids don't match

Step 1
Drop File A (the left file) — File A is the primary dataset you want to enrich. Every row in File A becomes exactly one row in the output, matched or not.
Step 2
Drop File B (the right file) — File B is the reference dataset. Columns from the best-matching File B row are appended to File A rows, each prefixed right_.
Step 3
Enter Left join col and Right join col — Type the exact header text of the match column in each file — e.g. Left join col company_name, Right join col name. Both are required; matching is case-insensitive and trimmed.
Step 4
Set Threshold % — Type a number from 50 to 100 in the Threshold % box. 80 is the default and works well for company names; raise it to cut false positives, lower it to catch more variation.
Step 5
Run the merge — Fuzzy Merger compares every File A row against every File B row, keeps each row's single best match, and reports 'X matched · Y unmatched (threshold: Z%)'.
Step 6
Download and review — Download fuzzy-merged.xlsx, then sort by __match_score ascending to hand-check the lowest-confidence matches before trusting the join.

What each control does

The Fuzzy Merger UI exposes exactly three inputs plus two file drop zones. There are no presets, no algorithm picker, and no candidate-list chooser — these are the real controls.

Control	Type	Default	What it does
File A	File drop	—	Left dataset; one output row per File A row
File B	File drop	—	Right dataset; best-match columns appended as `right_*`
Left join col	Text	(empty)	Exact header name of the match column in File A — required
Right join col	Text	(empty)	Exact header name of the match column in File B — required
Threshold %	Number	80	Minimum similarity score to accept a match; clamped to 50–100

How a percentage score is computed

Both values are lowercased and trimmed first, then scored by Levenshtein distance over the longer string's length. These are computed from the real formula.

File A value	File B value	Score	Matched at 80%?
Google LLC	Google Inc.	55%	No (below 80)
Acme Corporation	Acme Corp	56%	No
Microsoft Corp	microsoft corp.	93%	Yes
Jonathan Smith	Jonathon Smith	93%	Yes
Café Roma	Cafe Roma	89%	Yes

Tier limits that apply

Fuzzy Merger is gated to Developer tier (it calls isDeveloper). Free, Pro, and Pro + Media accounts cannot run it, so the limits that matter in practice are the Developer ones.

Tier	Can run Fuzzy Merger?	Max file size (excel)	Max rows
Free	No — requires Developer	5 MB	10,000
Pro	No — requires Developer	50 MB	100,000
Pro + Media	No — requires Developer	200 MB	500,000
Developer	Yes	500 MB	Unlimited

Cookbook

Worked merges showing the exact output columns. The merge is best-match-only: each File A row is paired with its single highest-scoring File B row.

Joining customers across two systems

File A (CRM) and File B (billing) describe the same companies with different name spellings. Fuzzy Merger appends billing columns to each CRM row, prefixed right_.

File A (customers.xlsx)        File B (billing.xlsx)
company_name   region          name              mrr
Google LLC     EMEA            Google Inc.       42000
GitHub         AMER            GitHub, Inc.      18000

Left join col: company_name   Right join col: name   Threshold: 80

Output (fuzzy-merged.xlsx):
company_name  region  __match_score  __matched_value  right_mrr
Google LLC    EMEA    55             (no match)       
GitHub        AMER    86             GitHub, Inc.     18000

Reading the __match_score column

Every output row carries a numeric score. Unmatched rows still show the best score that was found, not zero — useful for deciding whether to lower the threshold.

Sort the output by __match_score ascending:

company_name      __match_score  __matched_value
Unknown Vendor    31             (no match)
Google LLC        55             (no match)
GitHub            86             GitHub, Inc.
Microsoft Corp    93             microsoft corp.

The 55% row was close but below 80 — re-run at 50 to see if it joins.

Lowering the threshold to catch a near-miss

'Google LLC' vs 'Google Inc.' scores 55%. At the default 80% it stays unmatched; at 50% it joins. Lower deliberately and re-audit.

Threshold: 80   ->  Google LLC  __matched_value: (no match)
Threshold: 50   ->  Google LLC  __matched_value: Google Inc.   right_mrr: 42000

Results panel: '2 matched · 0 unmatched (threshold: 50%)'

Handling a right column name collision

File B's columns are appended prefixed right_, and the right join column itself is dropped from the merge. So a region in both files stays separate.

File A headers: company_name, region
File B headers: name, region, mrr
Right join col: name

Output headers:
company_name | region | __match_score | __matched_value | right_region | right_mrr

'name' is not duplicated; File A's region and File B's region coexist.

Pre-cleaning before the merge

Matching is case-insensitive and trimmed already, but stray punctuation and mojibake still cost similarity points. Normalize first for tighter scores.

Before: "ACME  Corp." vs "acme corp"   -> score 75 (extra spaces hurt)
After trimming internal whitespace via the cleaning step:
        "ACME Corp." vs "acme corp"    -> score 83 -> matches at 80

Use the whitespace and case helpers, then merge.

Edge cases and what actually happens

Free / Pro / Pro + Media account tries to run it

Requires Developer

Fuzzy Merger is gated by isDeveloper(tier). Lower tiers get the error 'Fuzzy Merger requires Developer tier.' before any processing. There is no way to run this specific tool on Free, Pro, or Pro + Media.

Only one file is uploaded

Two files required

Fuzzy Merger needs both File A and File B. With a single file the run fails with 'Two files are required for fuzzy merging.' Drop a reference file into the File B zone.

Left or right join column left blank

Required

Both Left join col and Right join col must be filled in. If either is empty the run fails with 'Left and right join columns are required.' Type the exact header text from each file.

Join column name is misspelled or absent

Scores as empty

If the named column doesn't exist in a file, every cell reads as empty string. Empty-vs-empty scores 100 and empty-vs-text scores 0, so you get either all matches or all '(no match)'. Double-check the header spelling against the file.

File A row matches nothing above the threshold

Preserved

Unmatched File A rows are always kept in the output with __matched_value: "(no match)" and __match_score set to the best score that was found — never dropped, never zeroed if a partial match existed.

Two File B rows tie for the best score

By design

The scan keeps a strictly-greater comparison (score > bestScore), so on an exact tie the first File B row encountered wins. Only one match is ever returned per File A row.

Numeric or ID join keys

Use exact join instead

Levenshtein on '1001' vs '1010' gives a misleadingly high score. For clean IDs use the exact-equality Sheet Joiner; reserve Fuzzy Merger for text keys with formatting variation.

Very large File A × File B

Slow but supported

The algorithm is O(n × m): every File A row is compared to every File B row. 5,000 × 5,000 is 25 million comparisons. Developer tier removes the row cap, but expect noticeable in-browser time on six-figure inputs — pre-filter where you can.

Accented or non-ASCII names

Counts as edits

'Café' vs 'Cafe' differ by one character, costing similarity points. Strip accents first with the non-ASCII remover if you want é and e to score as identical.

Output opened expecting CSV

XLSX only

Output is always a binary .xlsx workbook (sheet 'FuzzyMerged'), not CSV or JSON. It downloads in the browser; there is no server-side text endpoint for this tool.

Frequently asked questions

Does Fuzzy Merger do exact matching too?

Yes, implicitly — identical (after lowercasing and trimming) values score 100, which clears any threshold. But if you only need exact equality, the Sheet Joiner is faster and produces zero false positives.

How is the similarity score calculated?

Both join values are lowercased and trimmed, Levenshtein edit distance is computed, then converted with round((1 - distance / max(lenA, lenB)) * 100). Identical strings give 100; completely different strings approach 0.

What threshold should I start with?

80% is the default and a good starting point for company names. Raise it toward 90–95% to reduce false positives, or drop toward 50–60% to catch more variation. Always re-audit the __match_score column after changing it.

What happens to File A rows with no match?

They are kept in the output with __matched_value: "(no match)". The __match_score shows the best score found even if it was below the threshold, so you can judge how close it came.

Can one File A row match several File B rows?

No. For each File A row only the single highest-scoring File B row is returned. There is no many-to-many or candidate-list output — it is strictly best-match.

How are File B's columns named in the output?

Every File B column except the right join column is appended with a right_ prefix — mrr becomes right_mrr. This guarantees File B columns never overwrite File A columns of the same name.

What output format do I get?

A binary .xlsx file named fuzzy-merged.xlsx with a sheet called 'FuzzyMerged'. It downloads directly in your browser; there is no CSV or JSON output option.

Are my files uploaded anywhere?

No. Parsing and matching run entirely in your browser using SheetJS. Neither File A nor File B leaves your machine.

Is there a row or file-size limit?

The tool requires Developer tier, which allows files up to 500 MB and an unlimited row count for the excel family. Practical performance is bounded by the O(n × m) comparison cost, not a hard row cap.

Does it match across multiple columns at once?

No — exactly one Left join col and one Right join col. To match on a combination, concatenate the columns into a single key first (for example with the regex extractor or a helper column), then merge on that.

Why did two obviously-similar names not match?

Levenshtein penalizes length differences heavily. 'Acme Corporation' vs 'Acme Corp' only scores ~56% because 'oration' is seven extra edits. Trim suffixes like Inc/LLC/Corp beforehand, or lower the threshold.

Can I see which rows ended up unmatched?

Yes. After running, the results panel reports 'X matched · Y unmatched'. In the file, filter __matched_value for '(no match)' to isolate every unjoined File A row.

Privacy first

Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.

How to join two excel datasets using approximate string matching when ids don't match

Step 1
Drop File A (the left file) — File A is the primary dataset you want to enrich. Every row in File A becomes exactly one row in the output, matched or not.
Step 2
Drop File B (the right file) — File B is the reference dataset. Columns from the best-matching File B row are appended to File A rows, each prefixed right_.
Step 3
Enter Left join col and Right join col — Type the exact header text of the match column in each file — e.g. Left join col company_name, Right join col name. Both are required; matching is case-insensitive and trimmed.
Step 4
Set Threshold % — Type a number from 50 to 100 in the Threshold % box. 80 is the default and works well for company names; raise it to cut false positives, lower it to catch more variation.
Step 5
Run the merge — Fuzzy Merger compares every File A row against every File B row, keeps each row's single best match, and reports 'X matched · Y unmatched (threshold: Z%)'.
Step 6
Download and review — Download fuzzy-merged.xlsx, then sort by __match_score ascending to hand-check the lowest-confidence matches before trusting the join.

What each control does

The Fuzzy Merger UI exposes exactly three inputs plus two file drop zones. There are no presets, no algorithm picker, and no candidate-list chooser — these are the real controls.

Control	Type	Default	What it does
File A	File drop	—	Left dataset; one output row per File A row
File B	File drop	—	Right dataset; best-match columns appended as `right_*`
Left join col	Text	(empty)	Exact header name of the match column in File A — required
Right join col	Text	(empty)	Exact header name of the match column in File B — required
Threshold %	Number	80	Minimum similarity score to accept a match; clamped to 50–100

How a percentage score is computed

Both values are lowercased and trimmed first, then scored by Levenshtein distance over the longer string's length. These are computed from the real formula.

File A value	File B value	Score	Matched at 80%?
Google LLC	Google Inc.	55%	No (below 80)
Acme Corporation	Acme Corp	56%	No
Microsoft Corp	microsoft corp.	93%	Yes
Jonathan Smith	Jonathon Smith	93%	Yes
Café Roma	Cafe Roma	89%	Yes

Tier limits that apply

Fuzzy Merger is gated to Developer tier (it calls isDeveloper). Free, Pro, and Pro + Media accounts cannot run it, so the limits that matter in practice are the Developer ones.

Tier	Can run Fuzzy Merger?	Max file size (excel)	Max rows
Free	No — requires Developer	5 MB	10,000
Pro	No — requires Developer	50 MB	100,000
Pro + Media	No — requires Developer	200 MB	500,000
Developer	Yes	500 MB	Unlimited

Cookbook

Worked merges showing the exact output columns. The merge is best-match-only: each File A row is paired with its single highest-scoring File B row.

Joining customers across two systems

File A (CRM) and File B (billing) describe the same companies with different name spellings. Fuzzy Merger appends billing columns to each CRM row, prefixed right_.

File A (customers.xlsx)        File B (billing.xlsx)
company_name   region          name              mrr
Google LLC     EMEA            Google Inc.       42000
GitHub         AMER            GitHub, Inc.      18000

Left join col: company_name   Right join col: name   Threshold: 80

Output (fuzzy-merged.xlsx):
company_name  region  __match_score  __matched_value  right_mrr
Google LLC    EMEA    55             (no match)       
GitHub        AMER    86             GitHub, Inc.     18000

Reading the __match_score column

Every output row carries a numeric score. Unmatched rows still show the best score that was found, not zero — useful for deciding whether to lower the threshold.

Sort the output by __match_score ascending:

company_name      __match_score  __matched_value
Unknown Vendor    31             (no match)
Google LLC        55             (no match)
GitHub            86             GitHub, Inc.
Microsoft Corp    93             microsoft corp.

The 55% row was close but below 80 — re-run at 50 to see if it joins.

Lowering the threshold to catch a near-miss

'Google LLC' vs 'Google Inc.' scores 55%. At the default 80% it stays unmatched; at 50% it joins. Lower deliberately and re-audit.

Threshold: 80   ->  Google LLC  __matched_value: (no match)
Threshold: 50   ->  Google LLC  __matched_value: Google Inc.   right_mrr: 42000

Results panel: '2 matched · 0 unmatched (threshold: 50%)'

Handling a right column name collision

File B's columns are appended prefixed right_, and the right join column itself is dropped from the merge. So a region in both files stays separate.

File A headers: company_name, region
File B headers: name, region, mrr
Right join col: name

Output headers:
company_name | region | __match_score | __matched_value | right_region | right_mrr

'name' is not duplicated; File A's region and File B's region coexist.

Pre-cleaning before the merge

Matching is case-insensitive and trimmed already, but stray punctuation and mojibake still cost similarity points. Normalize first for tighter scores.

Before: "ACME  Corp." vs "acme corp"   -> score 75 (extra spaces hurt)
After trimming internal whitespace via the cleaning step:
        "ACME Corp." vs "acme corp"    -> score 83 -> matches at 80

Use the whitespace and case helpers, then merge.

Edge cases and what actually happens

Free / Pro / Pro + Media account tries to run it

Requires Developer

Only one file is uploaded

Two files required

Fuzzy Merger needs both File A and File B. With a single file the run fails with 'Two files are required for fuzzy merging.' Drop a reference file into the File B zone.

Left or right join column left blank

Required

Both Left join col and Right join col must be filled in. If either is empty the run fails with 'Left and right join columns are required.' Type the exact header text from each file.

Join column name is misspelled or absent

Scores as empty

File A row matches nothing above the threshold

Preserved

Two File B rows tie for the best score

By design

The scan keeps a strictly-greater comparison (score > bestScore), so on an exact tie the first File B row encountered wins. Only one match is ever returned per File A row.

Numeric or ID join keys

Use exact join instead

Levenshtein on '1001' vs '1010' gives a misleadingly high score. For clean IDs use the exact-equality Sheet Joiner; reserve Fuzzy Merger for text keys with formatting variation.

Very large File A × File B

Slow but supported

Accented or non-ASCII names

Counts as edits

'Café' vs 'Cafe' differ by one character, costing similarity points. Strip accents first with the non-ASCII remover if you want é and e to score as identical.

Output opened expecting CSV

XLSX only

Output is always a binary .xlsx workbook (sheet 'FuzzyMerged'), not CSV or JSON. It downloads in the browser; there is no server-side text endpoint for this tool.

Frequently asked questions

Does Fuzzy Merger do exact matching too?

How is the similarity score calculated?

What threshold should I start with?

What happens to File A rows with no match?

They are kept in the output with __matched_value: "(no match)". The __match_score shows the best score found even if it was below the threshold, so you can judge how close it came.

Can one File A row match several File B rows?

No. For each File A row only the single highest-scoring File B row is returned. There is no many-to-many or candidate-list output — it is strictly best-match.

How are File B's columns named in the output?

Every File B column except the right join column is appended with a right_ prefix — mrr becomes right_mrr. This guarantees File B columns never overwrite File A columns of the same name.

What output format do I get?

A binary .xlsx file named fuzzy-merged.xlsx with a sheet called 'FuzzyMerged'. It downloads directly in your browser; there is no CSV or JSON output option.

Are my files uploaded anywhere?

No. Parsing and matching run entirely in your browser using SheetJS. Neither File A nor File B leaves your machine.

Is there a row or file-size limit?

Does it match across multiple columns at once?

Why did two obviously-similar names not match?

Can I see which rows ended up unmatched?

Yes. After running, the results panel reports 'X matched · Y unmatched'. In the file, filter __matched_value for '(no match)' to isolate every unjoined File A row.

Join Two Excel Datasets Using Approximate String Matching When IDs Don't Match

How to join two excel datasets using approximate string matching when ids don't match

What each control does

How a percentage score is computed

Tier limits that apply

Cookbook

Joining customers across two systems

Reading the __match_score column

Lowering the threshold to catch a near-miss

Handling a right column name collision

Pre-cleaning before the merge

Edge cases and what actually happens

Free / Pro / Pro + Media account tries to run it

Only one file is uploaded

Left or right join column left blank

Join column name is misspelled or absent

File A row matches nothing above the threshold

Two File B rows tie for the best score

Numeric or ID join keys

Very large File A × File B

Accented or non-ASCII names

Output opened expecting CSV

Frequently asked questions

Does Fuzzy Merger do exact matching too?

How is the similarity score calculated?

What threshold should I start with?

What happens to File A rows with no match?

Can one File A row match several File B rows?

How are File B's columns named in the output?

What output format do I get?

Are my files uploaded anywhere?

Is there a row or file-size limit?

Does it match across multiple columns at once?

Why did two obviously-similar names not match?

Can I see which rows ended up unmatched?

Privacy first

Related guides

Join Two Excel Datasets Using Approximate String Matching When IDs Don't Match

How to join two excel datasets using approximate string matching when ids don't match

What each control does

How a percentage score is computed

Tier limits that apply

Cookbook

Joining customers across two systems

Reading the __match_score column

Lowering the threshold to catch a near-miss

Handling a right column name collision

Pre-cleaning before the merge

Edge cases and what actually happens

Free / Pro / Pro + Media account tries to run it

Only one file is uploaded

Left or right join column left blank

Join column name is misspelled or absent

File A row matches nothing above the threshold

Two File B rows tie for the best score

Numeric or ID join keys

Very large File A × File B

Accented or non-ASCII names

Output opened expecting CSV

Frequently asked questions

Does Fuzzy Merger do exact matching too?

How is the similarity score calculated?

What threshold should I start with?

What happens to File A rows with no match?

Can one File A row match several File B rows?

How are File B's columns named in the output?

What output format do I get?

Are my files uploaded anywhere?

Is there a row or file-size limit?

Does it match across multiple columns at once?

Why did two obviously-similar names not match?

Can I see which rows ended up unmatched?

Privacy first

Related guides