How to join two excel datasets using approximate string matching when ids don't match
- Step 1Drop File A (the left file) — File A is the primary dataset you want to enrich. Every row in File A becomes exactly one row in the output, matched or not.
- Step 2Drop File B (the right file) — File B is the reference dataset. Columns from the best-matching File B row are appended to File A rows, each prefixed
right_. - Step 3Enter Left join col and Right join col — Type the exact header text of the match column in each file — e.g. Left join col
company_name, Right join colname. Both are required; matching is case-insensitive and trimmed. - Step 4Set Threshold % — Type a number from 50 to 100 in the Threshold % box. 80 is the default and works well for company names; raise it to cut false positives, lower it to catch more variation.
- Step 5Run the merge — Fuzzy Merger compares every File A row against every File B row, keeps each row's single best match, and reports 'X matched · Y unmatched (threshold: Z%)'.
- Step 6Download and review — Download
fuzzy-merged.xlsx, then sort by__match_scoreascending to hand-check the lowest-confidence matches before trusting the join.
What each control does
The Fuzzy Merger UI exposes exactly three inputs plus two file drop zones. There are no presets, no algorithm picker, and no candidate-list chooser — these are the real controls.
| Control | Type | Default | What it does |
|---|---|---|---|
| File A | File drop | — | Left dataset; one output row per File A row |
| File B | File drop | — | Right dataset; best-match columns appended as right_* |
| Left join col | Text | (empty) | Exact header name of the match column in File A — required |
| Right join col | Text | (empty) | Exact header name of the match column in File B — required |
| Threshold % | Number | 80 | Minimum similarity score to accept a match; clamped to 50–100 |
How a percentage score is computed
Both values are lowercased and trimmed first, then scored by Levenshtein distance over the longer string's length. These are computed from the real formula.
| File A value | File B value | Score | Matched at 80%? |
|---|---|---|---|
| Google LLC | Google Inc. | 55% | No (below 80) |
| Acme Corporation | Acme Corp | 56% | No |
| Microsoft Corp | microsoft corp. | 93% | Yes |
| Jonathan Smith | Jonathon Smith | 93% | Yes |
| Café Roma | Cafe Roma | 89% | Yes |
Tier limits that apply
Fuzzy Merger is gated to Developer tier (it calls isDeveloper). Free, Pro, and Pro + Media accounts cannot run it, so the limits that matter in practice are the Developer ones.
| Tier | Can run Fuzzy Merger? | Max file size (excel) | Max rows |
|---|---|---|---|
| Free | No — requires Developer | 5 MB | 10,000 |
| Pro | No — requires Developer | 50 MB | 100,000 |
| Pro + Media | No — requires Developer | 200 MB | 500,000 |
| Developer | Yes | 500 MB | Unlimited |
Cookbook
Worked merges showing the exact output columns. The merge is best-match-only: each File A row is paired with its single highest-scoring File B row.
Joining customers across two systems
File A (CRM) and File B (billing) describe the same companies with different name spellings. Fuzzy Merger appends billing columns to each CRM row, prefixed right_.
File A (customers.xlsx) File B (billing.xlsx) company_name region name mrr Google LLC EMEA Google Inc. 42000 GitHub AMER GitHub, Inc. 18000 Left join col: company_name Right join col: name Threshold: 80 Output (fuzzy-merged.xlsx): company_name region __match_score __matched_value right_mrr Google LLC EMEA 55 (no match) GitHub AMER 86 GitHub, Inc. 18000
Reading the __match_score column
Every output row carries a numeric score. Unmatched rows still show the best score that was found, not zero — useful for deciding whether to lower the threshold.
Sort the output by __match_score ascending: company_name __match_score __matched_value Unknown Vendor 31 (no match) Google LLC 55 (no match) GitHub 86 GitHub, Inc. Microsoft Corp 93 microsoft corp. The 55% row was close but below 80 — re-run at 50 to see if it joins.
Lowering the threshold to catch a near-miss
'Google LLC' vs 'Google Inc.' scores 55%. At the default 80% it stays unmatched; at 50% it joins. Lower deliberately and re-audit.
Threshold: 80 -> Google LLC __matched_value: (no match) Threshold: 50 -> Google LLC __matched_value: Google Inc. right_mrr: 42000 Results panel: '2 matched · 0 unmatched (threshold: 50%)'
Handling a right column name collision
File B's columns are appended prefixed right_, and the right join column itself is dropped from the merge. So a region in both files stays separate.
File A headers: company_name, region File B headers: name, region, mrr Right join col: name Output headers: company_name | region | __match_score | __matched_value | right_region | right_mrr 'name' is not duplicated; File A's region and File B's region coexist.
Pre-cleaning before the merge
Matching is case-insensitive and trimmed already, but stray punctuation and mojibake still cost similarity points. Normalize first for tighter scores.
Before: "ACME Corp." vs "acme corp" -> score 75 (extra spaces hurt)
After trimming internal whitespace via the cleaning step:
"ACME Corp." vs "acme corp" -> score 83 -> matches at 80
Use the whitespace and case helpers, then merge.Edge cases and what actually happens
Free / Pro / Pro + Media account tries to run it
Requires DeveloperFuzzy Merger is gated by isDeveloper(tier). Lower tiers get the error 'Fuzzy Merger requires Developer tier.' before any processing. There is no way to run this specific tool on Free, Pro, or Pro + Media.
Only one file is uploaded
Two files requiredFuzzy Merger needs both File A and File B. With a single file the run fails with 'Two files are required for fuzzy merging.' Drop a reference file into the File B zone.
Left or right join column left blank
RequiredBoth Left join col and Right join col must be filled in. If either is empty the run fails with 'Left and right join columns are required.' Type the exact header text from each file.
Join column name is misspelled or absent
Scores as emptyIf the named column doesn't exist in a file, every cell reads as empty string. Empty-vs-empty scores 100 and empty-vs-text scores 0, so you get either all matches or all '(no match)'. Double-check the header spelling against the file.
File A row matches nothing above the threshold
PreservedUnmatched File A rows are always kept in the output with __matched_value: "(no match)" and __match_score set to the best score that was found — never dropped, never zeroed if a partial match existed.
Two File B rows tie for the best score
By designThe scan keeps a strictly-greater comparison (score > bestScore), so on an exact tie the first File B row encountered wins. Only one match is ever returned per File A row.
Numeric or ID join keys
Use exact join insteadLevenshtein on '1001' vs '1010' gives a misleadingly high score. For clean IDs use the exact-equality Sheet Joiner; reserve Fuzzy Merger for text keys with formatting variation.
Very large File A × File B
Slow but supportedThe algorithm is O(n × m): every File A row is compared to every File B row. 5,000 × 5,000 is 25 million comparisons. Developer tier removes the row cap, but expect noticeable in-browser time on six-figure inputs — pre-filter where you can.
Accented or non-ASCII names
Counts as edits'Café' vs 'Cafe' differ by one character, costing similarity points. Strip accents first with the non-ASCII remover if you want é and e to score as identical.
Output opened expecting CSV
XLSX onlyOutput is always a binary .xlsx workbook (sheet 'FuzzyMerged'), not CSV or JSON. It downloads in the browser; there is no server-side text endpoint for this tool.
Frequently asked questions
Does Fuzzy Merger do exact matching too?
Yes, implicitly — identical (after lowercasing and trimming) values score 100, which clears any threshold. But if you only need exact equality, the Sheet Joiner is faster and produces zero false positives.
How is the similarity score calculated?
Both join values are lowercased and trimmed, Levenshtein edit distance is computed, then converted with round((1 - distance / max(lenA, lenB)) * 100). Identical strings give 100; completely different strings approach 0.
What threshold should I start with?
80% is the default and a good starting point for company names. Raise it toward 90–95% to reduce false positives, or drop toward 50–60% to catch more variation. Always re-audit the __match_score column after changing it.
What happens to File A rows with no match?
They are kept in the output with __matched_value: "(no match)". The __match_score shows the best score found even if it was below the threshold, so you can judge how close it came.
Can one File A row match several File B rows?
No. For each File A row only the single highest-scoring File B row is returned. There is no many-to-many or candidate-list output — it is strictly best-match.
How are File B's columns named in the output?
Every File B column except the right join column is appended with a right_ prefix — mrr becomes right_mrr. This guarantees File B columns never overwrite File A columns of the same name.
What output format do I get?
A binary .xlsx file named fuzzy-merged.xlsx with a sheet called 'FuzzyMerged'. It downloads directly in your browser; there is no CSV or JSON output option.
Are my files uploaded anywhere?
No. Parsing and matching run entirely in your browser using SheetJS. Neither File A nor File B leaves your machine.
Is there a row or file-size limit?
The tool requires Developer tier, which allows files up to 500 MB and an unlimited row count for the excel family. Practical performance is bounded by the O(n × m) comparison cost, not a hard row cap.
Does it match across multiple columns at once?
No — exactly one Left join col and one Right join col. To match on a combination, concatenate the columns into a single key first (for example with the regex extractor or a helper column), then merge on that.
Why did two obviously-similar names not match?
Levenshtein penalizes length differences heavily. 'Acme Corporation' vs 'Acme Corp' only scores ~56% because 'oration' is seven extra edits. Trim suffixes like Inc/LLC/Corp beforehand, or lower the threshold.
Can I see which rows ended up unmatched?
Yes. After running, the results panel reports 'X matched · Y unmatched'. In the file, filter __matched_value for '(no match)' to isolate every unjoined File A row.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.