How to detect duplicate products in a product catalogue csv
- Step 1Export the product catalogue CSV — Download from your PIM, ERP, supplier feed, or Shopify (Products → Export). The delimiter is auto-detected; standard comma-delimited feeds work as-is.
- Step 2Drop the file onto the tool — Parsing runs in your browser via PapaParse — catalogue data never reaches a server. Free runs handle up to 2 MB / 500 rows; Pro handles 100 MB / 100,000 rows.
- Step 3Select the SKU column — In Find duplicates in column, choose
SKU(orVariant SKUfor Shopify). One key column per run, so do SKU first, barcode second. - Step 4Choose case sensitivity — Leave Case-sensitive matching off (default) unless your SKUs treat casing as distinct. Click Find duplicates.
- Step 5Compare the flagged rows — Read the summary cards, then Download Marked CSV and filter
_is_duplicate = YES. Compare the conflicting rows to identify which holds the correct variant, price, and stock data. - Step 6Resolve and re-export the clean feed — Remove the incorrect duplicates (or fix the SKUs), then run a second pass on
Barcode/EANto catch barcode-level conflicts before importing the clean catalogue.
What the catalogue duplicate finder does
The full option set. One key column per pass, one checkbox, flag-only output. No fuzzy match, no multi-column key, no auto-removal.
| Control | Behaviour | Default |
|---|---|---|
| Find duplicates in column | Single key column (SKU, then Barcode on a second run); values grouped to find repeats | First column |
| Case-sensitive matching | Off lowercases before comparing (ABC-1 = abc-1); on requires identical casing | Off |
_is_duplicate column | YES if the SKU/barcode appears 2+ times, NO if once; first occurrence is YES too | Always added |
| Removal | None — duplicates are flagged for review. Use csv-deduplicator to drop surplus rows | Zero removed |
Identifier passes for a catalogue
Run separate passes per identifier since the key is one column. Normalise formats first so genuine duplicates are not split by formatting.
| Pass | Key column | Catches | Pre-step |
|---|---|---|---|
| 1 | SKU / Variant SKU | Same SKU listed twice — the classic PIM overwrite conflict | Trim whitespace; confirm casing policy |
| 2 | Barcode / EAN / UPC | Same barcode under different SKUs (true product collision) | Fix scientific notation (5.06E+12) back to full digits first |
| Optional | Combined SKU|Barcode | Rows identical on both identifiers | Build with csv-column-merger |
Cookbook
Before/after rows from real product feeds. SKUs and barcodes anonymised; the _is_duplicate column is exactly what the tool appends.
Duplicate SKU — the silent overwrite
ExampleTwo rows share the SKU ABC-01 with different prices. A PIM would upsert the second over the first; the finder flags both so you choose the correct price before import.
Input (catalogue.csv): SKU,Title,Price ABC-01,Blue Mug,9.99 DEF-02,Red Mug,9.99 ABC-01,Blue Mug,11.99 Key column: SKU · Case-sensitive: off Output (catalogue.duplicates-marked.csv): SKU,Title,Price,_is_duplicate ABC-01,Blue Mug,9.99,YES DEF-02,Red Mug,9.99,NO ABC-01,Blue Mug,11.99,YES
Shopify variant rows look like duplicates
ExampleShopify exports repeat the Handle across each variant row, leaving most product-level cells blank on continuation rows. Key on Variant SKU (which is unique per variant) rather than Handle to avoid flagging legitimate variants as duplicates.
Input (Shopify export, abridged): Handle,Title,Variant SKU blue-mug,Blue Mug,MUG-BLU-S blue-mug,,MUG-BLU-L Key column: Variant SKU · Case-sensitive: off Output (variants kept distinct): Handle,Title,Variant SKU,_is_duplicate blue-mug,Blue Mug,MUG-BLU-S,NO blue-mug,,MUG-BLU-L,NO
Barcode mangled into scientific notation
ExampleExcel coerced a 13-digit EAN to 5.06E+12, so the true barcode and the mangled one are different text and won't group. Fix the format before the barcode pass.
Input: SKU,Barcode A1,5060000000123 A2,5.06E+12 Key column: Barcode · Case-sensitive: off Output (not flagged — text differs): SKU,Barcode,_is_duplicate A1,5060000000123,NO A2,5.06E+12,NO Fix: re-export with the barcode column stored as text, or repair with csv-find-replace, then re-run the barcode pass.
Case-sensitive run for case-meaningful SKUs
ExampleA supplier uses casing to distinguish lines, so ABC-01 and abc-01 are different products. Ticking Case-sensitive matching keeps them as two unique values.
Input: SKU,Supplier ABC-01,LineA abc-01,LineB Key column: SKU · Case-sensitive: ON Output (no duplicates): SKU,Supplier,_is_duplicate ABC-01,LineA,NO abc-01,LineB,NO
Reading the conflict summary
ExampleFor a 2,000-row feed (Pro tier) with 18 SKUs duplicated, the summary shows the overwrite exposure before you push the feed.
Summary after Find duplicates (SKU pass): Duplicate groups : 18 (SKUs that repeat) Extra copies : 22 (surplus rows = sum of count-1) Unique values : 1,960 (SKUs appearing exactly once) Meaning: 1,978 distinct SKUs across 2,000 rows; 18 repeat, some more than twice — 22 rows would overwrite on import.
Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
You wanted a deduplicated feed, not flags
By designThis tool flags only — it appends _is_duplicate so you compare variant data before deciding. To produce a clean feed with one row per SKU, use csv-deduplicator, which keeps the first of each group.
Barcodes shown as scientific notation
Not matchedExcel coerces long barcodes to 5.06E+12; that text won't match the full-digit form. Re-export the barcode column as text, or repair with csv-find-replace, before the barcode pass.
Shopify variant rows flagged as duplicates
AvoidableShopify repeats the Handle across variant rows. If you key on Handle, every multi-variant product looks duplicated. Key on Variant SKU (unique per variant) instead to flag only genuine duplicates.
Whitespace around the SKU
Not matchedMatching is whole-cell exact, so ABC-01 and ABC-01 are different. Run csv-whitespace-trimmer before checking so leading/trailing spaces don't hide real SKU duplicates.
First occurrence marked YES
ExpectedEvery member of a duplicate group is flagged YES, including the first, so you can compare all conflicting rows. To keep one and drop the rest automatically, use csv-deduplicator.
Need to check SKU and barcode together
Single key onlyOnly one key column per pass. Run SKU, then run Barcode, and reconcile — or build a combined SKU|Barcode column with csv-column-merger for an exact composite match in one pass.
Blank SKU cells
Grouped togetherRows with an empty SKU all share one empty key and are flagged YES together (shown as (empty) in the list). Fill or filter blank SKUs before pushing the feed.
Catalogue over the free 500-row / 2 MB cap
Upgrade requiredFree runs cap at 2 MB and 500 rows; larger feeds are blocked with a Pro prompt. Pro lifts it to 100 MB / 100,000 rows. Splitting with csv-row-splitter handles a one-off but won't detect duplicates across chunks.
Frequently asked questions
Can I check for duplicates on both SKU and barcode at once?
No — the key is a single column per run. Run SKU first, then run Barcode/EAN, and reconcile the two marked files. For an exact composite check, combine the columns with csv-column-merger and key on the combined value.
Does it remove the duplicate products or just flag them?
It flags them. An _is_duplicate column (YES/NO) is appended and all rows are kept so you can compare variant data and decide. To physically remove duplicates and keep one per SKU, use csv-deduplicator.
What if I want to keep the most recent duplicate row?
Sort by a date column descending with csv-sorter before running, so the first occurrence of each SKU is the newest. The finder still marks all copies YES; you then keep whichever row your policy prefers, or feed the sorted file to csv-deduplicator which keeps the first.
Does this work for Shopify product exports with variants?
Yes — select the Variant SKU column, which is unique per variant, to flag true variant-level duplicates. Don't key on Handle, because Shopify repeats it across every variant row and that would flag legitimate variants as duplicates.
Why didn't it flag two products with the same barcode?
Most often the barcode was mangled into scientific notation (5.06E+12) by Excel, or there's a stray space, so the two values are different text. Re-export the barcode column as text, fix with csv-find-replace, or trim with csv-whitespace-trimmer, then re-run.
Is product and supplier data uploaded?
No. Parsing and detection run entirely in your browser. SKUs, prices, supplier names, and stock counts never reach a server. Only an anonymous usage counter is recorded when signed in, and it can be disabled in settings.
Does case-sensitive matching matter for SKUs?
Sometimes. Many catalogues treat SKUs as case-insensitive, where the default catches ABC-01 vs abc-01. Turn case-sensitive matching on only if your supplier uses casing to distinguish different products.
How large a catalogue can I check?
Free runs handle up to 2 MB and 500 rows; larger feeds are blocked with a Pro prompt. Pro handles 100 MB and 100,000 rows. Beyond that, split with csv-row-splitter, accepting that duplicates spanning chunks won't be caught.
What does the output file look like?
Your original feed with a trailing _is_duplicate column, saved as <yourfile>.duplicates-marked.csv. Filter that column to YES in your spreadsheet to review the conflicting product rows.
How do I combine supplier feeds before checking?
Append them with csv-merger (they should share a header schema) into one file, then run the duplicate finder so SKUs that collide across suppliers are caught in a single pass.
What do the summary numbers tell me about my feed?
Duplicate groups = how many SKUs (or barcodes) repeat. Extra copies = surplus rows that would overwrite on import. Unique values = SKUs appearing exactly once. Together they size your overwrite risk before you push the feed.
Should I clean the feed before or after the PIM import?
Before. PIMs and marketplaces upsert on the SKU key and overwrite silently, so a duplicate SKU corrupts data without an error. Flagging and resolving in the CSV first means what you import matches what you intended.
Privacy first
Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.