Find Duplicate Products in a Product Catalogue CSV

How to detect duplicate products in a product catalogue csv

Step 1
Export the product catalogue CSV — Download from your PIM, ERP, supplier feed, or Shopify (Products → Export). The delimiter is auto-detected; standard comma-delimited feeds work as-is.
Step 2
Drop the file onto the tool — Parsing runs in your browser via PapaParse — catalogue data never reaches a server. Free runs handle up to 2 MB / 500 rows; Pro handles 100 MB / 100,000 rows.
Step 3
Select the SKU column — In Find duplicates in column, choose SKU (or Variant SKU for Shopify). One key column per run, so do SKU first, barcode second.
Step 4
Choose case sensitivity — Leave Case-sensitive matching off (default) unless your SKUs treat casing as distinct. Click Find duplicates.
Step 5
Compare the flagged rows — Read the summary cards, then Download Marked CSV and filter _is_duplicate = YES. Compare the conflicting rows to identify which holds the correct variant, price, and stock data.
Step 6
Resolve and re-export the clean feed — Remove the incorrect duplicates (or fix the SKUs), then run a second pass on Barcode/EAN to catch barcode-level conflicts before importing the clean catalogue.

What the catalogue duplicate finder does

The full option set. One key column per pass, one checkbox, flag-only output. No fuzzy match, no multi-column key, no auto-removal.

Control	Behaviour	Default
Find duplicates in column	Single key column (`SKU`, then `Barcode` on a second run); values grouped to find repeats	First column
Case-sensitive matching	Off lowercases before comparing (`ABC-1` = `abc-1`); on requires identical casing	Off
`_is_duplicate` column	`YES` if the SKU/barcode appears 2+ times, `NO` if once; first occurrence is `YES` too	Always added
Removal	None — duplicates are flagged for review. Use csv-deduplicator to drop surplus rows	Zero removed

Identifier passes for a catalogue

Run separate passes per identifier since the key is one column. Normalise formats first so genuine duplicates are not split by formatting.

Pass	Key column	Catches	Pre-step
1	`SKU` / `Variant SKU`	Same SKU listed twice — the classic PIM overwrite conflict	Trim whitespace; confirm casing policy
2	`Barcode` / `EAN` / `UPC`	Same barcode under different SKUs (true product collision)	Fix scientific notation (`5.06E+12`) back to full digits first
Optional	Combined `SKU\|Barcode`	Rows identical on both identifiers	Build with csv-column-merger

Cookbook

Before/after rows from real product feeds. SKUs and barcodes anonymised; the _is_duplicate column is exactly what the tool appends.

Duplicate SKU — the silent overwrite

Example

Two rows share the SKU ABC-01 with different prices. A PIM would upsert the second over the first; the finder flags both so you choose the correct price before import.

Input (catalogue.csv):
SKU,Title,Price
ABC-01,Blue Mug,9.99
DEF-02,Red Mug,9.99
ABC-01,Blue Mug,11.99

Key column: SKU  ·  Case-sensitive: off

Output (catalogue.duplicates-marked.csv):
SKU,Title,Price,_is_duplicate
ABC-01,Blue Mug,9.99,YES
DEF-02,Red Mug,9.99,NO
ABC-01,Blue Mug,11.99,YES

Shopify variant rows look like duplicates

Example

Shopify exports repeat the Handle across each variant row, leaving most product-level cells blank on continuation rows. Key on Variant SKU (which is unique per variant) rather than Handle to avoid flagging legitimate variants as duplicates.

Input (Shopify export, abridged):
Handle,Title,Variant SKU
blue-mug,Blue Mug,MUG-BLU-S
blue-mug,,MUG-BLU-L

Key column: Variant SKU  ·  Case-sensitive: off

Output (variants kept distinct):
Handle,Title,Variant SKU,_is_duplicate
blue-mug,Blue Mug,MUG-BLU-S,NO
blue-mug,,MUG-BLU-L,NO

Barcode mangled into scientific notation

Example

Excel coerced a 13-digit EAN to 5.06E+12, so the true barcode and the mangled one are different text and won't group. Fix the format before the barcode pass.

Input:
SKU,Barcode
A1,5060000000123
A2,5.06E+12

Key column: Barcode  ·  Case-sensitive: off

Output (not flagged — text differs):
SKU,Barcode,_is_duplicate
A1,5060000000123,NO
A2,5.06E+12,NO

Fix: re-export with the barcode column stored as text, or
repair with csv-find-replace, then re-run the barcode pass.

Case-sensitive run for case-meaningful SKUs

Example

A supplier uses casing to distinguish lines, so ABC-01 and abc-01 are different products. Ticking Case-sensitive matching keeps them as two unique values.

Input:
SKU,Supplier
ABC-01,LineA
abc-01,LineB

Key column: SKU  ·  Case-sensitive: ON

Output (no duplicates):
SKU,Supplier,_is_duplicate
ABC-01,LineA,NO
abc-01,LineB,NO

Reading the conflict summary

Example

For a 2,000-row feed (Pro tier) with 18 SKUs duplicated, the summary shows the overwrite exposure before you push the feed.

Summary after Find duplicates (SKU pass):
  Duplicate groups : 18    (SKUs that repeat)
  Extra copies     : 22    (surplus rows = sum of count-1)
  Unique values    : 1,960 (SKUs appearing exactly once)

Meaning: 1,978 distinct SKUs across 2,000 rows; 18 repeat,
some more than twice — 22 rows would overwrite on import.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

You wanted a deduplicated feed, not flags

By design

This tool flags only — it appends _is_duplicate so you compare variant data before deciding. To produce a clean feed with one row per SKU, use csv-deduplicator, which keeps the first of each group.

Barcodes shown as scientific notation

Not matched

Excel coerces long barcodes to 5.06E+12; that text won't match the full-digit form. Re-export the barcode column as text, or repair with csv-find-replace, before the barcode pass.

Shopify variant rows flagged as duplicates

Avoidable

Shopify repeats the Handle across variant rows. If you key on Handle, every multi-variant product looks duplicated. Key on Variant SKU (unique per variant) instead to flag only genuine duplicates.

Whitespace around the SKU

Not matched

Matching is whole-cell exact, so ABC-01 and ABC-01 are different. Run csv-whitespace-trimmer before checking so leading/trailing spaces don't hide real SKU duplicates.

First occurrence marked YES

Expected

Every member of a duplicate group is flagged YES, including the first, so you can compare all conflicting rows. To keep one and drop the rest automatically, use csv-deduplicator.

Need to check SKU and barcode together

Single key only

Only one key column per pass. Run SKU, then run Barcode, and reconcile — or build a combined SKU|Barcode column with csv-column-merger for an exact composite match in one pass.

Blank SKU cells

Grouped together

Rows with an empty SKU all share one empty key and are flagged YES together (shown as (empty) in the list). Fill or filter blank SKUs before pushing the feed.

Catalogue over the free 500-row / 2 MB cap

Upgrade required

Free runs cap at 2 MB and 500 rows; larger feeds are blocked with a Pro prompt. Pro lifts it to 100 MB / 100,000 rows. Splitting with csv-row-splitter handles a one-off but won't detect duplicates across chunks.

Frequently asked questions

Can I check for duplicates on both SKU and barcode at once?

No — the key is a single column per run. Run SKU first, then run Barcode/EAN, and reconcile the two marked files. For an exact composite check, combine the columns with csv-column-merger and key on the combined value.

Does it remove the duplicate products or just flag them?

It flags them. An _is_duplicate column (YES/NO) is appended and all rows are kept so you can compare variant data and decide. To physically remove duplicates and keep one per SKU, use csv-deduplicator.

What if I want to keep the most recent duplicate row?

Sort by a date column descending with csv-sorter before running, so the first occurrence of each SKU is the newest. The finder still marks all copies YES; you then keep whichever row your policy prefers, or feed the sorted file to csv-deduplicator which keeps the first.

Does this work for Shopify product exports with variants?

Yes — select the Variant SKU column, which is unique per variant, to flag true variant-level duplicates. Don't key on Handle, because Shopify repeats it across every variant row and that would flag legitimate variants as duplicates.

Why didn't it flag two products with the same barcode?

Most often the barcode was mangled into scientific notation (5.06E+12) by Excel, or there's a stray space, so the two values are different text. Re-export the barcode column as text, fix with csv-find-replace, or trim with csv-whitespace-trimmer, then re-run.

Is product and supplier data uploaded?

No. Parsing and detection run entirely in your browser. SKUs, prices, supplier names, and stock counts never reach a server. Only an anonymous usage counter is recorded when signed in, and it can be disabled in settings.

Does case-sensitive matching matter for SKUs?

Sometimes. Many catalogues treat SKUs as case-insensitive, where the default catches ABC-01 vs abc-01. Turn case-sensitive matching on only if your supplier uses casing to distinguish different products.

How large a catalogue can I check?

Free runs handle up to 2 MB and 500 rows; larger feeds are blocked with a Pro prompt. Pro handles 100 MB and 100,000 rows. Beyond that, split with csv-row-splitter, accepting that duplicates spanning chunks won't be caught.

What does the output file look like?

Your original feed with a trailing _is_duplicate column, saved as <yourfile>.duplicates-marked.csv. Filter that column to YES in your spreadsheet to review the conflicting product rows.

How do I combine supplier feeds before checking?

Append them with csv-merger (they should share a header schema) into one file, then run the duplicate finder so SKUs that collide across suppliers are caught in a single pass.

What do the summary numbers tell me about my feed?

Duplicate groups = how many SKUs (or barcodes) repeat. Extra copies = surplus rows that would overwrite on import. Unique values = SKUs appearing exactly once. Together they size your overwrite risk before you push the feed.

Should I clean the feed before or after the PIM import?

Before. PIMs and marketplaces upsert on the SKU key and overwrite silently, so a duplicate SKU corrupts data without an error. Flagging and resolving in the CSV first means what you import matches what you intended.

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

How to detect duplicate products in a product catalogue csv

Step 1
Export the product catalogue CSV — Download from your PIM, ERP, supplier feed, or Shopify (Products → Export). The delimiter is auto-detected; standard comma-delimited feeds work as-is.
Step 2
Drop the file onto the tool — Parsing runs in your browser via PapaParse — catalogue data never reaches a server. Free runs handle up to 2 MB / 500 rows; Pro handles 100 MB / 100,000 rows.
Step 3
Select the SKU column — In Find duplicates in column, choose SKU (or Variant SKU for Shopify). One key column per run, so do SKU first, barcode second.
Step 4
Choose case sensitivity — Leave Case-sensitive matching off (default) unless your SKUs treat casing as distinct. Click Find duplicates.
Step 5
Compare the flagged rows — Read the summary cards, then Download Marked CSV and filter _is_duplicate = YES. Compare the conflicting rows to identify which holds the correct variant, price, and stock data.
Step 6
Resolve and re-export the clean feed — Remove the incorrect duplicates (or fix the SKUs), then run a second pass on Barcode/EAN to catch barcode-level conflicts before importing the clean catalogue.

What the catalogue duplicate finder does

The full option set. One key column per pass, one checkbox, flag-only output. No fuzzy match, no multi-column key, no auto-removal.

Control	Behaviour	Default
Find duplicates in column	Single key column (`SKU`, then `Barcode` on a second run); values grouped to find repeats	First column
Case-sensitive matching	Off lowercases before comparing (`ABC-1` = `abc-1`); on requires identical casing	Off
`_is_duplicate` column	`YES` if the SKU/barcode appears 2+ times, `NO` if once; first occurrence is `YES` too	Always added
Removal	None — duplicates are flagged for review. Use csv-deduplicator to drop surplus rows	Zero removed

Identifier passes for a catalogue

Run separate passes per identifier since the key is one column. Normalise formats first so genuine duplicates are not split by formatting.

Pass	Key column	Catches	Pre-step
1	`SKU` / `Variant SKU`	Same SKU listed twice — the classic PIM overwrite conflict	Trim whitespace; confirm casing policy
2	`Barcode` / `EAN` / `UPC`	Same barcode under different SKUs (true product collision)	Fix scientific notation (`5.06E+12`) back to full digits first
Optional	Combined `SKU\|Barcode`	Rows identical on both identifiers	Build with csv-column-merger

Cookbook

Before/after rows from real product feeds. SKUs and barcodes anonymised; the _is_duplicate column is exactly what the tool appends.

Duplicate SKU — the silent overwrite

Example

Two rows share the SKU ABC-01 with different prices. A PIM would upsert the second over the first; the finder flags both so you choose the correct price before import.

Input (catalogue.csv):
SKU,Title,Price
ABC-01,Blue Mug,9.99
DEF-02,Red Mug,9.99
ABC-01,Blue Mug,11.99

Key column: SKU  ·  Case-sensitive: off

Output (catalogue.duplicates-marked.csv):
SKU,Title,Price,_is_duplicate
ABC-01,Blue Mug,9.99,YES
DEF-02,Red Mug,9.99,NO
ABC-01,Blue Mug,11.99,YES

Shopify variant rows look like duplicates

Example

Input (Shopify export, abridged):
Handle,Title,Variant SKU
blue-mug,Blue Mug,MUG-BLU-S
blue-mug,,MUG-BLU-L

Key column: Variant SKU  ·  Case-sensitive: off

Output (variants kept distinct):
Handle,Title,Variant SKU,_is_duplicate
blue-mug,Blue Mug,MUG-BLU-S,NO
blue-mug,,MUG-BLU-L,NO

Barcode mangled into scientific notation

Example

Excel coerced a 13-digit EAN to 5.06E+12, so the true barcode and the mangled one are different text and won't group. Fix the format before the barcode pass.

Input:
SKU,Barcode
A1,5060000000123
A2,5.06E+12

Key column: Barcode  ·  Case-sensitive: off

Output (not flagged — text differs):
SKU,Barcode,_is_duplicate
A1,5060000000123,NO
A2,5.06E+12,NO

Fix: re-export with the barcode column stored as text, or
repair with csv-find-replace, then re-run the barcode pass.

Case-sensitive run for case-meaningful SKUs

Example

A supplier uses casing to distinguish lines, so ABC-01 and abc-01 are different products. Ticking Case-sensitive matching keeps them as two unique values.

Input:
SKU,Supplier
ABC-01,LineA
abc-01,LineB

Key column: SKU  ·  Case-sensitive: ON

Output (no duplicates):
SKU,Supplier,_is_duplicate
ABC-01,LineA,NO
abc-01,LineB,NO

Reading the conflict summary

Example

For a 2,000-row feed (Pro tier) with 18 SKUs duplicated, the summary shows the overwrite exposure before you push the feed.

Summary after Find duplicates (SKU pass):
  Duplicate groups : 18    (SKUs that repeat)
  Extra copies     : 22    (surplus rows = sum of count-1)
  Unique values    : 1,960 (SKUs appearing exactly once)

Meaning: 1,978 distinct SKUs across 2,000 rows; 18 repeat,
some more than twice — 22 rows would overwrite on import.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

You wanted a deduplicated feed, not flags

By design

Barcodes shown as scientific notation

Not matched

Excel coerces long barcodes to 5.06E+12; that text won't match the full-digit form. Re-export the barcode column as text, or repair with csv-find-replace, before the barcode pass.

Shopify variant rows flagged as duplicates

Avoidable

Shopify repeats the Handle across variant rows. If you key on Handle, every multi-variant product looks duplicated. Key on Variant SKU (unique per variant) instead to flag only genuine duplicates.

Whitespace around the SKU

Not matched

Matching is whole-cell exact, so ABC-01 and ABC-01 are different. Run csv-whitespace-trimmer before checking so leading/trailing spaces don't hide real SKU duplicates.

First occurrence marked YES

Expected

Every member of a duplicate group is flagged YES, including the first, so you can compare all conflicting rows. To keep one and drop the rest automatically, use csv-deduplicator.

Need to check SKU and barcode together

Single key only

Only one key column per pass. Run SKU, then run Barcode, and reconcile — or build a combined SKU|Barcode column with csv-column-merger for an exact composite match in one pass.

Blank SKU cells

Grouped together

Rows with an empty SKU all share one empty key and are flagged YES together (shown as (empty) in the list). Fill or filter blank SKUs before pushing the feed.

Catalogue over the free 500-row / 2 MB cap

Upgrade required

Frequently asked questions

Can I check for duplicates on both SKU and barcode at once?

Does it remove the duplicate products or just flag them?

What if I want to keep the most recent duplicate row?

Does this work for Shopify product exports with variants?

Why didn't it flag two products with the same barcode?

Is product and supplier data uploaded?

Does case-sensitive matching matter for SKUs?

How large a catalogue can I check?

What does the output file look like?

Your original feed with a trailing _is_duplicate column, saved as <yourfile>.duplicates-marked.csv. Filter that column to YES in your spreadsheet to review the conflicting product rows.

How do I combine supplier feeds before checking?

Append them with csv-merger (they should share a header schema) into one file, then run the duplicate finder so SKUs that collide across suppliers are caught in a single pass.

What do the summary numbers tell me about my feed?

Should I clean the feed before or after the PIM import?

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

Detect Duplicate Products in a Product Catalogue CSV

How to detect duplicate products in a product catalogue csv

What the catalogue duplicate finder does

Identifier passes for a catalogue

Cookbook

Duplicate SKU — the silent overwrite

Shopify variant rows look like duplicates

Barcode mangled into scientific notation

Case-sensitive run for case-meaningful SKUs

Reading the conflict summary

Errors and edge cases

You wanted a deduplicated feed, not flags

Barcodes shown as scientific notation

Shopify variant rows flagged as duplicates

Whitespace around the SKU

First occurrence marked YES

Need to check SKU and barcode together

Blank SKU cells

Catalogue over the free 500-row / 2 MB cap

Frequently asked questions

Can I check for duplicates on both SKU and barcode at once?

Does it remove the duplicate products or just flag them?

What if I want to keep the most recent duplicate row?

Does this work for Shopify product exports with variants?

Why didn't it flag two products with the same barcode?

Is product and supplier data uploaded?

Does case-sensitive matching matter for SKUs?

How large a catalogue can I check?

What does the output file look like?

How do I combine supplier feeds before checking?

What do the summary numbers tell me about my feed?

Should I clean the feed before or after the PIM import?

Privacy first

Related guides

Detect Duplicate Products in a Product Catalogue CSV

How to detect duplicate products in a product catalogue csv

What the catalogue duplicate finder does

Identifier passes for a catalogue

Cookbook

Duplicate SKU — the silent overwrite

Shopify variant rows look like duplicates

Barcode mangled into scientific notation

Case-sensitive run for case-meaningful SKUs

Reading the conflict summary

Errors and edge cases

You wanted a deduplicated feed, not flags

Barcodes shown as scientific notation

Shopify variant rows flagged as duplicates

Whitespace around the SKU

First occurrence marked YES

Need to check SKU and barcode together

Blank SKU cells

Catalogue over the free 500-row / 2 MB cap

Frequently asked questions

Can I check for duplicates on both SKU and barcode at once?

Does it remove the duplicate products or just flag them?

What if I want to keep the most recent duplicate row?

Does this work for Shopify product exports with variants?

Why didn't it flag two products with the same barcode?

Is product and supplier data uploaded?

Does case-sensitive matching matter for SKUs?

How large a catalogue can I check?

What does the output file look like?

How do I combine supplier feeds before checking?

What do the summary numbers tell me about my feed?

Should I clean the feed before or after the PIM import?

Privacy first

Related guides