How to remove duplicate sku rows from a csv
- Step 1Combine product feeds into one file — The tool dedupes within a single CSV. If you have a master catalogue plus supplier feeds, concatenate them with csv-merger, placing the feed whose data you trust (your master price list) first so its row wins.
- Step 2Drop the catalogue CSV onto the deduplicator — Accepts
.csvand Excel/ODS (.xlsx/.xls/.ods— first sheet only, auto-converted). PapaParse auto-detects whether the file is comma- or semicolon-delimited. - Step 3Select your SKU / identifier column — From the Unique key column dropdown, pick the column that uniquely identifies a product variant — usually
SKU. UseASINfor Amazon feeds,Barcode/UPCfor retail, or an internalProduct IDif SKUs are inconsistent across suppliers. - Step 4Set case sensitivity to match your SKU scheme — Leave Case-sensitive keys off for typed SKUs where case is incidental. Turn it on if your identifier scheme treats
AB12andab12as different products (some internal and Amazon codes are case-distinct). - Step 5Run and verify against your variant count — Click Remove duplicates and read the tiles: Rows in, Rows out, Duplicates, Unique keys, Empty keys. The Unique keys count should match the number of distinct products you expect. A non-zero Empty keys means rows with no SKU — fix those before import.
- Step 6Download and import the deduplicated feed — Click Download CSV (or
.xlsxif you uploaded a spreadsheet). One row per SKU survives, in original order. Import into your store or ERP — the duplicate-SKU rejections disappear.
The deduplicator's two controls
There are exactly two options. No multi-column key, no merge-on-collision, no quantity-summing — it removes whole duplicate rows only.
| Control | Effect | Default | SKU-feed guidance |
|---|---|---|---|
| Unique key column | Rows sharing this value (trimmed, optionally lowercased) are duplicates; first is kept | First column | Set to SKU, or ASIN/Barcode/Product ID per channel |
| Case-sensitive keys | Off: AB12 matches ab12. On: exact case required | Off | Off for typed SKUs; on for case-distinct codes |
| Whitespace in key | Trimmed before comparison; the stored SKU value is unchanged | Always trimmed | A trailing-space SKU still matches its clean form |
| Blank SKU rows | Never deduped; preserved and counted as Empty keys | Always kept | Filter out rows with no SKU first if your importer rejects them |
Which identifier column to use per channel
Match the dedup key to how each marketplace or system uniquely identifies a product.
| Channel / system | Identifier column | Notes |
|---|---|---|
| Shopify product import | Variant SKU | Shopify keys variants on SKU; duplicate SKUs across products cause overwrite/rejection |
| Amazon Seller flat file | seller-sku (or asin) | Dedupe on seller-sku for your listings; asin to find catalogue overlaps |
| Retail / POS export | Barcode / UPC / EAN | Barcodes are globally unique; reliable when SKUs are inconsistent |
| Supplier price-list merge | Product ID or MPN | Use the field both suppliers share; concatenate master feed first to keep your pricing |
| WooCommerce product CSV | SKU | Woo requires unique SKUs; a duplicate blocks the whole import row |
Cookbook
Real before/after rows from product and inventory feeds. The tool keeps the first row per identifier and removes whole duplicate rows — it does not sum quantities or merge fields.
Marketplace sync re-added an existing SKU
ExampleA nightly sync appended rows that already existed in the master feed. Concatenating master-first and deduping on SKU keeps the master row and drops the re-added copy.
Input (master rows above synced rows): sku,title,price SKU-1001,Blue Mug,12.00 SKU-1002,Red Mug,12.00 SKU-1001,Blue Mug,11.50 Key column: sku · Case-sensitive keys: OFF Output (master price kept): sku,title,price SKU-1001,Blue Mug,12.00 SKU-1002,Red Mug,12.00 Stats: Rows in 3 · Rows out 2 · Duplicates 1 · Unique keys 2
Hand-typed SKU casing differs
ExampleTwo team members entered the same product with different SKU casing. With default case-insensitive matching they collapse to one product.
Input: sku,supplier abc-12,Acme ABC-12,Acme xyz-99,Globex Key column: sku · Case-sensitive keys: OFF Output: sku,supplier abc-12,Acme xyz-99,Globex If abc-12 and ABC-12 are genuinely different variants in your scheme, turn Case-sensitive keys ON to keep both.
Trailing space on a barcode from a CSV export
ExampleA POS export left a trailing space on some barcodes. Trim-before-compare recognises the padded barcode as the same product and removes the duplicate.
Input (trailing space on row 1): barcode,name 5012345678900 ,Widget 5012345678900,Widget Key column: barcode Output: barcode,name 5012345678900 ,Widget The surviving barcode still has its space — trim affects only the key. Clean the value afterward with csv-whitespace-trimmer.
Blank-SKU rows kept for cataloguing
ExampleNew products awaiting a SKU have a blank identifier. They aren't duplicates of each other — every blank-key row is preserved and counted as an Empty key.
Input: sku,title ,New Hat (pending) ,New Scarf (pending) SKU-7,Old Belt Key column: sku Output (both blank-SKU rows kept): sku,title ,New Hat (pending) ,New Scarf (pending) SKU-7,Old Belt Stats: Rows in 3 · Rows out 3 · Duplicates 0 · Empty keys 2
Two suppliers, same UPC — keep the cheaper source first
ExampleBoth suppliers list the same UPC. You want the row from the supplier you placed first (your preferred vendor) to survive. First-occurrence-wins does exactly that.
Input (preferred supplier rows first): upc,supplier,cost 0049000000443,VendorA,3.10 0049000000443,VendorB,3.45 Key column: upc Output (VendorA kept): upc,supplier,cost 0049000000443,VendorA,3.10 To keep the CHEAPEST instead, sort ascending by cost with csv-sorter first, then dedupe.
Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Product feed over the free 500-row limit
Pro requiredThis is a Pro tool; free is capped at 500 rows / 2 MB. Real catalogues exceed that. Pro raises it to 100,000 rows / 100 MB. For feeds beyond 100k, split with csv-row-splitter, dedupe each chunk, concatenate, then dedupe once more.
Same SKU, different sizes/colours that you want to keep
Over-collapse riskIf your file repeats one parent SKU across size/colour variants in a single column, deduping on that column would wrongly collapse legitimate variants. Dedupe on the variant-level identifier (Variant SKU, Barcode) instead, which is unique per variant.
Need to sum quantities across duplicate SKUs
Not supportedThe deduplicator removes whole rows; it does NOT aggregate. If SKU-1001 appears twice with quantities 5 and 3, you get one row with whichever quantity was first — not 8. For summing, use a spreadsheet pivot or your ERP's import aggregation; the deduplicator is for collapsing, not totalling.
Want the cheapest/most-recent row kept
First-row onlyOnly the first occurrence is kept. To keep the lowest price or newest entry, sort first with csv-sorter (price ascending, or date descending) so the desired row sits first, then dedupe on the SKU.
SKUs differ only by case and that matters
Toggle case-sensitiveBy default AB12 and ab12 collapse. If your scheme treats them as distinct products, enable Case-sensitive keys so only byte-exact SKUs match. This is the one place case sensitivity commonly matters for product data.
Leading-zero barcode lost as a number
PreservedThe tool is text-only and never reinterprets a barcode as a number, so a leading-zero UPC like 0049000000443 is preserved exactly. (If Excel mangled it to 4.9E+10 before export, fix that in the source — the deduplicator can't recover digits the spreadsheet already dropped.)
Composite key (SKU + warehouse) needed
Single key onlyOne key column only. To dedupe per SKU per warehouse, merge the two columns first with csv-column-merger into a combined key, dedupe on it, then split back with csv-column-value-splitter.
Blank-SKU rows all kept
PreservedRows with no SKU value pass through untouched and count as Empty keys — they are never treated as duplicates of each other. If your importer rejects blank SKUs, filter them out first with csv-column-filter (sku is_not_empty).
Semicolon-delimited supplier feed
SupportedDelimiter auto-detection handles ;-separated supplier exports without configuration. Output is comma-delimited. No values are altered.
You only want to find duplicate SKUs, not remove them
Use the finderTo audit which SKUs are duplicated and how many times before deleting, use csv-duplicate-finder — it marks rows YES/NO and groups matches. The deduplicator is the cleanup step once you've reviewed.
Frequently asked questions
Which column should I use to dedupe a product feed?
The variant-level unique identifier: SKU (or Variant SKU) for most stores, ASIN/seller-sku for Amazon, Barcode/UPC/EAN for retail, or an internal Product ID/MPN when SKUs are inconsistent across suppliers. Pick one column — the tool keys on a single field.
Does it match SKUs case-insensitively?
Yes by default. abc-12 and ABC-12 collapse to one product unless you turn on Case-sensitive keys. Enable that checkbox only if your SKU scheme genuinely treats case as significant.
Will it sum the quantities of duplicate SKUs?
No. It removes whole duplicate rows and keeps the first — it does not aggregate or total quantities. If SKU-1001 appears with qty 5 then qty 3, the output keeps one row with qty 5. For summing, use a spreadsheet pivot or your ERP's import aggregation.
Which duplicate row survives?
The first occurrence in file order. To keep the cheapest, sort ascending by price with csv-sorter first; to keep the newest, sort descending by date — then dedupe so the desired row is first.
What happens to products with no SKU yet?
They're kept. Blank-key rows are never deduped and are counted as Empty keys. If your import tool rejects blank SKUs, pre-filter them out with csv-column-filter (sku is_not_empty).
Will leading zeros on my UPC/barcode survive?
Yes — the tool is text-only and never converts identifiers to numbers, so 0049000000443 is preserved. If a spreadsheet already stripped the zeros before you exported, that loss happened upstream and can't be recovered here.
Can I dedupe per SKU per warehouse in one pass?
Not directly — the key is one column. Merge SKU and warehouse into a single key with csv-column-merger, dedupe on that, then split it back with csv-column-value-splitter if needed.
How large a catalogue can it process?
Free tier: 500 rows / 2 MB (this is a Pro tool). Pro: 100,000 rows / 100 MB. For larger feeds, split with csv-row-splitter, dedupe each part, concatenate, and run a final pass.
Can I upload an Excel product sheet?
Yes — .xlsx, .xls, and .ods are accepted; the first sheet is converted to CSV, deduped, and downloadable back as .xlsx. Plain .csv works too, and the delimiter is auto-detected.
Is my wholesale cost / supplier data uploaded?
No. Parsing and deduplication run in your browser. Costs, supplier names, and SKUs never reach a server — only an anonymous run counter is recorded for signed-in dashboards.
How do I just see which SKUs are duplicated first?
Use csv-duplicate-finder — it adds an _is_duplicate YES/NO column and groups duplicate SKUs so you can review before removing anything. Then use this deduplicator to collapse them.
It collapsed legitimate size variants — what went wrong?
You likely keyed on a parent SKU shared across variants. Dedupe on the variant-level identifier instead (Variant SKU, Barcode), which is unique per size/colour, so genuine variants are preserved.
Privacy first
Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.