How to normalise product category values in a product csv
- Step 1Decide your canonical taxonomy — List the final category names your store or PIM will use, e.g.
Men's Shoes,Women's Shoes,Kids' Shoes. Every supplier variant maps onto one of these. - Step 2Export the product CSV — Download the merged feed from your PIM, ERP, or supplier. The dropzone takes
.csv,.tsv,.txtwith auto delimiter detection — useful for tab-delimited marketplace feeds. - Step 3Map the first variant — Find
Mens Footwear, Replace withMen's Shoes. Leave Case sensitive off somens footwearandMENS FOOTWEARare caught too. The matched text is replaced in every data cell. - Step 4Check the counter, then chain the next variant — Run, read Replacements made, then download and re-feed the
.replaced.csvto map the next variant (Mens Shoes→Men's Shoes). Repeat per variant, or use Regex mode to do them together. - Step 5Handle the apostrophe forms — If your source mixes straight
'and curly ’ apostrophes, run a normalising pass first (Find ’ Replace with') soMen’sandMen'sbecome one string before you map categories. - Step 6Download and upload to your PIM or marketplace — Click Download CSV (saved as
<name>.replaced.csv) and import. Re-check the storefront navigation to confirm the duplicate facets are gone.
Category-variant folding plan
Typical multi-supplier category variants and the canonical value to standardise on. Each row is one literal pass — or fold same-target rows into one Regex alternation.
| Find (variant) | Replace with (canonical) | Why it appears | Regex tip |
|---|---|---|---|
| Mens Footwear | Men's Shoes | Supplier uses 'Footwear' segment naming | ^Mens? (Footwear|Shoes)$ |
| Mens Shoes | Men's Shoes | Missing possessive apostrophe | Same alternation as above |
| Men’s Shoes | Men's Shoes | Curly apostrophe from a Word/Mac source | Normalise ’ to ' first |
| T Shirts | T-Shirts | Hyphen dropped on export | ^T[ -]?Shirts?$ |
| Tees | T-Shirts | Casual/marketing synonym | Literal pass |
| Accessory | Accessories | Singular vs plural mismatch | Anchor with ^Accessory$ to avoid 'Accessories' |
Option surface for category work
The full set of controls. No column picker, no mapping grid, no preset taxonomy — scope and many-to-one mapping are done with anchors and multiple passes.
| Control | What it does | Default | Category use |
|---|---|---|---|
| Find | Text/pattern searched in every data cell | (empty) | The supplier's category string |
| Replace with | Substitute text; empty = delete | (empty) | Your canonical category name |
| Case sensitive | Off = casing ignored on the match | Off | Leave off — supplier casing is inconsistent |
| Regex mode | Find treated as a JS regex | Off | Whole-cell anchors ^...$ and apostrophe-optional Men'?s |
Cookbook
Real before/after category cells from multi-supplier product feeds. The header row is preserved; matched text in data cells is replaced.
Fold three men's-shoe variants in one regex pass
ExampleWith Regex mode on, an anchored alternation maps every spelling of the men's-shoe segment onto the canonical value at once. The anchors keep Mens Shoes Polish (an accessory) out of the match.
Input: SKU,Category A1,Mens Footwear A2,Mens Shoes A3,Men's Shoes A4,Mens Shoes Polish Find (Regex ON): ^Men'?s (Footwear|Shoes)$ Replace with: Men's Shoes Output: SKU,Category A1,Men's Shoes A2,Men's Shoes A3,Men's Shoes A4,Mens Shoes Polish Replacements made: 2 (A4 untouched — anchors excluded it)
Normalise the curly apostrophe before mapping
ExampleFeeds exported from Word, Pages, or some PIMs write a curly ’ instead of a straight '. To the matcher these are different bytes. A normalising pass first lets a single canonical value win.
Pass 1 — Find: ’ Replace with: ' (Regex OFF) Before: Men’s Shoes After pass 1: Men's Shoes Pass 2 — Find: Men's Shoes Replace with: Men's Shoes (now both rows share the straight apostrophe and dedupe in your PIM)
Category word also appears in product titles
ExampleA literal Find Shoes would also hit the word inside a Title cell like Waterproof Shoes — Black. Because there's no column selector, restrict the match with whole-cell anchors so only the Category cell value Shoes is folded.
Input: Title,Category Waterproof Shoes - Black,Shoes Leather Belt,Accessory Find (Regex ON): ^Shoes$ Replace with: Footwear Output: Title,Category Waterproof Shoes - Black,Footwear Leather Belt,Accessory Replacements made: 1 (the Title 'Shoes' was not at cell boundaries)
Header named 'Category' stays put
ExampleFind & Replace skips row 0, so even if your canonical word also appears as a column heading, the heading is preserved while the data is normalised.
Input: Category,Price Mens Footwear,49.99 Mens Shoes,52.00 Find: Mens Footwear Replace with: Men's Shoes Output (header untouched): Category,Price Men's Shoes,49.99 Mens Shoes,52.00 Replacements made: 1
Singular-to-plural without catching the plural
ExampleMapping Accessory→Accessories with a literal Find would also turn Accessories into Accessoriesies. Anchor the cell with ^Accessory$ so only the singular form changes.
Input: SKU,Category B1,Accessory B2,Accessories Find (Regex ON): ^Accessory$ Replace with: Accessories Output: SKU,Category B1,Accessories B2,Accessories Replacements made: 1
Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Curly vs straight apostrophe are different characters
By designMen's Shoes (U+0027 straight quote) and Men’s Shoes (U+2019 curly quote) will not match each other. A Find for one leaves the other unchanged. Fix: run a normalising pass first — Find ’ Replace with ' — to unify the apostrophe across the file before mapping categories.
No column selector — category words leak into titles
By designReplacement spans every data cell. A literal Find Shoes will also alter the word inside Title or Description cells. Fix: use Regex mode with whole-cell anchors ^Shoes$ so only cells whose entire value is the category string are folded, or isolate rows first with csv-column-filter.
Plural overreach — `Accessory`→`Accessories`
By designWithout anchors, a literal Find Accessory matches the start of Accessories and produces Accessoriesies. Fix: anchor the whole cell with ^Accessory$ in Regex mode, or replace the longer form first and let the shorter one fall through.
Header row never changes
PreservedRow 0 is returned verbatim, so a Category / Product Type / google_product_category heading is safe. If your feed has no header row, the first product row is mistaken for a header and skipped — prepend a placeholder header line.
Invalid regex returns the file unchanged
No changesA malformed pattern (unbalanced ( or [) is caught and the original rows are returned with 0 replacements — no crash, no partial edit. Fix: if the counter reads 0 unexpectedly, validate the regex; an unescaped apostrophe-handling group is the common slip.
One pass, one variant — no paste-in mapping grid
By designYou cannot enter Mens Shoes→Men's Shoes, Tees→T-Shirts as a table. Each distinct variant is its own pass (re-feed the output) unless it shares a target and can be folded into one Regex alternation. For a large taxonomy remap, a spreadsheet lookup may be faster than many passes.
Free tier limit reached on a big feed
Tier limitFree caps at 2 MB / 500 rows. A 20,000-SKU marketplace feed exceeds it. Fix: upgrade to Pro, or split with csv-row-splitter, normalise each chunk identically, then recombine with csv-merger.
Tab-delimited feed parses fine, comma-delimited output
SupportedMarketplace feeds (Google Merchant Center TSV, some Amazon templates) are tab-delimited. PapaParse auto-detects the tab on input. The serialised output uses PapaParse's unparse default (comma) — if your destination needs tabs, re-export from there or convert afterwards.
Frequently asked questions
What if a category name also appears inside product titles?
Use Regex mode with whole-cell anchors. Find ^Shoes$ (Regex on) only matches cells whose entire value is Shoes, so the word inside a Title like Waterproof Shoes is left alone. There is no column selector, so anchors are how you scope the match to a clean category cell.
Can I use regex patterns for the find value?
Yes — tick Regex mode. Then Men'?s matches both Mens and Men's, and an alternation ^(Mens Footwear|Mens Shoes)$ folds several variants onto one canonical value in a single pass. Without Regex mode, the Find text is treated literally (so . is a real dot, not 'any character').
Is there a limit to the number of replacement pairs?
There is one pair per pass — there is no grid for multiple pairs. To apply many variants you run consecutive passes, feeding each output back into the tool, or you combine same-target variants into one Regex alternation. Plan your taxonomy as a sequence of passes.
Why didn't `Men’s Shoes` get folded with `Men's Shoes`?
They use different apostrophe characters — a curly ’ (U+2019) versus a straight ' (U+0027). The matcher treats them as distinct text. Run a normalising pass first (Find the curly ’, replace with the straight '), then map categories so both rows share one canonical string.
Does case matter when folding categories?
By default no — matching is case-insensitive, so MENS SHOES, mens shoes, and Mens Shoes all match one Find. The replacement is always written exactly as you typed it, giving uniform casing in the output. Tick Case sensitive only if a supplier uses casing meaningfully.
Will the header row get rewritten if my canonical word is also a column name?
No. Row 0 is never modified, so a Category heading (or any header that happens to match your Find) is preserved. The trade-off is that headerless files will have their first data row skipped — add a placeholder header if your feed has none.
How do I avoid turning `Accessory` into `Accessoriesies`?
Anchor the match. In Regex mode, Find ^Accessory$ only changes cells equal to Accessory, leaving Accessories alone. Without anchors, a literal Find matches the prefix inside the longer word and double-suffixes it.
What does the replacements number mean?
It's the count of cells that changed, not occurrences. A category that appeared in 40 product rows shows 40 — a quick check that you caught the variant you expected. All occurrences within a single cell are replaced (global match), but that still counts as one cell.
Which file types and how big?
.csv, .tsv, and .txt, with delimiter auto-detection (handy for tab-delimited marketplace feeds). Free tier allows up to 2 MB / 500 rows; Pro lifts both. For large feeds, split with csv-row-splitter and recombine with csv-merger.
Is my supplier feed sent to a server?
No. Parsing and replacement run in your browser via PapaParse. Product data, costs, and supplier identifiers stay on-device. Only an anonymous run counter is stored server-side for signed-in dashboard stats, and you can opt out in settings.
After folding categories, how do I confirm the duplicates are gone?
Run csv-duplicate-finder on the category column to confirm only canonical values remain, or csv-sorter to eyeball them grouped. For broader feed cleanup (whitespace, empty rows) before mapping, csv-cleaner is the companion tool.
Can I script category normalisation for a recurring feed?
Yes. GET /api/v1/tools/csv-find-replace returns the option schema (find, replace, caseSensitive, useRegex); pair the @jadapps/runner and POST to 127.0.0.1:9789/v1/tools/csv-find-replace/run. Since each call is one pair, a normalisation pipeline strings together one runner call per regex pattern on each scheduled supplier export.
Privacy first
Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.