Normalise Product Category Values in a CSV

How to normalise product category values in a product csv

Step 1
Decide your canonical taxonomy — List the final category names your store or PIM will use, e.g. Men's Shoes, Women's Shoes, Kids' Shoes. Every supplier variant maps onto one of these.
Step 2
Export the product CSV — Download the merged feed from your PIM, ERP, or supplier. The dropzone takes .csv, .tsv, .txt with auto delimiter detection — useful for tab-delimited marketplace feeds.
Step 3
Map the first variant — Find Mens Footwear, Replace with Men's Shoes. Leave Case sensitive off so mens footwear and MENS FOOTWEAR are caught too. The matched text is replaced in every data cell.
Step 4
Check the counter, then chain the next variant — Run, read Replacements made, then download and re-feed the .replaced.csv to map the next variant (Mens Shoes→Men's Shoes). Repeat per variant, or use Regex mode to do them together.
Step 5
Handle the apostrophe forms — If your source mixes straight ' and curly ’ apostrophes, run a normalising pass first (Find ’ Replace with ') so Men’s and Men's become one string before you map categories.
Step 6
Download and upload to your PIM or marketplace — Click Download CSV (saved as <name>.replaced.csv) and import. Re-check the storefront navigation to confirm the duplicate facets are gone.

Category-variant folding plan

Typical multi-supplier category variants and the canonical value to standardise on. Each row is one literal pass — or fold same-target rows into one Regex alternation.

Find (variant)	Replace with (canonical)	Why it appears	Regex tip
Mens Footwear	Men's Shoes	Supplier uses 'Footwear' segment naming	`^Mens? (Footwear\|Shoes)$`
Mens Shoes	Men's Shoes	Missing possessive apostrophe	Same alternation as above
Men’s Shoes	Men's Shoes	Curly apostrophe from a Word/Mac source	Normalise ’ to ' first
T Shirts	T-Shirts	Hyphen dropped on export	`^T[ -]?Shirts?$`
Tees	T-Shirts	Casual/marketing synonym	Literal pass
Accessory	Accessories	Singular vs plural mismatch	Anchor with `^Accessory$` to avoid 'Accessories'

Option surface for category work

The full set of controls. No column picker, no mapping grid, no preset taxonomy — scope and many-to-one mapping are done with anchors and multiple passes.

Control	What it does	Default	Category use
Find	Text/pattern searched in every data cell	(empty)	The supplier's category string
Replace with	Substitute text; empty = delete	(empty)	Your canonical category name
Case sensitive	Off = casing ignored on the match	Off	Leave off — supplier casing is inconsistent
Regex mode	Find treated as a JS regex	Off	Whole-cell anchors `^...$` and apostrophe-optional `Men'?s`

Cookbook

Real before/after category cells from multi-supplier product feeds. The header row is preserved; matched text in data cells is replaced.

Fold three men's-shoe variants in one regex pass

Example

With Regex mode on, an anchored alternation maps every spelling of the men's-shoe segment onto the canonical value at once. The anchors keep Mens Shoes Polish (an accessory) out of the match.

Input:
SKU,Category
A1,Mens Footwear
A2,Mens Shoes
A3,Men's Shoes
A4,Mens Shoes Polish

Find (Regex ON): ^Men'?s (Footwear|Shoes)$
Replace with: Men's Shoes

Output:
SKU,Category
A1,Men's Shoes
A2,Men's Shoes
A3,Men's Shoes
A4,Mens Shoes Polish

Replacements made: 2 (A4 untouched — anchors excluded it)

Normalise the curly apostrophe before mapping

Example

Feeds exported from Word, Pages, or some PIMs write a curly ’ instead of a straight '. To the matcher these are different bytes. A normalising pass first lets a single canonical value win.

Pass 1 — Find: ’   Replace with: '   (Regex OFF)

Before: Men’s Shoes
After pass 1: Men's Shoes

Pass 2 — Find: Men's Shoes   Replace with: Men's Shoes
(now both rows share the straight apostrophe and dedupe in your PIM)

Category word also appears in product titles

Example

A literal Find Shoes would also hit the word inside a Title cell like Waterproof Shoes — Black. Because there's no column selector, restrict the match with whole-cell anchors so only the Category cell value Shoes is folded.

Input:
Title,Category
Waterproof Shoes - Black,Shoes
Leather Belt,Accessory

Find (Regex ON): ^Shoes$    Replace with: Footwear

Output:
Title,Category
Waterproof Shoes - Black,Footwear
Leather Belt,Accessory

Replacements made: 1 (the Title 'Shoes' was not at cell boundaries)

Header named 'Category' stays put

Example

Find & Replace skips row 0, so even if your canonical word also appears as a column heading, the heading is preserved while the data is normalised.

Input:
Category,Price
Mens Footwear,49.99
Mens Shoes,52.00

Find: Mens Footwear    Replace with: Men's Shoes

Output (header untouched):
Category,Price
Men's Shoes,49.99
Mens Shoes,52.00

Replacements made: 1

Singular-to-plural without catching the plural

Example

Mapping Accessory→Accessories with a literal Find would also turn Accessories into Accessoriesies. Anchor the cell with ^Accessory$ so only the singular form changes.

Input:
SKU,Category
B1,Accessory
B2,Accessories

Find (Regex ON): ^Accessory$    Replace with: Accessories

Output:
SKU,Category
B1,Accessories
B2,Accessories

Replacements made: 1

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Curly vs straight apostrophe are different characters

By design

Men's Shoes (U+0027 straight quote) and Men’s Shoes (U+2019 curly quote) will not match each other. A Find for one leaves the other unchanged. Fix: run a normalising pass first — Find ’ Replace with ' — to unify the apostrophe across the file before mapping categories.

No column selector — category words leak into titles

By design

Replacement spans every data cell. A literal Find Shoes will also alter the word inside Title or Description cells. Fix: use Regex mode with whole-cell anchors ^Shoes$ so only cells whose entire value is the category string are folded, or isolate rows first with csv-column-filter.

Plural overreach — `Accessory`→`Accessories`

By design

Without anchors, a literal Find Accessory matches the start of Accessories and produces Accessoriesies. Fix: anchor the whole cell with ^Accessory$ in Regex mode, or replace the longer form first and let the shorter one fall through.

Header row never changes

Preserved

Row 0 is returned verbatim, so a Category / Product Type / google_product_category heading is safe. If your feed has no header row, the first product row is mistaken for a header and skipped — prepend a placeholder header line.

Invalid regex returns the file unchanged

No changes

A malformed pattern (unbalanced ( or [) is caught and the original rows are returned with 0 replacements — no crash, no partial edit. Fix: if the counter reads 0 unexpectedly, validate the regex; an unescaped apostrophe-handling group is the common slip.

One pass, one variant — no paste-in mapping grid

By design

You cannot enter Mens Shoes→Men's Shoes, Tees→T-Shirts as a table. Each distinct variant is its own pass (re-feed the output) unless it shares a target and can be folded into one Regex alternation. For a large taxonomy remap, a spreadsheet lookup may be faster than many passes.

Free tier limit reached on a big feed

Tier limit

Free caps at 2 MB / 500 rows. A 20,000-SKU marketplace feed exceeds it. Fix: upgrade to Pro, or split with csv-row-splitter, normalise each chunk identically, then recombine with csv-merger.

Tab-delimited feed parses fine, comma-delimited output

Supported

Marketplace feeds (Google Merchant Center TSV, some Amazon templates) are tab-delimited. PapaParse auto-detects the tab on input. The serialised output uses PapaParse's unparse default (comma) — if your destination needs tabs, re-export from there or convert afterwards.

Frequently asked questions

What if a category name also appears inside product titles?

Use Regex mode with whole-cell anchors. Find ^Shoes$ (Regex on) only matches cells whose entire value is Shoes, so the word inside a Title like Waterproof Shoes is left alone. There is no column selector, so anchors are how you scope the match to a clean category cell.

Can I use regex patterns for the find value?

Yes — tick Regex mode. Then Men'?s matches both Mens and Men's, and an alternation ^(Mens Footwear|Mens Shoes)$ folds several variants onto one canonical value in a single pass. Without Regex mode, the Find text is treated literally (so . is a real dot, not 'any character').

Is there a limit to the number of replacement pairs?

There is one pair per pass — there is no grid for multiple pairs. To apply many variants you run consecutive passes, feeding each output back into the tool, or you combine same-target variants into one Regex alternation. Plan your taxonomy as a sequence of passes.

Why didn't `Men’s Shoes` get folded with `Men's Shoes`?

They use different apostrophe characters — a curly ’ (U+2019) versus a straight ' (U+0027). The matcher treats them as distinct text. Run a normalising pass first (Find the curly ’, replace with the straight '), then map categories so both rows share one canonical string.

Does case matter when folding categories?

By default no — matching is case-insensitive, so MENS SHOES, mens shoes, and Mens Shoes all match one Find. The replacement is always written exactly as you typed it, giving uniform casing in the output. Tick Case sensitive only if a supplier uses casing meaningfully.

Will the header row get rewritten if my canonical word is also a column name?

No. Row 0 is never modified, so a Category heading (or any header that happens to match your Find) is preserved. The trade-off is that headerless files will have their first data row skipped — add a placeholder header if your feed has none.

How do I avoid turning `Accessory` into `Accessoriesies`?

Anchor the match. In Regex mode, Find ^Accessory$ only changes cells equal to Accessory, leaving Accessories alone. Without anchors, a literal Find matches the prefix inside the longer word and double-suffixes it.

What does the replacements number mean?

It's the count of cells that changed, not occurrences. A category that appeared in 40 product rows shows 40 — a quick check that you caught the variant you expected. All occurrences within a single cell are replaced (global match), but that still counts as one cell.

Which file types and how big?

.csv, .tsv, and .txt, with delimiter auto-detection (handy for tab-delimited marketplace feeds). Free tier allows up to 2 MB / 500 rows; Pro lifts both. For large feeds, split with csv-row-splitter and recombine with csv-merger.

Is my supplier feed sent to a server?

No. Parsing and replacement run in your browser via PapaParse. Product data, costs, and supplier identifiers stay on-device. Only an anonymous run counter is stored server-side for signed-in dashboard stats, and you can opt out in settings.

After folding categories, how do I confirm the duplicates are gone?

Run csv-duplicate-finder on the category column to confirm only canonical values remain, or csv-sorter to eyeball them grouped. For broader feed cleanup (whitespace, empty rows) before mapping, csv-cleaner is the companion tool.

Can I script category normalisation for a recurring feed?

Yes. GET /api/v1/tools/csv-find-replace returns the option schema (find, replace, caseSensitive, useRegex); pair the @jadapps/runner and POST to 127.0.0.1:9789/v1/tools/csv-find-replace/run. Since each call is one pair, a normalisation pipeline strings together one runner call per regex pattern on each scheduled supplier export.

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

How to normalise product category values in a product csv

Step 1
Decide your canonical taxonomy — List the final category names your store or PIM will use, e.g. Men's Shoes, Women's Shoes, Kids' Shoes. Every supplier variant maps onto one of these.
Step 2
Export the product CSV — Download the merged feed from your PIM, ERP, or supplier. The dropzone takes .csv, .tsv, .txt with auto delimiter detection — useful for tab-delimited marketplace feeds.
Step 3
Map the first variant — Find Mens Footwear, Replace with Men's Shoes. Leave Case sensitive off so mens footwear and MENS FOOTWEAR are caught too. The matched text is replaced in every data cell.
Step 4
Check the counter, then chain the next variant — Run, read Replacements made, then download and re-feed the .replaced.csv to map the next variant (Mens Shoes→Men's Shoes). Repeat per variant, or use Regex mode to do them together.
Step 5
Handle the apostrophe forms — If your source mixes straight ' and curly ’ apostrophes, run a normalising pass first (Find ’ Replace with ') so Men’s and Men's become one string before you map categories.
Step 6
Download and upload to your PIM or marketplace — Click Download CSV (saved as <name>.replaced.csv) and import. Re-check the storefront navigation to confirm the duplicate facets are gone.

Category-variant folding plan

Typical multi-supplier category variants and the canonical value to standardise on. Each row is one literal pass — or fold same-target rows into one Regex alternation.

Find (variant)	Replace with (canonical)	Why it appears	Regex tip
Mens Footwear	Men's Shoes	Supplier uses 'Footwear' segment naming	`^Mens? (Footwear\|Shoes)$`
Mens Shoes	Men's Shoes	Missing possessive apostrophe	Same alternation as above
Men’s Shoes	Men's Shoes	Curly apostrophe from a Word/Mac source	Normalise ’ to ' first
T Shirts	T-Shirts	Hyphen dropped on export	`^T[ -]?Shirts?$`
Tees	T-Shirts	Casual/marketing synonym	Literal pass
Accessory	Accessories	Singular vs plural mismatch	Anchor with `^Accessory$` to avoid 'Accessories'

Option surface for category work

The full set of controls. No column picker, no mapping grid, no preset taxonomy — scope and many-to-one mapping are done with anchors and multiple passes.

Control	What it does	Default	Category use
Find	Text/pattern searched in every data cell	(empty)	The supplier's category string
Replace with	Substitute text; empty = delete	(empty)	Your canonical category name
Case sensitive	Off = casing ignored on the match	Off	Leave off — supplier casing is inconsistent
Regex mode	Find treated as a JS regex	Off	Whole-cell anchors `^...$` and apostrophe-optional `Men'?s`

Cookbook

Real before/after category cells from multi-supplier product feeds. The header row is preserved; matched text in data cells is replaced.

Fold three men's-shoe variants in one regex pass

Example

With Regex mode on, an anchored alternation maps every spelling of the men's-shoe segment onto the canonical value at once. The anchors keep Mens Shoes Polish (an accessory) out of the match.

Input:
SKU,Category
A1,Mens Footwear
A2,Mens Shoes
A3,Men's Shoes
A4,Mens Shoes Polish

Find (Regex ON): ^Men'?s (Footwear|Shoes)$
Replace with: Men's Shoes

Output:
SKU,Category
A1,Men's Shoes
A2,Men's Shoes
A3,Men's Shoes
A4,Mens Shoes Polish

Replacements made: 2 (A4 untouched — anchors excluded it)

Normalise the curly apostrophe before mapping

Example

Feeds exported from Word, Pages, or some PIMs write a curly ’ instead of a straight '. To the matcher these are different bytes. A normalising pass first lets a single canonical value win.

Pass 1 — Find: ’   Replace with: '   (Regex OFF)

Before: Men’s Shoes
After pass 1: Men's Shoes

Pass 2 — Find: Men's Shoes   Replace with: Men's Shoes
(now both rows share the straight apostrophe and dedupe in your PIM)

Category word also appears in product titles

Example

Input:
Title,Category
Waterproof Shoes - Black,Shoes
Leather Belt,Accessory

Find (Regex ON): ^Shoes$    Replace with: Footwear

Output:
Title,Category
Waterproof Shoes - Black,Footwear
Leather Belt,Accessory

Replacements made: 1 (the Title 'Shoes' was not at cell boundaries)

Header named 'Category' stays put

Example

Find & Replace skips row 0, so even if your canonical word also appears as a column heading, the heading is preserved while the data is normalised.

Input:
Category,Price
Mens Footwear,49.99
Mens Shoes,52.00

Find: Mens Footwear    Replace with: Men's Shoes

Output (header untouched):
Category,Price
Men's Shoes,49.99
Mens Shoes,52.00

Replacements made: 1

Singular-to-plural without catching the plural

Example

Mapping Accessory→Accessories with a literal Find would also turn Accessories into Accessoriesies. Anchor the cell with ^Accessory$ so only the singular form changes.

Input:
SKU,Category
B1,Accessory
B2,Accessories

Find (Regex ON): ^Accessory$    Replace with: Accessories

Output:
SKU,Category
B1,Accessories
B2,Accessories

Replacements made: 1

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Curly vs straight apostrophe are different characters

By design

No column selector — category words leak into titles

By design

Plural overreach — `Accessory`→`Accessories`

By design

Header row never changes

Preserved

Invalid regex returns the file unchanged

No changes

One pass, one variant — no paste-in mapping grid

By design

Free tier limit reached on a big feed

Tier limit

Free caps at 2 MB / 500 rows. A 20,000-SKU marketplace feed exceeds it. Fix: upgrade to Pro, or split with csv-row-splitter, normalise each chunk identically, then recombine with csv-merger.

Tab-delimited feed parses fine, comma-delimited output

Supported

Frequently asked questions

What if a category name also appears inside product titles?

Can I use regex patterns for the find value?

Is there a limit to the number of replacement pairs?

Why didn't `Men’s Shoes` get folded with `Men's Shoes`?

Does case matter when folding categories?

Will the header row get rewritten if my canonical word is also a column name?

How do I avoid turning `Accessory` into `Accessoriesies`?

What does the replacements number mean?

Which file types and how big?

Is my supplier feed sent to a server?

After folding categories, how do I confirm the duplicates are gone?

Can I script category normalisation for a recurring feed?

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

Normalise Product Category Values in a Product CSV

How to normalise product category values in a product csv

Category-variant folding plan

Option surface for category work

Cookbook

Fold three men's-shoe variants in one regex pass

Normalise the curly apostrophe before mapping

Category word also appears in product titles

Header named 'Category' stays put

Singular-to-plural without catching the plural

Errors and edge cases

Curly vs straight apostrophe are different characters

No column selector — category words leak into titles

Plural overreach — `Accessory`→`Accessories`

Header row never changes

Invalid regex returns the file unchanged

One pass, one variant — no paste-in mapping grid

Free tier limit reached on a big feed

Tab-delimited feed parses fine, comma-delimited output

Frequently asked questions

What if a category name also appears inside product titles?

Can I use regex patterns for the find value?

Is there a limit to the number of replacement pairs?

Why didn't `Men’s Shoes` get folded with `Men's Shoes`?

Does case matter when folding categories?

Will the header row get rewritten if my canonical word is also a column name?

How do I avoid turning `Accessory` into `Accessoriesies`?

What does the replacements number mean?

Which file types and how big?

Is my supplier feed sent to a server?

After folding categories, how do I confirm the duplicates are gone?

Can I script category normalisation for a recurring feed?

Privacy first

Related guides

Normalise Product Category Values in a Product CSV

How to normalise product category values in a product csv

Category-variant folding plan

Option surface for category work

Cookbook

Fold three men's-shoe variants in one regex pass

Normalise the curly apostrophe before mapping

Category word also appears in product titles

Header named 'Category' stays put

Singular-to-plural without catching the plural

Errors and edge cases

Curly vs straight apostrophe are different characters

No column selector — category words leak into titles

Plural overreach — `Accessory`→`Accessories`

Header row never changes

Invalid regex returns the file unchanged

One pass, one variant — no paste-in mapping grid

Free tier limit reached on a big feed

Tab-delimited feed parses fine, comma-delimited output

Frequently asked questions

What if a category name also appears inside product titles?

Can I use regex patterns for the find value?

Is there a limit to the number of replacement pairs?

Why didn't `Men’s Shoes` get folded with `Men's Shoes`?

Does case matter when folding categories?

Will the header row get rewritten if my canonical word is also a column name?

How do I avoid turning `Accessory` into `Accessoriesies`?

What does the replacements number mean?

Which file types and how big?

Is my supplier feed sent to a server?

After folding categories, how do I confirm the duplicates are gone?

Can I script category normalisation for a recurring feed?

Privacy first

Related guides