How to split a product csv into one file per category
- Step 1Export the master product feed — Pull the full catalogue from your PIM, ERP, or store: Shopify Products → Export → Plain CSV; Akeneo / Plytix / other PIM export to CSV; Magento / WooCommerce product export. Keep the header row — it is copied into every output. Free tier handles 2 MB / 500 rows; Pro raises that to 100 MB / 100,000 rows.
- Step 2Drop the feed onto the splitter above — PapaParse parses it locally — product and cost data never leave the tab. The first row becomes the headers and fills the Split by column dropdown.
- Step 3Select the Category column — Choose
Category,Product Type,Collection, or whichever column carries the categorical value. The dropdown shows every header by name (blank headers appear asCol N). Only one column can be selected per run; pickSubcategoryinstead if you want a finer split. - Step 4Decide on case-sensitive grouping — Case-sensitive grouping (default ON) keeps
Apparelandapparelseparate — useful for catching PIM casing drift. Untick it to merge case variants into one file named after the first spelling seen. It never trims whitespace, soApparelwith a trailing space stays its own group. - Step 5Run the split and read the group list — Click Split into groups. The result panel shows Total rows and Groups (the distinct-value count), then each category with its row count. If the group count is higher than your real taxonomy, you have casing, whitespace, or multi-value cells to clean up first.
- Step 6Download each category file — Each category has its own Download button — files save individually (no ZIP/bulk download). The filename is
<feed-name>.<category>.csvwith the category sanitised to letters/digits/hyphen/underscore (other characters become_) and truncated to 50 characters.
The two controls the splitter exposes
Two inputs only — a single-column selector and a case toggle. No taxonomy mapping, no multi-column split, no value-rename, no delimiter picker.
| Control | What it does for a product feed | Default |
|---|---|---|
| Split by column (dropdown) | Selects the single column to partition on — Category, Product Type, Collection, Subcategory. Lists every header by name; blank headers show Col N | First column (index 0) |
| Case-sensitive grouping (checkbox) | ON: Apparel and apparel are separate files. OFF: case variants merge (key lowercased); the merged file is named after the first-seen spelling. Never trims whitespace | On (ticked) |
How category cells group
The split partitions on the whole raw cell. Multi-value category strings are NOT decomposed — the entire string is one group value.
| Category cells | Result (case-sensitive default) | Note |
|---|---|---|
Apparel, Apparel, Footwear | 2 files: Apparel (2), Footwear (1) | Clean single-value taxonomy |
Apparel, apparel | 2 files (untick case-sensitivity → 1) | PIM casing drift |
Apparel|Shirts, Apparel|Pants | 2 files — full path is the group key | Path is not split on the pipe |
Apparel,Sale, Apparel | 2 files — Apparel,Sale is one value | Multi-category cell stays whole |
Footwear, `` (blank) | 2 files: Footwear (1), (empty) (1) | Uncategorised SKUs grouped |
Free vs Pro limits for catalogue splits
Limits apply to the source feed before the split. Output file count is uncapped — one per distinct category.
| Limit | Free | Pro |
|---|---|---|
| Source file size | 2 MB | 100 MB |
| Source row count | 500 rows | 100,000 rows |
| Output category files | No cap | No cap |
Cookbook
Before/after splits from product feeds. Representative columns only; costs/margins anonymised.
Shopify product feed split by Product Type
ExampleEach category team uploads their own products. Selecting Product Type gives one file per type, header included.
Input (products.csv): Handle,Title,Product Type,Price tee-1,Logo Tee,Apparel,19.00 mug-1,Logo Mug,Drinkware,9.00 tee-2,V-Neck,Apparel,21.00 Split by column: Product Type Groups (2): Apparel — 2 rows → products.Apparel.csv Drinkware — 1 row → products.Drinkware.csv products.Apparel.csv: Handle,Title,Product Type,Price tee-1,Logo Tee,Apparel,19.00 tee-2,V-Neck,Apparel,21.00
Multi-value category cell is treated as one category
ExampleIf a SKU lists two categories in one cell (Apparel|Sale), the splitter does not break it into Apparel and Sale — the whole string is the group value. Split the cell into real columns first if you need per-category routing.
Input (Category column): Apparel Apparel|Sale Apparel Split by Category → 2 groups: Apparel — 2 rows → products.Apparel.csv Apparel|Sale — 1 row → products.Apparel_Sale.csv To route the |Sale item into the Apparel file too, normalise the multi-value cell to a single primary category first — e.g. with csv-find-replace (/tool/csv-find-replace) strip `|Sale` to leave `Apparel` — then split on the clean column.
Casing drift inflating the category count
ExampleTwo PIM editors typed Footwear and footwear. Default grouping treats them as two categories. Untick case-sensitivity to merge.
Category cells: Footwear, footwear, FOOTWEAR, Apparel Default (case-sensitive): 4 groups Untick 'Case-sensitive grouping': 2 groups Footwear (3 rows) — named after first spelling Apparel (1 row) → products.Footwear.csv
Uncategorised SKUs land in the (empty) file
ExampleProducts with a blank category group together under (empty) so you can triage them before distributing the rest.
Category cells: Apparel, (blank), Drinkware, (blank) Split by Category → 3 groups: Apparel — 1 row → products.Apparel.csv Drinkware — 1 row → products.Drinkware.csv (empty) — 2 rows → products._empty_.csv
Subcategory split after a category split
ExampleThe tool splits by one column per run. For category-then-subcategory, split by category first, then re-run on each category file using the subcategory column.
Pass 1 — split products.csv by Category → products.Apparel.csv, products.Footwear.csv Pass 2 — drop products.Apparel.csv back in, split by Subcategory → products.Apparel.Shirts.csv products.Apparel.Pants.csv For pulling a single category for one channel only, the csv-column-filter tool (/tool/csv-column-filter) is faster.
Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Multi-value category cell not decomposed
By designA cell like Apparel|Shirts or Apparel,Sale is treated as a single group value — the splitter partitions on whole cells, it does not parse delimiters inside them. If you need true per-category routing for multi-category SKUs, normalise the cell to one primary category first (for example with csv-find-replace to strip the secondary value), then split on the clean column.
Casing variants from multiple PIM editors
By designFootwear and footwear are two groups with the default case-sensitive setting. Untick Case-sensitive grouping to merge; the combined file is named after the first spelling seen. This is also a useful diagnostic — a higher-than-expected group count exposes taxonomy drift to fix in the PIM.
Trailing spaces in the category cell
PreservedThe splitter never trims. Apparel and Apparel (trailing space) are always separate groups, even with case-sensitivity off. Trim the column first with the csv-cleaner if your feed has padded category values.
Blank / uncategorised products
PreservedRows with an empty category are grouped into one file labelled (empty). They are never dropped — this is where you find SKUs that need categorising before they go to a channel.
Splitting on a near-unique column (SKU, title)
ExpectedChoosing a column that is almost unique per row yields one tiny file per product, each with its own download button (no bulk/ZIP). Pick a genuine categorical column. To extract a single category instead of every value, use csv-column-filter.
Feed over the free 2 MB / 500-row limit
Rejected (free tier)Full catalogues usually exceed 500 rows, which blocks the run on free with an upgrade overlay. Pro raises the ceiling to 100 MB / 100,000 rows. The size/row check is on the source feed, not on the per-category outputs.
Special characters in category names
SupportedCategory values like Home & Garden or Kids' Toys keep their exact text in the on-screen group and in the CSV cells; only the download filename sanitises non-alphanumerics to underscores and truncates at 50 characters. The split itself is unaffected.
Two long category labels sharing their first 50 characters
Filename collision riskFilenames truncate to 50 sanitised characters, so two very long category paths with the same 50-character prefix produce the same filename. The contents are correct; rename on save if your taxonomy uses very long path strings.
Frequently asked questions
What if a product belongs to multiple categories?
It depends how the categories are stored. If the Category column holds one value per row, the product goes into that category's file. If a single cell lists several categories (e.g. Apparel|Shirts), the whole string is treated as one group value — the product appears once, in a file named after that combined string. To route multi-category SKUs into each category, normalise the cell to one primary category first (for example with csv-find-replace to strip the secondary value), then split.
Can I split by subcategory instead of top-level category?
Yes — just select the Subcategory column in the dropdown. For category-then-subcategory, split by category first, then re-run the splitter on each category file using the subcategory column. The tool partitions by one column per run.
Does this work for Shopify, WooCommerce, and Magento exports?
Yes. Any CSV with a category-type column works — the splitter is column-agnostic and reads whatever header you select. PapaParse auto-detects the delimiter, so comma and semicolon exports both parse. The tool doesn't care which platform produced the feed.
Why did I get more category files than my real taxonomy has?
Almost always casing drift (Apparel vs apparel), trailing spaces, or multi-value cells. Casing merges if you untick Case-sensitive grouping; whitespace and multi-value cells need cleaning first — use the csv-cleaner to trim, or csv-find-replace to normalise multi-value cells down to a single primary category.
Is the master header in every category file?
Yes. Each output begins with the catalogue's original header row, followed by that category's rows only. The files import into Shopify, a marketplace, or Google Merchant with the full column set intact.
Does the splitter download all category files in one ZIP?
No. Each category has its own Download button and saves a single CSV. There is no bulk or ZIP download. If your taxonomy has many categories, decide which teams actually need files rather than downloading all of them.
Are uncategorised products dropped?
No. Products with a blank category are grouped into one file shown as (empty). Use it as a triage list — categorise those SKUs in your PIM before they go to a channel.
Does it rename or remap my categories?
No. The on-screen group name and the cell contents keep the exact feed value. Only the download filename is sanitised (non-alphanumerics become underscores, truncated to 50 characters). There is no taxonomy mapping feature — the split is a faithful partition.
How big a feed can it handle?
Free tier: 2 MB / 500 rows. Pro: 100 MB / 100,000 rows. The limit is on the source feed. A catalogue with thousands of SKUs needs Pro, or pre-chunk it with the csv-row-splitter (which splits by row count, not category).
Is my product and cost data uploaded?
No. PapaParse runs in your browser; titles, costs, margins, and supplier columns never reach a server. Only an anonymous run counter is recorded server-side when you are signed in.
I only need one category for one channel — should I use this?
If you want every category as its own file, yes. If you only need a single category, the csv-column-filter tool extracts that one slice in one step (equals/contains conditions) without producing files for every other category.
Can I run the category split automatically?
Yes — GET /api/v1/tools/csv-column-value-splitter returns the option schema (columnIndex, caseSensitive). Pair the @jadapps/runner once and POST your feed to 127.0.0.1:9789/v1/tools/csv-column-value-splitter/run. A typical pipeline: nightly PIM export → runner-side split by Category → drop each file to the owning team's folder. Product data stays on your machine — the runner runs locally.
Privacy first
Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.