How to validate a Shopify product CSV before import
- Step 1Export or build the Shopify product CSV — In Shopify admin go to Products → Export → 'All products' / 'Selected' → 'Plain CSV file', or fill in Shopify's official product-import template. If you edited the export in Excel, use Save As → 'CSV UTF-8 (Comma delimited)' — plain 'CSV (Comma delimited)' on Excel-for-Windows can write CP1252 and corrupt accented text. The validator does not read
.xlsxbinaries, so save to a text CSV first. - Step 2Drop the CSV onto the validator above — Free tier: 2 MB / 500 rows per file (a row here is a CSV line, so a 200-product catalog with 3 variants each is ~800 rows — that needs Pro). Pro: 100 MB / 100,000 rows. The dropzone accepts
.csv,.tsv, and.txt; the file is read in-browser and validation runs automatically the moment it lands. - Step 3Read the four summary tiles — The header shows Health score, Rows (total lines including the header), Columns (header column count — the 'expected width' the row-width check compares against), and Issues (errors + warnings). The score badge turns green at 90% and amber below it, but treat the number as secondary: a 96% score with one
row_widtherror still means a product will be skipped, so read the issues, not just the score. - Step 4Walk the Top issues panel — fix row-shape and encoding blockers first — The panel lists the first 8 issues; red badges are
errorseverity (row_width,encoding) and amber arewarning(type_mismatch,duplicate_header,duplicate_row,duplicate_key,empty_cell,suspicious_value). Each line shows Row, Col, the column name, and a detail string. Row-shape errors are the ones that make Shopify silently drop a product, so clear those first. - Step 5Read the Column profile panel — Every column shows its inferred type plus filled / empty counts. A
stringprofile onVariant Priceis a red flag — it means most of that column is non-numeric, not the reverse. Columns whose names match the unique-key heuristic (Variant SKU,Variant Barcode, anything containingid) automatically get duplicate-key checking on top of the whole-row duplicate check. - Step 6Fix in source, download the report, re-validate — Click 'Download report' for a
<filename>.health-report.json(full stats, column profile, every reported issue with row/col references, and a 10-row preview) to keep alongside the file. Fix in your spreadsheet, not the CSV: strip currency symbols with csv-find-replace, de-dupe repeatedHandlerows with csv-deduplicator, and re-save as CSV UTF-8. Re-drop to confirmerrorsis zero, then import in Shopify admin.
What the validator checks on a Shopify CSV
Eight issue types, mapped to Shopify import failure modes. Error severity blocks a row (Shopify skips the product). Warning severity is a quality flag (the file may import but a value lands wrong).
| Issue type | Severity | What triggers it | Why Shopify files hit it |
|---|---|---|---|
| `row_width` | Error | A row's column count differs from the header's column count | An unescaped comma or hard line break inside Body (HTML) splits one product line into two, throwing off every column after it |
| `encoding` | Error | Cell contains \uFFFD or a control character below 0x20 (except tab/newline/carriage-return) | An Excel CP1252 save read back as UTF-8 — é, ü, ñ, ™ in titles or descriptions come back as garbage |
| `duplicate_header` | Warning | The same header name appears twice in row 1 (case-insensitive) | Pasting a second Variant Price or both Tags and tags columns from two templates |
| `duplicate_row` | Warning | A whole data row is identical to an earlier one after trimming whitespace and lowercasing | Copy-pasting the same variant line twice, or merging two exports without de-duping |
| `duplicate_key` | Warning | Same value in a column whose name matches the unique-key heuristic (Variant SKU, Variant Barcode, names containing id) | Two different products sharing one SKU, or a barcode collision across variants |
| `type_mismatch` | Warning | A cell does not match the column's dominant number / boolean / date profile (85% dominance) | £12.99 or 12,99 in Variant Price; out of stock in Variant Inventory Qty; TRUE/FALSE mixed with blanks in Published |
| `empty_cell` | Warning | An empty cell in a column that is at least 85% populated | A skipped Variant Price mid-file; deliberately-blank child-row columns stay quiet because they exceed the 15% empty threshold |
| `suspicious_value` | Warning | A value appearing once that is substring-similar to a value appearing twice or more, in a 2–12 distinct-value categorical column | active (trailing space) next to active in Status; Default Title vs Default Title in Option1 Value |
Shopify columns the validator profiles
How common Shopify product-CSV columns are inferred and what the validator flags. The unique-key detector matches column names case-insensitively and separator-insensitively (Variant SKU, variant_sku, variant-sku all match the sku token).
| Shopify column | Inferred type | Unique-key checked? | Common flag |
|---|---|---|---|
| `Handle` | string | No (no id/sku token) | Not treated as a key, so genuinely repeated Handle rows (one per variant) are correct and won't be flagged as duplicate-key; only whole-row duplicates surface |
| `Variant SKU` | string | Yes (sku token) | duplicate_key when two variants share a SKU — fix before import or Shopify merges inventory unexpectedly |
| `Variant Barcode` | string (digits) | Yes (barcode token) | duplicate_key on barcode collisions; an Excel-dropped leading zero shows as a shorter, possibly-colliding value |
| `Variant Price` / `Variant Compare At Price` | number | No | type_mismatch on £12.99, $12.99, 12,99 (EU comma decimal), or 12.99 GBP — Shopify expects a bare decimal like 12.99 |
| `Variant Inventory Qty` | number | No | type_mismatch on text like out of stock / TBD; note Shopify often manages inventory separately, so blanks may be legitimate |
| `Published` / `Variant Requires Shipping` / `Variant Taxable` | boolean | No | type_mismatch when TRUE/FALSE is mixed with 1/0 or yes/no — the parser reads true/false/yes/no/y/n as boolean, so a stray published string stands out |
| `Body (HTML)` | string | No | Source of most row_width errors — embedded commas, quotes, and line breaks in the HTML split the row unless the cell is properly double-quoted |
| `Tags` | string | No | Usually high-cardinality, so it skips the suspicious-value scan (too many distinct values); duplicate header (Tags + tags) is the realistic flag |
What the validator catches vs what Shopify import catches
The two layers are complementary. Fixing structural issues here doesn't guarantee a clean import, but skipping it means burning an import attempt to surface them.
| Check | JAD CSV Validator | Shopify Import |
|---|---|---|
Row split by comma/line-break in Body (HTML) | Yes — row_width error before import | Skips the product, vague banner, no row number |
| Encoding mojibake in titles/descriptions | Yes — encoding error before import | Often silent — product imports with garbled text |
Currency symbol or text in Variant Price | Yes — type_mismatch warning | Row error or price defaults oddly |
Duplicate Variant SKU / Variant Barcode | Yes — duplicate_key warning | May merge inventory across products |
A column missing entirely (no Variant Price column) | No — it profiles only the columns present | Yes — import rejects or defaults the field |
| Image URL reachable / image actually exists | No — content not fetched | Yes — broken image URLs flagged at import |
| Valid product type / collection rules | No — business logic out of scope | Yes — Shopify validates against your store |
| File size cap | Free 2 MB / Pro 100 MB | Shopify caps product CSV at 15 MB / 50,000 variants per import |
Cookbook
Real Shopify export scenarios mapped to the exact validator output. Column names match Shopify's documented product-CSV headers verbatim; every code block is a representative excerpt, not a full file.
Comma inside Body (HTML) splits a product row
ExampleShopify's Body (HTML) column holds rich HTML. If a cell contains an unescaped comma and the export wasn't quoted, the row splits and every column shifts. The validator reports it as a row_width error with the exact column-count delta.
Header (6 cols):
Handle,Title,Body (HTML),Vendor,Variant Price,Variant SKU
Row (7 cols — one extra comma inside the HTML):
widget,Blue Widget,<p>Durable, compact frame</p>,Acme,12.99,WID-BLU
^ this comma was inside Body (HTML)
but the cell was not quoted
Validator output:
Row 2, Col 1 (row width)
type: row_width severity: error
detail: "Row has 7 columns; header has 6."
Fix: re-export from Shopify (Shopify quotes the field correctly), or in
the spreadsheet wrap Body (HTML) cells in double quotes before saving.Currency symbol in Variant Price
ExampleShopify expects bare decimals in Variant Price — no symbol, no thousands separator, no inline currency code. The validator infers the column as number and flags every cell that doesn't parse.
Header:
Handle,Variant Price,Variant Inventory Qty
Rows:
widget-blue,12.99,100 OK
widget-red,£14.99,50 type_mismatch (currency symbol)
widget-green,15,40 OK (whole numbers are fine)
widget-black,18,50 GBP,30 type_mismatch (inline code + EU comma)
Validator output (2 issues):
Row 3: Expected number based on the column profile, but found "£14.99".
Row 5: Expected number based on the column profile, but found "18,50 GBP".
Fix: csv-find-replace regex [£$€¥] -> (empty), then fix EU comma decimals
with (\d+),(\d+)$ -> $1.$2 before re-importing.Duplicate Variant SKU across two products
ExampleTwo different products that share a Variant SKU cause Shopify to merge inventory tracking. Because the column name contains the sku token, it's checked as a unique key.
Header:
Handle,Title,Variant SKU,Variant Price
Rows:
blue-tee,Blue Tee,TEE-001,19.99
red-tee,Red Tee,TEE-001,19.99 <- same SKU, different product
Validator output:
Row 3, Col 3 Variant SKU
type: duplicate_key severity: warning
detail: 'Value "TEE-001" also appears on row 2.
This column looks like a unique key.'
Fix: give each variant a unique SKU, or de-dupe with csv-deduplicator on
the Variant SKU column if the second row is a true duplicate.Mojibake from an Excel CP1252 save
ExampleEditing a Shopify export in Excel-for-Windows and saving as plain 'CSV (Comma delimited)' can write CP1252. The validator reads the file as UTF-8 (browser File.text() always decodes UTF-8), so CP1252 bytes for accented characters surface as the replacement character.
Header:
Handle,Title,Vendor
Rows (CP1252 bytes decoded as UTF-8):
cafe-mug,Caf\uFFFD Mug,Acme encoding error (was 'Café')
creme-set,Cr\uFFFDme Brulee Set,Acme encoding error (was 'Crème')
Validator output:
Row 2, Col 2 (Title): Cell contains a replacement or control character,
which usually means an encoding problem.
Row 3, Col 2 (Title): same.
Fix: re-save from Excel as 'CSV UTF-8 (Comma delimited)', or re-export
from Shopify (always UTF-8). The csv-cleaner tool can also normalise
text before re-import.Suspicious near-duplicate in a low-cardinality column
ExampleWhen a column has only a handful of distinct values, the validator runs a suspicious-value scan: a value that appears once and is substring-similar to a value appearing two-plus times is flagged. Trailing-space typos in Status-style columns are the classic case.
Header:
Handle,Status
Rows (12 rows, Status column has 2 common values + 1 rare):
widget-1,active
widget-2,active
... (8 more 'active' / 'draft' rows)
widget-12,active <- trailing space, appears once
^ note the trailing space
Validator output:
Row 12, Col 2 Status
type: suspicious_value severity: warning
detail: 'Rare value "active" looks close to common category "active".'
Fix: trim the trailing space with csv-whitespace-trimmer (or csv-find-replace),
then re-validate so the value count collapses to the real two values.Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Repeated `Handle` rows (one per variant) are NOT flagged as duplicate keys
By designShopify's multi-row product format repeats the Handle across every variant of a product, so a 4-variant product has 4 rows sharing one Handle. Because Handle does not contain an id / sku / email token, the unique-key detector does not treat it as a key — so those legitimate repeats produce no duplicate_key warnings. Only the whole-row duplicate check could fire, and only if two variant rows are byte-for-byte identical. This is correct behaviour; do not try to de-dupe by Handle.
Unescaped comma or line break in `Body (HTML)` splits the row
row_width errorTab/comma-delimited files can't hold a literal delimiter or hard newline inside a cell unless the cell is double-quoted. Rich-text Body (HTML) pasted from a CMS often carries both. The validator reports the affected row with row_width and the exact column-count delta. The cleanest fix is to re-export from Shopify (which quotes the field), or wrap Body (HTML) cells in double quotes; replacing the inner commas is a last resort because it edits your content.
Excel-for-Europe wrote a semicolon-delimited file
SupportedOn systems with a European locale, Excel often saves CSVs using semicolons as the field separator. PapaParse auto-detects the delimiter, so a semicolon file validates correctly as long as it's consistent. If a file mixes delimiters (some lines comma, some semicolon), the detector locks onto one and the other lines show up as row_width errors — that's the signal to re-export cleanly.
Blank `Variant Inventory Qty` or `Variant Compare At Price` on child rows
Expected emptiness — verify, don't bulk-fillMany Shopify columns are legitimately blank on variant child rows or for products that don't use inventory tracking. The empty_cell warning only fires when a column is at least 85% populated, so a mostly-blank optional column stays quiet. If you do see empty-cell warnings on Variant Price, those are real — a priced column with a few blanks. Verify each warning corresponds to an intentionally-untracked variant before changing anything.
Currency symbol or thousands separator in `Variant Price`
type_mismatch warningThe number classifier strips a leading $ and removes commas before testing, so $1,299 parses as a number, but £1,299 (non-dollar symbol) does not and is flagged. EU comma-decimal like 12,99 also fails because the comma is stripped, leaving 1299, which is a different number than intended. The validator can only tell you the cell isn't a clean decimal — fix the formatting in source so the price is unambiguous.
A required Shopify column is missing entirely
Not detectedThe validator profiles only the columns that exist in the file. If you deleted the Variant Price column, the validator simply won't report on it — there is no schema to compare against. You are responsible for matching Shopify's expected header set. Open Shopify's sample product CSV, compare the header row, and add any missing columns before import.
File exceeds the free 500-row limit
Blocked on free tierFree tier is capped at 2 MB and 500 rows, where a row is a CSV line. Because Shopify spreads one product across multiple variant rows, even a modest catalog blows past 500 lines quickly (e.g. 170 products averaging 3 variants = ~680 lines). The runner blocks the file before validating and shows the limit message. Pro raises this to 100 MB / 100,000 rows; to validate a large file on free tier, split it with csv-row-splitter and validate each chunk.
Only the first 250 issues are reported
By designThe validator caps the reported issue list at 250 to keep the browser responsive, and the Issues tile reflects that capped count. If a file has a single systemic problem — wrong delimiter, a whole-column type drift, an encoding mismatch — you can easily exceed 250. Fix that one root cause and re-validate; the count usually drops to single digits. The downloaded JSON report contains the same capped issue list.
`.xlsx` workbook dropped instead of a CSV
Unsupported inputThe validator reads text formats only (.csv, .tsv, .txt) via the browser's text decoder — it does not parse Excel binary workbooks. Dropping an .xlsx produces garbage rows or a parse failure, not a useful report. Save your workbook to CSV first (CSV UTF-8 recommended), then validate the saved file.
Frequently asked questions
Will this catch the 'We couldn't import some of your products' error from Shopify?
It catches the most common structural causes of that banner before you import: rows split by an unescaped comma or line break in Body (HTML) (flagged as row_width errors) and encoding corruption in titles or descriptions (flagged as encoding errors). Shopify's banner doesn't tell you which row failed; this report does, with a Row and Col reference. It can't catch business-rule rejections (an invalid collection, a broken image URL) — those still need Shopify's own import to surface.
Do I need to convert my Shopify export to a specific format first?
No. The validator accepts .csv, .tsv, and .txt, and PapaParse auto-detects whether the file uses commas, tabs, or semicolons as the field separator. A standard Shopify export (comma) and an Excel-for-Europe re-save (semicolon) both validate without renaming. The only requirement is that it's a text file — .xlsx workbooks must be saved to CSV first.
Why are my repeated `Handle` rows not flagged as duplicates?
That's intentional and correct. Shopify lists each variant of a product on its own row sharing the same Handle, so repeats are expected. The unique-key detector only treats a column as a key when its name contains a token like id, sku, email, barcode, upc, or ean — and Handle contains none of those. So legitimate variant rows produce no duplicate_key warnings. Only a byte-for-byte identical whole row would trigger a duplicate_row warning.
Can it tell me if my `Variant Price` column has bad values?
Yes. If most of the Variant Price column parses as numbers, the validator infers the column type as number and flags any cell that doesn't — £14.99, 12,99 (EU comma decimal), 18.50 GBP, or text. The number classifier does strip a leading $ and remove commas, so $1,299 is accepted, but a non-dollar currency symbol fails. The detail string shows the exact offending value so you can find it in your spreadsheet.
How does it decide a column is a number versus a string?
For each column it classifies every non-empty cell as number, boolean, date, or string, then takes the dominant class if it covers at least 85% of cells; otherwise the column is mixed. So a Variant Price column that's 99% decimals and 1% £14.99 is inferred as number, and that 1% gets type_mismatch warnings. If the column profile shows string for a price column, it means most of it is non-numeric — the opposite of what you'd want.
Does it detect a missing required column, like a deleted `Variant Price`?
No. The validator profiles only the columns present in the file; there's no built-in Shopify schema to compare against, so a deleted column simply isn't reported. You're responsible for matching Shopify's expected header set — compare your header row against Shopify's sample product CSV before importing. The validator's job is the shape and content of the columns you do have.
Is my product data sent anywhere?
No. The file is read with the browser's own File.text() and parsed by PapaParse entirely client-side. Cell contents — titles, costs, SKUs, inventory counts — never reach a JAD server. The tool is marked 100% client-side. The only thing recorded server-side is a usage counter for signed-in dashboard stats, never the file contents.
Can I download the health report?
Yes. The 'Download report' button saves a <filename>.health-report.json containing the full result object: stats (rows, columns, issue counts, duration), the per-column profile, every reported issue with its row/column references, the health score, and a 10-row preview. The issue list in the file is the same capped-at-250 list shown in the UI. Keep it in version control alongside the CSV for an audit trail before re-importing.
What does the health score actually measure?
It's round((1 - reportedIssues / totalCells) * 100), where total cells is data rows times columns. So it's a density measure — the share of cells that are clean — not a pass/fail. The score badge turns green at 90% and amber below it, but a high score can still hide a single row_width error that will make Shopify skip a product. Always read the errors panel rather than trusting the score alone.
Why does the validator only show 8 issues in the panel?
The on-page 'Top issues' panel shows the first 8 for a quick read; the Issues tile shows the total reported count (capped at 250). For the complete list, download the JSON report. If you're seeing far more than 8 and they all look alike, you usually have one systemic problem — a mis-detected delimiter or a whole-column type drift — so fix that root cause and re-run.
It flagged a value as 'suspicious' but it looks fine — what's that about?
The suspicious-value scan only runs on low-cardinality columns (a string/mixed column with 2–12 distinct values and not too many uniques). It flags a value that appears exactly once and is substring-similar to a value appearing two-plus times — the classic case is a trailing space or stray character, like active next to active in a status column. It's a high-precision tip-off for typos, not exhaustive validation; trim and re-validate to confirm.
Where do I fix the issues — in the CSV or in Shopify?
Fix in your source spreadsheet, then re-export to CSV. For mechanical fixes you can chain JAD tools: strip currency symbols with csv-find-replace, remove duplicate variant rows with csv-deduplicator, trim stray whitespace with csv-whitespace-trimmer, and rename mismatched headers with csv-header-rename. Re-drop the cleaned file here to confirm errors is zero, then import in Shopify admin.
Privacy first
Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.