How to extract a product price list from pdf to csv
- Step 1Check the catalogue has selectable text — Try selecting a SKU or price in the PDF. If text highlights, the catalogue is born-digital and extracts well. If it's a scan, run OCR first to add a text layer.
- Step 2Drop the price-list PDF onto the converter — Add the file above. There are no options — extraction runs as soon as the file is read, in your browser, so pricing never leaves your device.
- Step 3Review the extracted rows — The preview shows the first ~5,000 characters of CSV. Confirm product rows came through and spot the category headers, repeated column headers, and footers you'll need to remove.
- Step 4Download the CSV — Click Download — the file saves as
.txtwith CSV contents. Rename to.csvso your import template recognises it. - Step 5Clean the data for import — Remove category-header rows, repeated column headers, and page footers; split merged cells; strip currency symbols from prices; and rename headers to match your platform's import columns.
- Step 6Import into your platform — Upload the cleaned CSV to Shopify (Products → Import), WooCommerce (Product CSV importer), or your ERP, mapping each column to the destination field.
Catalogue quirks and how to handle them
Supplier price lists are rarely clean grids. These are the common messes and the cleanup each needs.
| Quirk | How it extracts | Cleanup |
|---|---|---|
| Category headers (e.g. 'Fasteners') | Extract as one-cell rows between product blocks | Delete, or copy into a Category column before deleting |
| Column header repeated on every page | Repeats as a row at the top of each page block | Keep the first; delete the repeats after import |
| Merged 'from / to' or tiered-price cells | Misalign — fewer fragments shift later columns left | Split into separate columns manually |
| Per-page footer (page numbers, contact info) | Extracts as a trailing row per page | Delete the footer rows |
| Prices with currency symbols (£, $, €) | Kept with the value as text | Strip symbol + separators; format as number |
Mapping to common import templates
After cleanup, rename the extracted columns to what each platform expects.
| Platform | Typical key columns to map to |
|---|---|
| Shopify | Handle, Title, Variant SKU, Variant Price |
| WooCommerce | SKU, Name, Regular price |
| Magento | sku, name, price |
| Generic ERP / procurement | Item code, Description, Unit cost |
File-size and page limits by tier
Long catalogues can be big; check the limit before uploading.
| Tier | Max file size | Max pages |
|---|---|---|
| Free | 2 MB | 50 pages |
| Pro | 50 MB | 500 pages |
| Pro + Media | 500 MB | 2,000 pages |
| Developer | 2 GB | 10,000 pages |
| Enterprise | Unlimited | Unlimited |
Cookbook
Real price-list extractions. Output is shown verbatim so you can see the category rows, repeated headers, and currency formatting you'll clean up.
A clean product table
A well-aligned catalogue table extracts cleanly: code, description, price each in its column.
CSV output: "Code","Description","Price" "FB-100","M6 hex bolt (box 100)","4.20" "FB-205","M8 hex bolt (box 100)","5.80" "WS-010","M6 washer (box 200)","2.10"
Category headers between product blocks
A catalogue grouped by category extracts the category names as their own rows. Copy them into a Category column before deleting, if you want to preserve grouping.
CSV output: "Fasteners" "FB-100","M6 hex bolt","4.20" "FB-205","M8 hex bolt","5.80" "Washers" "WS-010","M6 washer","2.10" → "Fasteners" and "Washers" are category rows, not products.
Currency symbols on prices
Prices keep their symbol as text. Strip it so the importer reads a number.
Extracted: "£4.20" Clean in Excel: Find & Replace "£" → empty, then format the Price column as Number / Currency. Shopify/WooCommerce expect a bare number: 4.20
Merged tiered-price cell misaligns
A tiered price spanning two visual columns (e.g. 1-9 / 10+) extracts with fewer fragments, shifting later columns. Split manually.
PDF row: PV-50 Pump valve 12.00 10.50
(10.50 = price for qty 10+)
CSV (may shift if the tier cell is merged):
"PV-50","Pump valve","12.00 10.50"
→ split "12.00 10.50" into two price columns.Scanned catalogue needs OCR
A photographed or scanned price sheet has no text layer; OCR first, then extract.
Input: supplier_pricelist_scan.pdf
Output: (empty)
Fix:
1. /pdf-tools/pdf-ocr
2. re-run this tool → product rows extract
(check OCR'd SKUs and prices carefully)Edge cases and what actually happens
Category headers mixed into the rows
Noisy outputA catalogue grouped by category extracts the category names as one-cell rows because the tool returns every line of text by position. Before deleting them, consider copying each into a Category column so you don't lose the grouping your store needs.
Column header repeats on every page
Manual fixupEach page is processed independently, so a header row printed at the top of every page repeats in the output. Keep the first occurrence and delete the rest after import.
Merged or tiered-price cells misalign
Manual fixupColumns are rebuilt per row from text positions, not a shared grid. A merged cell or a tiered price block has a different fragment count, so later columns shift. Split those rows into the correct columns manually.
Currency symbols block numeric import
Manual fixupPrices keep their symbol and thousands separator as text (£1,299.00). E-commerce importers expect a bare number, so strip the symbol and separators and format the column as Number before uploading.
Scanned or image-based catalogue
No text foundImage-only price lists have no selectable text, so extraction returns nothing. OCR adds a text layer — then verify OCR'd SKUs and prices, since a misread digit changes a price.
Long product descriptions wrap onto two rows
By designA description that wraps to a second visual line sits at a different Y-position and becomes a separate CSV row. Merge it back with the product row above after import.
Footers and contact blocks extract as rows
Noisy outputPage footers (page numbers, supplier contact details) print at the bottom of each page and extract as trailing rows. Delete them so only product rows reach your import template.
Catalogue exceeds the free 2 MB / 50-page limit
Blocked on free tierFree tier caps at 2 MB / 50 pages, which a full catalogue can exceed. Upgrade (Pro is 50 MB / 500 pages), or extract the relevant pages first and process that subset.
Frequently asked questions
Does it export CSV directly?
The output is CSV-formatted text — every cell quoted, comma-separated — but it downloads with a .txt extension. Rename it to .csv and it imports anywhere CSV is accepted (Shopify, WooCommerce, Magento, an ERP). You can also open it in Excel/Sheets and re-save as .csv after cleanup.
What if the PDF has multiple price tables, one per category?
Each page is extracted in turn and appended with a blank line between blocks; category names that print as headings come through as their own rows. There are no separate sheets — it's one CSV stream. Use the category rows to split or tag products during cleanup, then import.
Will product codes and descriptions stay aligned with prices?
For clean, grid-aligned catalogues, yes — codes, descriptions, and prices land on the same row. Rows with merged cells, tiered prices, or wrapped descriptions can misalign because columns are rebuilt from text positions, so budget a short alignment pass on those.
How do I get prices to import as numbers?
Strip the currency symbol and thousands separators (Find & Replace £ / , → empty) and format the column as Number. E-commerce importers expect a bare numeric price like 4.20, not £4.20.
Does it work on scanned price sheets?
Not directly — a scan has no text layer, so you'll get nothing. Run OCR first, then re-run this tool. Verify OCR'd SKUs and prices carefully, because a single misread digit changes a price.
Can I extract just one category or page?
No — there are no options; the tool processes the whole document. To limit input, extract the pages you need first, then run the extraction on that subset.
How do I map the columns to Shopify or WooCommerce?
After cleanup, rename the extracted columns to the platform's import names — Shopify uses Handle / Title / Variant SKU / Variant Price; WooCommerce uses SKU / Name / Regular price. The importer then maps each column to its product field.
Is my pricing data uploaded anywhere?
No. Extraction runs entirely in your browser via pdf.js — your buy prices and supplier terms never reach a server. Only anonymous usage counters are recorded when you're signed in, which matters for commercially sensitive supplier pricing.
Why do I have to clean up the output so much?
The tool returns every line of text on each page by position — it doesn't know which lines are products and which are category headers, repeated column headers, or footers. That's why a cleanup pass is normal. On a stable supplier template the same junk rows land in the same places each update, so you can templatise the cleanup.
How large a catalogue can I process?
Free tier handles up to 2 MB / 50 pages. Pro raises that to 50 MB / 500 pages, with higher tiers above — enough for most full catalogues.
Can I get JSON instead, for an integration?
Yes — extract a price-list table to JSON returns structured objects keyed on the header row, which is easier to push into a store API or sync script than CSV. Use CSV for spreadsheet imports, JSON for code.
Are formulas or tiered-pricing rules preserved?
No — PDFs store printed values only. Tiered prices come through as the printed numbers (and may need splitting into columns); any pricing logic you rebuild in your store or spreadsheet.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.