Extract a PDF Price List to CSV (Excel) — Free Online

How to extract a product price list from pdf to csv

Step 1
Check the catalogue has selectable text — Try selecting a SKU or price in the PDF. If text highlights, the catalogue is born-digital and extracts well. If it's a scan, run OCR first to add a text layer.
Step 2
Drop the price-list PDF onto the converter — Add the file above. There are no options — extraction runs as soon as the file is read, in your browser, so pricing never leaves your device.
Step 3
Review the extracted rows — The preview shows the first ~5,000 characters of CSV. Confirm product rows came through and spot the category headers, repeated column headers, and footers you'll need to remove.
Step 4
Download the CSV — Click Download — the file saves as .txt with CSV contents. Rename to .csv so your import template recognises it.
Step 5
Clean the data for import — Remove category-header rows, repeated column headers, and page footers; split merged cells; strip currency symbols from prices; and rename headers to match your platform's import columns.
Step 6
Import into your platform — Upload the cleaned CSV to Shopify (Products → Import), WooCommerce (Product CSV importer), or your ERP, mapping each column to the destination field.

Catalogue quirks and how to handle them

Supplier price lists are rarely clean grids. These are the common messes and the cleanup each needs.

Quirk	How it extracts	Cleanup
Category headers (e.g. 'Fasteners')	Extract as one-cell rows between product blocks	Delete, or copy into a Category column before deleting
Column header repeated on every page	Repeats as a row at the top of each page block	Keep the first; delete the repeats after import
Merged 'from / to' or tiered-price cells	Misalign — fewer fragments shift later columns left	Split into separate columns manually
Per-page footer (page numbers, contact info)	Extracts as a trailing row per page	Delete the footer rows
Prices with currency symbols (£, $, €)	Kept with the value as text	Strip symbol + separators; format as number

Mapping to common import templates

After cleanup, rename the extracted columns to what each platform expects.

Platform	Typical key columns to map to
Shopify	Handle, Title, Variant SKU, Variant Price
WooCommerce	SKU, Name, Regular price
Magento	sku, name, price
Generic ERP / procurement	Item code, Description, Unit cost

File-size and page limits by tier

Long catalogues can be big; check the limit before uploading.

Tier	Max file size	Max pages
Free	2 MB	50 pages
Pro	50 MB	500 pages
Pro + Media	500 MB	2,000 pages
Developer	2 GB	10,000 pages
Enterprise	Unlimited	Unlimited

Cookbook

Real price-list extractions. Output is shown verbatim so you can see the category rows, repeated headers, and currency formatting you'll clean up.

A clean product table

A well-aligned catalogue table extracts cleanly: code, description, price each in its column.

CSV output:
"Code","Description","Price"
"FB-100","M6 hex bolt (box 100)","4.20"
"FB-205","M8 hex bolt (box 100)","5.80"
"WS-010","M6 washer (box 200)","2.10"

Category headers between product blocks

A catalogue grouped by category extracts the category names as their own rows. Copy them into a Category column before deleting, if you want to preserve grouping.

CSV output:
"Fasteners"
"FB-100","M6 hex bolt","4.20"
"FB-205","M8 hex bolt","5.80"
"Washers"
"WS-010","M6 washer","2.10"

→ "Fasteners" and "Washers" are category rows, not products.

Currency symbols on prices

Prices keep their symbol as text. Strip it so the importer reads a number.

Extracted:  "£4.20"

Clean in Excel:  Find & Replace "£" → empty,
then format the Price column as Number / Currency.
Shopify/WooCommerce expect a bare number: 4.20

Merged tiered-price cell misaligns

A tiered price spanning two visual columns (e.g. 1-9 / 10+) extracts with fewer fragments, shifting later columns. Split manually.

PDF row:  PV-50  Pump valve   12.00   10.50
          (10.50 = price for qty 10+)

CSV (may shift if the tier cell is merged):
"PV-50","Pump valve","12.00 10.50"

→ split "12.00 10.50" into two price columns.

Scanned catalogue needs OCR

A photographed or scanned price sheet has no text layer; OCR first, then extract.

Input:  supplier_pricelist_scan.pdf
Output: (empty)

Fix:
  1. /pdf-tools/pdf-ocr
  2. re-run this tool → product rows extract
     (check OCR'd SKUs and prices carefully)

Edge cases and what actually happens

Category headers mixed into the rows

Noisy output

A catalogue grouped by category extracts the category names as one-cell rows because the tool returns every line of text by position. Before deleting them, consider copying each into a Category column so you don't lose the grouping your store needs.

Column header repeats on every page

Manual fixup

Each page is processed independently, so a header row printed at the top of every page repeats in the output. Keep the first occurrence and delete the rest after import.

Merged or tiered-price cells misalign

Manual fixup

Columns are rebuilt per row from text positions, not a shared grid. A merged cell or a tiered price block has a different fragment count, so later columns shift. Split those rows into the correct columns manually.

Currency symbols block numeric import

Manual fixup

Prices keep their symbol and thousands separator as text (£1,299.00). E-commerce importers expect a bare number, so strip the symbol and separators and format the column as Number before uploading.

Scanned or image-based catalogue

No text found

Image-only price lists have no selectable text, so extraction returns nothing. OCR adds a text layer — then verify OCR'd SKUs and prices, since a misread digit changes a price.

Long product descriptions wrap onto two rows

By design

A description that wraps to a second visual line sits at a different Y-position and becomes a separate CSV row. Merge it back with the product row above after import.

Footers and contact blocks extract as rows

Noisy output

Page footers (page numbers, supplier contact details) print at the bottom of each page and extract as trailing rows. Delete them so only product rows reach your import template.

Catalogue exceeds the free 2 MB / 50-page limit

Blocked on free tier

Free tier caps at 2 MB / 50 pages, which a full catalogue can exceed. Upgrade (Pro is 50 MB / 500 pages), or extract the relevant pages first and process that subset.

Frequently asked questions

Does it export CSV directly?

The output is CSV-formatted text — every cell quoted, comma-separated — but it downloads with a .txt extension. Rename it to .csv and it imports anywhere CSV is accepted (Shopify, WooCommerce, Magento, an ERP). You can also open it in Excel/Sheets and re-save as .csv after cleanup.

What if the PDF has multiple price tables, one per category?

Each page is extracted in turn and appended with a blank line between blocks; category names that print as headings come through as their own rows. There are no separate sheets — it's one CSV stream. Use the category rows to split or tag products during cleanup, then import.

Will product codes and descriptions stay aligned with prices?

For clean, grid-aligned catalogues, yes — codes, descriptions, and prices land on the same row. Rows with merged cells, tiered prices, or wrapped descriptions can misalign because columns are rebuilt from text positions, so budget a short alignment pass on those.

How do I get prices to import as numbers?

Strip the currency symbol and thousands separators (Find & Replace £ / , → empty) and format the column as Number. E-commerce importers expect a bare numeric price like 4.20, not £4.20.

Does it work on scanned price sheets?

Not directly — a scan has no text layer, so you'll get nothing. Run OCR first, then re-run this tool. Verify OCR'd SKUs and prices carefully, because a single misread digit changes a price.

Can I extract just one category or page?

No — there are no options; the tool processes the whole document. To limit input, extract the pages you need first, then run the extraction on that subset.

How do I map the columns to Shopify or WooCommerce?

After cleanup, rename the extracted columns to the platform's import names — Shopify uses Handle / Title / Variant SKU / Variant Price; WooCommerce uses SKU / Name / Regular price. The importer then maps each column to its product field.

Is my pricing data uploaded anywhere?

No. Extraction runs entirely in your browser via pdf.js — your buy prices and supplier terms never reach a server. Only anonymous usage counters are recorded when you're signed in, which matters for commercially sensitive supplier pricing.

Why do I have to clean up the output so much?

The tool returns every line of text on each page by position — it doesn't know which lines are products and which are category headers, repeated column headers, or footers. That's why a cleanup pass is normal. On a stable supplier template the same junk rows land in the same places each update, so you can templatise the cleanup.

How large a catalogue can I process?

Free tier handles up to 2 MB / 50 pages. Pro raises that to 50 MB / 500 pages, with higher tiers above — enough for most full catalogues.

Can I get JSON instead, for an integration?

Yes — extract a price-list table to JSON returns structured objects keyed on the header row, which is easier to push into a store API or sync script than CSV. Use CSV for spreadsheet imports, JSON for code.

Are formulas or tiered-pricing rules preserved?

No — PDFs store printed values only. Tiered prices come through as the printed numbers (and may need splitting into columns); any pricing logic you rebuild in your store or spreadsheet.

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

How to extract a product price list from pdf to csv

Step 1
Check the catalogue has selectable text — Try selecting a SKU or price in the PDF. If text highlights, the catalogue is born-digital and extracts well. If it's a scan, run OCR first to add a text layer.
Step 2
Drop the price-list PDF onto the converter — Add the file above. There are no options — extraction runs as soon as the file is read, in your browser, so pricing never leaves your device.
Step 3
Review the extracted rows — The preview shows the first ~5,000 characters of CSV. Confirm product rows came through and spot the category headers, repeated column headers, and footers you'll need to remove.
Step 4
Download the CSV — Click Download — the file saves as .txt with CSV contents. Rename to .csv so your import template recognises it.
Step 5
Clean the data for import — Remove category-header rows, repeated column headers, and page footers; split merged cells; strip currency symbols from prices; and rename headers to match your platform's import columns.
Step 6
Import into your platform — Upload the cleaned CSV to Shopify (Products → Import), WooCommerce (Product CSV importer), or your ERP, mapping each column to the destination field.

Catalogue quirks and how to handle them

Supplier price lists are rarely clean grids. These are the common messes and the cleanup each needs.

Quirk	How it extracts	Cleanup
Category headers (e.g. 'Fasteners')	Extract as one-cell rows between product blocks	Delete, or copy into a Category column before deleting
Column header repeated on every page	Repeats as a row at the top of each page block	Keep the first; delete the repeats after import
Merged 'from / to' or tiered-price cells	Misalign — fewer fragments shift later columns left	Split into separate columns manually
Per-page footer (page numbers, contact info)	Extracts as a trailing row per page	Delete the footer rows
Prices with currency symbols (£, $, €)	Kept with the value as text	Strip symbol + separators; format as number

Mapping to common import templates

After cleanup, rename the extracted columns to what each platform expects.

Platform	Typical key columns to map to
Shopify	Handle, Title, Variant SKU, Variant Price
WooCommerce	SKU, Name, Regular price
Magento	sku, name, price
Generic ERP / procurement	Item code, Description, Unit cost

File-size and page limits by tier

Long catalogues can be big; check the limit before uploading.

Tier	Max file size	Max pages
Free	2 MB	50 pages
Pro	50 MB	500 pages
Pro + Media	500 MB	2,000 pages
Developer	2 GB	10,000 pages
Enterprise	Unlimited	Unlimited

Cookbook

Real price-list extractions. Output is shown verbatim so you can see the category rows, repeated headers, and currency formatting you'll clean up.

A clean product table

A well-aligned catalogue table extracts cleanly: code, description, price each in its column.

CSV output:
"Code","Description","Price"
"FB-100","M6 hex bolt (box 100)","4.20"
"FB-205","M8 hex bolt (box 100)","5.80"
"WS-010","M6 washer (box 200)","2.10"

Category headers between product blocks

A catalogue grouped by category extracts the category names as their own rows. Copy them into a Category column before deleting, if you want to preserve grouping.

CSV output:
"Fasteners"
"FB-100","M6 hex bolt","4.20"
"FB-205","M8 hex bolt","5.80"
"Washers"
"WS-010","M6 washer","2.10"

→ "Fasteners" and "Washers" are category rows, not products.

Currency symbols on prices

Prices keep their symbol as text. Strip it so the importer reads a number.

Extracted:  "£4.20"

Clean in Excel:  Find & Replace "£" → empty,
then format the Price column as Number / Currency.
Shopify/WooCommerce expect a bare number: 4.20

Merged tiered-price cell misaligns

A tiered price spanning two visual columns (e.g. 1-9 / 10+) extracts with fewer fragments, shifting later columns. Split manually.

PDF row:  PV-50  Pump valve   12.00   10.50
          (10.50 = price for qty 10+)

CSV (may shift if the tier cell is merged):
"PV-50","Pump valve","12.00 10.50"

→ split "12.00 10.50" into two price columns.

Scanned catalogue needs OCR

A photographed or scanned price sheet has no text layer; OCR first, then extract.

Input:  supplier_pricelist_scan.pdf
Output: (empty)

Fix:
  1. /pdf-tools/pdf-ocr
  2. re-run this tool → product rows extract
     (check OCR'd SKUs and prices carefully)

Edge cases and what actually happens

Category headers mixed into the rows

Noisy output

Column header repeats on every page

Manual fixup

Each page is processed independently, so a header row printed at the top of every page repeats in the output. Keep the first occurrence and delete the rest after import.

Merged or tiered-price cells misalign

Manual fixup

Currency symbols block numeric import

Manual fixup

Prices keep their symbol and thousands separator as text (£1,299.00). E-commerce importers expect a bare number, so strip the symbol and separators and format the column as Number before uploading.

Scanned or image-based catalogue

No text found

Image-only price lists have no selectable text, so extraction returns nothing. OCR adds a text layer — then verify OCR'd SKUs and prices, since a misread digit changes a price.

Long product descriptions wrap onto two rows

By design

A description that wraps to a second visual line sits at a different Y-position and becomes a separate CSV row. Merge it back with the product row above after import.

Footers and contact blocks extract as rows

Noisy output

Page footers (page numbers, supplier contact details) print at the bottom of each page and extract as trailing rows. Delete them so only product rows reach your import template.

Catalogue exceeds the free 2 MB / 50-page limit

Blocked on free tier

Free tier caps at 2 MB / 50 pages, which a full catalogue can exceed. Upgrade (Pro is 50 MB / 500 pages), or extract the relevant pages first and process that subset.

Frequently asked questions

Does it export CSV directly?

What if the PDF has multiple price tables, one per category?

Will product codes and descriptions stay aligned with prices?

How do I get prices to import as numbers?

Strip the currency symbol and thousands separators (Find & Replace £ / , → empty) and format the column as Number. E-commerce importers expect a bare numeric price like 4.20, not £4.20.

Does it work on scanned price sheets?

Not directly — a scan has no text layer, so you'll get nothing. Run OCR first, then re-run this tool. Verify OCR'd SKUs and prices carefully, because a single misread digit changes a price.

Can I extract just one category or page?

No — there are no options; the tool processes the whole document. To limit input, extract the pages you need first, then run the extraction on that subset.

How do I map the columns to Shopify or WooCommerce?

Is my pricing data uploaded anywhere?

Why do I have to clean up the output so much?

How large a catalogue can I process?

Free tier handles up to 2 MB / 50 pages. Pro raises that to 50 MB / 500 pages, with higher tiers above — enough for most full catalogues.

Can I get JSON instead, for an integration?

Are formulas or tiered-pricing rules preserved?

No — PDFs store printed values only. Tiered prices come through as the printed numbers (and may need splitting into columns); any pricing logic you rebuild in your store or spreadsheet.

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

Extract a Product Price List from PDF to CSV

How to extract a product price list from pdf to csv

Catalogue quirks and how to handle them

Mapping to common import templates

File-size and page limits by tier

Cookbook

A clean product table

Category headers between product blocks

Currency symbols on prices

Merged tiered-price cell misaligns

Scanned catalogue needs OCR

Edge cases and what actually happens

Category headers mixed into the rows

Column header repeats on every page

Merged or tiered-price cells misalign

Currency symbols block numeric import

Scanned or image-based catalogue

Long product descriptions wrap onto two rows

Footers and contact blocks extract as rows

Catalogue exceeds the free 2 MB / 50-page limit

Frequently asked questions

Does it export CSV directly?

What if the PDF has multiple price tables, one per category?

Will product codes and descriptions stay aligned with prices?

How do I get prices to import as numbers?

Does it work on scanned price sheets?

Can I extract just one category or page?

How do I map the columns to Shopify or WooCommerce?

Is my pricing data uploaded anywhere?

Why do I have to clean up the output so much?

How large a catalogue can I process?

Can I get JSON instead, for an integration?

Are formulas or tiered-pricing rules preserved?

Privacy first

Related guides

Extract a Product Price List from PDF to CSV

How to extract a product price list from pdf to csv

Catalogue quirks and how to handle them

Mapping to common import templates

File-size and page limits by tier

Cookbook

A clean product table

Category headers between product blocks

Currency symbols on prices

Merged tiered-price cell misaligns

Scanned catalogue needs OCR

Edge cases and what actually happens

Category headers mixed into the rows

Column header repeats on every page

Merged or tiered-price cells misalign

Currency symbols block numeric import

Scanned or image-based catalogue

Long product descriptions wrap onto two rows

Footers and contact blocks extract as rows

Catalogue exceeds the free 2 MB / 50-page limit

Frequently asked questions

Does it export CSV directly?

What if the PDF has multiple price tables, one per category?

Will product codes and descriptions stay aligned with prices?

How do I get prices to import as numbers?

Does it work on scanned price sheets?

Can I extract just one category or page?

How do I map the columns to Shopify or WooCommerce?

Is my pricing data uploaded anywhere?

Why do I have to clean up the output so much?

How large a catalogue can I process?

Can I get JSON instead, for an integration?

Are formulas or tiered-pricing rules preserved?

Privacy first

Related guides