Extract a PDF Price List Table to JSON

How to extract a pdf price list table to json

Step 1
Open the tool and drop the supplier PDF — Load the price list into the PDF Table to JSON tool. It extracts immediately in the browser — no options to configure.
Step 2
Confirm the product columns in the preview — Check the first 20 objects: the keys should be your real columns (code, description, price, pack size). If the keys are a logo/title line, the header detection picked the wrong first line — see the edge cases.
Step 3
Download the JSON array — Save <name>.json — a flat array of product objects covering every page.
Step 4
Drop repeated header and section rows — Filter out reprinted header rows and category banners (e.g. r.Code !== 'Code' and rows where the price column is empty). These appear because each page's lines are emitted as rows.
Step 5
Strip the currency symbol and cast the price — Prices are strings like "£12.50". Strip the symbol and thousands separators, then Number() it: Number(r.Price.replace(/[^0-9.]/g, '')). Keep Code as a string so leading zeros survive.
Step 6
Import via your platform's API — Map the cleaned objects to your platform's product schema and push them — Shopify Admin API, WooCommerce REST, or your ERP's bulk-upsert endpoint.

Price-list fields: in vs. out

How the tool's generic row/column extraction lands on a typical supplier price list, and what your import step must finish.

Price-list element	What you get	Your import step
Product code / SKU	String, exactly as printed (leading zeros preserved)	Keep as string; never `Number()` a SKU
Description	String; long descriptions that wrap may split into a second row	Merge wrapped continuation rows
Price	String with currency symbol, e.g. `"£12.50"`	Strip symbol + separators, then `Number()`
Tiered pricing columns	Each tier is its own key (`Qty 1`, `Qty 10`, …)	Cast each tier; reshape to your price-break model
Category banner rows	Emitted as a row with mostly empty cells	Filter rows where the price cell is empty
Reprinted page headers	Become data rows from page 2 onward	Filter `r.Code !== 'Code'`

Tier limits for price-list PDFs

Most supplier price lists are small and well within free limits; large multi-hundred-page catalogues need a higher tier or a split first.

Tier	Max file size	Max pages
Free	2 MB	50
Pro	50 MB	500
Pro + Media	500 MB	2,000
Developer	2 GB	10,000

Cookbook

A real wholesale price list and the exact cleanup that turns the string array into an import-ready product feed.

Supplier list with a currency column

Note prices keep their symbol and stay strings, and the category banner 'FASTENERS' becomes a near-empty row.

PDF:
Code     Description            Price
FASTENERS
BLT-06   M6 Bolt, zinc          £0.12
BLT-08   M8 Bolt, zinc          £0.18
WSH-06   M6 Washer              £0.04

Downloaded JSON:
[
  { "Code": "FASTENERS", "Description": "",            "Price": "" },
  { "Code": "BLT-06",    "Description": "M6 Bolt, zinc", "Price": "£0.12" },
  { "Code": "BLT-08",    "Description": "M8 Bolt, zinc", "Price": "£0.18" },
  { "Code": "WSH-06",    "Description": "M6 Washer",     "Price": "£0.04" }
]

Clean to a numeric, import-ready feed

Drop banner rows (empty price), strip the currency symbol, and cast — while keeping the code as a string.

const rows = JSON.parse(json);
const feed = rows
  .filter(r => r.Price && r.Code !== "Code")          // drop banners + repeated headers
  .map(r => ({
    sku:   r.Code,                                     // string — keep leading zeros
    title: r.Description,
    price: Number(r.Price.replace(/[^0-9.]/g, "")),   // "£0.12" -> 0.12
  }));

// feed -> Shopify/WooCommerce product upsert

Tiered (volume) pricing preserved as columns

Each quantity break in the PDF becomes its own key, so volume pricing survives extraction intact.

PDF:
Code    Description     Qty1    Qty10   Qty100
C-22    Cable tie       0.05    0.04    0.03

JSON:
[ { "Code": "C-22", "Description": "Cable tie",
    "Qty1": "0.05", "Qty10": "0.04", "Qty100": "0.03" } ]

// reshape to price breaks:
breaks: [
  { min: 1,   price: 0.05 },
  { min: 10,  price: 0.04 },
  { min: 100, price: 0.03 },
]

Merge a wrapped description back together

A long product name that wraps to a second visual line shows up as an extra row with an empty Code. Stitch it onto the row above.

const merged = [];
for (const r of feedRows) {
  if (!r.Code && merged.length) {
    merged[merged.length - 1].Description += " " + r.Description;
  } else {
    merged.push({ ...r });
  }
}

Detect price drift against last month

Because the output is plain JSON, diffing this month's feed against last month's is a one-liner per SKU — useful for catching supplier increases before they hit your margins.

const prev = JSON.parse(lastMonthJson);
const prevBySku = Object.fromEntries(prev.map(p => [p.sku, p.price]));

for (const r of feed) {
  if (prevBySku[r.sku] != null && prevBySku[r.sku] !== r.price) {
    console.log(`${r.sku}: ${prevBySku[r.sku]} -> ${r.price}`);
  }
}

Edge cases and what actually happens

Currency symbol stuck to the price

Expected

Prices extract as strings with their symbol, e.g. "£12.50" or "$9.99", because the tool never reformats a value. Strip it in your import step: Number(r.Price.replace(/[^0-9.]/g, '')). Watch for thousands separators ("1,299.00") and decimal commas in EU lists.

Category banner rows mixed in with products

Filter needed

A 'FASTENERS' or 'ELECTRICAL' section header spanning the row width comes through as a row with empty price/description cells. Filter on a required column being non-empty (r.Price) to drop banners cleanly.

Repeated header on every page

By design

Long catalogues reprint the column header on each page; from page 2 onward those header lines are emitted as data rows. Filter r.Code !== 'Code' (or whatever your first column is named) before importing.

Scanned / faxed price list

Empty array

An image-only PDF has no text layer, so extraction returns nothing. Many supplier price lists are scans — run PDF OCR first to add a text layer, then extract. Verify the OCR'd prices, since OCR can confuse 8/B and 0/O in codes.

Multi-line product description

Split row

A description that wraps onto a second line becomes a separate row with an empty code. Merge continuation rows (an empty key column means it belongs to the row above) before mapping to your product schema.

Price list larger than the free limit

Blocked

A catalogue over 2 MB or more than 50 pages is blocked on the free tier before extraction. Upgrade to Pro (50 MB / 500 pages) or split it first with PDF Split and extract section by section.

Decimal comma in European price lists

Cast carefully

A list printed as 1.299,00 (EU format) extracts verbatim as a string. Number() won't parse it directly — normalise first: remove dots, swap the comma for a dot ('1.299,00'.replace(/\./g,'').replace(',', '.') → 1299.00).

Price column merges with the next column

Misaligned

If the price and a 'pack size' column sit very close horizontally, the position-based grouping can merge them into one cell. Inspect the preview; if you see "£0.12 /100" in one key, split it in post-processing, or re-export the list with wider column spacing if you control the source.

Frequently asked questions

Will the currency symbol be in the price field?

Yes. Prices extract as strings exactly as printed, symbol included ("£12.50"). The tool never reformats values. Strip the symbol and any separators in your import step — Number(r.Price.replace(/[^0-9.]/g, '')) — then store the numeric value.

Are tiered / volume prices preserved?

Yes. Each price column in the PDF becomes its own JSON key, so Qty 1, Qty 10, Qty 100 come through as three separate fields per product. Reshape them into your platform's price-break model after casting each to a number.

Will my product codes lose their leading zeros?

No — and this is a key advantage of string output. A SKU like 0042 stays "0042" because the tool never converts values to numbers. Keep codes as strings in your import; only cast the price column.

Can I import the result into Shopify or WooCommerce directly?

After two quick steps: filter repeated header/banner rows and cast the price to a number, then map the keys to the platform's field names. The array shape itself is import-ready; the platforms just need numeric prices and their own field naming.

How do I handle the section/category banner rows?

They appear as rows with empty price and description cells. Filter on a required column — rows.filter(r => r.Price) removes banners. If you want to keep the category as a field, carry the last seen banner value down onto the following product rows in your script.

My supplier sends a scanned PDF — will it work?

Not directly: a scan has no text layer, so extraction returns an empty array. Run PDF OCR first to add a text layer, then extract. Double-check OCR'd codes and prices, since OCR can misread similar characters.

Is my pricing data uploaded anywhere?

No. Everything runs in your browser via PDF.js. Confidential supplier pricing never leaves your machine; only anonymous usage counters are recorded when you're signed in.

Can I do this every month automatically?

The tool auto-runs on drop, so the manual step is just drop-and-download. The cleanup (filter + cast + map) is a small reusable script. On paid tiers, runs can route to the local JAD runner. For a full unattended pipeline, reuse the same extraction and the cookbook's cast/diff steps in your own job.

What if I'd rather have a spreadsheet than JSON?

Use PDF to Excel, which applies the same row/column detection but outputs CSV for Excel or Google Sheets. Choose JSON when you're feeding an API or script, CSV when a human will work in a spreadsheet.

Why did some prices end up in the description column?

Column assignment is based on horizontal position. If columns are very close together or right-aligned numbers drift left, the grouping can put a value under the wrong key. Check the preview; if it's systematic, fix it with a small remap, or re-export the list with clearer spacing if you control the source.

What are the file-size and page limits?

Free: 2 MB and 50 pages. Pro: 50 MB / 500 pages. Pro + Media: 500 MB / 2,000 pages. Developer: 2 GB / 10,000 pages. Oversized files are blocked before extraction; split large catalogues with PDF Split first.

How do I diff this month's prices against last month's?

Because both extractions are plain JSON arrays, build a SKU→price map from last month and compare. The cookbook shows a few lines that print every changed SKU — handy for catching supplier increases before they erode margin.

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

How to extract a pdf price list table to json

Step 1
Open the tool and drop the supplier PDF — Load the price list into the PDF Table to JSON tool. It extracts immediately in the browser — no options to configure.
Step 2
Confirm the product columns in the preview — Check the first 20 objects: the keys should be your real columns (code, description, price, pack size). If the keys are a logo/title line, the header detection picked the wrong first line — see the edge cases.
Step 3
Download the JSON array — Save <name>.json — a flat array of product objects covering every page.
Step 4
Drop repeated header and section rows — Filter out reprinted header rows and category banners (e.g. r.Code !== 'Code' and rows where the price column is empty). These appear because each page's lines are emitted as rows.
Step 5
Strip the currency symbol and cast the price — Prices are strings like "£12.50". Strip the symbol and thousands separators, then Number() it: Number(r.Price.replace(/[^0-9.]/g, '')). Keep Code as a string so leading zeros survive.
Step 6
Import via your platform's API — Map the cleaned objects to your platform's product schema and push them — Shopify Admin API, WooCommerce REST, or your ERP's bulk-upsert endpoint.

Price-list fields: in vs. out

How the tool's generic row/column extraction lands on a typical supplier price list, and what your import step must finish.

Price-list element	What you get	Your import step
Product code / SKU	String, exactly as printed (leading zeros preserved)	Keep as string; never `Number()` a SKU
Description	String; long descriptions that wrap may split into a second row	Merge wrapped continuation rows
Price	String with currency symbol, e.g. `"£12.50"`	Strip symbol + separators, then `Number()`
Tiered pricing columns	Each tier is its own key (`Qty 1`, `Qty 10`, …)	Cast each tier; reshape to your price-break model
Category banner rows	Emitted as a row with mostly empty cells	Filter rows where the price cell is empty
Reprinted page headers	Become data rows from page 2 onward	Filter `r.Code !== 'Code'`

Tier limits for price-list PDFs

Most supplier price lists are small and well within free limits; large multi-hundred-page catalogues need a higher tier or a split first.

Tier	Max file size	Max pages
Free	2 MB	50
Pro	50 MB	500
Pro + Media	500 MB	2,000
Developer	2 GB	10,000

Cookbook

A real wholesale price list and the exact cleanup that turns the string array into an import-ready product feed.

Supplier list with a currency column

Note prices keep their symbol and stay strings, and the category banner 'FASTENERS' becomes a near-empty row.

PDF:
Code     Description            Price
FASTENERS
BLT-06   M6 Bolt, zinc          £0.12
BLT-08   M8 Bolt, zinc          £0.18
WSH-06   M6 Washer              £0.04

Downloaded JSON:
[
  { "Code": "FASTENERS", "Description": "",            "Price": "" },
  { "Code": "BLT-06",    "Description": "M6 Bolt, zinc", "Price": "£0.12" },
  { "Code": "BLT-08",    "Description": "M8 Bolt, zinc", "Price": "£0.18" },
  { "Code": "WSH-06",    "Description": "M6 Washer",     "Price": "£0.04" }
]

Clean to a numeric, import-ready feed

Drop banner rows (empty price), strip the currency symbol, and cast — while keeping the code as a string.

const rows = JSON.parse(json);
const feed = rows
  .filter(r => r.Price && r.Code !== "Code")          // drop banners + repeated headers
  .map(r => ({
    sku:   r.Code,                                     // string — keep leading zeros
    title: r.Description,
    price: Number(r.Price.replace(/[^0-9.]/g, "")),   // "£0.12" -> 0.12
  }));

// feed -> Shopify/WooCommerce product upsert

Tiered (volume) pricing preserved as columns

Each quantity break in the PDF becomes its own key, so volume pricing survives extraction intact.

PDF:
Code    Description     Qty1    Qty10   Qty100
C-22    Cable tie       0.05    0.04    0.03

JSON:
[ { "Code": "C-22", "Description": "Cable tie",
    "Qty1": "0.05", "Qty10": "0.04", "Qty100": "0.03" } ]

// reshape to price breaks:
breaks: [
  { min: 1,   price: 0.05 },
  { min: 10,  price: 0.04 },
  { min: 100, price: 0.03 },
]

Merge a wrapped description back together

A long product name that wraps to a second visual line shows up as an extra row with an empty Code. Stitch it onto the row above.

const merged = [];
for (const r of feedRows) {
  if (!r.Code && merged.length) {
    merged[merged.length - 1].Description += " " + r.Description;
  } else {
    merged.push({ ...r });
  }
}

Detect price drift against last month

Because the output is plain JSON, diffing this month's feed against last month's is a one-liner per SKU — useful for catching supplier increases before they hit your margins.

const prev = JSON.parse(lastMonthJson);
const prevBySku = Object.fromEntries(prev.map(p => [p.sku, p.price]));

for (const r of feed) {
  if (prevBySku[r.sku] != null && prevBySku[r.sku] !== r.price) {
    console.log(`${r.sku}: ${prevBySku[r.sku]} -> ${r.price}`);
  }
}

Edge cases and what actually happens

Currency symbol stuck to the price

Expected

Category banner rows mixed in with products

Filter needed

Repeated header on every page

By design

Scanned / faxed price list

Empty array

Multi-line product description

Split row

Price list larger than the free limit

Blocked

A catalogue over 2 MB or more than 50 pages is blocked on the free tier before extraction. Upgrade to Pro (50 MB / 500 pages) or split it first with PDF Split and extract section by section.

Decimal comma in European price lists

Cast carefully

Price column merges with the next column

Misaligned

Frequently asked questions

Will the currency symbol be in the price field?

Are tiered / volume prices preserved?

Will my product codes lose their leading zeros?

Can I import the result into Shopify or WooCommerce directly?

How do I handle the section/category banner rows?

My supplier sends a scanned PDF — will it work?

Is my pricing data uploaded anywhere?

No. Everything runs in your browser via PDF.js. Confidential supplier pricing never leaves your machine; only anonymous usage counters are recorded when you're signed in.

Can I do this every month automatically?

What if I'd rather have a spreadsheet than JSON?

Why did some prices end up in the description column?

What are the file-size and page limits?

How do I diff this month's prices against last month's?

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

How to extract a pdf price list table to json

Price-list fields: in vs. out

Tier limits for price-list PDFs

Cookbook

Supplier list with a currency column

Clean to a numeric, import-ready feed

Tiered (volume) pricing preserved as columns

Merge a wrapped description back together

Detect price drift against last month

Edge cases and what actually happens

Currency symbol stuck to the price

Category banner rows mixed in with products

Repeated header on every page

Scanned / faxed price list

Multi-line product description

Price list larger than the free limit

Decimal comma in European price lists

Price column merges with the next column

Frequently asked questions

Will the currency symbol be in the price field?

Are tiered / volume prices preserved?

Will my product codes lose their leading zeros?

Can I import the result into Shopify or WooCommerce directly?

How do I handle the section/category banner rows?

My supplier sends a scanned PDF — will it work?

Is my pricing data uploaded anywhere?

Can I do this every month automatically?

What if I'd rather have a spreadsheet than JSON?

Why did some prices end up in the description column?

What are the file-size and page limits?

How do I diff this month's prices against last month's?

Privacy first

Related guides

Extract a PDF Price List Table to JSON

How to extract a pdf price list table to json

Price-list fields: in vs. out

Tier limits for price-list PDFs

Cookbook

Supplier list with a currency column

Clean to a numeric, import-ready feed

Tiered (volume) pricing preserved as columns

Merge a wrapped description back together

Detect price drift against last month

Edge cases and what actually happens

Currency symbol stuck to the price

Category banner rows mixed in with products

Repeated header on every page

Scanned / faxed price list

Multi-line product description

Price list larger than the free limit

Decimal comma in European price lists

Price column merges with the next column

Frequently asked questions

Will the currency symbol be in the price field?

Are tiered / volume prices preserved?

Will my product codes lose their leading zeros?

Can I import the result into Shopify or WooCommerce directly?

How do I handle the section/category banner rows?

My supplier sends a scanned PDF — will it work?

Is my pricing data uploaded anywhere?

Can I do this every month automatically?

What if I'd rather have a spreadsheet than JSON?

Why did some prices end up in the description column?

What are the file-size and page limits?

How do I diff this month's prices against last month's?

Privacy first

Related guides