How to extract a pdf price list table to json
- Step 1Open the tool and drop the supplier PDF — Load the price list into the PDF Table to JSON tool. It extracts immediately in the browser — no options to configure.
- Step 2Confirm the product columns in the preview — Check the first 20 objects: the keys should be your real columns (code, description, price, pack size). If the keys are a logo/title line, the header detection picked the wrong first line — see the edge cases.
- Step 3Download the JSON array — Save
<name>.json— a flat array of product objects covering every page. - Step 4Drop repeated header and section rows — Filter out reprinted header rows and category banners (e.g.
r.Code !== 'Code'and rows where the price column is empty). These appear because each page's lines are emitted as rows. - Step 5Strip the currency symbol and cast the price — Prices are strings like
"£12.50". Strip the symbol and thousands separators, thenNumber()it:Number(r.Price.replace(/[^0-9.]/g, '')). KeepCodeas a string so leading zeros survive. - Step 6Import via your platform's API — Map the cleaned objects to your platform's product schema and push them — Shopify Admin API, WooCommerce REST, or your ERP's bulk-upsert endpoint.
Price-list fields: in vs. out
How the tool's generic row/column extraction lands on a typical supplier price list, and what your import step must finish.
| Price-list element | What you get | Your import step |
|---|---|---|
| Product code / SKU | String, exactly as printed (leading zeros preserved) | Keep as string; never Number() a SKU |
| Description | String; long descriptions that wrap may split into a second row | Merge wrapped continuation rows |
| Price | String with currency symbol, e.g. "£12.50" | Strip symbol + separators, then Number() |
| Tiered pricing columns | Each tier is its own key (Qty 1, Qty 10, …) | Cast each tier; reshape to your price-break model |
| Category banner rows | Emitted as a row with mostly empty cells | Filter rows where the price cell is empty |
| Reprinted page headers | Become data rows from page 2 onward | Filter r.Code !== 'Code' |
Tier limits for price-list PDFs
Most supplier price lists are small and well within free limits; large multi-hundred-page catalogues need a higher tier or a split first.
| Tier | Max file size | Max pages |
|---|---|---|
| Free | 2 MB | 50 |
| Pro | 50 MB | 500 |
| Pro + Media | 500 MB | 2,000 |
| Developer | 2 GB | 10,000 |
Cookbook
A real wholesale price list and the exact cleanup that turns the string array into an import-ready product feed.
Supplier list with a currency column
Note prices keep their symbol and stay strings, and the category banner 'FASTENERS' becomes a near-empty row.
PDF:
Code Description Price
FASTENERS
BLT-06 M6 Bolt, zinc £0.12
BLT-08 M8 Bolt, zinc £0.18
WSH-06 M6 Washer £0.04
Downloaded JSON:
[
{ "Code": "FASTENERS", "Description": "", "Price": "" },
{ "Code": "BLT-06", "Description": "M6 Bolt, zinc", "Price": "£0.12" },
{ "Code": "BLT-08", "Description": "M8 Bolt, zinc", "Price": "£0.18" },
{ "Code": "WSH-06", "Description": "M6 Washer", "Price": "£0.04" }
]Clean to a numeric, import-ready feed
Drop banner rows (empty price), strip the currency symbol, and cast — while keeping the code as a string.
const rows = JSON.parse(json);
const feed = rows
.filter(r => r.Price && r.Code !== "Code") // drop banners + repeated headers
.map(r => ({
sku: r.Code, // string — keep leading zeros
title: r.Description,
price: Number(r.Price.replace(/[^0-9.]/g, "")), // "£0.12" -> 0.12
}));
// feed -> Shopify/WooCommerce product upsertTiered (volume) pricing preserved as columns
Each quantity break in the PDF becomes its own key, so volume pricing survives extraction intact.
PDF:
Code Description Qty1 Qty10 Qty100
C-22 Cable tie 0.05 0.04 0.03
JSON:
[ { "Code": "C-22", "Description": "Cable tie",
"Qty1": "0.05", "Qty10": "0.04", "Qty100": "0.03" } ]
// reshape to price breaks:
breaks: [
{ min: 1, price: 0.05 },
{ min: 10, price: 0.04 },
{ min: 100, price: 0.03 },
]Merge a wrapped description back together
A long product name that wraps to a second visual line shows up as an extra row with an empty Code. Stitch it onto the row above.
const merged = [];
for (const r of feedRows) {
if (!r.Code && merged.length) {
merged[merged.length - 1].Description += " " + r.Description;
} else {
merged.push({ ...r });
}
}Detect price drift against last month
Because the output is plain JSON, diffing this month's feed against last month's is a one-liner per SKU — useful for catching supplier increases before they hit your margins.
const prev = JSON.parse(lastMonthJson);
const prevBySku = Object.fromEntries(prev.map(p => [p.sku, p.price]));
for (const r of feed) {
if (prevBySku[r.sku] != null && prevBySku[r.sku] !== r.price) {
console.log(`${r.sku}: ${prevBySku[r.sku]} -> ${r.price}`);
}
}Edge cases and what actually happens
Currency symbol stuck to the price
ExpectedPrices extract as strings with their symbol, e.g. "£12.50" or "$9.99", because the tool never reformats a value. Strip it in your import step: Number(r.Price.replace(/[^0-9.]/g, '')). Watch for thousands separators ("1,299.00") and decimal commas in EU lists.
Category banner rows mixed in with products
Filter neededA 'FASTENERS' or 'ELECTRICAL' section header spanning the row width comes through as a row with empty price/description cells. Filter on a required column being non-empty (r.Price) to drop banners cleanly.
Repeated header on every page
By designLong catalogues reprint the column header on each page; from page 2 onward those header lines are emitted as data rows. Filter r.Code !== 'Code' (or whatever your first column is named) before importing.
Scanned / faxed price list
Empty arrayAn image-only PDF has no text layer, so extraction returns nothing. Many supplier price lists are scans — run PDF OCR first to add a text layer, then extract. Verify the OCR'd prices, since OCR can confuse 8/B and 0/O in codes.
Multi-line product description
Split rowA description that wraps onto a second line becomes a separate row with an empty code. Merge continuation rows (an empty key column means it belongs to the row above) before mapping to your product schema.
Price list larger than the free limit
BlockedA catalogue over 2 MB or more than 50 pages is blocked on the free tier before extraction. Upgrade to Pro (50 MB / 500 pages) or split it first with PDF Split and extract section by section.
Decimal comma in European price lists
Cast carefullyA list printed as 1.299,00 (EU format) extracts verbatim as a string. Number() won't parse it directly — normalise first: remove dots, swap the comma for a dot ('1.299,00'.replace(/\./g,'').replace(',', '.') → 1299.00).
Price column merges with the next column
MisalignedIf the price and a 'pack size' column sit very close horizontally, the position-based grouping can merge them into one cell. Inspect the preview; if you see "£0.12 /100" in one key, split it in post-processing, or re-export the list with wider column spacing if you control the source.
Frequently asked questions
Will the currency symbol be in the price field?
Yes. Prices extract as strings exactly as printed, symbol included ("£12.50"). The tool never reformats values. Strip the symbol and any separators in your import step — Number(r.Price.replace(/[^0-9.]/g, '')) — then store the numeric value.
Are tiered / volume prices preserved?
Yes. Each price column in the PDF becomes its own JSON key, so Qty 1, Qty 10, Qty 100 come through as three separate fields per product. Reshape them into your platform's price-break model after casting each to a number.
Will my product codes lose their leading zeros?
No — and this is a key advantage of string output. A SKU like 0042 stays "0042" because the tool never converts values to numbers. Keep codes as strings in your import; only cast the price column.
Can I import the result into Shopify or WooCommerce directly?
After two quick steps: filter repeated header/banner rows and cast the price to a number, then map the keys to the platform's field names. The array shape itself is import-ready; the platforms just need numeric prices and their own field naming.
How do I handle the section/category banner rows?
They appear as rows with empty price and description cells. Filter on a required column — rows.filter(r => r.Price) removes banners. If you want to keep the category as a field, carry the last seen banner value down onto the following product rows in your script.
My supplier sends a scanned PDF — will it work?
Not directly: a scan has no text layer, so extraction returns an empty array. Run PDF OCR first to add a text layer, then extract. Double-check OCR'd codes and prices, since OCR can misread similar characters.
Is my pricing data uploaded anywhere?
No. Everything runs in your browser via PDF.js. Confidential supplier pricing never leaves your machine; only anonymous usage counters are recorded when you're signed in.
Can I do this every month automatically?
The tool auto-runs on drop, so the manual step is just drop-and-download. The cleanup (filter + cast + map) is a small reusable script. On paid tiers, runs can route to the local JAD runner. For a full unattended pipeline, reuse the same extraction and the cookbook's cast/diff steps in your own job.
What if I'd rather have a spreadsheet than JSON?
Use PDF to Excel, which applies the same row/column detection but outputs CSV for Excel or Google Sheets. Choose JSON when you're feeding an API or script, CSV when a human will work in a spreadsheet.
Why did some prices end up in the description column?
Column assignment is based on horizontal position. If columns are very close together or right-aligned numbers drift left, the grouping can put a value under the wrong key. Check the preview; if it's systematic, fix it with a small remap, or re-export the list with clearer spacing if you control the source.
What are the file-size and page limits?
Free: 2 MB and 50 pages. Pro: 50 MB / 500 pages. Pro + Media: 500 MB / 2,000 pages. Developer: 2 GB / 10,000 pages. Oversized files are blocked before extraction; split large catalogues with PDF Split first.
How do I diff this month's prices against last month's?
Because both extractions are plain JSON arrays, build a SKU→price map from last month and compare. The cookbook shows a few lines that print every changed SKU — handy for catching supplier increases before they erode margin.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.