How to extract pdf invoice line items into a spreadsheet
- Step 1Check the invoice is machine-generated — Try to select a number in the line-item table with your cursor. If text highlights, the invoice has a text layer and extracts well. If it's a scan or photo, run OCR first to add a text layer.
- Step 2Drop the invoice PDF onto the converter — Add the file above. There are no options — extraction starts as soon as the file is read, and parsing happens in your browser so the invoice never leaves your device.
- Step 3Review the extracted rows — The preview shows the first ~5,000 characters of CSV. Confirm the line items came through and note where the header block (supplier, invoice number, date) and totals sit relative to the line-item rows.
- Step 4Download the CSV — Click Download. The file saves as
.txtwith CSV contents. Rename to.csvif your AP importer keys off the extension. - Step 5Import into your AP workbook or ERP — Use Excel's Data → From Text/CSV (so amounts and codes aren't coerced), or your ERP's CSV import template. Map the line-item columns to the destination fields.
- Step 6Separate header rows from line items — The invoice header (supplier, invoice #, date) and the totals usually extract as their own rows above and below the line items. Move them into header fields, or delete them so the line-item rows import as a clean table.
What extracts well — and what doesn't
Invoice layout drives accuracy. Machine-generated invoices with grid-aligned line items are the sweet spot.
| Invoice type | Result |
|---|---|
| Xero / QuickBooks / SAP / Sage PDF | Strong — consistent column structure, clean line-item extraction |
| Custom-template digital invoice (text layer) | Good — extracts, but verify column alignment on rows with blank cells |
| Multi-page invoice (many line items) | Works — each page appended with a blank-line separator; reconcile after import |
| Scanned / photographed paper invoice | Nothing until OCR'd — no text layer to read |
| Invoice where totals sit in a side box | Totals extract as separate rows by position, not aligned to the line-item columns |
What the output is (and isn't)
Set expectations before you download so the file fits your AP workflow.
| Aspect | Reality |
|---|---|
| File format | CSV-formatted text, downloaded as .txt (rename to .csv if needed) |
| Field labelling | None — rows are returned by position; no "invoice number" / "VAT" tagging |
| Header block | Supplier, invoice #, date extract as their own short rows, not as columns |
| Currency symbols | Kept with the value as text (e.g. "£340.00") — reformat in your sheet |
| Multiple invoices at once | No — single file per run; batch is not offered in this tool's UI |
File-size and page limits by tier
Free-tier limits apply per invoice. High-volume AP needs a paid tier.
| Tier | Max file size | Max pages |
|---|---|---|
| Free | 2 MB | 50 pages |
| Pro | 50 MB | 500 pages |
| Pro + Media | 500 MB | 2,000 pages |
| Developer | 2 GB | 10,000 pages |
| Enterprise | Unlimited | Unlimited |
Cookbook
Real invoice extractions. Supplier names and amounts are illustrative; output is shown verbatim so you can see the quoting and where header rows land.
A clean QuickBooks-style line-item table
Machine-generated invoices have evenly aligned columns, so each line item maps cleanly to a CSV row. The header block extracts above the items.
CSV output: "Acme Supplies Ltd" "Invoice","INV-2041","Date","03/06/2026" "Description","Qty","Unit","Amount" "Steel bracket","10","1.50","15.00" "Hex bolt M8","200","0.04","8.00" "Subtotal","","","23.00" "VAT 20%","","","4.60" "Total","","","27.60"
Currency symbols come through as text
Amounts keep their currency symbol as part of the cell value. Strip or reformat in your spreadsheet so they're numeric for totalling.
CSV output: "Consulting — June","1","£1,200.00" "Expenses","1","£84.50" In Excel: Find & Replace "£" and "," → empty, then format the column as Currency to total it.
A multi-page invoice with many line items
Each page is processed separately and appended with a blank line. The column header repeats if it repeats on the PDF page.
CSV output: "Description","Qty","Amount" "Item 1","2","19.98" "...","...","..." "Description","Qty","Amount" "Item 41","5","12.50" "...","...","..." → delete the repeated header row from page 2 after import.
Header block separated from line items
Supplier, invoice number, and date sit at the top of the page, so they extract as rows above the line-item grid. Lift them into header fields in your AP record.
CSV output: "Northwind Trading" "VAT Reg: GB123456789" "Invoice No: 7781 Due: 30/06/2026" "SKU","Item","Qty","Price" "..." → the first three rows are header metadata, not line items.
Scanned paper invoice needs OCR
A photographed or scanned invoice has no text layer, so extraction returns nothing. OCR adds the text layer; then this tool extracts the line items.
Input: supplier_invoice_scan.pdf (photo) Output: (empty) Fix: 1. /pdf-tools/pdf-ocr 2. re-run this tool → line items extract
Edge cases and what actually happens
Scanned or photographed invoice
No text foundImage-only invoices have no selectable text, so there is nothing to extract and the output is empty. Run OCR to add a text layer first; OCR PDF for data extraction is the matching workflow for invoices you receive as scans.
Totals in a boxed summary misalign
Manual fixupWhen subtotal / VAT / total live in a side box rather than the line-item columns, they extract as separate rows positioned by where the text sits, not aligned to the item columns. Move them into the right cells after import.
Header metadata mixed with line items
By designSupplier, invoice number, and date extract as their own rows because the tool returns every line of text by position with no field labelling. Split those rows out into header fields during import — there's no automatic tagging of invoice fields.
A wrapped long description splits across rows
By designA line-item description that wraps onto a second visual line sits at a different Y-position, so it becomes a second CSV row. Merge it back with the line above, or shorten descriptions at the source.
Amounts coerced by Excel on open
Excel coercionThe CSV holds amounts as text exactly as printed, but Excel reformats on open — long invoice numbers can show in scientific notation and leading zeros drop. Import via Data → From Text/CSV and set invoice-number columns to Text.
You need to process a stack of invoices
One at a timeThis tool accepts a single file per run — there's no batch in its UI. For high-volume AP, process invoices individually, or use the API and local runner workflow on a paid tier to script extraction across a folder.
Currency symbol prevents totalling
Manual fixupSymbols like £, $, € extract as part of the cell text, so Excel treats the value as text and won't sum it. Strip the symbol and thousands separators (Find & Replace), then format the column as a number.
Invoice exceeds the free 2 MB / 50-page limit
Blocked on free tierFree tier caps at 2 MB / 50 pages; the dropzone blocks larger files before processing. Most single invoices are well under this. Long consolidated statements may need Pro (50 MB / 500 pages) or higher.
Frequently asked questions
Does it output a real Excel file?
No — it outputs CSV-formatted text that downloads as a .txt file. CSV imports directly into Excel, Google Sheets, an AP template, or an ERP, so you get a spreadsheet of invoice data; it just isn't a native .xlsx workbook with sheets and formatting.
Will it work on invoices from QuickBooks, Xero, or SAP?
Yes — those systems produce machine-generated PDFs with consistent, grid-aligned line-item tables, which is exactly what the position-based extraction handles best. Verify the columns after import, especially on rows with blank cells.
What about scanned paper invoices?
They have no text layer, so extraction returns nothing. Run OCR to add a text layer first, then re-run this tool. For receiving scans regularly, OCR PDF for data extraction is the right starting point.
Does it identify the invoice number, date, and VAT automatically?
No — there's no field-mapping intelligence. It returns the table's text by position. The header block (supplier, invoice #, date) and totals come through as their own rows, which you map to fields during import. It speeds data entry; it doesn't replace your AP coding.
Can I batch-process many invoices at once?
Not in this tool's UI — it takes one file per run. For volume, process invoices individually, or script extraction via the API and the local runner on a paid tier (Pro and above unlock API access).
Why do my amounts show as text in Excel?
If the value includes a currency symbol or thousands separator (£1,200.00), Excel treats it as text. Use Find & Replace to strip the symbol and commas, then format the column as Currency or Number so it totals correctly.
Are formulas (line totals = qty × price) preserved?
No — PDFs store the printed result, not the calculation. You get the extended amounts as printed. Re-create any formulas you need in your spreadsheet after import.
Are my invoices uploaded anywhere?
No. Extraction runs entirely in your browser via pdf.js — supplier names, amounts, and totals never reach a server. Only anonymous usage counters are recorded when you're signed in.
Why are some line items split across two rows?
A long description that wraps onto a second visual line sits at a different vertical position, so it becomes a separate CSV row. Merge it back with the row above after import, or shorten descriptions before exporting the PDF.
How large an invoice can I process?
Free tier handles up to 2 MB and 50 pages — fine for almost any single invoice. Pro raises that to 50 MB / 500 pages for long consolidated statements, with higher tiers above.
Can I get JSON instead, to push into our ERP API?
Yes — extract a PDF table to JSON returns an array of objects keyed on the first row, which is easier to map to an API payload than CSV. Use it when you're integrating rather than importing into a spreadsheet.
The header and totals clutter my line-item table — how do I clean it up?
After import, delete or move the short rows that hold the supplier block and the subtotal/VAT/total lines, leaving only the line-item grid. It's a one-pass cleanup; on a repeating invoice template the rows land in the same place each time.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.