Extract Tables from a PDF to Excel (CSV) — Free Browser Tool

How to extract tables from a pdf into a spreadsheet

Step 1
Confirm the PDF has selectable text — Open the PDF and try to select a value inside a table with your cursor. If text highlights, it has a text layer and will extract. If you can only draw a box (it's an image), run OCR first to add a text layer, then come back.
Step 2
Drop the PDF onto the converter — Add the file to the tool above. There are no options to set — the extraction starts automatically as soon as the file is read. Parsing happens entirely in your browser.
Step 3
Review the preview — The result panel shows the first ~5,000 characters of the CSV in a scrollable box. Scan it to confirm the columns line up and the rows you expected are present before downloading.
Step 4
Download the CSV file — Click Download. The file saves with a .txt extension and CSV-formatted contents (every cell quoted, comma-separated). Rename it to .csv if your importer keys off the extension.
Step 5
Open or import into your spreadsheet — In Excel use Data → From Text/CSV (don't just double-click, so Excel doesn't coerce IDs or dates). In Google Sheets, File → Import → Upload and choose comma as the separator.
Step 6
Clean up ragged rows — Page headers, footers, and titles extract as their own short rows because the tool treats every line of text on the page as a row. Delete those non-data rows and re-align any column that drifted, then save.

What the tool produces vs. what people assume

The name says Excel; the output is CSV text. Knowing the real shape avoids surprises on download.

Expectation	Actual behaviour
Native `.xlsx` workbook	No — output is CSV-formatted text. Every cell is double-quoted, values comma-separated, rows newline-separated
Multiple sheets (one per table)	No — all pages go into one CSV stream; a blank line separates each page's block of rows
Download named `.xlsx` or `.csv`	Downloads as `<filename>.txt` with CSV contents. Rename to `.csv` if your importer needs the extension
Cell formatting / colours / fonts	Not preserved — CSV carries values only, no styling
Formulas restored	No — PDFs store calculated values, not formulas. Only the printed numbers are extracted
Configurable column count / page range	No options exist — the tool extracts all pages with no settings

How a page becomes rows and columns

The exact extraction pipeline, so you can predict the output for a given layout.

Step	What happens
1. Read text fragments	pdf.js returns every text run on the page with its position (an x/y transform)
2. Group into rows	Fragments are bucketed by their Y-coordinate (rounded to the nearest point). Each Y-bucket becomes one CSV row
3. Order columns	Within a row, fragments are sorted by X-coordinate (left to right) — that order becomes the column order
4. Order rows	Rows are emitted top-to-bottom (highest Y first), matching reading order
5. Emit CSV	Each cell is wrapped in quotes (internal quotes doubled), cells joined by commas, a page with 2+ rows is kept; pages are separated by a blank line

File-size and page limits by tier

Free-tier blocks trigger in the dropzone before processing. Larger PDFs need a paid tier.

Tier	Max file size	Max pages
Free	2 MB	50 pages
Pro	50 MB	500 pages
Pro + Media	500 MB	2,000 pages
Developer	2 GB	10,000 pages
Enterprise	Unlimited	Unlimited

Cookbook

Real extractions showing exactly what comes out for a given table layout. Output is shown verbatim — note the quoting and the blank line between pages.

A clean single-page table

A born-digital table with evenly aligned columns is the best case. Each printed row becomes one CSV row; each value lands in its column.

PDF page (Order summary):
  SKU      Item            Qty   Price
  A-100    Widget           2    9.99
  A-205    Bracket         10    1.50

CSV output:
"SKU","Item","Qty","Price"
"A-100","Widget","2","9.99"
"A-205","Bracket","10","1.50"

A two-page table

Each page is processed independently and its rows appended, separated by a blank line. The header repeats if it repeats in the PDF. There is no single merged sheet — you reconcile the two blocks after import.

CSV output:
"Date","Description","Amount"
"01/03","Opening balance","1200.00"
"02/03","Invoice 4471","-340.00"

"Date","Description","Amount"
"15/03","Invoice 4480","-90.00"
"28/03","Refund","45.00"

(blank line between page 1 and page 2 blocks)

Page title and footer extracted as stray rows

The tool treats every line of text on the page as a row — including the report title and the page-number footer. These appear as short rows you delete after import.

CSV output:
"Q3 Sales Report — Confidential"
"Region","Revenue","Units"
"North","42000","310"
"South","38500","288"
"Page 1 of 4"

→ delete the title row and the footer row in your sheet.

Quotes inside a cell are escaped

A value containing a double quote is preserved by doubling the quote, per CSV convention, so spreadsheets parse it correctly.

PDF cell value:  6" pipe fitting

CSV output:
"P-77","6"" pipe fitting","3.20"

Excel / Sheets display:  6" pipe fitting  (correct)

A scanned table yields nothing

If the PDF is an image with no text layer, pdf.js finds no text fragments, so no rows are produced. The fix is OCR, then re-run extraction.

Input:  scan_of_invoice.pdf  (photo of a printed table)
Output: (empty — no selectable text on the page)

Fix:
  1. /pdf-tools/pdf-ocr  → adds text layer
  2. re-run this tool on the OCR'd PDF        → rows appear

Edge cases and what actually happens

Scanned / image-only PDF (no text layer)

No text found

pdf.js extracts text fragments, not pixels. A scanned page is an image, so there is nothing to group into rows and the output is empty. Run OCR to add a text layer first, then re-run this tool. For data extraction specifically, OCR PDF for data extraction is the matching workflow.

Columns drift on rows with merged or blank cells

Manual fixup

Columns are reconstructed purely from each fragment's X-position within its own row — there is no fixed column grid shared across rows. When a cell is empty or two cells are merged, that row has fewer fragments, so later values shift left and land in the wrong column. Re-align those rows in your spreadsheet after import.

A wrapped cell splits into two rows

By design

Rows are grouped by Y-position. A cell whose text wraps onto a second visual line sits at a different Y, so it becomes a separate CSV row beneath the first. Merge the two rows manually, or widen the column at the source before exporting the PDF.

Text not laid out as a grid still produces rows

Noisy output

The tool keeps any page with two or more text rows — it doesn't verify the page actually contains a table. A page of prose produces one CSV row per line of text. If you only want narrative text, use extract text from PDF instead.

Multi-page table looks like separate tables

Expected

Each page is processed on its own and appended with a blank-line separator; there is no logic that stitches a table continuing across a page break into one block. Concatenate the blocks in your spreadsheet (and delete any repeated header rows) after import.

Numbers and IDs may be coerced by Excel

Excel coercion

The CSV carries every value as text exactly as printed, but Excel applies its own formatting on open — leading zeros drop from codes like 00734, and long numbers can show in scientific notation. Import via Data → From Text/CSV and set those columns to Text to preserve them.

Downloaded file has a .txt extension

By design

The output is CSV-formatted but downloads as <filename>.txt. Excel and Sheets can import it as-is; rename it to .csv if a downstream importer keys off the file extension.

PDF exceeds the free 2 MB / 50-page limit

Blocked on free tier

Free tier accepts up to 2 MB and 50 pages; the dropzone blocks larger files before any processing. Pro raises this to 50 MB / 500 pages, with higher ceilings above. Or split the PDF first with a page-extraction tool and process the part you need.

Frequently asked questions

Does this produce a real Excel (.xlsx) file?

No. Despite the name, the output is CSV-formatted text — every cell double-quoted, values comma-separated, a blank line between pages. It downloads as a .txt file. CSV opens natively in Excel, Google Sheets, and Numbers, so you get a spreadsheet immediately; it just isn't a native .xlsx workbook with sheets and formatting.

How does it know where the columns are?

It uses each text fragment's position. Fragments on the same vertical line (Y-coordinate) become one row; within that row they're ordered left to right by horizontal position (X-coordinate) to form columns. There's no machine-learning table model — it's a deterministic position-based reconstruction, which is why clean, grid-aligned tables extract best.

Will it work on a scanned PDF?

Not directly. A scan is an image with no selectable text, so there's nothing to extract and you'll get an empty result. Run OCR to add a text layer first, then re-run this tool. OCR PDF for data extraction is purpose-built for that pipeline.

Are formulas recovered into Excel?

No — PDFs store only the calculated results, never the underlying formulas. You get the printed numbers. If you need live formulas, re-create them in your spreadsheet after importing the values.

Can I pick which page or which table to extract?

No — there are no options. The tool processes every page and outputs all of them. To narrow the input, extract the pages you want first with a tool like extract a single page from a PDF, then run the extraction on that subset.

Why are there extra short rows in my output?

The tool treats every line of text on a page as a row, including titles, headers, and page-number footers. Those appear as their own short rows. Delete the non-data rows in your spreadsheet after import — it's a quick cleanup pass.

My columns are misaligned on some rows — why?

Columns are rebuilt from positions within each individual row, not from a shared grid. A row with an empty cell or a merged cell has fewer fragments, so the remaining values shift. Re-align those specific rows after import, or fix the layout at the source before exporting the PDF.

Is my document uploaded anywhere?

No. Parsing and extraction run entirely in your browser using pdf.js. The PDF never leaves your device; only anonymous usage counters are recorded when you're signed in.

What's the difference between this and PDF table to JSON?

Same position-based detection, different output. This tool emits CSV rows; extract a PDF table to JSON emits an array of objects using the first row as keys — better for feeding an API or a data pipeline. Choose CSV for spreadsheets, JSON for code.

How big a PDF can I process?

Free tier allows up to 2 MB and 50 pages. Pro raises that to 50 MB / 500 pages, Pro+Media to 500 MB / 2,000 pages, and Developer to 2 GB / 10,000 pages. The dropzone blocks oversize files before processing rather than failing midway.

How do I avoid Excel mangling my numbers and codes?

Don't double-click the file — import it via Data → From Text/CSV and mark code/ID columns as Text. That stops Excel dropping leading zeros from values like 00734 and showing long numbers in scientific notation. In Google Sheets, untick "Convert text to numbers, dates, and formulas" during import.

Can I get plain text or Word instead of a table?

Yes. If the document is prose rather than a grid, extract text from the PDF or convert the PDF to editable Word. Use this table tool only when the content is genuinely tabular.

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

How to extract tables from a pdf into a spreadsheet

Step 1
Confirm the PDF has selectable text — Open the PDF and try to select a value inside a table with your cursor. If text highlights, it has a text layer and will extract. If you can only draw a box (it's an image), run OCR first to add a text layer, then come back.
Step 2
Drop the PDF onto the converter — Add the file to the tool above. There are no options to set — the extraction starts automatically as soon as the file is read. Parsing happens entirely in your browser.
Step 3
Review the preview — The result panel shows the first ~5,000 characters of the CSV in a scrollable box. Scan it to confirm the columns line up and the rows you expected are present before downloading.
Step 4
Download the CSV file — Click Download. The file saves with a .txt extension and CSV-formatted contents (every cell quoted, comma-separated). Rename it to .csv if your importer keys off the extension.
Step 5
Open or import into your spreadsheet — In Excel use Data → From Text/CSV (don't just double-click, so Excel doesn't coerce IDs or dates). In Google Sheets, File → Import → Upload and choose comma as the separator.
Step 6
Clean up ragged rows — Page headers, footers, and titles extract as their own short rows because the tool treats every line of text on the page as a row. Delete those non-data rows and re-align any column that drifted, then save.

What the tool produces vs. what people assume

The name says Excel; the output is CSV text. Knowing the real shape avoids surprises on download.

Expectation	Actual behaviour
Native `.xlsx` workbook	No — output is CSV-formatted text. Every cell is double-quoted, values comma-separated, rows newline-separated
Multiple sheets (one per table)	No — all pages go into one CSV stream; a blank line separates each page's block of rows
Download named `.xlsx` or `.csv`	Downloads as `<filename>.txt` with CSV contents. Rename to `.csv` if your importer needs the extension
Cell formatting / colours / fonts	Not preserved — CSV carries values only, no styling
Formulas restored	No — PDFs store calculated values, not formulas. Only the printed numbers are extracted
Configurable column count / page range	No options exist — the tool extracts all pages with no settings

How a page becomes rows and columns

The exact extraction pipeline, so you can predict the output for a given layout.

Step	What happens
1. Read text fragments	pdf.js returns every text run on the page with its position (an x/y transform)
2. Group into rows	Fragments are bucketed by their Y-coordinate (rounded to the nearest point). Each Y-bucket becomes one CSV row
3. Order columns	Within a row, fragments are sorted by X-coordinate (left to right) — that order becomes the column order
4. Order rows	Rows are emitted top-to-bottom (highest Y first), matching reading order
5. Emit CSV	Each cell is wrapped in quotes (internal quotes doubled), cells joined by commas, a page with 2+ rows is kept; pages are separated by a blank line

File-size and page limits by tier

Free-tier blocks trigger in the dropzone before processing. Larger PDFs need a paid tier.

Tier	Max file size	Max pages
Free	2 MB	50 pages
Pro	50 MB	500 pages
Pro + Media	500 MB	2,000 pages
Developer	2 GB	10,000 pages
Enterprise	Unlimited	Unlimited

Cookbook

Real extractions showing exactly what comes out for a given table layout. Output is shown verbatim — note the quoting and the blank line between pages.

A clean single-page table

A born-digital table with evenly aligned columns is the best case. Each printed row becomes one CSV row; each value lands in its column.

PDF page (Order summary):
  SKU      Item            Qty   Price
  A-100    Widget           2    9.99
  A-205    Bracket         10    1.50

CSV output:
"SKU","Item","Qty","Price"
"A-100","Widget","2","9.99"
"A-205","Bracket","10","1.50"

A two-page table

CSV output:
"Date","Description","Amount"
"01/03","Opening balance","1200.00"
"02/03","Invoice 4471","-340.00"

"Date","Description","Amount"
"15/03","Invoice 4480","-90.00"
"28/03","Refund","45.00"

(blank line between page 1 and page 2 blocks)

Page title and footer extracted as stray rows

The tool treats every line of text on the page as a row — including the report title and the page-number footer. These appear as short rows you delete after import.

CSV output:
"Q3 Sales Report — Confidential"
"Region","Revenue","Units"
"North","42000","310"
"South","38500","288"
"Page 1 of 4"

→ delete the title row and the footer row in your sheet.

Quotes inside a cell are escaped

A value containing a double quote is preserved by doubling the quote, per CSV convention, so spreadsheets parse it correctly.

PDF cell value:  6" pipe fitting

CSV output:
"P-77","6"" pipe fitting","3.20"

Excel / Sheets display:  6" pipe fitting  (correct)

A scanned table yields nothing

If the PDF is an image with no text layer, pdf.js finds no text fragments, so no rows are produced. The fix is OCR, then re-run extraction.

Input:  scan_of_invoice.pdf  (photo of a printed table)
Output: (empty — no selectable text on the page)

Fix:
  1. /pdf-tools/pdf-ocr  → adds text layer
  2. re-run this tool on the OCR'd PDF        → rows appear

Edge cases and what actually happens

Scanned / image-only PDF (no text layer)

No text found

Columns drift on rows with merged or blank cells

Manual fixup

A wrapped cell splits into two rows

By design

Text not laid out as a grid still produces rows

Noisy output

Multi-page table looks like separate tables

Expected

Numbers and IDs may be coerced by Excel

Excel coercion

Downloaded file has a .txt extension

By design

The output is CSV-formatted but downloads as <filename>.txt. Excel and Sheets can import it as-is; rename it to .csv if a downstream importer keys off the file extension.

PDF exceeds the free 2 MB / 50-page limit

Blocked on free tier

Frequently asked questions

Does this produce a real Excel (.xlsx) file?

How does it know where the columns are?

Will it work on a scanned PDF?

Are formulas recovered into Excel?

No — PDFs store only the calculated results, never the underlying formulas. You get the printed numbers. If you need live formulas, re-create them in your spreadsheet after importing the values.

Can I pick which page or which table to extract?

Why are there extra short rows in my output?

My columns are misaligned on some rows — why?

Is my document uploaded anywhere?

No. Parsing and extraction run entirely in your browser using pdf.js. The PDF never leaves your device; only anonymous usage counters are recorded when you're signed in.

What's the difference between this and PDF table to JSON?

How big a PDF can I process?

How do I avoid Excel mangling my numbers and codes?

Can I get plain text or Word instead of a table?

Yes. If the document is prose rather than a grid, extract text from the PDF or convert the PDF to editable Word. Use this table tool only when the content is genuinely tabular.

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

Extract Tables from a PDF into a Spreadsheet

How to extract tables from a pdf into a spreadsheet

What the tool produces vs. what people assume

How a page becomes rows and columns

File-size and page limits by tier

Cookbook

A clean single-page table

A two-page table

Page title and footer extracted as stray rows

Quotes inside a cell are escaped

A scanned table yields nothing

Edge cases and what actually happens

Scanned / image-only PDF (no text layer)

Columns drift on rows with merged or blank cells

A wrapped cell splits into two rows

Text not laid out as a grid still produces rows

Multi-page table looks like separate tables

Numbers and IDs may be coerced by Excel

Downloaded file has a .txt extension

PDF exceeds the free 2 MB / 50-page limit

Frequently asked questions

Does this produce a real Excel (.xlsx) file?

How does it know where the columns are?

Will it work on a scanned PDF?

Are formulas recovered into Excel?

Can I pick which page or which table to extract?

Why are there extra short rows in my output?

My columns are misaligned on some rows — why?

Is my document uploaded anywhere?

What's the difference between this and PDF table to JSON?

How big a PDF can I process?

How do I avoid Excel mangling my numbers and codes?

Can I get plain text or Word instead of a table?

Privacy first

Related guides

Extract Tables from a PDF into a Spreadsheet

How to extract tables from a pdf into a spreadsheet

What the tool produces vs. what people assume

How a page becomes rows and columns

File-size and page limits by tier

Cookbook

A clean single-page table

A two-page table

Page title and footer extracted as stray rows

Quotes inside a cell are escaped

A scanned table yields nothing

Edge cases and what actually happens

Scanned / image-only PDF (no text layer)

Columns drift on rows with merged or blank cells

A wrapped cell splits into two rows

Text not laid out as a grid still produces rows

Multi-page table looks like separate tables

Numbers and IDs may be coerced by Excel

Downloaded file has a .txt extension

PDF exceeds the free 2 MB / 50-page limit

Frequently asked questions

Does this produce a real Excel (.xlsx) file?

How does it know where the columns are?

Will it work on a scanned PDF?

Are formulas recovered into Excel?

Can I pick which page or which table to extract?

Why are there extra short rows in my output?

My columns are misaligned on some rows — why?

Is my document uploaded anywhere?

What's the difference between this and PDF table to JSON?

How big a PDF can I process?

How do I avoid Excel mangling my numbers and codes?

Can I get plain text or Word instead of a table?

Privacy first

Related guides