Convert a PDF Financial Report to Excel (CSV)

How to extract financial report tables from a pdf into a spreadsheet

Step 1
Confirm the report has a text layer — Select a figure inside a statement table with your cursor. If text highlights, the report is born-digital and extracts well. If it's a scanned filing, run OCR first to add a text layer.
Step 2
Drop the report PDF onto the converter — Add the file above. There are no options — extraction runs as soon as the file is read, and parsing happens in your browser so the financials never leave your device.
Step 3
Review the extracted statements — The preview shows the first ~5,000 characters of CSV. Confirm the statement rows and the period columns came through, and note where headings and footnotes landed.
Step 4
Download the CSV — Click Download — the file saves as .txt with CSV contents. Rename to .csv if your importer needs it.
Step 5
Import into your model — Use Excel's Data → From Text/CSV so figures aren't coerced. Map the period columns and line-item rows into the relevant statement tabs of your model.
Step 6
Convert negatives and remove non-data rows — Convert parenthetical negatives (123) to numeric -123, strip currency symbols and thousands separators, and delete heading/footnote rows so the figures calculate in your model.

Financial-statement specifics

How common financial-report conventions come through the extraction.

Convention	How it extracts
Parenthetical negatives `(1,234)`	Kept as text — not converted to `-1234`. Reformat in your model
Thousands separators `1,234,567`	Kept as text with commas — strip them so the figure is numeric
Currency / unit symbols (£, $, €, 'm)	Kept with the value as text — reformat the column after import
Footnote markers (¹, *, (a))	Extract attached to the cell or as separate short rows — filter out
Section headings (e.g. 'Operating expenses')	Extract as their own one-cell rows above the figures
Multi-period columns (FY24, FY25)	Preserved as adjacent columns when aligned in the source

What the output is (and isn't)

Set expectations so the extraction fits a modelling workflow.

Aspect	Reality
File format	CSV-formatted text, downloaded as `.txt` (rename to `.csv`)
Sheets	None — all pages in one CSV stream, blank line between pages
Number typing	Everything is text; you convert to numbers in your model
Statement labelling	None — rows returned by position, no "this is the balance sheet" tagging
Formulas	Not recoverable — PDFs store results, not formulas

File-size and page limits by tier

Annual reports can be large; check the tier limit before uploading.

Tier	Max file size	Max pages
Free	2 MB	50 pages
Pro	50 MB	500 pages
Pro + Media	500 MB	2,000 pages
Developer	2 GB	10,000 pages
Enterprise	Unlimited	Unlimited

Cookbook

Real financial-statement extractions. Output is shown verbatim — note how negatives, separators, and headings come through as text.

A two-period P&L

When period columns are aligned in the source, they reconstruct as adjacent CSV columns. Negatives stay parenthetical text.

CSV output:
"","FY25","FY24"
"Revenue","48,200","41,900"
"Cost of sales","(29,100)","(25,400)"
"Gross profit","19,100","16,500"
"Operating expenses","(12,300)","(11,200)"
"Operating profit","6,800","5,300"

Parenthetical negatives need converting

Accounting-style negatives extract as text. Convert them in Excel so the figures calculate.

Extracted cell:  (29,100)

Excel formula to convert a column of these:
  =-1*VALUE(SUBSTITUTE(SUBSTITUTE(A2,"(",""),")",""))
  → -29100  (after also removing the comma)

Section headings extract as their own rows

A statement subheading sits on its own line, so it becomes a one-cell row above the figures it groups. Keep it as a label or delete it.

CSV output:
"Non-current assets"
"Property, plant & equipment","34,500"
"Intangible assets","8,200"
"Current assets"
"Inventory","6,100"
"Trade receivables","9,400"

Footnote markers attach to cells

Footnote references extract as part of the cell or as a trailing row. Strip them so the figure is clean.

CSV output:
"Deferred tax","1,240 (note 14)"
"Pension obligation","3,880 *"

→ remove " (note 14)" and " *" before using the figures.

A scanned filing yields nothing

Older filings scanned to PDF have no text layer. OCR first, then extract.

Input:  2019_annual_report_scan.pdf
Output: (empty)

Fix:
  1. /pdf-tools/pdf-ocr
  2. re-run this tool → statement rows extract
     (verify OCR'd digits carefully on financials)

Edge cases and what actually happens

Parenthetical negatives stay as text

Manual fixup

The tool extracts (1,234) exactly as printed and does not convert it to -1234. Excel treats it as text. Convert the column with SUBSTITUTE/VALUE or Find & Replace before the figures feed any calculation in your model.

Thousands separators block numeric use

Manual fixup

Figures like 1,234,567 keep their commas as text, so Excel won't sum them until you remove the separators. Strip commas (and currency symbols) and format the column as Number after import.

Footnotes and headings become extra rows

Noisy output

Section headings extract as one-cell rows and footnote markers attach to cells or trail as rows — the tool returns every line of text by position with no statement structure. Filter these out before the data is model-ready.

Scanned or image-based filing

No text found

Image-only filings have no selectable text, so extraction returns nothing. OCR adds a text layer — but verify OCR'd digits carefully on financial figures, where a misread 8/3 materially changes the number.

Misaligned columns where a period is blank

Manual fixup

Columns are rebuilt per row from text positions, not a shared grid. A line with a blank period cell has fewer fragments, so later figures shift left into the wrong period column. Re-align those rows after import.

Statement continues across a page break

Expected

Each page is processed independently and appended with a blank line — there's no stitching of a statement that runs across pages. Concatenate the blocks in your model and remove any repeated header rows.

Large annual report exceeds the free limit

Blocked on free tier

Free tier caps at 2 MB / 50 pages, which a full annual report often exceeds. Either upgrade (Pro is 50 MB / 500 pages), or extract only the statement pages first and run the extraction on that subset.

Figures coerced by Excel on open

Excel coercion

Excel reformats on a double-click open — long figures can show in scientific notation. Import via Data → From Text/CSV and control column types to keep figures exact for the model.

Frequently asked questions

Does this give me a real Excel workbook?

No — it outputs CSV-formatted text that downloads as a .txt file. CSV imports straight into Excel or Google Sheets, so you get the figures in a spreadsheet; it just isn't a native .xlsx with separate statement tabs or formatting. Build your model structure on the imported data.

Are negative numbers in parentheses recognised?

They're extracted exactly as printed — (1,234) comes through as text, not as -1234. Convert the column in Excel (SUBSTITUTE to strip the brackets and comma, then negate) so the figures calculate. This is the most common post-import step on financial statements.

What happens to currency symbols and thousands separators?

They're kept as part of the cell text (£1,234,567), which means Excel treats the value as text. Strip the symbol and commas with Find & Replace, then format the column as Number or Currency so it sums.

Can I extract footnotes with the tables?

Footnote markers and footnote text extract as part of cells or as separate rows, because the tool returns every line of text on the page by position. Filter or delete them after import so they don't pollute the figures.

Will it label which table is the P&L vs the balance sheet?

No — there's no statement-recognition logic. It returns the tables' text by position, page by page. Section headings extract as rows, which helps you orient, but you assign each block to the right statement tab during import.

Does it work on scanned filings?

Not directly — scans have no text layer and yield nothing. Run OCR first. On financials, double-check OCR'd digits: a single misread character changes the number, so verification matters more here than on prose.

Are formulas preserved?

No — PDFs store the printed results, never the formulas behind them. You get the reported figures; rebuild any calculations in your model.

Are my financials uploaded anywhere?

No. Extraction runs entirely in your browser via pdf.js — sensitive or pre-release figures never reach a server. Only anonymous usage counters are recorded when you're signed in, which matters when you're handling material non-public information.

Why are some period columns misaligned?

Columns are rebuilt from text positions per row, not from a fixed grid. A row with a blank period cell has fewer fragments, so the remaining figures shift into the wrong column. Re-align those specific rows after import.

My annual report is too big for the free tier — what now?

Free tier caps at 2 MB / 50 pages. Either upgrade (Pro is 50 MB / 500 pages, higher tiers above), or extract just the statement pages first and run the extraction on those few pages.

Can I get JSON for a data pipeline instead of a spreadsheet?

Yes — extract a financial table to JSON returns structured objects, which is easier to feed into a database or analysis script than CSV. Use CSV for modelling in Excel, JSON for code.

How do I keep the extraction clean for repeated reports?

On a recurring report (e.g. monthly management accounts) the layout is stable, so the heading/footnote rows land in the same places each time. Save your post-import cleanup steps (column conversions, row deletions) as an Excel macro or a Sheets script and reapply them each period.

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

How to extract financial report tables from a pdf into a spreadsheet

Step 1
Confirm the report has a text layer — Select a figure inside a statement table with your cursor. If text highlights, the report is born-digital and extracts well. If it's a scanned filing, run OCR first to add a text layer.
Step 2
Drop the report PDF onto the converter — Add the file above. There are no options — extraction runs as soon as the file is read, and parsing happens in your browser so the financials never leave your device.
Step 3
Review the extracted statements — The preview shows the first ~5,000 characters of CSV. Confirm the statement rows and the period columns came through, and note where headings and footnotes landed.
Step 4
Download the CSV — Click Download — the file saves as .txt with CSV contents. Rename to .csv if your importer needs it.
Step 5
Import into your model — Use Excel's Data → From Text/CSV so figures aren't coerced. Map the period columns and line-item rows into the relevant statement tabs of your model.
Step 6
Convert negatives and remove non-data rows — Convert parenthetical negatives (123) to numeric -123, strip currency symbols and thousands separators, and delete heading/footnote rows so the figures calculate in your model.

Financial-statement specifics

How common financial-report conventions come through the extraction.

Convention	How it extracts
Parenthetical negatives `(1,234)`	Kept as text — not converted to `-1234`. Reformat in your model
Thousands separators `1,234,567`	Kept as text with commas — strip them so the figure is numeric
Currency / unit symbols (£, $, €, 'm)	Kept with the value as text — reformat the column after import
Footnote markers (¹, *, (a))	Extract attached to the cell or as separate short rows — filter out
Section headings (e.g. 'Operating expenses')	Extract as their own one-cell rows above the figures
Multi-period columns (FY24, FY25)	Preserved as adjacent columns when aligned in the source

What the output is (and isn't)

Set expectations so the extraction fits a modelling workflow.

Aspect	Reality
File format	CSV-formatted text, downloaded as `.txt` (rename to `.csv`)
Sheets	None — all pages in one CSV stream, blank line between pages
Number typing	Everything is text; you convert to numbers in your model
Statement labelling	None — rows returned by position, no "this is the balance sheet" tagging
Formulas	Not recoverable — PDFs store results, not formulas

File-size and page limits by tier

Annual reports can be large; check the tier limit before uploading.

Tier	Max file size	Max pages
Free	2 MB	50 pages
Pro	50 MB	500 pages
Pro + Media	500 MB	2,000 pages
Developer	2 GB	10,000 pages
Enterprise	Unlimited	Unlimited

Cookbook

Real financial-statement extractions. Output is shown verbatim — note how negatives, separators, and headings come through as text.

A two-period P&L

When period columns are aligned in the source, they reconstruct as adjacent CSV columns. Negatives stay parenthetical text.

CSV output:
"","FY25","FY24"
"Revenue","48,200","41,900"
"Cost of sales","(29,100)","(25,400)"
"Gross profit","19,100","16,500"
"Operating expenses","(12,300)","(11,200)"
"Operating profit","6,800","5,300"

Parenthetical negatives need converting

Accounting-style negatives extract as text. Convert them in Excel so the figures calculate.

Extracted cell:  (29,100)

Excel formula to convert a column of these:
  =-1*VALUE(SUBSTITUTE(SUBSTITUTE(A2,"(",""),")",""))
  → -29100  (after also removing the comma)

Section headings extract as their own rows

A statement subheading sits on its own line, so it becomes a one-cell row above the figures it groups. Keep it as a label or delete it.

CSV output:
"Non-current assets"
"Property, plant & equipment","34,500"
"Intangible assets","8,200"
"Current assets"
"Inventory","6,100"
"Trade receivables","9,400"

Footnote markers attach to cells

Footnote references extract as part of the cell or as a trailing row. Strip them so the figure is clean.

CSV output:
"Deferred tax","1,240 (note 14)"
"Pension obligation","3,880 *"

→ remove " (note 14)" and " *" before using the figures.

A scanned filing yields nothing

Older filings scanned to PDF have no text layer. OCR first, then extract.

Input:  2019_annual_report_scan.pdf
Output: (empty)

Fix:
  1. /pdf-tools/pdf-ocr
  2. re-run this tool → statement rows extract
     (verify OCR'd digits carefully on financials)

Edge cases and what actually happens

Parenthetical negatives stay as text

Manual fixup

Thousands separators block numeric use

Manual fixup

Figures like 1,234,567 keep their commas as text, so Excel won't sum them until you remove the separators. Strip commas (and currency symbols) and format the column as Number after import.

Footnotes and headings become extra rows

Noisy output

Scanned or image-based filing

No text found

Misaligned columns where a period is blank

Manual fixup

Statement continues across a page break

Expected

Large annual report exceeds the free limit

Blocked on free tier

Figures coerced by Excel on open

Excel coercion

Excel reformats on a double-click open — long figures can show in scientific notation. Import via Data → From Text/CSV and control column types to keep figures exact for the model.

Frequently asked questions

Does this give me a real Excel workbook?

Are negative numbers in parentheses recognised?

What happens to currency symbols and thousands separators?

Can I extract footnotes with the tables?

Will it label which table is the P&L vs the balance sheet?

Does it work on scanned filings?

Are formulas preserved?

No — PDFs store the printed results, never the formulas behind them. You get the reported figures; rebuild any calculations in your model.

Are my financials uploaded anywhere?

Why are some period columns misaligned?

My annual report is too big for the free tier — what now?

Free tier caps at 2 MB / 50 pages. Either upgrade (Pro is 50 MB / 500 pages, higher tiers above), or extract just the statement pages first and run the extraction on those few pages.

Can I get JSON for a data pipeline instead of a spreadsheet?

Yes — extract a financial table to JSON returns structured objects, which is easier to feed into a database or analysis script than CSV. Use CSV for modelling in Excel, JSON for code.

How do I keep the extraction clean for repeated reports?

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

Extract Financial Report Tables from a PDF into a Spreadsheet

How to extract financial report tables from a pdf into a spreadsheet

Financial-statement specifics

What the output is (and isn't)

File-size and page limits by tier

Cookbook

A two-period P&L

Parenthetical negatives need converting

Section headings extract as their own rows

Footnote markers attach to cells

A scanned filing yields nothing

Edge cases and what actually happens

Parenthetical negatives stay as text

Thousands separators block numeric use

Footnotes and headings become extra rows

Scanned or image-based filing

Misaligned columns where a period is blank

Statement continues across a page break

Large annual report exceeds the free limit

Figures coerced by Excel on open

Frequently asked questions

Does this give me a real Excel workbook?

Are negative numbers in parentheses recognised?

What happens to currency symbols and thousands separators?

Can I extract footnotes with the tables?

Will it label which table is the P&L vs the balance sheet?

Does it work on scanned filings?

Are formulas preserved?

Are my financials uploaded anywhere?

Why are some period columns misaligned?

My annual report is too big for the free tier — what now?

Can I get JSON for a data pipeline instead of a spreadsheet?

How do I keep the extraction clean for repeated reports?

Privacy first

Related guides

Extract Financial Report Tables from a PDF into a Spreadsheet

How to extract financial report tables from a pdf into a spreadsheet

Financial-statement specifics

What the output is (and isn't)

File-size and page limits by tier

Cookbook

A two-period P&L

Parenthetical negatives need converting

Section headings extract as their own rows

Footnote markers attach to cells

A scanned filing yields nothing

Edge cases and what actually happens

Parenthetical negatives stay as text

Thousands separators block numeric use

Footnotes and headings become extra rows

Scanned or image-based filing

Misaligned columns where a period is blank

Statement continues across a page break

Large annual report exceeds the free limit

Figures coerced by Excel on open

Frequently asked questions

Does this give me a real Excel workbook?

Are negative numbers in parentheses recognised?

What happens to currency symbols and thousands separators?

Can I extract footnotes with the tables?

Will it label which table is the P&L vs the balance sheet?

Does it work on scanned filings?

Are formulas preserved?

Are my financials uploaded anywhere?

Why are some period columns misaligned?

My annual report is too big for the free tier — what now?

Can I get JSON for a data pipeline instead of a spreadsheet?

How do I keep the extraction clean for repeated reports?

Privacy first

Related guides