Apply OCR to an Image-Only PDF — Free Online Tool

How to apply ocr to an image-only pdf to add a text layer

Step 1
Confirm the PDF is image-only — Open it and try to select text. If nothing selects (or PDF to Plain Text returns empty), it is image-only and OCR will help. If text already selects, OCR is unnecessary.
Step 2
Drop the image-only PDF into the OCR tool — Load it; pdf.js and Tesseract.js process it in your browser with no upload.
Step 3
Choose the OCR language — Select the page language from the dropdown (English (eng), French (fra), German (deu), Spanish (spa), Italian (ita), Portuguese (por), Dutch (nld), Russian (rus), Chinese Simplified (chi_sim), and Japanese (jpn)). English is default. First use of a language downloads ~10 MB of Tesseract data, then caches.
Step 4
Run OCR to add the text layer — Each page is rendered to a 2× canvas, recognised, and rebuilt with an invisible Helvetica text layer over the re-embedded JPEG page image.
Step 5
Download the OCR-processed PDF — Save the searchable PDF — visually the same image pages, now with text behind them.
Step 6
Proceed with downstream processing — Extract text, convert, or index the document — those steps now work because the text layer exists.

Where image-only PDFs come from

Different sources, same problem: pixels with no text. OCR is the common fix.

Source	Why it has no text	OCR result
Flatbed / sheet-fed scanner	Scanner captures a raster image of the page	Recognised text layer added (best on 300 DPI+ output)
Phone photo saved to PDF	Camera produces an image, not characters	Recognised if sharp and well-lit; glare/blur hurt accuracy
Image-to-PDF export	Each JPG/PNG became a full-page image	Text recognised from each image page
Fax / copier PDF	Low-resolution raster scan	Recognised but accuracy limited by low DPI

The single OCR option

The OCR tool exposes exactly one control. Everything else is fixed in the pipeline.

Control	Choices	Default	Notes
OCR language	eng, fra, deu, spa, ita, por, nld, rus, chi_sim, jpn	eng (English)	First use downloads ~10 MB per language, then cached
Page selection	(none — all pages)	all pages	Extract pages first if you need a subset
Output format	(none — searchable PDF only)	PDF	Use a converter afterward for text/Markdown/JSON
DPI / deskew / threshold	(none)	fixed 2× render	Improve the source scan instead of tuning the tool

Cookbook

Recipes for the common image-only PDF sources and what to do after OCR.

Tell whether a PDF is image-only

The fastest programmatic check is a plain-text extract. Empty means image-only.

/pdf-tools/pdf-to-text on document.pdf:
  output: ""   (empty)
  -> image-only, OCR needed

(or: open in a viewer and try to select a word —
 if nothing highlights, it's image-only)

OCR a photographed document

Phone photos work when sharp and evenly lit. Save the photo as a PDF first, then OCR.

photo.jpg -> /pdf-tools/image-to-pdf -> photo.pdf
photo.pdf -> pdf-ocr (lang: eng) -> photo-searchable.pdf
  Ctrl+F "Total" -> match (if the photo was clear)

OCR an image-to-PDF batch export

When several photos were combined into one PDF, OCR adds a text layer to every page in one pass.

receipts.pdf (8 image pages)
  -> pdf-ocr -> receipts-searchable.pdf
  every page now searchable and selectable

Make an image PDF accessible-ready

OCR adds the text a screen reader needs to announce. Full PDF/UA compliance still needs tags and reading order set elsewhere.

image-only.pdf -> pdf-ocr -> has text layer
  screen reader: now reads recognised words
  (still add heading tags / alt text / reading order
   in a dedicated accessibility tool for PDF/UA)

Compress an image PDF after OCR for sharing

Image PDFs are heavy. OCR keeps them heavy; compress only if you no longer need the text layer.

scan.pdf (22 MB) -> pdf-ocr -> 21 MB
  -> /pdf-tools/pdf-compress-lossy (target ~1 MB)
  warning: lossy re-rasterises pages and removes
           the OCR text layer

Edge cases and what actually happens

PDF is not actually image-only

By design

If pages already contain text, OCR re-renders them to JPEG images and re-recognises — wasting time and softening the page. Verify with selection or PDF to Plain Text first; only OCR genuinely image-only files.

Image pages exceed the free 2 MB / 50-page cap

Blocked

Image PDFs are large by nature. Free allows 2 MB and 50 pages; Pro 50 MB and 500 pages; Pro+Media 500 MB and 2,000 pages. Split big image PDFs with PDF Split by Range before OCR.

Photo is blurry, skewed, or poorly lit

Degraded

OCR works on the image it is given — there is no deskew, perspective fix, or lighting correction. Retake the photo straight-on with even light, or rescan at 300 DPI+, then OCR.

Output is a JPEG re-render, not the original image

Expected

Each page is re-rendered at 2× and re-encoded as JPEG (quality 0.92). For most scans this is visually fine, but it is a re-compression — do not expect the exact original image bytes.

Non-Latin image text

Limited

Russian, Chinese, and Japanese can be recognised but cannot be encoded into the Helvetica (WinAnsi) text layer, so the searchable layer for those scripts may be empty. The tool is most reliable for Latin-script image PDFs.

Handwritten image content

Poor accuracy

Tesseract is for printed text; handwriting in an image recognises unreliably. See the handwritten OCR guide.

First run pauses before processing

Expected

The selected language's ~10 MB training data downloads once before recognition starts. Cached afterward.

Mixed image + text PDF

Re-rasterised

OCR processes all pages, so existing text pages get rasterised into images during the pass. If only some pages are images, extract those pages, OCR them, and recombine with the untouched text pages.

Run outside a browser

Passthrough

OCR needs a DOM canvas; in a non-browser context the input buffer is returned unchanged. Run the browser tool.

Frequently asked questions

How do I know if a PDF is image-only?

Open it and try to select text — if nothing selects, it is image-only. Programmatically, run PDF to Plain Text: an empty result means there is no text layer and OCR is needed.

Does OCR work on a phone photo turned into a PDF?

Yes, if the photo is clear and well-lit. Convert the photo with Image to PDF first, then OCR. Glare, shadow, blur, and perspective skew lower accuracy, and the tool has no image-correction controls.

What does the text layer actually enable?

Search (Ctrl+F), text selection and copy, and screen-reader announcement of the recognised words. The visible page is unchanged — the text sits invisibly behind it at opacity 0.

Can I use this for accessibility compliance?

OCR provides the text a screen reader needs, which is a prerequisite for accessibility. Full WCAG 2.1 / PDF/UA compliance also requires heading-structure tags, alt text, and a defined reading order, which you add with a dedicated accessibility tool after OCR.

Which languages are supported?

Ten in the dropdown: English (eng), French (fra), German (deu), Spanish (spa), Italian (ita), Portuguese (por), Dutch (nld), Russian (rus), Chinese Simplified (chi_sim), and Japanese (jpn), with English as default. Latin-script languages place reliably into the searchable layer; Cyrillic, Chinese, and Japanese may recognise but cannot be encoded into the Helvetica text layer.

Will OCR change my image pages?

Visually almost not at all, but technically yes: each page is re-rendered at 2× and re-encoded as JPEG (quality 0.92), so the output image is a re-compression of the source, plus the invisible text layer. It is not the byte-for-byte original image.

Can I OCR only certain pages?

No — OCR processes every page. To target a subset, use PDF Extract Pages to pull those pages out, OCR the extract, then merge back if needed.

Why is the OCR'd file still large?

Image PDFs are inherently big, and OCR re-embeds each page as a JPEG. To shrink one for email, run Aggressive PDF Compression afterward — but it re-rasterises pages and removes the searchable text layer.

Is anything uploaded?

No. pdf.js, Tesseract.js, and pdf-lib run in your browser; the image pages and recognised text never leave your device. The only network call is the one-time language-data download.

How large an image PDF can I OCR?

Free 2 MB / 50 pages, Pro 50 MB / 500 pages, Pro+Media 500 MB / 2,000 pages, Developer 2 GB / 10,000 pages, Enterprise unlimited. Use PDF Split by Range for oversized files.

Can I convert the image PDF to Word after OCR?

Yes. OCR adds the text layer; then run PDF to Word (a .txt for Word) or PDF to Plain Text. Without OCR, those converters would produce empty output on an image PDF.

Can I automate OCR of image PDFs?

Yes — fetch the schema from GET /api/v1/tools/pdf-ocr, pair the @jadapps/runner, and POST the file with { "lang": "eng" } to 127.0.0.1:9789/v1/tools/pdf-ocr/run. The runner processes the image PDF locally on your machine; nothing is uploaded.

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

How to apply ocr to an image-only pdf to add a text layer

Step 1
Confirm the PDF is image-only — Open it and try to select text. If nothing selects (or PDF to Plain Text returns empty), it is image-only and OCR will help. If text already selects, OCR is unnecessary.
Step 2
Drop the image-only PDF into the OCR tool — Load it; pdf.js and Tesseract.js process it in your browser with no upload.
Step 3
Choose the OCR language — Select the page language from the dropdown (English (eng), French (fra), German (deu), Spanish (spa), Italian (ita), Portuguese (por), Dutch (nld), Russian (rus), Chinese Simplified (chi_sim), and Japanese (jpn)). English is default. First use of a language downloads ~10 MB of Tesseract data, then caches.
Step 4
Run OCR to add the text layer — Each page is rendered to a 2× canvas, recognised, and rebuilt with an invisible Helvetica text layer over the re-embedded JPEG page image.
Step 5
Download the OCR-processed PDF — Save the searchable PDF — visually the same image pages, now with text behind them.
Step 6
Proceed with downstream processing — Extract text, convert, or index the document — those steps now work because the text layer exists.

Where image-only PDFs come from

Different sources, same problem: pixels with no text. OCR is the common fix.

Source	Why it has no text	OCR result
Flatbed / sheet-fed scanner	Scanner captures a raster image of the page	Recognised text layer added (best on 300 DPI+ output)
Phone photo saved to PDF	Camera produces an image, not characters	Recognised if sharp and well-lit; glare/blur hurt accuracy
Image-to-PDF export	Each JPG/PNG became a full-page image	Text recognised from each image page
Fax / copier PDF	Low-resolution raster scan	Recognised but accuracy limited by low DPI

The single OCR option

The OCR tool exposes exactly one control. Everything else is fixed in the pipeline.

Control	Choices	Default	Notes
OCR language	eng, fra, deu, spa, ita, por, nld, rus, chi_sim, jpn	eng (English)	First use downloads ~10 MB per language, then cached
Page selection	(none — all pages)	all pages	Extract pages first if you need a subset
Output format	(none — searchable PDF only)	PDF	Use a converter afterward for text/Markdown/JSON
DPI / deskew / threshold	(none)	fixed 2× render	Improve the source scan instead of tuning the tool

Cookbook

Recipes for the common image-only PDF sources and what to do after OCR.

Tell whether a PDF is image-only

The fastest programmatic check is a plain-text extract. Empty means image-only.

/pdf-tools/pdf-to-text on document.pdf:
  output: ""   (empty)
  -> image-only, OCR needed

(or: open in a viewer and try to select a word —
 if nothing highlights, it's image-only)

OCR a photographed document

Phone photos work when sharp and evenly lit. Save the photo as a PDF first, then OCR.

photo.jpg -> /pdf-tools/image-to-pdf -> photo.pdf
photo.pdf -> pdf-ocr (lang: eng) -> photo-searchable.pdf
  Ctrl+F "Total" -> match (if the photo was clear)

OCR an image-to-PDF batch export

When several photos were combined into one PDF, OCR adds a text layer to every page in one pass.

receipts.pdf (8 image pages)
  -> pdf-ocr -> receipts-searchable.pdf
  every page now searchable and selectable

Make an image PDF accessible-ready

OCR adds the text a screen reader needs to announce. Full PDF/UA compliance still needs tags and reading order set elsewhere.

image-only.pdf -> pdf-ocr -> has text layer
  screen reader: now reads recognised words
  (still add heading tags / alt text / reading order
   in a dedicated accessibility tool for PDF/UA)

Compress an image PDF after OCR for sharing

Image PDFs are heavy. OCR keeps them heavy; compress only if you no longer need the text layer.

scan.pdf (22 MB) -> pdf-ocr -> 21 MB
  -> /pdf-tools/pdf-compress-lossy (target ~1 MB)
  warning: lossy re-rasterises pages and removes
           the OCR text layer

Edge cases and what actually happens

PDF is not actually image-only

By design

Image pages exceed the free 2 MB / 50-page cap

Blocked

Image PDFs are large by nature. Free allows 2 MB and 50 pages; Pro 50 MB and 500 pages; Pro+Media 500 MB and 2,000 pages. Split big image PDFs with PDF Split by Range before OCR.

Photo is blurry, skewed, or poorly lit

Degraded

OCR works on the image it is given — there is no deskew, perspective fix, or lighting correction. Retake the photo straight-on with even light, or rescan at 300 DPI+, then OCR.

Output is a JPEG re-render, not the original image

Expected

Each page is re-rendered at 2× and re-encoded as JPEG (quality 0.92). For most scans this is visually fine, but it is a re-compression — do not expect the exact original image bytes.

Non-Latin image text

Limited

Handwritten image content

Poor accuracy

Tesseract is for printed text; handwriting in an image recognises unreliably. See the handwritten OCR guide.

First run pauses before processing

Expected

The selected language's ~10 MB training data downloads once before recognition starts. Cached afterward.

Mixed image + text PDF

Re-rasterised

OCR processes all pages, so existing text pages get rasterised into images during the pass. If only some pages are images, extract those pages, OCR them, and recombine with the untouched text pages.

Run outside a browser

Passthrough

OCR needs a DOM canvas; in a non-browser context the input buffer is returned unchanged. Run the browser tool.

Frequently asked questions

How do I know if a PDF is image-only?

Open it and try to select text — if nothing selects, it is image-only. Programmatically, run PDF to Plain Text: an empty result means there is no text layer and OCR is needed.

Does OCR work on a phone photo turned into a PDF?

Yes, if the photo is clear and well-lit. Convert the photo with Image to PDF first, then OCR. Glare, shadow, blur, and perspective skew lower accuracy, and the tool has no image-correction controls.

What does the text layer actually enable?

Search (Ctrl+F), text selection and copy, and screen-reader announcement of the recognised words. The visible page is unchanged — the text sits invisibly behind it at opacity 0.

Can I use this for accessibility compliance?

Which languages are supported?

Will OCR change my image pages?

Can I OCR only certain pages?

No — OCR processes every page. To target a subset, use PDF Extract Pages to pull those pages out, OCR the extract, then merge back if needed.

Why is the OCR'd file still large?

Is anything uploaded?

No. pdf.js, Tesseract.js, and pdf-lib run in your browser; the image pages and recognised text never leave your device. The only network call is the one-time language-data download.

How large an image PDF can I OCR?

Free 2 MB / 50 pages, Pro 50 MB / 500 pages, Pro+Media 500 MB / 2,000 pages, Developer 2 GB / 10,000 pages, Enterprise unlimited. Use PDF Split by Range for oversized files.

Can I convert the image PDF to Word after OCR?

Yes. OCR adds the text layer; then run PDF to Word (a .txt for Word) or PDF to Plain Text. Without OCR, those converters would produce empty output on an image PDF.

Can I automate OCR of image PDFs?

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

Apply OCR to an Image-Only PDF to Add a Text Layer

How to apply ocr to an image-only pdf to add a text layer

Where image-only PDFs come from

The single OCR option

Cookbook

Tell whether a PDF is image-only

OCR a photographed document

OCR an image-to-PDF batch export

Make an image PDF accessible-ready

Compress an image PDF after OCR for sharing

Edge cases and what actually happens

PDF is not actually image-only

Image pages exceed the free 2 MB / 50-page cap

Photo is blurry, skewed, or poorly lit

Output is a JPEG re-render, not the original image

Non-Latin image text

Handwritten image content

First run pauses before processing

Mixed image + text PDF

Run outside a browser

Frequently asked questions

How do I know if a PDF is image-only?

Does OCR work on a phone photo turned into a PDF?

What does the text layer actually enable?

Can I use this for accessibility compliance?

Which languages are supported?

Will OCR change my image pages?

Can I OCR only certain pages?

Why is the OCR'd file still large?

Is anything uploaded?

How large an image PDF can I OCR?

Can I convert the image PDF to Word after OCR?

Can I automate OCR of image PDFs?

Privacy first

Related guides

Apply OCR to an Image-Only PDF to Add a Text Layer

How to apply ocr to an image-only pdf to add a text layer

Where image-only PDFs come from

The single OCR option

Cookbook

Tell whether a PDF is image-only

OCR a photographed document

OCR an image-to-PDF batch export

Make an image PDF accessible-ready

Compress an image PDF after OCR for sharing

Edge cases and what actually happens

PDF is not actually image-only

Image pages exceed the free 2 MB / 50-page cap

Photo is blurry, skewed, or poorly lit

Output is a JPEG re-render, not the original image

Non-Latin image text

Handwritten image content

First run pauses before processing

Mixed image + text PDF

Run outside a browser

Frequently asked questions

How do I know if a PDF is image-only?

Does OCR work on a phone photo turned into a PDF?

What does the text layer actually enable?

Can I use this for accessibility compliance?

Which languages are supported?

Will OCR change my image pages?

Can I OCR only certain pages?

Why is the OCR'd file still large?

Is anything uploaded?

How large an image PDF can I OCR?

Can I convert the image PDF to Word after OCR?

Can I automate OCR of image PDFs?

Privacy first

Related guides