How to ocr a handwritten pdf document
- Step 1Set expectations and prep the scan — Understand this is printed-text OCR applied to handwriting — results vary from partial to poor. Scan at 300 DPI+ with strong contrast and minimal background noise to give it the best chance.
- Step 2Drop the handwritten PDF into the OCR tool — Load the scan; recognition runs locally in your browser. Nothing is uploaded — important for personal or confidential notes. If your handwriting is a photo, turn it into a PDF first with Image to PDF.
- Step 3Choose the writing's language — Select the language from the dropdown (English (
eng), French (fra), German (deu), Spanish (spa), Italian (ita), Portuguese (por), Dutch (nld), Russian (rus), Chinese Simplified (chi_sim), and Japanese (jpn)) — English default. Note there is no handwriting-specific profile; the language only swaps the Tesseract model. First use downloads ~10 MB, then caches. - Step 4Run OCR and download the searchable PDF — Each page is rendered, recognised, and rebuilt with an invisible text layer. The output is a searchable PDF, not a tidy transcript.
- Step 5Extract the rough text — Run the result through PDF to Plain Text, or PDF to Markdown if you want page headings, to pull out whatever was recognised as editable text.
- Step 6Proofread and correct every line — Compare the extracted text against the original handwriting and fix the (many) errors. For legal, medical, or official documents, full human transcription review is mandatory.
Realistic handwriting OCR expectations
Tesseract is a printed-text engine; these are honest, qualitative outcomes, not guarantees.
| Writing style | Likely outcome | Recommended approach |
|---|---|---|
| Neat block capitals, high contrast | Partial recognition — usable draft with errors | OCR, then proofread thoroughly |
| Neat lowercase printing | Hit-and-miss recognition | OCR as a starting point, expect heavy edits |
| Cursive / joined writing | Largely fails | Manual transcription, or a dedicated HTR service |
| Mixed print + cursive form | Only the printed parts recognise reliably | OCR for printed labels, transcribe the handwriting |
| Faint pencil / low contrast | Poor | Rescan darker, or transcribe manually |
What this tool does and does NOT have
Avoid assuming features that are not in the tool.
| Feature | Present? | Detail |
|---|---|---|
| Handwriting / HTR mode | No | Single Tesseract pipeline; no handwriting profile to enable |
| OCR language selection | Yes | Ten languages, English default — swaps the recognition model only |
| Confidence score / review UI | No | Every recognised word is placed as-is; proofread externally |
| Clean transcript output | No | Output is a searchable PDF; extract text with PDF to Plain Text |
| Image deskew / enhancement | No | Improve the scan before uploading instead |
Cookbook
Honest, workable recipes for handwriting — including when to stop and transcribe manually.
Block-printed form labels: partial win
Neat block capitals on a form can recover enough to seed a draft. Always check the result.
form-scan.pdf -> pdf-ocr (eng) -> /pdf-tools/pdf-to-text -> NAME: J SMITH DATE: 03 / 06 / 2026 (recognised roughly; verify every field by eye)
Cursive notes: recognise the limit
Cursive typically fails. The honest move is to stop and transcribe manually rather than trust garbled output.
cursive-notes.pdf -> pdf-ocr -> /pdf-tools/pdf-to-text -> "Hu meebng ntu..." (unusable) -> transcribe by hand or use a dedicated HTR service
Mixed printed + handwritten form
Printed labels recognise; the handwritten entries usually do not. Use OCR for structure, transcribe the answers.
OCR output: Patient Name: [garbled handwriting] Date of Birth: [garbled handwriting] -> printed labels recovered, fill in answers manually
Give it the best possible input
Since the tool has no image enhancement, do the prep yourself before uploading.
Before OCR: - scan at 300 DPI+ - maximise contrast (dark ink on white) - straighten the page - remove background lines/shading if you can Then -> pdf-ocr -> proofread
Critical document: OCR as draft only
For legal/medical/official handwriting, OCR is a typing aid at best — the record is the verified human transcription.
field-log.pdf -> pdf-ocr -> rough draft -> human transcribes against original -> second reviewer verifies -> verified transcript is the record (not the OCR)
Edge cases and what actually happens
Expecting a handwriting-specific mode
Not availableThere is no handwriting/HTR toggle or profile. The OCR pipeline is the same Tesseract printed-text path regardless of input; the only control is the language dropdown. Set expectations accordingly.
Cursive handwriting
Largely failsTesseract is not trained for joined cursive script and typically produces unusable output. Transcribe cursive manually, or use a dedicated handwriting-recognition (HTR) service designed for it.
Output text is mostly wrong
Expected for handwritingLow accuracy on handwriting is the norm, not a bug. There is no confidence threshold to filter bad reads — every recognised token is placed as-is. Proofread the extracted text against the original in full.
Non-Latin handwriting
LimitedBeyond Tesseract's weak handwriting recognition, Cyrillic and CJK cannot be encoded into the Helvetica (WinAnsi) text layer, so those scripts will not be searchable even when partially recognised. Use a Unicode-capable HTR tool.
Faint pencil or low-contrast writing
DegradedThe tool has no contrast enhancement. Rescan with darker settings or transcribe by hand; OCR works only on the image it is handed.
Critical legal / medical document
Do not rely on OCRNever use unreviewed handwriting OCR for legal, medical, or official records. Recognition errors in names, dosages, dates, or amounts carry real consequences — require human transcription and a second-reviewer check.
Free-tier scan over the cap
BlockedFree allows 2 MB / 50 pages. Multi-page handwritten logs can exceed it; upgrade to Pro (50 MB / 500 pages) or split with PDF Split by Range.
First run downloads language data
ExpectedThe selected language's ~10 MB Tesseract model downloads once before recognition. It is cached afterward; this is unrelated to handwriting accuracy.
Output page slightly recompressed
ExpectedAs with all OCR here, each page is re-rendered at 2× and re-encoded as JPEG (quality 0.92), so the output image is a re-compression of the original scan plus the invisible text layer.
Run outside a browser
PassthroughOCR needs a DOM canvas; in a non-browser context the buffer is returned unchanged. Use the live browser tool.
Frequently asked questions
Is there a handwriting recognition mode?
No. The tool runs Tesseract, a printed-text OCR engine, with a single pipeline and no handwriting/HTR profile. Selecting a language only swaps the recognition model; it does not switch to handwriting recognition. Expect printed-text-grade behaviour applied to handwriting.
What accuracy can I realistically expect on handwriting?
Highly variable and generally low. Neat block capitals at high contrast may give a partial, error-laden draft; lowercase printing is hit-and-miss; cursive largely fails. There is no confidence score to filter errors, so plan to proofread everything.
Should I trust handwriting OCR for important documents?
No. For legal, medical, official, or any consequential handwritten document, treat OCR output as a rough draft only. Misread names, numbers, dates, or dosages can cause real harm — require full human transcription and a second-reviewer check.
Does handwriting OCR work in all languages?
The dropdown offers English (eng), French (fra), German (deu), Spanish (spa), Italian (ita), Portuguese (por), Dutch (nld), Russian (rus), Chinese Simplified (chi_sim), and Japanese (jpn), but handwriting recognition is weak in every case because Tesseract is a printed-text engine. Additionally, Cyrillic and CJK cannot be written into the Helvetica text layer, so those scripts will not be searchable even if recognised.
Why is the output a PDF and not a transcript?
OCR always produces a searchable PDF with an invisible text layer. To get the rough recognised text as editable plain text, run the result through PDF to Plain Text — then proofread it against the original.
How can I improve handwriting recognition?
Improve the input, since the tool has no image enhancement: scan at 300 DPI+, maximise contrast (dark ink on white), straighten the page, and reduce background lines or shading. Then accept that cursive will still likely fail and budget time for manual correction.
What should I use instead for serious handwriting digitisation?
A dedicated handwriting text recognition (HTR) service, which uses models trained specifically on handwriting and supports Unicode output. This tool is best as a free, private first attempt on neat printing — not a production HTR solution.
Are my handwritten documents uploaded?
No. pdf.js, Tesseract.js, and pdf-lib run in your browser, so personal notes, journals, and confidential records never leave your device. The only network call is the one-time language-data download.
Can it read a mixed printed-and-handwritten form?
It tends to recognise the printed labels reasonably and miss the handwritten answers. A practical workflow: OCR to recover the printed structure, then transcribe the handwritten entries manually.
Will OCR change my original scan?
It returns a new file; your original is untouched. The output page is a 2× re-render re-encoded as JPEG (quality 0.92) with the invisible text layer added — visually close to the scan but technically a re-compression.
How many pages of handwriting can I process?
By tier: Free 2 MB / 50 pages, Pro 50 MB / 500 pages, Pro+Media 500 MB / 2,000 pages, Developer 2 GB / 10,000 pages. Split long handwritten logs with PDF Split by Range if needed — though for cursive, manual transcription is usually faster than correcting OCR.
Can I automate handwriting OCR?
You can script it via the local runner (GET /api/v1/tools/pdf-ocr for the schema; POST to 127.0.0.1:9789/v1/tools/pdf-ocr/run with { "lang": "eng" }), but given the low handwriting accuracy, automated output still needs human review before use. The runner keeps documents local to your machine.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.