How to read the fields in a pdf form
- Step 1Open the form reader — Go to the PDF Form Field Extractor. It runs in your browser via pdf-lib — nothing is sent to a server.
- Step 2Drop the filled PDF — Drag the form onto the dropzone. Whether it's blank or completed makes no difference to what's read — the tool maps the field structure either way. It runs automatically on drop; there's nothing to configure.
- Step 3Scan the field list — The preview shows the first 20 fields as
{ name, type, value }objects, with the total field count below. - Step 4Download the JSON — Save the complete field list as a
.jsonfile to share with a developer or attach to a ticket describing the form. - Step 5Identify field types — Use the
typekey to tell text boxes from checkboxes, radios, dropdowns, and option lists. This determines how each field must be read or filled. - Step 6Pass the schema to whoever processes the form — Hand the JSON to your team as the definitive field inventory before they write fill, validation, or value-reading code.
What you get vs. what you don't
Being precise about scope avoids surprises. Verified against lib/pdf/pdfEngine.ts.
| Question about the form | Answered? | How |
|---|---|---|
| What fields does it have? | Yes | One JSON entry per interactive AcroForm field |
| What's each field's name? | Yes | name = pdf-lib getName() (fully qualified) |
| What type is each field? | Yes | type = pdf-lib class name (7 types) |
| What did the user type / select? | No | value is always "" — structure only, not content |
| Which fields are required? | No | Required flags are not reported by this tool |
Reading the seven field types
What each type value tells you about how the field behaves.
| type | What it is | How a value would be captured |
|---|---|---|
PDFTextField | Free text entry | A string |
PDFCheckBox | On/off toggle | Checked or unchecked (Boolean) |
PDFRadioGroup | Single choice of several | The selected option's export value |
PDFDropdown | Combo box | The chosen option (sometimes free text) |
PDFOptionList | List box | One or more selected options |
PDFButton | Action button (not data) | No captured value |
PDFSignature | Signature placeholder | Signed/unsigned — verify cryptographically |
Cookbook
Reading a real form's structure, step by step. Field names shown are illustrative — your form's actual names appear verbatim.
Inventory of a returned form
A vendor sends back a completed onboarding form. You read its structure to confirm it's the version your pipeline expects.
Output:
[
{ "name": "vendor_name", "type": "PDFTextField", "value": "" },
{ "name": "tax_id", "type": "PDFTextField", "value": "" },
{ "name": "w9_received", "type": "PDFCheckBox", "value": "" },
{ "name": "payment_terms","type": "PDFDropdown", "value": "" }
]
The value is "" for all — this confirms the FIELDS, not the answers.Spotting a missing field after a form revision
The form was re-authored. Reading its fields tells you a field your code depends on is gone before that breaks anything downstream.
My code reads: vendor_name, tax_id, w9_received, payment_terms Form now has: vendor_name, tax_id, w9_received 'payment_terms' is missing → the new form dropped it. Fix the reader before it silently produces no payment data.
Distinguishing a radio group from checkboxes
Visually a set of options can look like either. The type tells you how to read it — a radio group is one field, separate checkboxes are several.
Single-choice (radio):
{ "name": "shipping", "type": "PDFRadioGroup", "value": "" }
→ ONE field; reads the selected option.
Multi-select (separate checkboxes):
{ "name": "extras.gift", "type": "PDFCheckBox", "value": "" }
{ "name": "extras.giftwrap", "type": "PDFCheckBox", "value": "" }
→ SEVERAL fields; read each independently.Filtering out non-data fields
Buttons and signature placeholders aren't data. Drop them by type so your value-reading step only touches real inputs.
Keep: PDFTextField, PDFCheckBox, PDFRadioGroup,
PDFDropdown, PDFOptionList
Drop: PDFButton (submit/reset), PDFSignature (verify separately)
The field map's 'type' key makes this filter a one-liner.Confirming the form is interactive at all
If you're not sure a PDF is a real fillable form, the field reader settles it instantly.
Output: [] Empty array = no interactive fields. The PDF is either flat (printed for handwriting), flattened, or pure-XFA. For a scanned form, run it through PDF OCR instead.
Edge cases and what actually happens
It does not return the entered values
By designThis reads the form's field structure (names and types), not the data a user typed. Every value is an empty string, even on a completed form. Use the field map as the key set for a separate value-reading step.
Handwritten form filled on paper then scanned
No fieldsA printed-and-scanned form is an image with no interactive fields, so the reader returns an empty array. Use PDF OCR to recognise the text off the scan, then map the recognised regions to fields yourself.
Form was flattened before you received it
No fieldsFlattening converts interactive fields to static page content (see PDF Flatten). The values are visible but no longer live, so there are no fields to read — the array comes back empty. Extract the visible text with PDF to Text.
Pure XFA dynamic form
AcroForm onlypdf-lib reads the AcroForm dictionary; it does not parse the XFA XML layer. A pure-XFA LiveCycle form may report few or no fields. If the form ships an AcroForm fallback, those fields still appear.
Required vs optional is not reported
Not reportedThe output is name + type + empty value only. It does not surface the field's 'required' flag, read-only state, or default value. To audit completeness, see the companion guide on auditing form completion.
Empty array means no AcroForm
Empty arrayAn empty result is a valid answer: the document has no interactive form. It's not an error — it tells you definitively that this PDF isn't fillable in the interactive sense.
Permission-restricted but readable
SupportedForms that only restrict editing (no open password) load fine because the engine ignores encryption for parsing. A PDF requiring a password to open may fail — remove it with PDF Unlock first.
Signature fields show as PDFSignature
ExpectedA signature field appears as PDFSignature in the map but the reader does not tell you whether it's actually signed or whether the signature is valid. To verify a digital signature, use PDF Signature Verify.
One file at a time
Single fileThe reader processes a single PDF per run — there's no multi-file batch in the UI. To inventory several forms, run them one after another and combine the JSON arrays in your own script.
Frequently asked questions
Does this read the actual answers in a filled form?
No — and this is the most important thing to understand. It reads the form's field structure: every interactive field's name and type. The value for each field is returned as an empty string, even on a fully completed form. Think of it as reading the form's blueprint, not the filled-in copy. The blueprint is what you need to know which fields exist and what type they are before capturing the answers in a separate step.
Do I need Adobe Acrobat?
No. Everything runs in your browser using pdf-lib. There's no software to install, no account required for the free tier, and the PDF never leaves your device.
What information do I actually get per field?
Three things per field: the fully-qualified name, the type (one of seven pdf-lib classes), and a value that is always an empty string. That's enough to inventory the form, classify each field, and produce the key list for a fill or value-reading pipeline.
Will it show empty and hidden fields?
Yes. Because it enumerates the form's interactive fields rather than reading visible answers, every interactive AcroForm field appears — including empty ones and fields that aren't currently shown. You see the full schema, not just what's visible on screen.
Why does my completed form return an empty array?
An empty array means there are no interactive AcroForm fields. The form may have been flattened (values baked into the page), printed and scanned (an image with no fields), or built as a pure-XFA dynamic form. For scanned forms, use PDF OCR to read the text.
Can I read several filled forms at once?
Not in a single run — the tool handles one file at a time. Process each form individually and concatenate the resulting JSON arrays in your own script if you need a combined inventory.
How do I tell a checkbox from a radio button in the output?
By the type key. Independent toggles are PDFCheckBox (you'll see one per checkbox). A single-choice set is a single PDFRadioGroup entry covering all the buttons in that group. This distinction matters because they're read and filled differently.
Does it tell me which fields are required?
No. The output is name, type, and an empty value. It does not report the required flag, read-only status, or default values. If you need to audit which fields must be completed, see the audit PDF form completion guide for the realistic approach.
What about signature fields?
A signature field is listed as PDFSignature. This tool doesn't tell you whether it's signed or verify the signature. To check a digital signature cryptographically, use PDF Signature Verify.
Can I export the field list as CSV?
No — the only output is a JSON array, downloaded as a .json file. Convert it to CSV downstream if you need a spreadsheet.
Is my filled form uploaded?
No. Parsing runs entirely in your browser via pdf-lib; the file is never transmitted. The result panel confirms '0 bytes uploaded'. Only an anonymous usage counter is recorded when you're signed in.
Once I know the fields, how do I get the values?
Use the field map as the key set for a value-reading step. In your own code you can load the PDF with pdf-lib and call each field's value getter (getText() for text fields, isChecked() for checkboxes, and so on), keyed by the exact names this tool gave you — which prevents the silent mismatches that come from guessing field names.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.