How to extract a pdf form's field map to json
- Step 1Open the field extractor — Go to the PDF Form Field Extractor. Everything runs locally in your browser via pdf-lib — the PDF is never uploaded.
- Step 2Drop the PDF form — Drag the form (blank or filled — it makes no difference to the field map) onto the dropzone. There are no options to configure: the tool reads the form and runs automatically as soon as the file is added.
- Step 3Review the field list preview — The result panel shows the first 20 entries as
{ "name", "type", "value" }objects, with a count of the total number of fields detected below the preview. - Step 4Download the JSON — Click Download to save the complete array as a
.jsonfile named after your PDF. Every field is included, in the order pdf-lib enumerates them. - Step 5Map names to your schema — The
namekeys are the PDF's internal field names — often machine-generated. Build a lookup that maps them to your database columns or API fields. Keep this map in source control alongside your fill/validation code. - Step 6Feed it into your fill or validation pipeline — Use the field list to generate a pre-fill template, or as the authoritative key set when you later read submitted values with a dedicated value-reading library.
What each JSON field means
Every entry in the output array has exactly three keys. Verified against the engine in lib/pdf/pdfEngine.ts.
| Key | What it holds | Notes |
|---|---|---|
name | The field's fully-qualified PDF name, e.g. applicant.email or topmostSubform[0].Page1[0].dob[0] | Comes from pdf-lib's field.getName(). Hierarchical fields are dotted; array-style names keep their [0] indices. |
type | The pdf-lib class name for the field: PDFTextField, PDFCheckBox, PDFRadioGroup, PDFDropdown, PDFOptionList, PDFButton, or PDFSignature | Comes from field.constructor.name. This is the field's structural type, not a friendly label. |
value | Always an empty string ("") | This tool maps structure, not content. It does not read the typed-in value — see the FAQ on reading values. |
PDF field types you'll see in the `type` key
The seven pdf-lib field classes and what each represents in a real form.
| type value | Form control | Typical use |
|---|---|---|
PDFTextField | Single- or multi-line text box | Names, addresses, dates typed as text, comments |
PDFCheckBox | Independent on/off checkbox | "I agree", opt-ins, yes/no toggles |
PDFRadioGroup | Mutually-exclusive radio button set | Single-choice questions (one option of several) |
PDFDropdown | Combo box (pick one, sometimes editable) | Country, state, category pickers |
PDFOptionList | Scrollable list box (single or multi-select) | Long option lists, multi-select selections |
PDFButton | Push button (submit, reset, JavaScript action) | Form-action buttons — usually not data fields |
PDFSignature | Digital signature field | Placeholder for a cryptographic signature |
Limits and behaviour
Real numbers from lib/tier-limits.ts and the tool client.
| Aspect | Free | Pro |
|---|---|---|
| Max file size | 2 MB | 50 MB |
| Max pages | 50 | 500 |
| Files per run | 1 | 1 (this tool processes a single file) |
| Output format | JSON only | JSON only |
| Options to configure | None — runs on drop | None — runs on drop |
Cookbook
What the JSON actually looks like for common form shapes. Field names are illustrative; your form's real names appear verbatim in the output.
A simple application form
A flat AcroForm with text fields, a checkbox, and a dropdown. Note that every value is an empty string — the array describes the form's fields, not anyone's answers.
Output (extract-pdf-form-fields-to-json):
[
{ "name": "first_name", "type": "PDFTextField", "value": "" },
{ "name": "last_name", "type": "PDFTextField", "value": "" },
{ "name": "email", "type": "PDFTextField", "value": "" },
{ "name": "country", "type": "PDFDropdown", "value": "" },
{ "name": "agree", "type": "PDFCheckBox", "value": "" }
]Hierarchical / nested field names
Forms authored in Acrobat or LiveCycle often use dotted, fully-qualified names. The tool reports them exactly as pdf-lib's getName() returns them.
[
{ "name": "topmostSubform[0].Page1[0].first[0]", "type": "PDFTextField", "value": "" },
{ "name": "topmostSubform[0].Page1[0].dob[0]", "type": "PDFTextField", "value": "" },
{ "name": "topmostSubform[0].Page1[0].sex[0]", "type": "PDFRadioGroup", "value": "" }
]
Use these full strings as the exact keys when you fill the form.Turning the map into a fill template
Once you have the names and types, generate a key/value scaffold your team can fill in or feed to a fill library. The extractor gives you the left column; you supply the right.
From the JSON, build a template (pseudocode): first_name -> "" last_name -> "" email -> "" country -> "" (one of the dropdown options) agree -> false (PDFCheckBox) The extractor confirms the names exist and their types, so your fill code won't fail on a typo'd field name.
Counting fields to gauge complexity
The result panel shows the total count below the preview. A quick way to size up an unfamiliar form before you commit to automating it.
Preview shows first 20 entries, then: ... (137 total items) 137 fields means this form is non-trivial — budget time to map names to your schema and to handle radio groups and option lists, which need their option labels handled separately.
Validating field names against your code
Diff the extracted names against the keys your fill script expects. Mismatches (renamed or removed fields after a form revision) surface immediately.
Expected by my script: first_name, last_name, email, phone Extracted from new form: first_name, last_name, email Missing: phone → The form was re-authored and dropped 'phone'. Update the script before it silently skips that data.
Edge cases and what actually happens
The `value` field is always empty
By designThis tool extracts the form's structure — names and types — not the data typed into it. Every entry's value is an empty string, even for a fully completed form. To read submitted answers you need a value-reading step; the field map gives you the canonical keys to read against.
PDF has no interactive form
Empty arrayIf the document has no AcroForm, the tool returns an empty array []. A form that's just lines and boxes printed on the page (a 'flat' form meant for handwriting) has no interactive fields to enumerate — use PDF OCR to read text off such a document instead.
XFA (LiveCycle / dynamic) forms
AcroForm onlyThe extractor reads the AcroForm dictionary via pdf-lib. Pure XFA forms (dynamic PDFs from Adobe LiveCycle) store their fields in an XML layer pdf-lib does not parse, so they may report few or no fields. Many XFA forms ship an AcroForm fallback layer — those still extract. For pure-XFA forms, use an XFA-aware desktop tool.
Flattened form
Empty arrayIf a form has been flattened (interactive fields baked into the page — see PDF Flatten), the interactive fields no longer exist, so the extractor returns an empty array. The visible text is still on the page; extract it with PDF to Text.
Fully-qualified / nested names look cryptic
ExpectedNames like topmostSubform[0].Page1[0].field[0] are the form's real internal names, returned verbatim from pdf-lib. They are correct keys for automation even if they aren't human-readable. Build a name-to-label map in your own code.
File exceeds the size or page limit
RejectedFree tier caps input at 2 MB and 50 pages; Pro raises this to 50 MB and 500 pages. A form larger than your tier's limit is rejected before processing. Form PDFs are usually small, so this rarely bites — but a scanned-image-heavy form can be large.
Encrypted / password-restricted PDF
Often supportedThe engine loads with ignoreEncryption, so forms that merely set permission restrictions (no open password) typically still yield their field map. A PDF that requires a password just to open may fail to parse — remove the password first with PDF Unlock.
Push buttons appear in the output
ExpectedSubmit/reset/JavaScript buttons are real form fields and show up as PDFButton. They carry no data, so filter them out when building a data schema — key off the type to drop PDFButton (and usually PDFSignature) entries.
Duplicate-looking names
ExpectedRadio groups expose a single field name for the whole group even though there are several physical buttons. You won't see one entry per button — you'll see one PDFRadioGroup entry. The individual option labels are not part of this output.
Need CSV instead of JSON
JSON onlyThis tool outputs a JSON array only — there is no CSV export option in the UI. If you need a spreadsheet, convert the JSON downstream, or for tabular content elsewhere in the PDF use PDF Table to JSON or PDF to Excel.
Frequently asked questions
Does this extract the values someone typed into the form?
No. This tool extracts the form's field map — each field's name and type — and returns the value as an empty string for every field. It tells you the structure of the form (what fields exist and what kind they are), not the data entered. That structure is exactly what you need to build pre-fill payloads, validate against a schema, or know the canonical keys before reading values with a dedicated value-reading library.
What does the JSON output look like?
A JSON array. Each element is an object with three keys: name (the field's fully-qualified PDF name), type (one of the seven pdf-lib field classes — PDFTextField, PDFCheckBox, PDFRadioGroup, PDFDropdown, PDFOptionList, PDFButton, PDFSignature), and value (always ""). It downloads as a .json file named after your PDF.
Why are the field names so cryptic?
The name is the form's internal field name, returned exactly as pdf-lib reports it. Acrobat- and LiveCycle-authored forms often use hierarchical, fully-qualified names like topmostSubform[0].Page1[0].first[0]. These are the correct keys to use when filling the form programmatically. To make them human-readable, build your own map from PDF field name to display label.
Does it work with Adobe XFA forms?
Standard AcroForms extract reliably. Pure XFA forms (dynamic PDFs from Adobe LiveCycle) keep their fields in an XML layer that pdf-lib does not parse, so they may report few or no fields. Many XFA forms include an AcroForm fallback that still extracts. For pure-XFA, use an XFA-aware desktop tool.
Are there any options to configure?
No. The tool runs automatically the moment you drop a file — there are no settings, formats, or toggles. It reads the form and produces the JSON field map. This keeps it fast and deterministic.
Can I export to CSV instead?
Not from this tool — the only output is a JSON array. If you need tabular data, convert the JSON downstream in a script or spreadsheet. For genuinely tabular content elsewhere in the PDF, see PDF Table to JSON or PDF to Excel.
How are checkboxes represented?
A checkbox appears as one entry with type: "PDFCheckBox". Because this tool maps structure rather than reading values, the value is an empty string — it does not report whether the box is checked. When you later read values, normalise checkbox state to a Boolean in your own code.
Why does my form return an empty array?
An empty array means no interactive AcroForm fields were found. Common causes: the form was flattened (fields baked into the page — see PDF Flatten), it's a pure-XFA form, or it's a printed/scanned form with no interactive layer. For scanned forms, use PDF OCR to read the text instead.
Will it handle a password-protected form?
The engine loads with encryption ignored, so permission-restricted forms (no open password) usually still yield their field map. A PDF that needs a password just to open may fail to parse — clear the password first with PDF Unlock, then run the extractor.
What are the file size and page limits?
Free tier accepts up to 2 MB and 50 pages per file; Pro raises this to 50 MB and 500 pages. Form PDFs are usually small, so the free limit covers the vast majority of real forms.
Is my form uploaded anywhere?
No. Parsing happens entirely in your browser via pdf-lib. The file never leaves your device; the result panel notes '0 bytes uploaded'. Only an anonymous usage counter is recorded when you're signed in.
How do I get the data once I have the field map?
The field map gives you the authoritative list of names and types. To capture submitted values, pair it with a value-reading step (for example, pdf-lib's own getText() / isChecked() in your own script, or a desktop form-data export). The map ensures you read against the exact field names the form actually uses, avoiding silent mismatches.
Privacy first
All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.