Map PDF Survey Fields for Analysis — Free Online Tool

How to map a pdf survey's questions and field types

Step 1
Open the field extractor — Go to the PDF Form Field Extractor. It runs in your browser; survey data is never uploaded.
Step 2
Drop one copy of the survey — Any copy works — the schema is identical across respondents. The tool runs automatically; there are no options to set.
Step 3
Read the question schema — The preview lists each field as { name, type, value } with the total count below. This is your variable list.
Step 4
Download the JSON schema — Save the field map as your survey's data dictionary. Keep it with your analysis scripts.
Step 5
Classify each question by type — Use the type to decide how to code each variable: numeric for Likert radio groups, categorical for single-choice, multi-response for option lists, free text for comments.
Step 6
Capture answers and load into your tool — Read each respondent's values in a separate step, build a tidy dataset keyed by the schema's field names, then analyse in Excel, R, or pandas.

Survey question type → analysis coding

How each pdf-lib field type maps to a survey variable. The schema gives you type; you decide the coding.

Field type	Typical survey question	Suggested coding
`PDFRadioGroup`	Likert / single-choice (e.g. 1–5 satisfaction)	Ordinal or numeric (map labels to scores)
`PDFDropdown`	Pick-one from a list (e.g. age band)	Categorical
`PDFOptionList`	Select-all-that-apply	Multi-response (one indicator per option)
`PDFCheckBox`	Yes/no or single opt-in	Binary (0/1)
`PDFTextField`	Open comment / short answer	Free text (qualitative coding)
`PDFButton`	Submit/reset	Exclude
`PDFSignature`	Consent signature	Exclude from analysis (verify separately)

Workflow split for survey analysis

Where this tool fits and where your analysis pipeline takes over.

Task	This tool	Your pipeline
Build the variable list	Field names + types (once)	Name variables, set coding
Get each answer	Empty strings (schema only)	Value-read each respondent file
Assemble the dataset	JSON schema	Tidy table: one row per respondent
Compute stats	—	AVERAGE in Sheets, mean()/describe() in R/pandas

Cookbook

From a survey's schema to a tidy dataset. Field names are illustrative; your survey's actual names appear verbatim in the JSON.

A Likert survey's schema

Three rating questions plus a comment. The radio groups confirm these are single-choice — the schema models the questions, not anyone's ratings.

[
  { "name": "q1_satisfaction", "type": "PDFRadioGroup", "value": "" },
  { "name": "q2_likelihood",   "type": "PDFRadioGroup", "value": "" },
  { "name": "q3_value",        "type": "PDFRadioGroup", "value": "" },
  { "name": "comments",        "type": "PDFTextField",  "value": "" }
]

Single-choice vs select-all-that-apply

The type tells you how to model the question. Get this wrong and you'll mis-aggregate the data.

Single-choice (radio): code as ONE categorical variable
  { "name": "primary_channel", "type": "PDFRadioGroup", "value": "" }

Select-all (option list): code as MULTIPLE indicators
  { "name": "channels_used",   "type": "PDFOptionList", "value": "" }
  -> channels_used_email, channels_used_phone, ...

Schema → R/pandas variable list

Use the field names as column names so your import code matches the survey exactly.

Columns (from the schema):
  q1_satisfaction, q2_likelihood, q3_value, comments

pandas:
  df = pd.DataFrame(rows, columns=[
    'q1_satisfaction','q2_likelihood','q3_value','comments'])

Rows come from your separate value-reading step.

Computing means after capture

Once values are captured into a tidy table, the schema's type tells you which columns are numeric and safe to average.

q1_satisfaction,q2_likelihood,q3_value
5,4,5
3,3,4
4,5,4

Sheets: =AVERAGE(A2:A4) -> 4.0
R:      mean(df$q1_satisfaction)
The schema confirmed these are radio (single-choice) scales.

Detecting a changed survey version

Re-running the schema extraction on a new survey edition surfaces added/removed questions before they corrupt your panel data.

v1 fields: q1..q5, comments
v2 fields: q1..q5, q6_nps, comments

New question 'q6_nps' added in v2.
Keep v1 and v2 responses in separate frames or add the
column with NA for v1 respondents.

Edge cases and what actually happens

It does not extract the ratings/answers

Schema only

Every value is an empty string. The tool models the survey's questions (names + types); it does not read what respondents selected or wrote. Capture answers in a separate value-reading step keyed by these field names.

Output is JSON, not CSV

JSON only

The result is a JSON array. There's no CSV/Excel export here — assemble your analysis table downstream in a script or spreadsheet using the schema as the column definition.

Likert scale labels aren't in the output

Not included

A Likert question shows as one PDFRadioGroup, but the option labels/scores (1–5) are not part of this output. You define the score mapping yourself when coding the variable.

Comparing many participants

Manual

The tool reads one file per run and returns schema, not data. Extract the schema once for the variable list, capture each participant's values separately, then combine rows into a single tidy dataset for comparison.

Scanned paper survey

No fields

A printed-and-scanned survey is an image with no interactive fields, so the array is empty. Use PDF OCR to recognise the marks/text, then map them to questions manually.

Multi-select option list

Expected

A select-all question is a single PDFOptionList field. The schema flags the type; you expand it into one indicator column per option during analysis. The individual option labels are not in this output.

Pure-XFA survey form

AcroForm only

Dynamic XFA surveys store fields in an XML layer pdf-lib doesn't parse, so they may return few or no fields. AcroForm-based surveys extract reliably.

Consent signature field

Expected

A consent signature appears as PDFSignature. Exclude it from statistical analysis. To confirm it's actually signed and valid, use PDF Signature Verify.

Free tier size/page limit

Rejected

Free tier caps at 2 MB and 50 pages; Pro at 50 MB and 500 pages. Multi-page survey booklets with images can be large — upgrade or compress with PDF Compress (lossy) if a survey is rejected.

Frequently asked questions

Does this pull the survey answers out of the PDF?

No. It extracts the survey's field schema — each question field's name and type — and returns every value as an empty string. It models the questions, not the responses. That model is what you need first: it tells you exactly which variables exist and how to code each one before you capture the actual answers in a separate step.

How does this help survey analysis if it doesn't give me data?

It gives you a correct, reusable data dictionary. The hardest part of analysing PDF surveys is mapping each question to the right variable and coding (single-choice vs multi-select vs numeric). The schema removes that guesswork — field names become your column names and the type tells you how to code each one consistently across every respondent file.

Can I calculate averages from Likert questions with this?

Not directly — the tool doesn't read the selected values. What it does is confirm a Likert question is a single-choice PDFRadioGroup, so you know to treat it as one numeric/ordinal variable. After you capture the chosen values into a table, use AVERAGE() in Sheets or mean() in R on that column.

How are single-choice vs select-all questions represented?

Single-choice questions are a single PDFRadioGroup field. Select-all-that-apply questions are typically a PDFOptionList (or several PDFCheckBox fields). The type key lets you model each correctly — radio as one categorical variable, option list as multiple indicator variables.

Will the option labels (e.g. 'Strongly agree') be in the output?

No. The output is field name + type + empty value. The labels and their numeric scores aren't included — you define that mapping yourself when coding the variable. Inspect the labels in a PDF reader once and record them in your data dictionary.

Can I compare responses from multiple participants?

Yes, with the right workflow. Extract the schema once for the variable list, then capture each participant's values in a separate value-reading step, build a tidy table (one row per participant) keyed by the schema's field names, and compare in your analysis tool. The tool itself processes one file per run.

Does it export to CSV or Excel for analysis?

No — the only output is a JSON array. Use it as the column definition and assemble your CSV/Excel dataset downstream in a script or spreadsheet.

What about a scanned paper survey?

A scanned survey has no interactive fields, so the tool returns an empty array. Use PDF OCR to recognise the text and marks, then map them to questions yourself. For pure-XFA dynamic surveys, an XFA-aware desktop tool is needed.

How do I name variables in R or pandas?

Use the field names from the schema directly. Because they're the survey's real field names, your import code will match the source exactly, avoiding mislabelled columns. If a name is awkward (e.g. a dotted, fully-qualified name), rename it in your code but keep a mapping back to the original.

Are signature/consent fields a problem?

They appear as PDFSignature and should be excluded from statistical analysis. If you need to confirm consent was actually signed, verify it cryptographically with PDF Signature Verify — this tool only reports that the field exists.

Is my survey data uploaded?

No. Parsing runs entirely in your browser via pdf-lib; survey files never reach a server. The result panel shows '0 bytes uploaded'. Only an anonymous usage counter is recorded when you're signed in — important for sensitive research data.

What are the size and page limits for a survey booklet?

Free tier accepts up to 2 MB and 50 pages; Pro raises that to 50 MB and 500 pages. Image-heavy survey booklets can exceed the free limit — compress them with PDF Compress (lossy) or upgrade.

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

How to map a pdf survey's questions and field types

Step 1
Open the field extractor — Go to the PDF Form Field Extractor. It runs in your browser; survey data is never uploaded.
Step 2
Drop one copy of the survey — Any copy works — the schema is identical across respondents. The tool runs automatically; there are no options to set.
Step 3
Read the question schema — The preview lists each field as { name, type, value } with the total count below. This is your variable list.
Step 4
Download the JSON schema — Save the field map as your survey's data dictionary. Keep it with your analysis scripts.
Step 5
Classify each question by type — Use the type to decide how to code each variable: numeric for Likert radio groups, categorical for single-choice, multi-response for option lists, free text for comments.
Step 6
Capture answers and load into your tool — Read each respondent's values in a separate step, build a tidy dataset keyed by the schema's field names, then analyse in Excel, R, or pandas.

Survey question type → analysis coding

How each pdf-lib field type maps to a survey variable. The schema gives you type; you decide the coding.

Field type	Typical survey question	Suggested coding
`PDFRadioGroup`	Likert / single-choice (e.g. 1–5 satisfaction)	Ordinal or numeric (map labels to scores)
`PDFDropdown`	Pick-one from a list (e.g. age band)	Categorical
`PDFOptionList`	Select-all-that-apply	Multi-response (one indicator per option)
`PDFCheckBox`	Yes/no or single opt-in	Binary (0/1)
`PDFTextField`	Open comment / short answer	Free text (qualitative coding)
`PDFButton`	Submit/reset	Exclude
`PDFSignature`	Consent signature	Exclude from analysis (verify separately)

Workflow split for survey analysis

Where this tool fits and where your analysis pipeline takes over.

Task	This tool	Your pipeline
Build the variable list	Field names + types (once)	Name variables, set coding
Get each answer	Empty strings (schema only)	Value-read each respondent file
Assemble the dataset	JSON schema	Tidy table: one row per respondent
Compute stats	—	AVERAGE in Sheets, mean()/describe() in R/pandas

Cookbook

From a survey's schema to a tidy dataset. Field names are illustrative; your survey's actual names appear verbatim in the JSON.

A Likert survey's schema

Three rating questions plus a comment. The radio groups confirm these are single-choice — the schema models the questions, not anyone's ratings.

[
  { "name": "q1_satisfaction", "type": "PDFRadioGroup", "value": "" },
  { "name": "q2_likelihood",   "type": "PDFRadioGroup", "value": "" },
  { "name": "q3_value",        "type": "PDFRadioGroup", "value": "" },
  { "name": "comments",        "type": "PDFTextField",  "value": "" }
]

Single-choice vs select-all-that-apply

The type tells you how to model the question. Get this wrong and you'll mis-aggregate the data.

Single-choice (radio): code as ONE categorical variable
  { "name": "primary_channel", "type": "PDFRadioGroup", "value": "" }

Select-all (option list): code as MULTIPLE indicators
  { "name": "channels_used",   "type": "PDFOptionList", "value": "" }
  -> channels_used_email, channels_used_phone, ...

Schema → R/pandas variable list

Use the field names as column names so your import code matches the survey exactly.

Columns (from the schema):
  q1_satisfaction, q2_likelihood, q3_value, comments

pandas:
  df = pd.DataFrame(rows, columns=[
    'q1_satisfaction','q2_likelihood','q3_value','comments'])

Rows come from your separate value-reading step.

Computing means after capture

Once values are captured into a tidy table, the schema's type tells you which columns are numeric and safe to average.

q1_satisfaction,q2_likelihood,q3_value
5,4,5
3,3,4
4,5,4

Sheets: =AVERAGE(A2:A4) -> 4.0
R:      mean(df$q1_satisfaction)
The schema confirmed these are radio (single-choice) scales.

Detecting a changed survey version

Re-running the schema extraction on a new survey edition surfaces added/removed questions before they corrupt your panel data.

v1 fields: q1..q5, comments
v2 fields: q1..q5, q6_nps, comments

New question 'q6_nps' added in v2.
Keep v1 and v2 responses in separate frames or add the
column with NA for v1 respondents.

Edge cases and what actually happens

It does not extract the ratings/answers

Schema only

Output is JSON, not CSV

JSON only

The result is a JSON array. There's no CSV/Excel export here — assemble your analysis table downstream in a script or spreadsheet using the schema as the column definition.

Likert scale labels aren't in the output

Not included

A Likert question shows as one PDFRadioGroup, but the option labels/scores (1–5) are not part of this output. You define the score mapping yourself when coding the variable.

Comparing many participants

Manual

Scanned paper survey

No fields

A printed-and-scanned survey is an image with no interactive fields, so the array is empty. Use PDF OCR to recognise the marks/text, then map them to questions manually.

Multi-select option list

Expected

Pure-XFA survey form

AcroForm only

Dynamic XFA surveys store fields in an XML layer pdf-lib doesn't parse, so they may return few or no fields. AcroForm-based surveys extract reliably.

Consent signature field

Expected

A consent signature appears as PDFSignature. Exclude it from statistical analysis. To confirm it's actually signed and valid, use PDF Signature Verify.

Free tier size/page limit

Rejected

Free tier caps at 2 MB and 50 pages; Pro at 50 MB and 500 pages. Multi-page survey booklets with images can be large — upgrade or compress with PDF Compress (lossy) if a survey is rejected.

Frequently asked questions

Does this pull the survey answers out of the PDF?

How does this help survey analysis if it doesn't give me data?

Can I calculate averages from Likert questions with this?

How are single-choice vs select-all questions represented?

Will the option labels (e.g. 'Strongly agree') be in the output?

Can I compare responses from multiple participants?

Does it export to CSV or Excel for analysis?

No — the only output is a JSON array. Use it as the column definition and assemble your CSV/Excel dataset downstream in a script or spreadsheet.

What about a scanned paper survey?

How do I name variables in R or pandas?

Are signature/consent fields a problem?

Is my survey data uploaded?

What are the size and page limits for a survey booklet?

Free tier accepts up to 2 MB and 50 pages; Pro raises that to 50 MB and 500 pages. Image-heavy survey booklets can exceed the free limit — compress them with PDF Compress (lossy) or upgrade.

Privacy first

All PDF processing runs locally in your browser using PDF-lib and pdf.js. No file is ever uploaded — only metadata counters are saved for signed-in dashboard stats.

Map a PDF Survey's Questions and Field Types

How to map a pdf survey's questions and field types

Survey question type → analysis coding

Workflow split for survey analysis

Cookbook

A Likert survey's schema

Single-choice vs select-all-that-apply

Schema → R/pandas variable list

Computing means after capture

Detecting a changed survey version

Edge cases and what actually happens

It does not extract the ratings/answers

Output is JSON, not CSV

Likert scale labels aren't in the output

Comparing many participants

Scanned paper survey

Multi-select option list

Pure-XFA survey form

Consent signature field

Free tier size/page limit

Frequently asked questions

Does this pull the survey answers out of the PDF?

How does this help survey analysis if it doesn't give me data?

Can I calculate averages from Likert questions with this?

How are single-choice vs select-all questions represented?

Will the option labels (e.g. 'Strongly agree') be in the output?

Can I compare responses from multiple participants?

Does it export to CSV or Excel for analysis?

What about a scanned paper survey?

How do I name variables in R or pandas?

Are signature/consent fields a problem?

Is my survey data uploaded?

What are the size and page limits for a survey booklet?

Privacy first

Related guides

Map a PDF Survey's Questions and Field Types

How to map a pdf survey's questions and field types

Survey question type → analysis coding

Workflow split for survey analysis

Cookbook

A Likert survey's schema

Single-choice vs select-all-that-apply

Schema → R/pandas variable list

Computing means after capture

Detecting a changed survey version

Edge cases and what actually happens

It does not extract the ratings/answers

Output is JSON, not CSV

Likert scale labels aren't in the output

Comparing many participants

Scanned paper survey

Multi-select option list

Pure-XFA survey form

Consent signature field

Free tier size/page limit

Frequently asked questions

Does this pull the survey answers out of the PDF?

How does this help survey analysis if it doesn't give me data?

Can I calculate averages from Likert questions with this?

How are single-choice vs select-all questions represented?

Will the option labels (e.g. 'Strongly agree') be in the output?

Can I compare responses from multiple participants?

Does it export to CSV or Excel for analysis?

What about a scanned paper survey?

How do I name variables in R or pandas?

Are signature/consent fields a problem?

Is my survey data uploaded?

What are the size and page limits for a survey booklet?

Privacy first

Related guides