Convert Excel Survey Data to a Python List of Dicts for Analysis

How to generate python code from excel survey results for statistical analysis

Step 1
Export responses to Excel — Download the .xlsx (or .csv) response export from Google Forms, SurveyMonkey, or Qualtrics. Make sure the response grid is on the first sheet — Qualtrics sometimes adds extra header tabs.
Step 2
Trim Qualtrics' extra header rows first — Qualtrics exports often carry two extra metadata rows under the question text. The tool treats row 1 as headers, so delete the extra description/ImportId rows in Excel before uploading, or your first data 'row' will be metadata.
Step 3
Upload and keep format = list_of_dicts — Drop the file; the default list_of_dicts is ideal for per-respondent records. Question headers are sanitised to identifiers automatically — the variable is data.
Step 4
Load into pandas — Paste the snippet and run import pandas as pd; df = pd.DataFrame(data). Each respondent is a row; each sanitised question is a column.
Step 5
Cast Likert / numeric columns — Ratings arrive as strings. Convert before testing: df["q1"] = pd.to_numeric(df["q1"], errors="coerce") (coerce turns blanks/garbage into NaN).
Step 6
Run the analysis — Crosstab + chi-square: scipy.stats.chi2_contingency(pd.crosstab(df.group, df.choice)). Group means: df.groupby("group")["q1"].mean() after casting. Prefer a no-code summary? Build the crosstab in Excel with the pivot generator or render response counts as an SVG chart.

How survey cells map to Python

Likert and numeric responses arrive as text — cast before statistics. Open-ended text and blanks map cleanly.

Survey cell	In the dict	For analysis
Likert `4`	`"4"`	`pd.to_numeric(df["q1"])` before mean/test
Multiple-choice `Yes`	`"Yes"`	use directly in `pd.crosstab`
Open text `It was great`	`"It was great"`	string column for sentiment/NLP
Skipped question (blank)	`None`	→ NaN; `dropna()` / `fillna()`
Scale `TRUE`/`FALSE`	`"TRUE"`/`"FALSE"`	`.map({"TRUE": True, ...})`
Free text with "quotes"	`"...\"quotes\"..."`	preserved, quotes escaped

Long question header → sanitised key

There is NO truncation and NO mapping comment — the whole header is converted, character by character. Keep a copy of the original sheet if you need the full question text for a report.

Original question header	Sanitised key	Note
`How likely are you to recommend us? (0-10)`	`How_likely_are_you_to_recommend_us___0_10_`	Full length kept; every non-alnum → `_`
`Q1: Overall satisfaction`	`Q1__Overall_satisfaction`	`:` and space → `_`
`2024 NPS`	`_2024_NPS`	Leading digit → `_` prefix
`Gender / identity`	`Gender___identity`	Spaces and `/` each → `_`

Survey export quirks and limits

Per-platform gotchas plus the generator's fixed caps. The tool needs Pro or above.

Source / limit	What to watch	Action
Google Forms	First column is a Timestamp	Sanitises to `Timestamp`; parse with `pd.to_datetime`
SurveyMonkey	Often two header rows (question + sub-answer)	Flatten to one header row before upload
Qualtrics	Extra description / ImportId rows under headers	Delete those rows so data isn't read as a response
Inline cap	First 500 responses only	Split larger surveys or use read_excel
Free tier	Tool is Pro-gated	Upgrade to generate

Cookbook

From a raw survey export to a statistical test. Ratings are quoted strings, so the cast step is not optional.

Survey export → list of dicts

Three respondents, two Likert questions and a group. Note ratings come out as strings.

Excel (Sheet1):
  group | q1_quality | q2_value | comment
  A     | 4          | 5        | Loved it
  B     | 2          | 3        | Too slow
  A     | 5          | 4        |            <- skipped

format: list_of_dicts → data.py

data = [
    {"group": "A", "q1_quality": "4", "q2_value": "5", "comment": "Loved it"},
    {"group": "B", "q1_quality": "2", "q2_value": "3", "comment": "Too slow"},
    {"group": "A", "q1_quality": "5", "q2_value": "4", "comment": None}
]

Into pandas, cast Likert, compute means

Load the list, coerce ratings to numbers, then compare group means.

import pandas as pd
from data import data   # rename if you like

df = pd.DataFrame(data)
for col in ["q1_quality", "q2_value"]:
    df[col] = pd.to_numeric(df[col], errors="coerce")

print(df.groupby("group")["q1_quality"].mean())
# A -> 4.5, B -> 2.0

Chi-square on a contingency table

Crosstab two categorical columns and run scipy's chi-square test.

import pandas as pd
from scipy.stats import chi2_contingency
from data import data

df = pd.DataFrame(data)
table = pd.crosstab(df["group"], df["q2_value"])
chi2, p, dof, expected = chi2_contingency(table)
print(f"chi2={chi2:.2f}, p={p:.3f}")

Skipped questions as missing data

Blank cells become None → NaN, so missing-response handling is built in.

import pandas as pd
from data import data

df = pd.DataFrame(data)
print(df["comment"].isna().sum())   # 1 skipped comment

# drop respondents who skipped q1, or impute
df["q1_quality"] = pd.to_numeric(df["q1_quality"], errors="coerce")
complete = df.dropna(subset=["q1_quality"])

Open-ended text for a word frequency pass

Text answers stay as strings, ready for a quick token count or any NLP step.

from collections import Counter
from data import data

words = Counter()
for r in data:
    c = r["comment"]
    if c:                      # skip None (skipped)
        words.update(c.lower().split())
print(words.most_common(5))

Edge cases and what actually happens

Likert ratings are strings, not ints

Expected

A 4 rating is emitted as "4". df["q1"].mean() on the raw column errors or gives nonsense because it's an object column. Cast first: pd.to_numeric(df["q1"], errors="coerce"). This is the headline gotcha for survey analysis with this tool.

Long question headers are not truncated

Preserved

Contrary to a common assumption, headers are NOT cut to 40 characters and there is NO original→sanitised mapping comment. A full-sentence question becomes one long key with underscores. Keep the original sheet handy if you need the readable question text for charts or reports.

Skipped questions become None

Supported

Blank cells (unanswered questions) map to None, which pandas reads as NaN. Use dropna(subset=[...]) to drop incomplete respondents or fillna(...) to impute. A cell containing only a space is treated as a real (non-empty) answer — trim those first if respondents entered spaces.

Over 500 responses

Truncated

Only the first 500 responses are inlined; list_of_dicts appends a # ... N more rows truncated comment. For a survey with thousands of responses, analyse via pandas.read_excel() on the file instead of inlining — the generator is best for small or sampled surveys.

A free-text answer contains a line break

Invalid Python

Multi-paragraph open-ended answers contain newlines, which the generator does NOT escape — producing a broken multi-line string literal that won't import. Collapse line breaks in the text column before generating with a whitespace pass, or fix the offending literal afterwards.

Qualtrics extra header rows treated as a response

Misread

Qualtrics inserts description/ImportId rows beneath the question text. The tool treats row 1 as the header and everything after as responses, so those metadata rows become bogus respondent dicts. Delete them in Excel before uploading.

Two header rows from SurveyMonkey

Misaligned

SurveyMonkey grid questions export with a question row and a sub-answer row. Only the first row becomes keys, so the second header row lands as a phantom respondent. Flatten to a single header row before generating.

Duplicate question keys collapse

Collision

If two questions sanitise to the same identifier (e.g. punctuation-only differences), the later one overwrites the earlier inside each respondent dict — you silently lose a question. Rename the headers first with csv-header-rename.

Only the first sheet is read

By design

Many exports keep responses on a 'Sheet1' and a codebook/legend on another tab. The generator reads the first sheet only — move the response grid to the front if it isn't already.

Run on Free tier

Rejected

The generator is Pro-gated (Python Generator requires Pro tier.). Free users can't convert survey data with it. For large surveys you'd want pandas.read_excel() regardless of tier.

Frequently asked questions

How are Likert (1-5) responses represented?

As quoted strings — a 4 becomes "4", not the integer 4, because cells are read with SheetJS formatting on. Cast before any statistic: pd.to_numeric(df["q1"], errors="coerce").

Are long question headers truncated?

No. There is no length cap and no mapping comment. The full header is converted character-by-character into one long valid identifier (non-alphanumerics become _, a leading digit gets an _ prefix). Keep the original sheet if you need the readable question text.

Can I analyse open-ended text answers too?

Yes. Text responses are preserved verbatim as quoted strings (quotes and backslashes escaped), ready for word counts, sentiment, or any NLP step. The one caveat: answers with embedded newlines break the literal — collapse line breaks first.

What happens to skipped questions?

Blank cells become None, which pandas reads as NaN. Use dropna(subset=[...]) to exclude incomplete respondents or fillna(...) to impute. Space-only cells count as answered, so trim them if needed.

How many responses can I include?

Up to 500 are inlined (a fixed constant). list_of_dicts adds a # ... N more rows truncated comment past that. For larger surveys, use pandas.read_excel() on the full export.

How do I run a chi-square test on the output?

Load into pandas, crosstab two categorical columns, and pass to scipy: chi2_contingency(pd.crosstab(df.group, df.choice)). No casting is needed for categorical crosstabs — only for numeric tests.

Why is my mean() failing or wrong?

Because the rating column is strings (object dtype). mean() either errors or concatenates. Convert with pd.to_numeric(df[col], errors="coerce") first — coerce turns any non-numeric blanks into NaN safely.

Does it handle Qualtrics exports cleanly?

Mostly, but delete Qualtrics' extra description/ImportId rows under the question headers first — the tool treats row 1 as headers and everything below as responses, so those metadata rows would become fake respondents.

What about Google Forms timestamps?

The first column is usually a Timestamp; it sanitises to a Timestamp key and comes through as a formatted string. Parse it with pd.to_datetime(df["Timestamp"]) for time-based analysis.

Does it read multiple sheets (e.g. a codebook tab)?

No — only the first sheet. Put the response grid on the first sheet; convert any codebook/legend tab separately if you need it.

Is respondent data uploaded anywhere?

No. All parsing and code generation happen in your browser via SheetJS, so sensitive answers (health, income, opinions) never leave your machine.

What does the file download as?

A plain-text data.py with data = [...]. Copy is also available. Rename the variable to something like responses after pasting.

Privacy first

Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.

How to generate python code from excel survey results for statistical analysis

Step 1
Export responses to Excel — Download the .xlsx (or .csv) response export from Google Forms, SurveyMonkey, or Qualtrics. Make sure the response grid is on the first sheet — Qualtrics sometimes adds extra header tabs.
Step 2
Trim Qualtrics' extra header rows first — Qualtrics exports often carry two extra metadata rows under the question text. The tool treats row 1 as headers, so delete the extra description/ImportId rows in Excel before uploading, or your first data 'row' will be metadata.
Step 3
Upload and keep format = list_of_dicts — Drop the file; the default list_of_dicts is ideal for per-respondent records. Question headers are sanitised to identifiers automatically — the variable is data.
Step 4
Load into pandas — Paste the snippet and run import pandas as pd; df = pd.DataFrame(data). Each respondent is a row; each sanitised question is a column.
Step 5
Cast Likert / numeric columns — Ratings arrive as strings. Convert before testing: df["q1"] = pd.to_numeric(df["q1"], errors="coerce") (coerce turns blanks/garbage into NaN).
Step 6
Run the analysis — Crosstab + chi-square: scipy.stats.chi2_contingency(pd.crosstab(df.group, df.choice)). Group means: df.groupby("group")["q1"].mean() after casting. Prefer a no-code summary? Build the crosstab in Excel with the pivot generator or render response counts as an SVG chart.

How survey cells map to Python

Likert and numeric responses arrive as text — cast before statistics. Open-ended text and blanks map cleanly.

Survey cell	In the dict	For analysis
Likert `4`	`"4"`	`pd.to_numeric(df["q1"])` before mean/test
Multiple-choice `Yes`	`"Yes"`	use directly in `pd.crosstab`
Open text `It was great`	`"It was great"`	string column for sentiment/NLP
Skipped question (blank)	`None`	→ NaN; `dropna()` / `fillna()`
Scale `TRUE`/`FALSE`	`"TRUE"`/`"FALSE"`	`.map({"TRUE": True, ...})`
Free text with "quotes"	`"...\"quotes\"..."`	preserved, quotes escaped

Long question header → sanitised key

There is NO truncation and NO mapping comment — the whole header is converted, character by character. Keep a copy of the original sheet if you need the full question text for a report.

Original question header	Sanitised key	Note
`How likely are you to recommend us? (0-10)`	`How_likely_are_you_to_recommend_us___0_10_`	Full length kept; every non-alnum → `_`
`Q1: Overall satisfaction`	`Q1__Overall_satisfaction`	`:` and space → `_`
`2024 NPS`	`_2024_NPS`	Leading digit → `_` prefix
`Gender / identity`	`Gender___identity`	Spaces and `/` each → `_`

Survey export quirks and limits

Per-platform gotchas plus the generator's fixed caps. The tool needs Pro or above.

Source / limit	What to watch	Action
Google Forms	First column is a Timestamp	Sanitises to `Timestamp`; parse with `pd.to_datetime`
SurveyMonkey	Often two header rows (question + sub-answer)	Flatten to one header row before upload
Qualtrics	Extra description / ImportId rows under headers	Delete those rows so data isn't read as a response
Inline cap	First 500 responses only	Split larger surveys or use read_excel
Free tier	Tool is Pro-gated	Upgrade to generate

Cookbook

From a raw survey export to a statistical test. Ratings are quoted strings, so the cast step is not optional.

Survey export → list of dicts

Three respondents, two Likert questions and a group. Note ratings come out as strings.

Excel (Sheet1):
  group | q1_quality | q2_value | comment
  A     | 4          | 5        | Loved it
  B     | 2          | 3        | Too slow
  A     | 5          | 4        |            <- skipped

format: list_of_dicts → data.py

data = [
    {"group": "A", "q1_quality": "4", "q2_value": "5", "comment": "Loved it"},
    {"group": "B", "q1_quality": "2", "q2_value": "3", "comment": "Too slow"},
    {"group": "A", "q1_quality": "5", "q2_value": "4", "comment": None}
]

Into pandas, cast Likert, compute means

Load the list, coerce ratings to numbers, then compare group means.

import pandas as pd
from data import data   # rename if you like

df = pd.DataFrame(data)
for col in ["q1_quality", "q2_value"]:
    df[col] = pd.to_numeric(df[col], errors="coerce")

print(df.groupby("group")["q1_quality"].mean())
# A -> 4.5, B -> 2.0

Chi-square on a contingency table

Crosstab two categorical columns and run scipy's chi-square test.

import pandas as pd
from scipy.stats import chi2_contingency
from data import data

df = pd.DataFrame(data)
table = pd.crosstab(df["group"], df["q2_value"])
chi2, p, dof, expected = chi2_contingency(table)
print(f"chi2={chi2:.2f}, p={p:.3f}")

Skipped questions as missing data

Blank cells become None → NaN, so missing-response handling is built in.

import pandas as pd
from data import data

df = pd.DataFrame(data)
print(df["comment"].isna().sum())   # 1 skipped comment

# drop respondents who skipped q1, or impute
df["q1_quality"] = pd.to_numeric(df["q1_quality"], errors="coerce")
complete = df.dropna(subset=["q1_quality"])

Open-ended text for a word frequency pass

Text answers stay as strings, ready for a quick token count or any NLP step.

from collections import Counter
from data import data

words = Counter()
for r in data:
    c = r["comment"]
    if c:                      # skip None (skipped)
        words.update(c.lower().split())
print(words.most_common(5))

Edge cases and what actually happens

Likert ratings are strings, not ints

Expected

Long question headers are not truncated

Preserved

Skipped questions become None

Supported

Over 500 responses

Truncated

A free-text answer contains a line break

Invalid Python

Qualtrics extra header rows treated as a response

Misread

Two header rows from SurveyMonkey

Misaligned

Duplicate question keys collapse

Collision

Only the first sheet is read

By design

Many exports keep responses on a 'Sheet1' and a codebook/legend on another tab. The generator reads the first sheet only — move the response grid to the front if it isn't already.

Run on Free tier

Rejected

The generator is Pro-gated (Python Generator requires Pro tier.). Free users can't convert survey data with it. For large surveys you'd want pandas.read_excel() regardless of tier.

Frequently asked questions

How are Likert (1-5) responses represented?

As quoted strings — a 4 becomes "4", not the integer 4, because cells are read with SheetJS formatting on. Cast before any statistic: pd.to_numeric(df["q1"], errors="coerce").

Are long question headers truncated?

Can I analyse open-ended text answers too?

What happens to skipped questions?

How many responses can I include?

Up to 500 are inlined (a fixed constant). list_of_dicts adds a # ... N more rows truncated comment past that. For larger surveys, use pandas.read_excel() on the full export.

How do I run a chi-square test on the output?

Load into pandas, crosstab two categorical columns, and pass to scipy: chi2_contingency(pd.crosstab(df.group, df.choice)). No casting is needed for categorical crosstabs — only for numeric tests.

Why is my mean() failing or wrong?

Does it handle Qualtrics exports cleanly?

What about Google Forms timestamps?

The first column is usually a Timestamp; it sanitises to a Timestamp key and comes through as a formatted string. Parse it with pd.to_datetime(df["Timestamp"]) for time-based analysis.

Does it read multiple sheets (e.g. a codebook tab)?

No — only the first sheet. Put the response grid on the first sheet; convert any codebook/legend tab separately if you need it.

Is respondent data uploaded anywhere?

No. All parsing and code generation happen in your browser via SheetJS, so sensitive answers (health, income, opinions) never leave your machine.

What does the file download as?

A plain-text data.py with data = [...]. Copy is also available. Rename the variable to something like responses after pasting.

Generate Python Code from Excel Survey Results for Statistical Analysis

How to generate python code from excel survey results for statistical analysis

How survey cells map to Python

Long question header → sanitised key

Survey export quirks and limits

Cookbook

Survey export → list of dicts

Into pandas, cast Likert, compute means

Chi-square on a contingency table

Skipped questions as missing data

Open-ended text for a word frequency pass

Edge cases and what actually happens

Likert ratings are strings, not ints

Long question headers are not truncated

Skipped questions become None

Over 500 responses

A free-text answer contains a line break

Qualtrics extra header rows treated as a response

Two header rows from SurveyMonkey

Duplicate question keys collapse

Only the first sheet is read

Run on Free tier

Frequently asked questions

How are Likert (1-5) responses represented?

Are long question headers truncated?

Can I analyse open-ended text answers too?

What happens to skipped questions?

How many responses can I include?

How do I run a chi-square test on the output?

Why is my mean() failing or wrong?

Does it handle Qualtrics exports cleanly?

What about Google Forms timestamps?

Does it read multiple sheets (e.g. a codebook tab)?

Is respondent data uploaded anywhere?

What does the file download as?

Privacy first

Related guides

Generate Python Code from Excel Survey Results for Statistical Analysis

How to generate python code from excel survey results for statistical analysis

How survey cells map to Python

Long question header → sanitised key

Survey export quirks and limits

Cookbook

Survey export → list of dicts

Into pandas, cast Likert, compute means

Chi-square on a contingency table

Skipped questions as missing data

Open-ended text for a word frequency pass

Edge cases and what actually happens

Likert ratings are strings, not ints

Long question headers are not truncated

Skipped questions become None

Over 500 responses

A free-text answer contains a line break

Qualtrics extra header rows treated as a response

Two header rows from SurveyMonkey

Duplicate question keys collapse

Only the first sheet is read

Run on Free tier

Frequently asked questions

How are Likert (1-5) responses represented?

Are long question headers truncated?

Can I analyse open-ended text answers too?

What happens to skipped questions?

How many responses can I include?

How do I run a chi-square test on the output?

Why is my mean() failing or wrong?

Does it handle Qualtrics exports cleanly?

What about Google Forms timestamps?

Does it read multiple sheets (e.g. a codebook tab)?

Is respondent data uploaded anywhere?

What does the file download as?

Privacy first

Related guides