How to convert an excel spreadsheet to python data structures
- Step 1Put the data you want on the first sheet — The tool reads sheet index 0 only. Multi-tab workbooks are not merged — move the table you want to the first tab, or save that tab as its own file. The first row is treated as the header row and supplies the dict keys / DataFrame columns.
- Step 2Upload the .xlsx or .csv — Drop the file on the tool. It accepts both
.xlsxand.csv(Pro tier: up to 50 MB / 100,000 rows per file). The Python Generator requires Pro or above — the Free tier cannot run it. - Step 3Choose the output format — Pick
list_of_dicts(default) for row-by-row iteration,dict_of_listsfor column-wise access, orpandasto get animport pandas as pd+pd.DataFrame({...})constructor. - Step 4Read the preview — The generated code renders in a preview panel. Outputs longer than 4,000 characters are visually truncated in the preview with a
… (truncated — download for full output)note — the Copy and Download actions still give you the complete code. - Step 5Copy or download — Use Copy to put the snippet on your clipboard, or Download to save it as
data.py. The variable isdatafor the dict/list formats anddffor pandas. - Step 6Fix typed numbers if you need real ints — Because cells are read as formatted text, numeric values come out quoted (
"42"). If you need numeric Python, cast them after pasting —int(row["age"]), or for pandasdf = df.astype({"age": int})— or pipe the DataFrame throughpd.to_numeric. Need a different target instead of Python? The same sheet can become an i18n JSON file, a tRPC router, or a Tailwind HTML table.
The three output formats
All three are driven by the single format option. Variable name is data for list_of_dicts and dict_of_lists, df for pandas. Only list_of_dicts appends a truncation comment when a file exceeds 500 rows.
| format value | Shape emitted | Variable | Best for |
|---|---|---|---|
list_of_dicts (default) | data = [ {"col": val, …}, … ] | data | Row iteration, fixtures, pd.DataFrame(data) later |
dict_of_lists | data = { "col": [v1, v2, …], … } | data | Column-wise access, quick pd.DataFrame(data) |
pandas | import pandas as pd then df = pd.DataFrame({ "col": [...] }) | df | Drop-in DataFrame for analysis notebooks |
How each cell value is rendered
Cells are read with SheetJS formatting on (raw mode off), so most Excel/CSV values reach the generator as strings. The generator only emits an unquoted Python number/bool when the JavaScript value is genuinely a number/boolean — which rarely happens for spreadsheet input.
| Cell in Excel/CSV | Value seen by generator | Python emitted | Notes |
|---|---|---|---|
Ann | "Ann" (string) | "Ann" | Quotes inside are escaped to \" |
42 | "42" (string) | "42" | Quoted string, NOT int — cast with int() if needed |
9.5 | "9.5" (string) | "9.5" | Quoted string, not float |
TRUE | "TRUE" (string) | "TRUE" | Not Python True — compare as text or cast |
| (empty cell) | "" (empty string) | None | Empty string maps to None |
C:\Users | "C:\Users" | "C:\\Users" | Backslashes are doubled, so the literal is correct |
Header sanitisation rules
Applied to every header before it becomes a dict key / DataFrame column. There is no length cap and no original→sanitised mapping comment — the rules below are the whole transform.
| Header in Excel | Sanitised identifier | Rule applied |
|---|---|---|
First Name | First_Name | Spaces → _ |
Q1 (score) | Q1__score_ | Each non-alphanumeric char → _ |
2024_total | _2024_total | Leading digit gets _ prefix |
class | class_col | Python keyword → _col suffix |
email@domain | email_domain | @ → _ |
Cookbook
Real before/after fragments showing exactly what the generator emits. Numbers and booleans appear quoted because spreadsheet cells reach the generator as formatted text — plan your casts accordingly.
list_of_dicts from a small lookup table
The default format. One dict per data row, keys taken from the sanitised header row, assigned to data.
Excel (Sheet1):
code | label | active
US | United Sta | TRUE
GB | United Kin | TRUE
format: list_of_dicts → data.py
data = [
{"code": "US", "label": "United Sta", "active": "TRUE"},
{"code": "GB", "label": "United Kin", "active": "TRUE"}
]dict_of_lists for column-wise access
Same data, column-oriented. Hand it straight to pandas with pd.DataFrame(data) or index a column with data["code"].
format: dict_of_lists → data.py
data = {
"code": ["US", "GB"],
"label": ["United Sta", "United Kin"],
"active": ["TRUE", "TRUE"]
}pandas DataFrame constructor
The pandas format prepends the import and wraps the columns in pd.DataFrame({...}), assigned to df. Note the values stay quoted — convert dtypes after construction.
format: pandas → data.py
import pandas as pd
df = pd.DataFrame({
"code": ["US", "GB"],
"label": ["United Sta", "United Kin"],
"active": ["TRUE", "TRUE"]
})
# numbers/bools arrive as text — cast as needed:
# df = df.astype({"active": bool}) # after mapping TRUE/FALSEKeyword and special-character headers stay valid
A header named class would break a pd.DataFrame attribute access; a header with a space or paren is not a legal identifier. The sanitiser fixes both so the snippet runs.
Excel headers: class | Q1 (1-5) | for
list_of_dicts output:
data = [
{"class_col": "A", "Q1__1_5_": "4", "for_col": "yes"}
]
# 'class' → class_col, 'for' → for_col (keywords)
# 'Q1 (1-5)' → Q1__1_5_ (each non-alnum char → _)Truncation past 500 rows (list_of_dicts only)
The generator inlines at most 500 rows. For list_of_dicts a trailing comment records how many were dropped; dict_of_lists and pandas silently slice to 500 with no comment.
Input: 1,500-row roster.xlsx
format: list_of_dicts → data.py
data = [
{"id": "1", "name": "Ann"},
... (498 more rows)
{"id": "500", "name": "Zed"}
]
# ... 1000 more rows truncated
# For the full dataset, use pandas.read_excel("roster.xlsx") instead.Edge cases and what actually happens
Only the first sheet is read
By designThe generator reads sheet index 0 and nothing else. A workbook with Customers, Orders, Products tabs yields Python for Customers only. To convert another tab, move it to the first position or save it as a separate file and run again — there is no sheet picker on this tool.
Numbers come out as quoted strings
ExpectedCells are read with SheetJS number/date formatting applied, so 42 reaches the generator as "42" and is emitted as a quoted string. The typeof v === "number" branch in the code only fires for genuine JS numbers, which spreadsheet input almost never produces. Cast after pasting — int(row["qty"]), or pd.to_numeric(df["qty"]) for a DataFrame. If your numbers are stored as text in Excel because of stray spaces, run a whitespace trim on the sheet first.
Booleans come out as "TRUE"/"FALSE" text
ExpectedLike numbers, boolean cells are formatted to text before the generator sees them, so you get the string "TRUE", not Python True. Map them yourself: df["active"] = df["active"].map({"TRUE": True, "FALSE": False}).
A cell contains a line break
Invalid PythonThe string escaper doubles backslashes and escapes double quotes, but it does NOT escape newlines. A cell with an embedded line break produces a literal newline inside a "..." literal, which Python rejects with SyntaxError: EOL while scanning string literal. Strip newlines before generating (the whitespace/case helpers in the CSV family can collapse them), or post-process the snippet.
Empty cells become None
SupportedAny empty cell (read as "") is emitted as Python None. This is consistent across all three formats, so None-aware code works without extra cleaning. Note a cell that contains only spaces is NOT empty — it becomes a quoted whitespace string.
More than 500 rows in the file
TruncatedExactly 500 rows are inlined regardless of tier — this is a hard code constant, not a tier limit. list_of_dicts appends # ... N more rows truncated; dict_of_lists and pandas slice silently. For full datasets use pandas.read_excel() against the source file instead of inlining.
Empty sheet / no rows
SupportedIf the first sheet has no data rows, the generator returns data = [] (or the empty equivalent) rather than erroring. A header-only sheet still yields an empty collection because there are no data rows to inline.
Duplicate header names
CollisionIf two columns share a header (or two distinct headers sanitise to the same identifier, e.g. Q1 (a) and Q1 (b) both becoming Q1__a_-style collisions are avoided but Q.1 and Q,1 both become Q_1), the later key overwrites the earlier one inside each dict / DataFrame column. Rename the headers first with csv-header-rename so every column survives.
File exceeds the Pro size or row limit
RejectedPro caps uploads at 50 MB / 100,000 rows / 5 files; Pro-media 200 MB / 500,000; Developer 500 MB / unlimited. A file over your tier's ceiling is rejected before processing. Even within the limit, only the first 500 rows are inlined — the upload limit governs what you can load, not what gets written into the snippet.
Free tier upload
RejectedThe Python Generator throws Python Generator requires Pro tier. for Free-tier users. There is no row-count or file-size workaround on Free — the tool is gated at Pro and above.
Frequently asked questions
What variable name does the output use?
list_of_dicts and dict_of_lists assign to data. The pandas format assigns to df and prepends import pandas as pd. Rename after pasting if you prefer something domain-specific (countries, rows, etc.).
Why are my numbers wrapped in quotes?
The spreadsheet is parsed with SheetJS formatting applied, so numeric cells arrive as text and are emitted as quoted strings ("42"). The generator only outputs an unquoted number when the underlying JavaScript value is a real number, which spreadsheet parsing rarely produces. Cast with int()/float() or pd.to_numeric() after pasting.
How many rows are inlined?
Up to 500, and that's a fixed code constant — not a tier setting. list_of_dicts adds a # ... N more rows truncated comment; the other two formats slice to 500 silently. For larger data, use pandas.read_excel() against the file.
Does it read more than one sheet?
No. Only the first sheet (index 0) is read. Move the table you want to the first tab, or split it into its own file.
Can I pass list_of_dicts straight to pandas?
Yes. pd.DataFrame(data) accepts a list of dicts directly, so the list_of_dicts format doubles as DataFrame input without choosing the pandas format. The pandas format just saves you that one line and adds the import.
How are Python keywords in headers handled?
A header that exactly matches a Python keyword (class, for, import, True, None, etc.) gets a _col suffix — class becomes class_col — so the generated dict keys and column names never trigger a syntax error.
What happens to special characters in headers?
Every character outside [a-zA-Z0-9_] is replaced with _, and a leading digit gets an _ prefix. So Q1 (score) becomes Q1__score_ and 2024 total becomes _2024_total. There is no length truncation and no mapping comment.
Are empty cells preserved?
Empty cells become Python None in all three formats. A cell containing only spaces is treated as non-empty and becomes a quoted whitespace string — trim it first if you want it to read as None.
Will a cell with a line break work?
No — newlines inside a cell are not escaped, so they produce an invalid multi-line string literal (SyntaxError). Collapse line breaks before generating, or fix the literal in the output.
Does the output run offline?
Yes. The generated code has no network dependency. The list_of_dicts and dict_of_lists outputs need no third-party package at all; the pandas output needs pandas installed where you run it.
Is my data uploaded anywhere?
No. Parsing and code generation happen entirely in your browser via SheetJS. The file never leaves your machine, which is why salary tables, financial models, and pre-release data are safe to run through it.
What format does the file download as?
A plain-text .py file named data.py. You can also Copy the snippet to your clipboard from the result panel.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.