Convert Excel Spreadsheet to pandas DataFrame Code for Jupyter

How to generate pandas dataframe code from an excel file for jupyter notebooks

Step 1
Put the table on sheet 1 — Only the first sheet is read. If your analysis data is on a Data tab behind a Cover tab, move it to the front or export it alone — the tool will not reach into later sheets.
Step 2
Upload and choose the pandas format — Drop the .xlsx or .csv and set format to pandas. The generator wraps each column as a list inside pd.DataFrame({...}) and assigns it to df.
Step 3
Copy into a notebook cell — Use Copy (or Download data.py) and paste into a Jupyter/Colab cell. Run it — df exists immediately, no read_excel, no upload widget, no path.
Step 4
Cast the typed-as-text columns — Numeric and boolean columns arrive quoted. Add df = df.astype({"score": float, "count": int}) or df["score"] = pd.to_numeric(df["score"]) right after the constructor.
Step 5
Map the booleans — Boolean cells come out as "TRUE"/"FALSE" strings. Convert with df["active"] = df["active"].map({"TRUE": True, "FALSE": False}) before any boolean indexing.
Step 6
Treat it as a snapshot — Inline data is a frozen copy of up to 500 rows. If the source workbook changes, re-run the generator and replace the cell — for live or large data, keep pandas.read_excel() instead. For a quick visual instead of a DataFrame, the same sheet can become an SVG chart.

Inline constructor vs read_excel for notebooks

When the generated pd.DataFrame constructor beats reading the file, and when it doesn't.

Situation	Inline pd.DataFrame({...})	pandas.read_excel()
Notebook shared on GitHub without the .xlsx	Works — data is in the cell	Fails — file not found
Colab/Kaggle kernel with no openpyxl	Works — no Excel reader needed	Needs openpyxl/xlrd installed
Reviewer wants to see the data in the diff	Visible inline	Opaque binary attachment
Dataset over ~500 rows	Truncated to 500	Reads the whole file
Data refreshes weekly	Stale until you re-generate	Always current on re-read
Numeric/date dtypes matter	Arrive as text — cast manually	Inferred on read

What the pandas format emits per value

The constructor is one Python list per column. Spreadsheet cells reach the generator as formatted text, so plan a cast for numeric/boolean columns.

Cell	Emitted in the list	Cast you'll likely add
`Ann` (text)	`"Ann"`	none — already str
`9.5` (number)	`"9.5"`	`pd.to_numeric(df["col"])`
`42` (number)	`"42"`	`df.astype({"col": int})`
`TRUE` (bool)	`"TRUE"`	`.map({"TRUE": True, "FALSE": False})`
`2024-01-15` (date)	`"1/15/24"` (formatted text)	`pd.to_datetime(df["col"])`
(empty)	`None`	reads as `NaN` automatically

Tier upload limits (DataFrame size still caps at 500 rows)

Tier limits govern the file you can upload; the inlined DataFrame is always at most 500 rows. The tool needs Pro or above.

Tier	Max file	Max rows uploaded	Rows inlined in df
Free	n/a — tool gated	n/a	n/a (Pro required)
Pro	50 MB	100,000	500
Pro-media	200 MB	500,000	500
Developer	500 MB	unlimited	500

Cookbook

Constructor fragments and the dtype fixes you'll paste right after them. Values are quoted because cells arrive as formatted text.

Basic DataFrame constructor

The pandas format prepends the import and builds the column lists. df is ready after running the cell.

Excel (Sheet1):
  region | revenue | growth
  EMEA   | 120000  | 0.12
  APAC   | 98000   | 0.31

format: pandas → data.py

import pandas as pd

df = pd.DataFrame({
    "region": ["EMEA", "APAC"],
    "revenue": ["120000", "98000"],
    "growth": ["0.12", "0.31"]
})

Fixing dtypes after the constructor

Numbers are strings. One astype call (or to_numeric per column) restores numeric dtypes for analysis.

import pandas as pd

df = pd.DataFrame({
    "region": ["EMEA", "APAC"],
    "revenue": ["120000", "98000"],
    "growth": ["0.12", "0.31"]
})

df = df.astype({"revenue": int, "growth": float})
# now df["revenue"].sum() works as a number, not string concat

Empty cells become None → NaN

A blank cell is None in the constructor, which pandas treats as NaN — so missing-value methods work without extra cleaning.

Excel:
  user | score
  ann  | 88
  bob  |          <- blank

import pandas as pd
df = pd.DataFrame({
    "user": ["ann", "bob"],
    "score": ["88", None]
})

df["score"].isna()   # bob -> True
df["score"] = pd.to_numeric(df["score"])  # NaN stays NaN

Dates come through as formatted text

Date cells are formatted to a display string by SheetJS, so they arrive quoted. Parse with pd.to_datetime after construction.

import pandas as pd
df = pd.DataFrame({
    "event": ["launch", "review"],
    "on": ["1/15/24", "3/1/24"]
})

df["on"] = pd.to_datetime(df["on"])
# now df.set_index("on").resample("M") etc. works

Keyword-safe columns for attribute access

df.class would be a syntax error; the sanitiser renames the column so dotted access and queries stay legal.

Excel header: class  ->  sanitised to class_col

import pandas as pd
df = pd.DataFrame({
    "class_col": ["A", "B"],
    "count": ["10", "5"]
})

df.class_col      # legal — 'df.class' would not be

Edge cases and what actually happens

No dtypes comment block is produced

By design

The pandas format emits only the import and the pd.DataFrame({...}) constructor — there is no auto-generated dtypes annotation. Because every value arrives as text, add your own astype/to_numeric line for numeric and date columns after pasting.

Numeric columns are strings

Expected

df["revenue"].sum() on a freshly pasted DataFrame concatenates strings instead of adding numbers, because the cells came through as text. Cast first: df = df.astype({"revenue": int}) or pd.to_numeric. This is the single most common surprise with the pandas format.

Only sheet 1 becomes a DataFrame

By design

A multi-sheet model yields a DataFrame for the first tab only. To analyse several tabs, run the tool once per tab (moving each to the front, or saving each as its own file) and concatenate the resulting DataFrames in your notebook. If the tabs share a key and you'd rather merge them into one sheet upstream, the sheet joiner does that before you generate.

DataFrame capped at 500 rows

Truncated

The pandas format slices to the first 500 rows and — unlike list_of_dicts — adds no truncation comment, so a 5,000-row file silently produces a 500-row DataFrame. For the full dataset switch to pandas.read_excel() against the source file.

Line break inside a cell

Invalid Python

Newlines in cells are not escaped, so a multi-line cell yields a broken string literal and the cell won't run. Collapse line breaks before generating, or fix the offending literal in the output.

Boolean columns are 'TRUE'/'FALSE' strings

Expected

Boolean cells arrive as text, so df["active"] is an object column of "TRUE"/"FALSE". Boolean indexing (df[df.active]) won't behave until you map them: df["active"].map({"TRUE": True, "FALSE": False}).

Duplicate / colliding column names

Collision

If two headers sanitise to the same identifier, the second column's list overwrites the first in the constructor dict, so you lose a column. Rename headers first with csv-header-rename to keep them distinct.

Empty sheet

Supported

A sheet with no data rows produces an effectively empty structure rather than an error. With the pandas format you'll get a constructor with empty lists — a valid, empty DataFrame.

File too large for the tier

Rejected

Uploads above the tier ceiling (Pro 50 MB / 100,000 rows; Pro-media 200 MB / 500,000; Developer 500 MB / unlimited) are rejected before generation. Within the limit, still only 500 rows are inlined.

Run on Free tier

Rejected

The generator is Pro-gated and throws Python Generator requires Pro tier. for Free users. Upgrade to Pro to generate DataFrame code.

Frequently asked questions

Why does df["col"].sum() concatenate instead of add?

Because numeric cells are read as formatted text and emitted as quoted strings, the column has object dtype. Cast it first: df["col"] = pd.to_numeric(df["col"]) or df = df.astype({"col": int}).

Is there a dtypes block in the output?

No. The pandas format outputs only the import and the pd.DataFrame({...}) constructor. Add any dtype casting yourself after pasting — it's typically one astype line.

Why not just use pandas.read_excel()?

Use it when the file ships with the notebook and the kernel has openpyxl, and especially for data over 500 rows or data that changes. The inline constructor wins when the .xlsx isn't bundled (shared GitHub/Colab notebooks), when you want the data visible in the diff, or when you want zero Excel-reader dependency.

How many rows make it into the DataFrame?

At most 500 — a fixed code constant, not a tier setting. The pandas format truncates silently (no comment), so a larger file quietly yields a 500-row DataFrame.

Does it read multiple sheets into one DataFrame?

No. Only the first sheet is read. Generate per tab and pd.concat([...]) the results in your notebook.

What variable name is used?

df, and import pandas as pd is included. Rename to taste after pasting.

How are dates handled?

Date cells are formatted to display strings by SheetJS, so they arrive quoted (e.g. "1/15/24"). Parse them with pd.to_datetime(df["col"]) after construction.

What about empty cells?

They become None, which pandas reads as NaN. So df.isna(), df.dropna(), and df.fillna() all work without an extra cleaning step.

Are columns with spaces or keywords safe?

Yes. Headers are sanitised to valid identifiers (First Name → First_Name, class → class_col), so df.column_name attribute access and df.query(...) won't break. If you also want the source workbook's headers cleaned permanently, run csv-header-rename on the file first.

Does the constructor need pandas installed?

Yes — pd.DataFrame requires pandas where you run the cell. If you want zero dependency, use the list_of_dicts format and call pd.DataFrame(data) only where pandas is available.

Is my data sent to a server?

No. Everything runs in your browser via SheetJS. Nothing is uploaded, which is why unpublished experiment data is safe to convert.

What does the download look like?

A plain-text data.py containing the import and constructor. You can also Copy it directly into a notebook cell.

Privacy first

Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.

How to generate pandas dataframe code from an excel file for jupyter notebooks

Step 1
Put the table on sheet 1 — Only the first sheet is read. If your analysis data is on a Data tab behind a Cover tab, move it to the front or export it alone — the tool will not reach into later sheets.
Step 2
Upload and choose the pandas format — Drop the .xlsx or .csv and set format to pandas. The generator wraps each column as a list inside pd.DataFrame({...}) and assigns it to df.
Step 3
Copy into a notebook cell — Use Copy (or Download data.py) and paste into a Jupyter/Colab cell. Run it — df exists immediately, no read_excel, no upload widget, no path.
Step 4
Cast the typed-as-text columns — Numeric and boolean columns arrive quoted. Add df = df.astype({"score": float, "count": int}) or df["score"] = pd.to_numeric(df["score"]) right after the constructor.
Step 5
Map the booleans — Boolean cells come out as "TRUE"/"FALSE" strings. Convert with df["active"] = df["active"].map({"TRUE": True, "FALSE": False}) before any boolean indexing.
Step 6
Treat it as a snapshot — Inline data is a frozen copy of up to 500 rows. If the source workbook changes, re-run the generator and replace the cell — for live or large data, keep pandas.read_excel() instead. For a quick visual instead of a DataFrame, the same sheet can become an SVG chart.

Inline constructor vs read_excel for notebooks

When the generated pd.DataFrame constructor beats reading the file, and when it doesn't.

Situation	Inline pd.DataFrame({...})	pandas.read_excel()
Notebook shared on GitHub without the .xlsx	Works — data is in the cell	Fails — file not found
Colab/Kaggle kernel with no openpyxl	Works — no Excel reader needed	Needs openpyxl/xlrd installed
Reviewer wants to see the data in the diff	Visible inline	Opaque binary attachment
Dataset over ~500 rows	Truncated to 500	Reads the whole file
Data refreshes weekly	Stale until you re-generate	Always current on re-read
Numeric/date dtypes matter	Arrive as text — cast manually	Inferred on read

What the pandas format emits per value

The constructor is one Python list per column. Spreadsheet cells reach the generator as formatted text, so plan a cast for numeric/boolean columns.

Cell	Emitted in the list	Cast you'll likely add
`Ann` (text)	`"Ann"`	none — already str
`9.5` (number)	`"9.5"`	`pd.to_numeric(df["col"])`
`42` (number)	`"42"`	`df.astype({"col": int})`
`TRUE` (bool)	`"TRUE"`	`.map({"TRUE": True, "FALSE": False})`
`2024-01-15` (date)	`"1/15/24"` (formatted text)	`pd.to_datetime(df["col"])`
(empty)	`None`	reads as `NaN` automatically

Tier upload limits (DataFrame size still caps at 500 rows)

Tier limits govern the file you can upload; the inlined DataFrame is always at most 500 rows. The tool needs Pro or above.

Tier	Max file	Max rows uploaded	Rows inlined in df
Free	n/a — tool gated	n/a	n/a (Pro required)
Pro	50 MB	100,000	500
Pro-media	200 MB	500,000	500
Developer	500 MB	unlimited	500

Cookbook

Constructor fragments and the dtype fixes you'll paste right after them. Values are quoted because cells arrive as formatted text.

Basic DataFrame constructor

The pandas format prepends the import and builds the column lists. df is ready after running the cell.

Excel (Sheet1):
  region | revenue | growth
  EMEA   | 120000  | 0.12
  APAC   | 98000   | 0.31

format: pandas → data.py

import pandas as pd

df = pd.DataFrame({
    "region": ["EMEA", "APAC"],
    "revenue": ["120000", "98000"],
    "growth": ["0.12", "0.31"]
})

Fixing dtypes after the constructor

Numbers are strings. One astype call (or to_numeric per column) restores numeric dtypes for analysis.

import pandas as pd

df = pd.DataFrame({
    "region": ["EMEA", "APAC"],
    "revenue": ["120000", "98000"],
    "growth": ["0.12", "0.31"]
})

df = df.astype({"revenue": int, "growth": float})
# now df["revenue"].sum() works as a number, not string concat

Empty cells become None → NaN

A blank cell is None in the constructor, which pandas treats as NaN — so missing-value methods work without extra cleaning.

Excel:
  user | score
  ann  | 88
  bob  |          <- blank

import pandas as pd
df = pd.DataFrame({
    "user": ["ann", "bob"],
    "score": ["88", None]
})

df["score"].isna()   # bob -> True
df["score"] = pd.to_numeric(df["score"])  # NaN stays NaN

Dates come through as formatted text

Date cells are formatted to a display string by SheetJS, so they arrive quoted. Parse with pd.to_datetime after construction.

import pandas as pd
df = pd.DataFrame({
    "event": ["launch", "review"],
    "on": ["1/15/24", "3/1/24"]
})

df["on"] = pd.to_datetime(df["on"])
# now df.set_index("on").resample("M") etc. works

Keyword-safe columns for attribute access

df.class would be a syntax error; the sanitiser renames the column so dotted access and queries stay legal.

Excel header: class  ->  sanitised to class_col

import pandas as pd
df = pd.DataFrame({
    "class_col": ["A", "B"],
    "count": ["10", "5"]
})

df.class_col      # legal — 'df.class' would not be

Edge cases and what actually happens

No dtypes comment block is produced

By design

Numeric columns are strings

Expected

Only sheet 1 becomes a DataFrame

By design

DataFrame capped at 500 rows

Truncated

Line break inside a cell

Invalid Python

Newlines in cells are not escaped, so a multi-line cell yields a broken string literal and the cell won't run. Collapse line breaks before generating, or fix the offending literal in the output.

Boolean columns are 'TRUE'/'FALSE' strings

Expected

Duplicate / colliding column names

Collision

Empty sheet

Supported

A sheet with no data rows produces an effectively empty structure rather than an error. With the pandas format you'll get a constructor with empty lists — a valid, empty DataFrame.

File too large for the tier

Rejected

Uploads above the tier ceiling (Pro 50 MB / 100,000 rows; Pro-media 200 MB / 500,000; Developer 500 MB / unlimited) are rejected before generation. Within the limit, still only 500 rows are inlined.

Run on Free tier

Rejected

The generator is Pro-gated and throws Python Generator requires Pro tier. for Free users. Upgrade to Pro to generate DataFrame code.

Frequently asked questions

Why does df["col"].sum() concatenate instead of add?

Because numeric cells are read as formatted text and emitted as quoted strings, the column has object dtype. Cast it first: df["col"] = pd.to_numeric(df["col"]) or df = df.astype({"col": int}).

Is there a dtypes block in the output?

No. The pandas format outputs only the import and the pd.DataFrame({...}) constructor. Add any dtype casting yourself after pasting — it's typically one astype line.

Why not just use pandas.read_excel()?

How many rows make it into the DataFrame?

At most 500 — a fixed code constant, not a tier setting. The pandas format truncates silently (no comment), so a larger file quietly yields a 500-row DataFrame.

Does it read multiple sheets into one DataFrame?

No. Only the first sheet is read. Generate per tab and pd.concat([...]) the results in your notebook.

What variable name is used?

df, and import pandas as pd is included. Rename to taste after pasting.

How are dates handled?

Date cells are formatted to display strings by SheetJS, so they arrive quoted (e.g. "1/15/24"). Parse them with pd.to_datetime(df["col"]) after construction.

What about empty cells?

They become None, which pandas reads as NaN. So df.isna(), df.dropna(), and df.fillna() all work without an extra cleaning step.

Are columns with spaces or keywords safe?

Does the constructor need pandas installed?

Yes — pd.DataFrame requires pandas where you run the cell. If you want zero dependency, use the list_of_dicts format and call pd.DataFrame(data) only where pandas is available.

Is my data sent to a server?

No. Everything runs in your browser via SheetJS. Nothing is uploaded, which is why unpublished experiment data is safe to convert.

What does the download look like?

A plain-text data.py containing the import and constructor. You can also Copy it directly into a notebook cell.

Generate pandas DataFrame Code from an Excel File for Jupyter Notebooks

How to generate pandas dataframe code from an excel file for jupyter notebooks

Inline constructor vs read_excel for notebooks

What the pandas format emits per value

Tier upload limits (DataFrame size still caps at 500 rows)

Cookbook

Basic DataFrame constructor

Fixing dtypes after the constructor

Empty cells become None → NaN

Dates come through as formatted text

Keyword-safe columns for attribute access

Edge cases and what actually happens

No dtypes comment block is produced

Numeric columns are strings

Only sheet 1 becomes a DataFrame

DataFrame capped at 500 rows

Line break inside a cell

Boolean columns are 'TRUE'/'FALSE' strings

Duplicate / colliding column names

Empty sheet

File too large for the tier

Run on Free tier

Frequently asked questions

Why does df["col"].sum() concatenate instead of add?

Is there a dtypes block in the output?

Why not just use pandas.read_excel()?

How many rows make it into the DataFrame?

Does it read multiple sheets into one DataFrame?

What variable name is used?

How are dates handled?

What about empty cells?

Are columns with spaces or keywords safe?

Does the constructor need pandas installed?

Is my data sent to a server?

What does the download look like?

Privacy first

Related guides

Generate pandas DataFrame Code from an Excel File for Jupyter Notebooks

How to generate pandas dataframe code from an excel file for jupyter notebooks

Inline constructor vs read_excel for notebooks

What the pandas format emits per value

Tier upload limits (DataFrame size still caps at 500 rows)

Cookbook

Basic DataFrame constructor

Fixing dtypes after the constructor

Empty cells become None → NaN

Dates come through as formatted text

Keyword-safe columns for attribute access

Edge cases and what actually happens

No dtypes comment block is produced

Numeric columns are strings

Only sheet 1 becomes a DataFrame

DataFrame capped at 500 rows

Line break inside a cell

Boolean columns are 'TRUE'/'FALSE' strings

Duplicate / colliding column names

Empty sheet

File too large for the tier

Run on Free tier

Frequently asked questions

Why does df["col"].sum() concatenate instead of add?

Is there a dtypes block in the output?

Why not just use pandas.read_excel()?

How many rows make it into the DataFrame?

Does it read multiple sheets into one DataFrame?

What variable name is used?

How are dates handled?

What about empty cells?

Are columns with spaces or keywords safe?

Does the constructor need pandas installed?

Is my data sent to a server?

What does the download look like?

Privacy first

Related guides