How to generated inline python data vs pandas.read_excel() — choosing the right approach
- Step 1Count the rows — Over ~500 rows? Use
read_excel()— the generator only inlines the first 500 regardless of tier. Under 500 and stable? The generator is a candidate. - Step 2Ask if the data changes — If the source updates regularly, prefer
read_excel()so a re-run picks up changes. Inline data is a frozen snapshot you'd have to regenerate by hand. - Step 3Check whether the file travels with the code — Shared notebook on GitHub/Colab where the
.xlsxisn't bundled, or a reviewer who should see the data inline? Generate. File reliably on disk next to the script?read_excel(). - Step 4Decide how much you care about dtypes — Need clean numeric/date columns with no fuss?
read_excel()infers them. Fine to add oneastype/to_numericline? The generator works — it emits values as text. - Step 5If generating, pick the format —
list_of_dictsfor fixtures andpd.DataFrame(data)flexibility,dict_of_listsfor column access,pandasfor a ready DataFrame constructor. All cap at 500 rows. - Step 6Paste and cast (generator path) — Drop the snippet in, rename the
data/dfvariable, and cast numeric/boolean/date columns since they arrive as strings. If Python isn't actually the target, the same sheet also generates SQL INSERTs, an i18n JSON file, or a tRPC router.
Decision matrix
Pick the row that matches your situation. 'Generator' = inline literal from this tool; 'read_excel' = pandas.read_excel() at runtime.
| Your situation | Use | Why |
|---|---|---|
| ≤ 500 stable reference rows | Generator | Portable, no file/reader dependency |
| > 500 rows | read_excel | Generator caps inline at 500 |
| Data changes weekly | read_excel | Re-read picks up changes; inline is frozen |
| Notebook shared without the .xlsx | Generator | Data lives in the cell |
| Need int/float/datetime dtypes | read_excel | Inferred on read; generator emits text |
| Multi-sheet workbook | read_excel | Generator reads first sheet only |
| No openpyxl on the kernel | Generator (list/dict) | List/dict output needs no Excel reader |
| Test fixture maintained by QA | Generator | Regenerate from the shared sheet |
Capability comparison
Concrete differences between the two approaches, grounded in how this generator actually behaves.
| Capability | Generator (inline) | pandas.read_excel() |
|---|---|---|
| Row capacity | First 500 only | Whole file |
| Sheets | First sheet only | Any sheet via sheet_name |
| Numeric dtype | Text ("42") — cast manually | Inferred (int64/float64) |
| Dates | Formatted text ("1/15/24") | Parsed to datetime |
| File dependency | None — data is the code | Needs the file at runtime |
| openpyxl required | No (list/dict output) | Yes |
| Reflects live edits | No — snapshot | Yes — on re-read |
| Visible in diff | Yes | No (binary file) |
Format → what you paste
The generator's single format option and what each gives the consumer.
| format | Consumer gets | pandas dependency |
|---|---|---|
list_of_dicts | data = [{...}] | None (until you call pd.DataFrame) |
dict_of_lists | data = {col: [...]} | None |
pandas | df = pd.DataFrame({...}) | Yes |
Cookbook
Side-by-side recipes for the same dataset done both ways, so the trade-off is concrete.
Small lookup: inline wins
A 12-row currency table that never changes. Inlining removes the file dependency entirely.
# Generator (list_of_dicts), pasted once:
data = [
{"code": "USD", "symbol": "$"},
{"code": "EUR", "symbol": "\u20ac"}
# ... 10 more
]
# vs read_excel — needs the file + openpyxl every run:
# import pandas as pd
# df = pd.read_excel("currencies.xlsx") # FileNotFoundError if not shipped50,000-row export: read_excel wins
The generator would silently truncate to 500. read_excel reads the whole file.
# Generator (pandas) on a 50k-row file:
import pandas as pd
df = pd.DataFrame({ "id": ["1", ... 500 vals], ... })
# only 500 rows — and NO truncation comment in pandas/dict_of_lists format!
# read_excel — all 50,000 rows, real dtypes:
df = pd.read_excel("export.xlsx")
print(len(df)) # 50000Generated DataFrame needs a dtype fix
read_excel infers numerics; the generator emits text, so you add one cast line.
# Generator output:
import pandas as pd
df = pd.DataFrame({"qty": ["10", "5"], "price": ["2.50", "9.00"]})
df = df.astype({"qty": int, "price": float}) # <- needed
# read_excel output (no cast needed):
df = pd.read_excel("orders.xlsx") # qty int64, price float64 alreadyShared Colab notebook: inline wins
A teammate opens your Colab link. With read_excel they'd hit a missing-file error; inline data just runs.
# Inline (works for anyone who opens the notebook):
data = {"month": ["Jan", "Feb"], "sales": ["100", "140"]}
# read_excel (fails unless they also upload sales.xlsx):
# df = pd.read_excel("/content/sales.xlsx") # FileNotFoundErrorHybrid: fixture inline, pipeline via read_excel
Use both deliberately — tiny inline fixture for a unit test, read_excel for the real run.
# tests/fixtures.py (generated, frozen, 6 rows)
SAMPLE = [{"sku": "A1", "qty": "3"}, ...]
# app/pipeline.py (production, live, full file)
import pandas as pd
def load():
return pd.read_excel("data/inventory.xlsx")Edge cases and what actually happens
Dataset over 500 rows
TruncatedThe generator inlines only the first 500 rows (fixed constant). list_of_dicts flags it with a comment; dict_of_lists and pandas truncate silently. If completeness matters, read_excel() is the correct choice — it reads the whole file.
Data changes after you generated it
Stale snapshotInline data is frozen at generation time. A weekly-refreshed source means stale code until you re-run the generator and replace the literal. read_excel() re-reads the current file every run — pick it for living data.
You need real numeric/date dtypes
Generator limitationThe generator emits everything as text ("42", "1/15/24"), so you must cast after pasting. read_excel() infers int64/float64/datetime64 automatically. If dtype inference is the value you want, use read_excel().
Multi-sheet workbook
Generator limitationThe generator reads the first sheet only. read_excel(sheet_name=None) returns a dict of every sheet, or name a specific tab. For multi-tab models, read_excel() wins outright.
No openpyxl on the target machine
Generator advantageread_excel() raises ImportError: Missing optional dependency 'openpyxl' on a minimal image. The generator's list/dict output is plain Python and needs no Excel reader — a real edge where inline beats read_excel.
File not shipped with the notebook
Generator advantageA shared GitHub/Colab notebook without the bundled .xlsx makes read_excel() raise FileNotFoundError for every other reader. Inline data has no path dependency and just runs.
A cell with a line break (generator path)
Invalid PythonIf you choose the generator, newlines inside a cell aren't escaped and break the string literal. read_excel() handles multi-line cells natively — another point for read_excel when your data has free-text fields.
pandas not installed (generator path)
Generator advantage (list/dict)The pandas format needs pandas; the list_of_dicts/dict_of_lists formats don't. If the consumer has no pandas at all, the generator's non-pandas formats still work — read_excel() obviously requires pandas.
Free tier
RejectedThe generator is Pro-gated. On Free tier you can't produce inline code at all, so read_excel() (or upgrading) is your only path.
Sensitive data in a shared repo
Consider read_excelInline data commits the values into source control in plaintext. If the data is sensitive and the repo is shared, keeping it in a gitignored .xlsx read via read_excel() may be safer than baking it into a committed .py. If you do inline it, scrub PII from the sheet first with email/phone redaction.
Frequently asked questions
When should I always use read_excel() instead of inline data?
For files over 500 rows (the generator's inline cap), for data that changes regularly, for multi-sheet workbooks, and whenever you want pandas to infer numeric/date dtypes for you. read_excel re-reads the live file each run.
When is the generator clearly better?
Small, stable reference tables and test fixtures; notebooks shared where the .xlsx won't travel; environments with no openpyxl; and any case where you want the data visible in code review rather than hidden in a binary.
Why does the generator output numbers as strings?
It reads cells with SheetJS formatting applied, so numeric and boolean values arrive as text and are emitted quoted. read_excel, by contrast, infers real numeric and datetime dtypes. With the generator you add a cast; with read_excel you don't.
Does the generated DataFrame require pandas?
The pandas format does. The list_of_dicts and dict_of_lists formats are pure Python and need nothing extra — useful when the consumer has no pandas or no openpyxl.
What if my Excel file changes weekly?
Use read_excel() pointed at the file. Inline data is a snapshot — you'd have to re-run the generator and replace the literal each week, which is more maintenance than a live read.
Can I use both approaches in one project?
Yes, and it's common: generate a tiny inline fixture for unit tests (fast, no file, no openpyxl) and keep read_excel() for the production pipeline that processes the full, current file.
Does the generator handle multiple sheets like read_excel?
No. The generator reads only the first sheet. read_excel can load any sheet by name, or all of them with sheet_name=None. For multi-tab workbooks, read_excel wins.
How does row capacity compare?
The generator inlines at most 500 rows (a fixed constant, not a tier limit). read_excel reads the entire file. Past 500 rows, read_excel is the only complete option.
Is inline data safe to commit?
It's plaintext in your repo. For non-sensitive reference data that's fine and even desirable (visible in diffs). For sensitive values, prefer a gitignored .xlsx read via read_excel().
What about dates specifically?
The generator emits dates as formatted strings (e.g. "1/15/24"); you parse them with pd.to_datetime after pasting. read_excel parses dates into datetime64 columns automatically.
Can I generate without pandas installed locally?
Yes — generation happens in your browser via SheetJS, with no pandas involved at generation time. pandas only matters where you later run the generated code (and only for the pandas format).
Is the generator free?
No — it's Pro tier and above. read_excel() is just pandas, so it's available wherever you run Python. If you're on Free, read_excel() is your route until you upgrade.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.