How to pre-process excel date columns to iso 8601 before pandas.to_datetime()
- Step 1Export the Excel extract — Save the
.xlsx/.csvwith the date column(s) your pipeline ingests. - Step 2
- Step 3Name the pipeline's date columns — Type the exact headers
pandasparses (e.g.order_date, ship_date) so only those are normalised. Or leave blank to auto-detect. - Step 4Pick ISO 8601 — Choose
ISO 8601 (yyyy-MM-dd)— the format that lets you pass an explicitformat='%Y-%m-%d'to pandas. - Step 5Run and read with pandas — Download
dates-standardized.xlsx, thenpd.read_excel(...)andpd.to_datetime(df['order_date'], format='%Y-%m-%d'). - Step 6Handle reported failures in code — The findings report lists unparseable cells. Coerce or drop them deliberately (e.g.
errors='coerce') rather than letting pandas guess.
Why pandas struggles before standardizing
Common Excel values and what raw pandas does with them, versus after the column is normalised to ISO 8601.
| Excel value | Raw pd.to_datetime() | After ISO 8601 standardize |
|---|---|---|
12 January 2024 | Parses, but slow inference per value | 2024-01-12 -> exact %Y-%m-%d |
06/07/2024 (mixed col) | dayfirst guess may flip meaning | 2024-06-07 (month-first, consistent) |
Q3 2024 | ParserError or NaT | Left unchanged + in findings report |
2024-09-29 14:30 | Parses as datetime with time | 2024-09-29 (time dropped) |
| Mixed formats in one column | Falls back to slow object dtype / NaT | Uniform ISO -> clean datetime64[ns] |
What chrono-node does with common date strings
Behaviour verified by running the bundled chrono-node parser. The parser uses its default English settings — month-first for ambiguous slash dates — and there is no locale switch in this tool.
| Input cell value | Parses? | Result (to ISO 8601) | Note |
|---|---|---|---|
2024-09-29 | Yes | 2024-09-29 | Already ISO — passes through unchanged |
09/07/2024 | Yes | 2024-09-07 | Read as month-first (US): Sep 7, not 9 July |
13/07/2024 | Yes | 2024-07-13 | Day-first used only because 13 cannot be a month |
29 September 2024 | Yes | 2024-09-29 | Natural-language text date |
29-Sep-24 | Yes | 2024-09-29 | Two-digit year is expanded |
2024-09-29 14:30 | Yes | 2024-09-29 | Time is parsed then dropped by the date-only token |
next friday | Yes | (date relative to today) | Relative phrases resolve against the run date |
Q3 2024 | No | (left unchanged) | Listed in the findings report |
45899 (number as text) | No | (left unchanged) | A bare serial typed as text does not parse |
Tier limits for the Date Standardizer
The Date Standardizer is gated to Pro and above — a Free account cannot run it. Limits below are the real Excel-family numbers. One file is processed per run.
| Tier | Max file size | Max rows | Runs this tool? |
|---|---|---|---|
| Free | 5 MB | 10,000 | No — Pro required |
| Pro | 50 MB | 100,000 | Yes |
| Pro-media | 200 MB | 500,000 | Yes |
| Developer | 500 MB | Unlimited | Yes |
Cookbook
Standardize in the browser, then parse in Python. The snippets show the pandas call that works because the column is now uniform ISO 8601.
The deterministic parse
Once the column is ISO 8601 text, pass an explicit format. No inference, no dayfirst ambiguity, no per-value guessing.
# after running the Date Standardizer (ISO 8601 output)
import pandas as pd
df = pd.read_excel('dates-standardized.xlsx')
df['order_date'] = pd.to_datetime(df['order_date'], format='%Y-%m-%d')
# dtype: datetime64[ns], zero warningsAvoiding the ParserError that halts a load
A single Q3 2024 can raise ParserError mid-pipeline. Standardizing first leaves it as text (flagged in findings), so you decide how to handle it in code.
Before: pd.to_datetime(df['ship_date']) -> ParserError: 'Q3 2024'
After standardize, the bad value is still text:
df['ship_date'] = pd.to_datetime(df['ship_date'],
format='%Y-%m-%d',
errors='coerce') # 'Q3 2024' -> NaT, no crashNormalise only the columns you load
Name the pipeline's date fields so you don't touch numeric columns pandas reads as floats.
Columns box: order_date, ship_date
order_date, ship_date -> ISO 8601
price, qty, customer_id -> untouched
# pandas:
for c in ['order_date','ship_date']:
df[c] = pd.to_datetime(df[c], format='%Y-%m-%d')Date-only columns: time is dropped (plan for it)
If a column has timestamps you need to keep, do NOT route it through this tool — the date-only formats discard the time. Standardize date-only columns here; parse true timestamps in pandas.
Standardizer on 'event_ts' = 2024-09-29 14:30 -> output 2024-09-29 (time lost) Better: leave 'event_ts' alone, parse in pandas: df['event_ts'] = pd.to_datetime(df['event_ts']) Use the Standardizer only for date-only fields like 'order_date'.
Reconciling failed cells programmatically
Read the findings report, then assert on it in your pipeline so unparseable dates are an explicit, visible decision.
# findings.failedCells (from the tool): # Row 14, ship_date: "TBD" # Row 27, ship_date: "ask ops" bad = df['ship_date'].isna() assert bad.sum() == 2, 'unexpected unparseable dates' df = df[~bad] # or impute, per your rules
Edge cases and what actually happens
Output is text, not datetime64
ExpectedThe tool writes ISO 8601 text. pandas reads that cleanly via pd.to_datetime(col, format='%Y-%m-%d') — text is exactly what you want here.
Time component present
Time droppedOutput formats are date-only, so 2024-09-29 14:30 becomes 2024-09-29. For columns where the time matters, skip this tool and parse them directly in pandas.
Ambiguous slash dates in a mixed column
Parsed (month-first)chrono resolves 06/07/2024 month-first. This is consistent (unlike pandas' per-call dayfirst guessing) but may not match UK intent — verify day-first columns.
Value pandas would ParserError on
Failed parseQ3 2024 and similar are left as text and itemised in failedCells, so they reach pandas as plain strings you can coerce to NaT deliberately.
Bare Excel serial as text
Failed parseA serial number typed as text (45899) doesn't parse here. If it's a true Excel date cell, it's decoded correctly; if it's text, convert it in Excel first.
Relative phrases like `yesterday`
Resolves to todaychrono resolves these against the run date, which can poison a pipeline. Audit free-text columns before standardizing.
Auto-detect picks an unintended column
Over-broadIf a non-date column happens to have >=60% parseable values in its first 20 rows, auto-detect may rewrite it. Name your exact pipeline columns to avoid surprises.
Two-digit years
Expanded12-Jan-24 -> 2024. Verify the century for historical extracts before loading.
Failures beyond 50
Truncated listfailedCells lists the first 50; use failedCount for the true number when validating in code.
Extract over tier limit
RejectedFiles above your tier ceiling are refused — use Pro-media (200 MB / 500,000 rows) or Developer for big extracts.
Frequently asked questions
Why standardize before pandas instead of in pandas?
Pre-normalising to ISO 8601 lets you pass an explicit format='%Y-%m-%d', which is deterministic and fast. pandas' own inference is slower, can pick the wrong dayfirst interpretation per call, and raises ParserError on values like Q3 2024.
Does ISO 8601 work with all pandas date parsing?
Yes. yyyy-MM-dd is pandas' preferred format. pd.to_datetime(col, format='%Y-%m-%d') is unambiguous and skips inference overhead.
The output is text — is that a problem for pandas?
No. Text is exactly what pd.to_datetime expects to parse. ISO 8601 text parses cleanly into datetime64[ns].
What happens to a cell with a time, like `2024-01-12 14:30`?
The time is dropped — the six output formats are date-only. If you need the time, don't route that column through this tool; parse it directly in pandas with pd.to_datetime.
How does it handle ambiguous `06/07/2024`?
chrono reads it month-first (June 7), consistently across the column. That's more predictable than pandas' per-call dayfirst guessing, but verify if your column is UK day-first — there's no locale toggle.
What about values that can't be parsed?
They're left as text and listed in the findings report. They reach pandas as strings, so use errors='coerce' to turn them into NaT deliberately rather than crashing.
Can I normalise only the columns my pipeline reads?
Yes. Name them in the comma-separated box (e.g. order_date, ship_date). Leave it blank to auto-detect, but naming columns is safer for pipelines.
Should I run the Format Inspector first?
It helps. The Format Inspector shows which columns hold text-stored dates so you know exactly what to standardize before the pandas step.
How big an extract can I process?
100,000 rows on Pro, 500,000 on Pro-media, unlimited on Developer. The tool is Pro-gated and runs in the browser.
Is anything uploaded to a server?
No. The whole conversion runs client-side before the data ever reaches Python.
What about reproducibility — relative dates?
Avoid feeding free-text relative phrases (yesterday, next monday) through the tool; chrono resolves them against the run date, which isn't reproducible. Audit those columns first.
Can I generate the pandas loader code from the file too?
Yes — once standardized, the Python generator can emit a pandas DataFrame scaffold for the cleaned workbook to kick-start your loader.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.