How to use a regex pattern to extract substrings from an excel column
- Step 1Upload your file — Drop an
.xlsx,.xls,.ods, or.csvonto the tool. Only the first sheet is read — if your text lives on a later tab, move or copy it to the first sheet before uploading. - Step 2Type the exact source column header — Enter the column header exactly as it appears in row 1, including capitalisation and spaces. Matching is case-sensitive:
Descriptionanddescriptionare different columns, and a typo silently produces an all-empty output column rather than an error. - Step 3Write the regex pattern — Enter a JavaScript regular expression. Wrap the part you want to keep in parentheses to use it as the capture group:
Ref: (\w+)keeps just the ref, not theRef:prefix. With no parentheses, the whole match is returned. - Step 4Name the output column — The default is
extracted. Pick a name that does not already exist in the sheet — reusing an existing header overwrites that column's values for every row. - Step 5Run the extraction (Pro tier) — Click Run. The tool requires the Pro plan or higher; on the Free plan it stops with
Regex Extractor requires Pro tier.Pro processes up to 100,000 rows / 50 MB per file. - Step 6Check the change count and download — The panel shows rows in, rows out, and the change count. If the change count is 0 or far below your row total, your pattern is too strict or the column name is wrong — fix and re-run before downloading
regex-extracted.xlsx.
The three options and exactly what they do
These are the only controls the tool exposes — there are no flags toggles, no multi-pattern rows, and no preset library. Defaults shown are the real defaults from the option schema.
| Option | Type | Default | Behaviour |
|---|---|---|---|
sourceColumn | text (required) | — | Header of the column scanned. Matched case-sensitively against row 1. Wrong name → empty output, no error. |
pattern | text (required) | — | A JavaScript regex. First capture group is extracted, else whole match. Compiled with no flags (case-sensitive, single match per cell). |
outputColumn | text | extracted | Name of the new column written to the right. If it matches an existing header, that column is overwritten. |
What the engine returns per cell
The extraction rule is match ? (match[1] ?? match[0]) : "". Worked through for common pattern shapes against the source value Order ORD-2024-00193 shipped.
| Pattern | Has capture group? | Extracted value | Why |
|---|---|---|---|
ORD-\d{4}-\d{5} | No | ORD-2024-00193 | Whole match used (match[0]) |
(ORD-\d{4}-\d{5}) | Yes | ORD-2024-00193 | Capture group 1 used (match[1]) |
ORD-(\d{4})-\d{5} | Yes | 2024 | Only group 1 is read — later groups are ignored |
\d{4} | No | 2024 | First match only — 00193 is never reached |
INV-\d+ | No | `` (empty) | No match in the cell → empty string written |
Tier limits for the Regex Extractor
The tool is gated to Pro and above. Free-plan users hit the tier error before any file is read. Numbers are per-file.
| Plan | Available? | Max file size | Max rows |
|---|---|---|---|
| Free | No — blocked | 5 MB | 10,000 |
| Pro | Yes | 50 MB | 100,000 |
| Pro + Media | Yes | 200 MB | 500,000 |
| Developer | Yes | 500 MB | Unlimited |
Cookbook
Five recipes for the most common substring-extraction jobs. Each shows the source value, the pattern you type, and the value that lands in the new column. Remember: parentheses define what you keep, and only the first match per cell is returned.
Keep just the bracketed code
Pull the value inside square brackets out of a noisy status string. The capture group keeps the inner text only.
Source cell: "Status [ACTIVE] since 2023" Pattern: \[(\w+)\] Output: ACTIVE
Extract a date in any position
LEFT/MID can't find a date that floats around in the text. A regex with no capture group returns the whole match wherever it appears.
Source cell: "renewed on 2024-11-08 (auto)"
Pattern: \d{4}-\d{2}-\d{2}
Output: 2024-11-08Grab the domain after the @
Use a capture group so the @ is matched but not kept. Only the first email in the cell is processed.
Source cell: "reply to jane@example.co.uk please" Pattern: @([\w.\-]+) Output: example.co.uk
Pull a numeric quantity out of free text
Because cells are read as their displayed text (not raw numbers), a quantity buried in prose is matched as a string.
Source cell: "shipped 12 units of 48" Pattern: shipped (\d+) Output: 12
Empty result when the pattern misses
A cell with no match writes an empty string — useful for flagging rows that don't fit the expected shape so you can filter them in Excel.
Source cell: "no reference on file" Pattern: (ORD-\d+) Output: (blank)
Edge cases and what actually happens
Free plan
Pro requiredThe processor checks the tier before reading the file and throws Regex Extractor requires Pro tier. on Free. Upgrade to Pro to use the tool.
Source column header typo
Empty columnIf sourceColumn doesn't match a header exactly (case included), every cell reads as an empty string, every match fails, and the new column is entirely blank — with no error. Always copy the header verbatim.
Invalid regex syntax
ErrorThe pattern is compiled with new RegExp(...). An unbalanced bracket like ( or [a- throws a SyntaxError, which surfaces as a processing error. Validate your pattern (e.g. on regex101.com) first.
Two matches in one cell
By designThe global flag is stripped internally, so only the first match per cell is returned. A cell with two order IDs yields just the first.
Output column name already exists
OverwritesNaming the output column the same as an existing header overwrites that column's values for all rows. Use a fresh name like extracted to keep the original.
Data on a second sheet
IgnoredOnly the first sheet (index 0) is read. Move the target data to the first tab before uploading or the tool sees no rows.
Case mismatch in the pattern
No matchThe regex has no i flag. (ABC) will not match abc. Either type the correct case or build case-insensitivity into the pattern with [Aa][Bb][Cc] or an inline (?i).
Number / date source column
Matched as textCells are read formatted as strings (not raw values), so a date shows as 2024-11-08 and a number as its displayed form. Patterns run against that text — handy, but be aware locale formatting affects what you match.
Empty source cells
PreservedBlank cells become empty strings, never match, and write an empty extraction — they don't error or get dropped.
All capture groups beyond the first
IgnoredOnly match[1] is read. A pattern like (\d+)-(\d+) returns only the first group; the second is silently discarded. Run a second pass or restructure the pattern if you need both.
Frequently asked questions
What does the tool actually extract from each cell?
The first capture group if your pattern has parentheses (match[1]), otherwise the whole match (match[0]). Non-matching cells get an empty string.
Does it support named capture groups?
No. The engine reads match[1] (the first positional group) or the whole match — named groups like (?<name>...) are not mapped to columns. Use a plain positional group (...).
Can I extract every match in a cell, not just the first?
No. The global flag is removed internally, so exactly one match per cell is returned. To capture more you'd need to restructure your pattern or split the cell first.
Can I run multiple regex patterns in one pass?
No — one pattern per run, writing one output column. Run the tool again on the result file to add a second extracted column.
Is the regex case-sensitive?
Yes. The pattern is compiled with no flags, so (Active) won't match active. Build case-handling into the pattern with character classes or an inline (?i) modifier.
Does it modify my source column?
No. The original column is untouched; extracted values go into a new column appended to the right of the sheet.
What output format do I get?
A single-sheet .xlsx file named regex-extracted.xlsx, regardless of whether you uploaded XLSX or CSV. It's a binary download generated in your browser.
Why is my whole output column empty?
Almost always the source column name doesn't match the header exactly (case-sensitive), or the pattern matches nothing. Check the change count — 0 means no cell matched.
Does any data get uploaded?
No. Parsing and matching run entirely in your browser with SheetJS; the file bytes never reach a server.
Which file types can I upload?
.xlsx, .xls, .ods, and .csv are accepted. Only the first sheet of multi-sheet workbooks is processed.
How many rows can I process?
Pro handles 100,000 rows / 50 MB, Pro + Media 500,000 / 200 MB, Developer is unlimited up to 500 MB. The tool is not available on Free.
What if I just want to clean stray whitespace, not extract?
Use the whitespace trimmer for spreadsheets. For splitting one column into many sheets by value, see the conditional splitter; to reshape wide tables to long, see unpivot.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.