Extract Text from Excel Column Using Regex — Free Pattern Extractor

How to use a regex pattern to extract substrings from an excel column

Step 1
Upload your file — Drop an .xlsx, .xls, .ods, or .csv onto the tool. Only the first sheet is read — if your text lives on a later tab, move or copy it to the first sheet before uploading.
Step 2
Type the exact source column header — Enter the column header exactly as it appears in row 1, including capitalisation and spaces. Matching is case-sensitive: Description and description are different columns, and a typo silently produces an all-empty output column rather than an error.
Step 3
Write the regex pattern — Enter a JavaScript regular expression. Wrap the part you want to keep in parentheses to use it as the capture group: Ref: (\w+) keeps just the ref, not the Ref: prefix. With no parentheses, the whole match is returned.
Step 4
Name the output column — The default is extracted. Pick a name that does not already exist in the sheet — reusing an existing header overwrites that column's values for every row.
Step 5
Run the extraction (Pro tier) — Click Run. The tool requires the Pro plan or higher; on the Free plan it stops with Regex Extractor requires Pro tier. Pro processes up to 100,000 rows / 50 MB per file.
Step 6
Check the change count and download — The panel shows rows in, rows out, and the change count. If the change count is 0 or far below your row total, your pattern is too strict or the column name is wrong — fix and re-run before downloading regex-extracted.xlsx.

The three options and exactly what they do

These are the only controls the tool exposes — there are no flags toggles, no multi-pattern rows, and no preset library. Defaults shown are the real defaults from the option schema.

Option	Type	Default	Behaviour
`sourceColumn`	text (required)	—	Header of the column scanned. Matched case-sensitively against row 1. Wrong name → empty output, no error.
`pattern`	text (required)	—	A JavaScript regex. First capture group is extracted, else whole match. Compiled with no flags (case-sensitive, single match per cell).
`outputColumn`	text	`extracted`	Name of the new column written to the right. If it matches an existing header, that column is overwritten.

What the engine returns per cell

The extraction rule is match ? (match[1] ?? match[0]) : "". Worked through for common pattern shapes against the source value Order ORD-2024-00193 shipped.

Pattern	Has capture group?	Extracted value	Why
`ORD-\d{4}-\d{5}`	No	`ORD-2024-00193`	Whole match used (`match[0]`)
`(ORD-\d{4}-\d{5})`	Yes	`ORD-2024-00193`	Capture group 1 used (`match[1]`)
`ORD-(\d{4})-\d{5}`	Yes	`2024`	Only group 1 is read — later groups are ignored
`\d{4}`	No	`2024`	First match only — `00193` is never reached
`INV-\d+`	No	`` (empty)	No match in the cell → empty string written

Tier limits for the Regex Extractor

The tool is gated to Pro and above. Free-plan users hit the tier error before any file is read. Numbers are per-file.

Plan	Available?	Max file size	Max rows
Free	No — blocked	5 MB	10,000
Pro	Yes	50 MB	100,000
Pro + Media	Yes	200 MB	500,000
Developer	Yes	500 MB	Unlimited

Cookbook

Five recipes for the most common substring-extraction jobs. Each shows the source value, the pattern you type, and the value that lands in the new column. Remember: parentheses define what you keep, and only the first match per cell is returned.

Keep just the bracketed code

Pull the value inside square brackets out of a noisy status string. The capture group keeps the inner text only.

Source cell: "Status [ACTIVE] since 2023"
Pattern:     \[(\w+)\]
Output:      ACTIVE

Extract a date in any position

LEFT/MID can't find a date that floats around in the text. A regex with no capture group returns the whole match wherever it appears.

Source cell: "renewed on 2024-11-08 (auto)"
Pattern:     \d{4}-\d{2}-\d{2}
Output:      2024-11-08

Grab the domain after the @

Use a capture group so the @ is matched but not kept. Only the first email in the cell is processed.

Source cell: "reply to jane@example.co.uk please"
Pattern:     @([\w.\-]+)
Output:      example.co.uk

Pull a numeric quantity out of free text

Because cells are read as their displayed text (not raw numbers), a quantity buried in prose is matched as a string.

Source cell: "shipped 12 units of 48"
Pattern:     shipped (\d+)
Output:      12

Empty result when the pattern misses

A cell with no match writes an empty string — useful for flagging rows that don't fit the expected shape so you can filter them in Excel.

Source cell: "no reference on file"
Pattern:     (ORD-\d+)
Output:      (blank)

Edge cases and what actually happens

Free plan

Pro required

The processor checks the tier before reading the file and throws Regex Extractor requires Pro tier. on Free. Upgrade to Pro to use the tool.

Source column header typo

Empty column

If sourceColumn doesn't match a header exactly (case included), every cell reads as an empty string, every match fails, and the new column is entirely blank — with no error. Always copy the header verbatim.

Invalid regex syntax

Error

The pattern is compiled with new RegExp(...). An unbalanced bracket like ( or [a- throws a SyntaxError, which surfaces as a processing error. Validate your pattern (e.g. on regex101.com) first.

Two matches in one cell

By design

The global flag is stripped internally, so only the first match per cell is returned. A cell with two order IDs yields just the first.

Output column name already exists

Overwrites

Naming the output column the same as an existing header overwrites that column's values for all rows. Use a fresh name like extracted to keep the original.

Data on a second sheet

Ignored

Only the first sheet (index 0) is read. Move the target data to the first tab before uploading or the tool sees no rows.

Case mismatch in the pattern

No match

The regex has no i flag. (ABC) will not match abc. Either type the correct case or build case-insensitivity into the pattern with [Aa][Bb][Cc] or an inline (?i).

Number / date source column

Matched as text

Cells are read formatted as strings (not raw values), so a date shows as 2024-11-08 and a number as its displayed form. Patterns run against that text — handy, but be aware locale formatting affects what you match.

Empty source cells

Preserved

Blank cells become empty strings, never match, and write an empty extraction — they don't error or get dropped.

All capture groups beyond the first

Ignored

Only match[1] is read. A pattern like (\d+)-(\d+) returns only the first group; the second is silently discarded. Run a second pass or restructure the pattern if you need both.

Frequently asked questions

What does the tool actually extract from each cell?

The first capture group if your pattern has parentheses (match[1]), otherwise the whole match (match[0]). Non-matching cells get an empty string.

Does it support named capture groups?

No. The engine reads match[1] (the first positional group) or the whole match — named groups like (?<name>...) are not mapped to columns. Use a plain positional group (...).

Can I extract every match in a cell, not just the first?

No. The global flag is removed internally, so exactly one match per cell is returned. To capture more you'd need to restructure your pattern or split the cell first.

Can I run multiple regex patterns in one pass?

No — one pattern per run, writing one output column. Run the tool again on the result file to add a second extracted column.

Is the regex case-sensitive?

Yes. The pattern is compiled with no flags, so (Active) won't match active. Build case-handling into the pattern with character classes or an inline (?i) modifier.

Does it modify my source column?

No. The original column is untouched; extracted values go into a new column appended to the right of the sheet.

What output format do I get?

A single-sheet .xlsx file named regex-extracted.xlsx, regardless of whether you uploaded XLSX or CSV. It's a binary download generated in your browser.

Why is my whole output column empty?

Almost always the source column name doesn't match the header exactly (case-sensitive), or the pattern matches nothing. Check the change count — 0 means no cell matched.

Does any data get uploaded?

No. Parsing and matching run entirely in your browser with SheetJS; the file bytes never reach a server.

Which file types can I upload?

.xlsx, .xls, .ods, and .csv are accepted. Only the first sheet of multi-sheet workbooks is processed.

How many rows can I process?

Pro handles 100,000 rows / 50 MB, Pro + Media 500,000 / 200 MB, Developer is unlimited up to 500 MB. The tool is not available on Free.

What if I just want to clean stray whitespace, not extract?

Use the whitespace trimmer for spreadsheets. For splitting one column into many sheets by value, see the conditional splitter; to reshape wide tables to long, see unpivot.

Privacy first

Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.

How to use a regex pattern to extract substrings from an excel column

Step 1
Upload your file — Drop an .xlsx, .xls, .ods, or .csv onto the tool. Only the first sheet is read — if your text lives on a later tab, move or copy it to the first sheet before uploading.
Step 2
Type the exact source column header — Enter the column header exactly as it appears in row 1, including capitalisation and spaces. Matching is case-sensitive: Description and description are different columns, and a typo silently produces an all-empty output column rather than an error.
Step 3
Write the regex pattern — Enter a JavaScript regular expression. Wrap the part you want to keep in parentheses to use it as the capture group: Ref: (\w+) keeps just the ref, not the Ref: prefix. With no parentheses, the whole match is returned.
Step 4
Name the output column — The default is extracted. Pick a name that does not already exist in the sheet — reusing an existing header overwrites that column's values for every row.
Step 5
Run the extraction (Pro tier) — Click Run. The tool requires the Pro plan or higher; on the Free plan it stops with Regex Extractor requires Pro tier. Pro processes up to 100,000 rows / 50 MB per file.
Step 6
Check the change count and download — The panel shows rows in, rows out, and the change count. If the change count is 0 or far below your row total, your pattern is too strict or the column name is wrong — fix and re-run before downloading regex-extracted.xlsx.

The three options and exactly what they do

These are the only controls the tool exposes — there are no flags toggles, no multi-pattern rows, and no preset library. Defaults shown are the real defaults from the option schema.

Option	Type	Default	Behaviour
`sourceColumn`	text (required)	—	Header of the column scanned. Matched case-sensitively against row 1. Wrong name → empty output, no error.
`pattern`	text (required)	—	A JavaScript regex. First capture group is extracted, else whole match. Compiled with no flags (case-sensitive, single match per cell).
`outputColumn`	text	`extracted`	Name of the new column written to the right. If it matches an existing header, that column is overwritten.

What the engine returns per cell

The extraction rule is match ? (match[1] ?? match[0]) : "". Worked through for common pattern shapes against the source value Order ORD-2024-00193 shipped.

Pattern	Has capture group?	Extracted value	Why
`ORD-\d{4}-\d{5}`	No	`ORD-2024-00193`	Whole match used (`match[0]`)
`(ORD-\d{4}-\d{5})`	Yes	`ORD-2024-00193`	Capture group 1 used (`match[1]`)
`ORD-(\d{4})-\d{5}`	Yes	`2024`	Only group 1 is read — later groups are ignored
`\d{4}`	No	`2024`	First match only — `00193` is never reached
`INV-\d+`	No	`` (empty)	No match in the cell → empty string written

Tier limits for the Regex Extractor

The tool is gated to Pro and above. Free-plan users hit the tier error before any file is read. Numbers are per-file.

Plan	Available?	Max file size	Max rows
Free	No — blocked	5 MB	10,000
Pro	Yes	50 MB	100,000
Pro + Media	Yes	200 MB	500,000
Developer	Yes	500 MB	Unlimited

Cookbook

Keep just the bracketed code

Pull the value inside square brackets out of a noisy status string. The capture group keeps the inner text only.

Source cell: "Status [ACTIVE] since 2023"
Pattern:     \[(\w+)\]
Output:      ACTIVE

Extract a date in any position

LEFT/MID can't find a date that floats around in the text. A regex with no capture group returns the whole match wherever it appears.

Source cell: "renewed on 2024-11-08 (auto)"
Pattern:     \d{4}-\d{2}-\d{2}
Output:      2024-11-08

Grab the domain after the @

Use a capture group so the @ is matched but not kept. Only the first email in the cell is processed.

Source cell: "reply to jane@example.co.uk please"
Pattern:     @([\w.\-]+)
Output:      example.co.uk

Pull a numeric quantity out of free text

Because cells are read as their displayed text (not raw numbers), a quantity buried in prose is matched as a string.

Source cell: "shipped 12 units of 48"
Pattern:     shipped (\d+)
Output:      12

Empty result when the pattern misses

A cell with no match writes an empty string — useful for flagging rows that don't fit the expected shape so you can filter them in Excel.

Source cell: "no reference on file"
Pattern:     (ORD-\d+)
Output:      (blank)

Edge cases and what actually happens

Free plan

Pro required

The processor checks the tier before reading the file and throws Regex Extractor requires Pro tier. on Free. Upgrade to Pro to use the tool.

Source column header typo

Empty column

Invalid regex syntax

Error

Two matches in one cell

By design

The global flag is stripped internally, so only the first match per cell is returned. A cell with two order IDs yields just the first.

Output column name already exists

Overwrites

Naming the output column the same as an existing header overwrites that column's values for all rows. Use a fresh name like extracted to keep the original.

Data on a second sheet

Ignored

Only the first sheet (index 0) is read. Move the target data to the first tab before uploading or the tool sees no rows.

Case mismatch in the pattern

No match

The regex has no i flag. (ABC) will not match abc. Either type the correct case or build case-insensitivity into the pattern with [Aa][Bb][Cc] or an inline (?i).

Number / date source column

Matched as text

Empty source cells

Preserved

Blank cells become empty strings, never match, and write an empty extraction — they don't error or get dropped.

All capture groups beyond the first

Ignored

Only match[1] is read. A pattern like (\d+)-(\d+) returns only the first group; the second is silently discarded. Run a second pass or restructure the pattern if you need both.

Frequently asked questions

What does the tool actually extract from each cell?

The first capture group if your pattern has parentheses (match[1]), otherwise the whole match (match[0]). Non-matching cells get an empty string.

Does it support named capture groups?

No. The engine reads match[1] (the first positional group) or the whole match — named groups like (?<name>...) are not mapped to columns. Use a plain positional group (...).

Can I extract every match in a cell, not just the first?

No. The global flag is removed internally, so exactly one match per cell is returned. To capture more you'd need to restructure your pattern or split the cell first.

Can I run multiple regex patterns in one pass?

No — one pattern per run, writing one output column. Run the tool again on the result file to add a second extracted column.

Is the regex case-sensitive?

Yes. The pattern is compiled with no flags, so (Active) won't match active. Build case-handling into the pattern with character classes or an inline (?i) modifier.

Does it modify my source column?

No. The original column is untouched; extracted values go into a new column appended to the right of the sheet.

What output format do I get?

A single-sheet .xlsx file named regex-extracted.xlsx, regardless of whether you uploaded XLSX or CSV. It's a binary download generated in your browser.

Why is my whole output column empty?

Almost always the source column name doesn't match the header exactly (case-sensitive), or the pattern matches nothing. Check the change count — 0 means no cell matched.

Does any data get uploaded?

No. Parsing and matching run entirely in your browser with SheetJS; the file bytes never reach a server.

Which file types can I upload?

.xlsx, .xls, .ods, and .csv are accepted. Only the first sheet of multi-sheet workbooks is processed.

How many rows can I process?

Pro handles 100,000 rows / 50 MB, Pro + Media 500,000 / 200 MB, Developer is unlimited up to 500 MB. The tool is not available on Free.

What if I just want to clean stray whitespace, not extract?

Use the whitespace trimmer for spreadsheets. For splitting one column into many sheets by value, see the conditional splitter; to reshape wide tables to long, see unpivot.

Use a Regex Pattern to Extract Substrings from an Excel Column

How to use a regex pattern to extract substrings from an excel column

The three options and exactly what they do

What the engine returns per cell

Tier limits for the Regex Extractor

Cookbook

Keep just the bracketed code

Extract a date in any position

Grab the domain after the @

Pull a numeric quantity out of free text

Empty result when the pattern misses

Edge cases and what actually happens

Free plan

Source column header typo

Invalid regex syntax

Two matches in one cell

Output column name already exists

Data on a second sheet

Case mismatch in the pattern

Number / date source column

Empty source cells

All capture groups beyond the first

Frequently asked questions

What does the tool actually extract from each cell?

Does it support named capture groups?

Can I extract every match in a cell, not just the first?

Can I run multiple regex patterns in one pass?

Is the regex case-sensitive?

Does it modify my source column?

What output format do I get?

Why is my whole output column empty?

Does any data get uploaded?

Which file types can I upload?

How many rows can I process?

What if I just want to clean stray whitespace, not extract?

Privacy first

Related guides

Use a Regex Pattern to Extract Substrings from an Excel Column

How to use a regex pattern to extract substrings from an excel column

The three options and exactly what they do

What the engine returns per cell

Tier limits for the Regex Extractor

Cookbook

Keep just the bracketed code

Extract a date in any position

Grab the domain after the @

Pull a numeric quantity out of free text

Empty result when the pattern misses

Edge cases and what actually happens

Free plan

Source column header typo

Invalid regex syntax

Two matches in one cell

Output column name already exists

Data on a second sheet

Case mismatch in the pattern

Number / date source column

Empty source cells

All capture groups beyond the first

Frequently asked questions

What does the tool actually extract from each cell?

Does it support named capture groups?

Can I extract every match in a cell, not just the first?

Can I run multiple regex patterns in one pass?

Is the regex case-sensitive?

Does it modify my source column?

What output format do I get?

Why is my whole output column empty?

Does any data get uploaded?

Which file types can I upload?

How many rows can I process?

What if I just want to clean stray whitespace, not extract?

Privacy first

Related guides