How to use regex to extract email addresses embedded in excel text cells
- Step 1Upload the contact dataset — Drop the
.xlsxor.csvwhose first sheet holds the text column with embedded addresses. Only the first sheet is read. - Step 2Enter the source column header exactly — Type the header as it appears in row 1, with the same capitalisation —
Notes,Contact Info,Description. A mismatch silently yields an all-empty column. - Step 3Provide the email regex — Use a pattern with a capture group around the address, e.g.
([A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}). The parentheses ensure only the address is kept. - Step 4Name the output column — Use a fresh name like
email(the schema default isextracted). Don't reuse an existing header or you'll overwrite it. - Step 5Run on Pro and read the change count — The tool needs the Pro plan or higher. After running, the change count tells you how many rows yielded an address — a low number usually means the wrong source column.
- Step 6Download and finish cleaning — Download
regex-extracted.xlsx. For final hygiene, deduplicate the new email column and trim any inherited whitespace before importing into your platform.
Email extraction option contract
The tool exposes exactly three text inputs — there is no built-in email validator, no 'extract all emails' mode, and no default pattern is pre-seeded by the engine (you type the pattern).
| Option | Required? | Recommended value | Note |
|---|---|---|---|
sourceColumn | Yes | The free-text field, e.g. Notes | Case-sensitive header match |
pattern | Yes | ([A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}) | Capture group keeps the address only |
outputColumn | No | email | Defaults to extracted; avoid reusing a header |
How common email-bearing cells extract
Using the recommended pattern above. The engine returns the first capture group and the first match per cell only.
| Source cell | Extracted value | Why |
|---|---|---|
Jane Smith (jane@example.com) | jane@example.com | Address captured, name and brackets dropped |
"Ops" <ops@vendor.co.uk> | ops@vendor.co.uk | Angle brackets are not part of the group |
a@x.com, b@y.com | a@x.com | First match only — second address ignored |
mailto:hi@site.io | hi@site.io | Prefix not matched by the address group |
no email listed | `` (empty) | No match → empty string |
Plan availability and capacity
Email extraction is a Pro-tier feature. Capacity is per uploaded file.
| Plan | Can run? | File size | Rows |
|---|---|---|---|
| Free | No | 5 MB | 10,000 |
| Pro | Yes | 50 MB | 100,000 |
| Pro + Media | Yes | 200 MB | 500,000 |
| Developer | Yes | 500 MB | Unlimited |
Cookbook
Patterns for the email shapes you'll actually meet in CRM and ticket exports. Each is written with a capture group so only the address lands in your new column. Test against a few sample cells first — and remember, one address per cell.
Standard embedded address
The everyday case: an address inside a sentence. The capture group strips the prose.
Source cell: "Please contact jane.doe@example.com today"
Pattern: ([A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,})
Output: jane.doe@example.comAddress inside angle brackets
Header-style "Name" <addr> strings from forwarded mail. The brackets aren't in the group, so they're dropped.
Source cell: "\"Support\" <help@acme.io>" Pattern: <([^>@\s]+@[^>\s]+)> Output: help@acme.io
Two addresses, take the first
When a cell lists several addresses, only the first is returned. Plan a second pass or a split if you need them all.
Source cell: "primary a@x.com; cc b@y.com"
Pattern: ([A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,})
Output: a@x.comStrip a mailto: prefix
Scraped HTML often leaves mailto:. Match it outside the group so only the address is kept.
Source cell: "mailto:owner@store.shop?subject=Hi" Pattern: mailto:([^?\s]+) Output: owner@store.shop
Flag rows with no address
Cells without an email return empty — sort or filter on the new column to isolate contacts you still need to chase.
Source cell: "phone only - 0207 946 0018"
Pattern: ([A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,})
Output: (blank)Edge cases and what actually happens
Free plan
Pro requiredThe extractor is gated; Free users get Regex Extractor requires Pro tier. before the file is read.
Cell with multiple emails
First onlyOnly the first address per cell is returned. A to; cc cell loses the cc. Restructure the pattern or split the column first if you need both.
Uppercase address vs lowercase pattern
May missMatching is case-sensitive. A TLD class of [a-z] won't match JANE@EXAMPLE.COM. Use [A-Za-z] ranges to be safe.
Source column header typo
Empty columnA header mismatch (case included) reads every cell as empty, so no address is found and the column is blank — with no error.
Invalid regex
ErrorAn unbalanced group or class throws a SyntaxError surfaced as a processing error. Validate the pattern before running.
Output column named like an existing field
OverwritesReusing a header overwrites that column. Use a new name such as email.
Addresses on a second worksheet
IgnoredOnly the first sheet is processed; data on later tabs is invisible to the tool.
Trailing punctuation captured
Watch the patternA greedy local part can pull in trailing dots or commas (addr.,). Anchor the TLD with \.[A-Za-z]{2,} and avoid trailing .* to keep the address clean.
No address in the cell
PreservedNon-matching cells write an empty string and are kept as rows — nothing is dropped.
Whitespace around the extracted value
Run a cleanupThe extractor returns exactly what the group matched; if you want a final trim/dedup pass, send the output to the whitespace trimmer and deduplicator.
Frequently asked questions
What email regex should I use?
A solid general pattern is ([A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}). The parentheses make it the capture group so only the address is kept. No pattern is pre-filled — you type it.
Does it validate that the address is real?
No. It extracts whatever matches your regex. It does not check deliverability, MX records, or syntax beyond your pattern.
Can it pull every email out of a cell?
No — only the first match per cell. The global flag is stripped internally. For multiple addresses, split the cell first or run additional passes.
Why did it grab the name too?
Your capture group is too wide. Tighten it to the address shape ([^\s<>]+@...) so surrounding text isn't included.
Is matching case-sensitive?
Yes. Use [A-Za-z] ranges rather than [a-z] so all-caps addresses still match, or add an inline (?i) modifier.
Does it change my original notes column?
No. The source column is untouched; the address goes into a new column you name (default extracted).
What format is the output?
A single-sheet .xlsx named regex-extracted.xlsx, built in your browser whether you uploaded XLSX or CSV.
Can I extract phone numbers with the same tool?
Yes — swap the email pattern for a phone pattern and re-run. See the postcode and phone extraction guide for tested phone regexes.
Is my contact data uploaded anywhere?
No. SheetJS parses and matches entirely in the browser; addresses never reach a server.
Which file types are accepted?
.xlsx, .xls, .ods, and .csv. Only the first sheet is read.
What's the row limit?
Pro 100,000 rows, Pro + Media 500,000, Developer unlimited (up to 500 MB). The tool isn't available on Free.
How do I dedupe the extracted emails?
Run the output file through the CSV deduplicator for exact dedup, or the fuzzy dedup tool for near-duplicate contacts.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.