How to pull order ids and product codes from messy excel text columns with regex
- Step 1Spot the consistent shape of your code — Look at a few rows and write the literal skeleton — e.g.
ORD-then four digits, a dash, five digits →ORD-\d{4}-\d{5}. SKUs likeXB-4401-RED→[A-Z]{2}-\d{4}-[A-Z]+. - Step 2Upload the file — Drop the
.xlsxor.csvwhose first sheet contains the status/text column. Later sheets are ignored. - Step 3Enter the source column and capture pattern — Type the source header exactly (case-sensitive), then the regex with the code wrapped in parentheses:
(ORD-\d{4}-\d{5}). - Step 4Name the output column — Use
order_id,sku, ortracking— anything other than an existing header, which would be overwritten. Default isextracted. - Step 5Run on Pro and verify the match rate — The tool needs Pro or higher. Compare the change count to your row total — a gap means some rows use a different code format.
- Step 6Download and reconcile — Download
regex-extracted.xlsx. Join the clean code column against your ERP/order master to reconcile, or split by status for fulfilment teams.
Patterns for common code formats
Wrap the identifier in a capture group so only the code is kept. The whole match is used if you omit the parentheses.
| Code type | Example value | Pattern (with capture group) |
|---|---|---|
| Dashed order ref | ORD-2024-00193 | (ORD-\d{4}-\d{5}) |
| SKU with colour | XB-4401-RED | ([A-Z]{2}-\d{4}-[A-Z]+) |
| UPS tracking | 1Z999AA10123456784 | (1Z[0-9A-Z]{16}) |
| Invoice no. | INV2024/0042 | (INV\d{4}/\d+) |
| Optional dashes | ORD1234 or ORD-1234 | (ORD-?\d{4}) |
What gets returned per cell
The rule is first capture group, else whole match; first match only; empty string on no match.
| Source cell | Pattern | Output |
|---|---|---|
Order #ORD-2024-00193 received | (ORD-\d{4}-\d{5}) | ORD-2024-00193 |
SKU: XB-4401-RED confirmed | ([A-Z]{2}-\d{4}-[A-Z]+) | XB-4401-RED |
refs ORD-1 and ORD-2 | (ORD-\d+) | ORD-1 (first only) |
ord-2024-00193 | (ORD-\d{4}-\d{5}) | `` (empty — case mismatch) |
pending, no ref yet | (ORD-\d+) | `` (empty) |
Tier limits
Pro-tier feature. Per-file limits below.
| Plan | Runs? | Max size | Max rows |
|---|---|---|---|
| Free | No | 5 MB | 10,000 |
| Pro | Yes | 50 MB | 100,000 |
| Pro + Media | Yes | 200 MB | 500,000 |
| Developer | Yes | 500 MB | Unlimited |
Cookbook
Recipes for the order, SKU, and tracking formats that turn up in support queues and logistics feeds. Build the pattern from a couple of real rows, wrap the code in parentheses, and confirm the change count before you trust the column.
Dashed order reference
The classic Order #... line. The capture group drops everything but the ID.
Source cell: "Order #ORD-2024-00193 received"
Pattern: (ORD-\d{4}-\d{5})
Output: ORD-2024-00193SKU with a variable suffix
Colour/size suffixes vary in length — a regex handles that where MID can't.
Source cell: "SKU: XB-4401-RED confirmed"
Pattern: ([A-Z]{2}-\d{4}-[A-Z]+)
Output: XB-4401-REDCodes with optional dashes
Some systems write ORD1234, others ORD-1234. Make the dash optional with ?.
Source cell: "ref ORD1234 raised"
Pattern: (ORD-?\d{4})
Output: ORD1234Tracking number among words
Anchor on the carrier prefix so the right token is captured regardless of position.
Source cell: "delivered via 1Z999AA10123456784 today"
Pattern: (1Z[0-9A-Z]{16})
Output: 1Z999AA10123456784Spot the un-extractable rows
Rows missing a code come back empty so you can filter and chase them in Excel.
Source cell: "awaiting reference from supplier"
Pattern: (ORD-\d{4}-\d{5})
Output: (blank)Edge cases and what actually happens
Free plan
Pro requiredThe extractor throws Regex Extractor requires Pro tier. on Free before reading the file.
Lower-case code, upper-case pattern
No matchMatching is case-sensitive: (ORD-\d+) misses ord-123. Add an inline (?i) or include both cases in a class if your codes vary.
Two codes in one cell
First onlyOnly the first match per cell is returned — a row mentioning two refs yields the first. Restructure the pattern or split the cell if you need both.
Wrong source column name
Empty columnA header typo (case-sensitive) makes every cell read empty, so no code is found and the new column is blank, silently.
Invalid regex
ErrorUnbalanced parentheses/brackets throw a SyntaxError shown as a processing error. Test the pattern first.
Leading zeros / numeric codes
Matched as textCells are read as displayed text, so a numeric code keeps its leading zeros in the regex — but if your sheet stored it as a number, Excel may have already dropped them before upload.
Output column collides with a header
OverwritesNaming the output the same as an existing column overwrites it. Use order_id or sku instead.
Codes spread across worksheets
IgnoredOnly the first sheet is processed. Consolidate to sheet 1 before uploading.
Greedy pattern over-captures
Watch the patternA pattern ending in .* or \d+ against ORD-1-EXTRA may grab too much. Anchor lengths with {n} quantifiers to bound the capture.
Empty source cells
PreservedBlank cells write an empty extraction and stay as rows — nothing is dropped or errored.
Frequently asked questions
How do I extract an order ID like ORD-2024-00193?
Use (ORD-\d{4}-\d{5}). The parentheses capture just the ID; the surrounding words in the cell are ignored.
What if some codes have dashes and some don't?
Make the dash optional: (ORD-?\d{4}) matches both ORD-1234 and ORD1234.
Can I extract two different codes (order + tracking) at once?
No — one pattern per run, one output column. Run the tool twice: once for the order ID, once for the tracking number, on the result file.
Why are some rows blank in my output?
Those cells didn't match the pattern — often a different code format or a case mismatch. The change count shows how many rows did match.
Is the match case-sensitive?
Yes. (ORD...) won't match ord.... Use (?i) inline or include both cases in a character class.
Does it keep the original status text?
Yes. The source column is unchanged; the code goes into a new column (default extracted).
Can I test my regex before running on the whole file?
Yes — try it on a few sample cell values at regex101.com, then paste the pattern in. The tool applies the same pattern to every row.
What does the output look like?
A single-sheet .xlsx named regex-extracted.xlsx, generated in your browser from either an XLSX or CSV upload.
Does only the first match in a cell get extracted?
Yes. The global flag is stripped, so exactly one (the first) match per cell is returned.
What are the size and row limits?
Pro: 50 MB / 100,000 rows. Pro + Media: 200 MB / 500,000. Developer: 500 MB / unlimited. Not available on Free.
How do I match the extracted code against my product master?
Use the sheet joiner to left/inner-join the extracted column against another file on the code key.
How do I split orders by status after extraction?
Run the result through the conditional splitter to break the rows into one sheet per unique status value.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.