How to redact pii from excel before sharing with third-party vendors under gdpr
- Step 1Consolidate to the first sheet — Move the columns the vendor needs onto the first worksheet — only that sheet is scanned. Leave behind anything they don't need at all.
- Step 2Open the redactor — The Excel PII Redactor launches the in-browser scrubber. Drag the consolidated file in; SheetJS parses it locally, no upload.
- Step 3Run the six detectors — Email, IBAN, card, US SSN, UK NI, and phone patterns run in order; checksums gate the IBAN and card matches to cut false positives.
- Step 4Check the counts against expectations — Confirm the email/phone/iban tallies roughly match the columns you know contain identifiers — a zero where you expected hits is a red flag.
- Step 5Redact free-text identifiers yourself — Names and postal addresses are not pattern-detectable here; drop or generalise those columns before export to prevent re-identification.
- Step 6Download and send the scrubbed file — You get
<name>-scrubbed.<ext>(scrubbed text/JSON). Strip metadata and comments with the sibling tools, then share.
What the scanner actually detects (and how)
The six built-in detectors, in the exact order they run. Order matters: specific number patterns (IBAN, card, SSN, NI) run before the generic phone matcher, because each is a subset of 'a run of digits with optional spaces'. Tags are fixed literal strings — there is no mask-format option in this tool.
| Detector | What matches | Validation | Replacement tag |
|---|---|---|---|
local@domain.tld — letters/digits/._%+- before @, a dotted domain, 2+ letter TLD | Pattern only (no MX/syntax-strict RFC check) | [REDACTED_EMAIL] | |
| IBAN | 2 letters + 2 check digits + 4–7 groups of alphanumerics, spaces allowed | ISO 13616 mod-97-10 checksum — fails the check, left untouched | [REDACTED_IBAN] |
| Credit card | 13–19 digits with optional spaces or dashes between them | Luhn checksum — fails Luhn, left untouched | [REDACTED_CARD] |
| US SSN | NNN-NN-NNNN with SSA invalid-block exclusions (no 000/666/9xx area, no 00 group, no 0000 serial) | Structural rules baked into the pattern | [REDACTED_SSN] |
| UK NI number | Two prefix letters + 6 digits + a final A–D letter, e.g. QQ 12 34 56 C | Excludes the disallowed prefixes (BG, GB, NK, KN, TN, NT, ZZ) | [REDACTED_NI] |
| Phone | +CC optional, then 2–4 digit groups separated by space/dot/dash — broad international shape | Pattern only; runs last so card/SSN/IBAN claim their digits first | [REDACTED_PHONE] |
Input, output, and tier limits
The Excel PII Redactor entry routes to the Email & Phone Scrubber, which reads the spreadsheet, flattens the first worksheet to text, scrubs it, and hands back a text file. Tier numbers shown are the Excel family caps from the pricing model.
| Property | Behaviour |
|---|---|
| Accepted inputs | .xlsx, .xls, .ods, .csv, plus pasted/dropped JSON / Markdown / TXT |
| What gets read | The first worksheet only — it is converted to a JSON array of row objects, then scanned |
| Output | A text file named <yourfile>-scrubbed.<ext> (the scrubbed JSON/text), not a rebuilt .xlsx workbook |
| Findings | Per-category counts (email, iban, credit_card, ssn_us, ni_uk, phone) plus a total itemsRedacted — no per-cell address log |
| Configurable options | None. There is no options panel, no custom-regex field, no mask-format picker — the six detectors and their tags are fixed |
| Free tier (Excel family) | 5 MB file, 10,000 rows, 1 file at a time |
| Pro / Pro-media / Developer | 50 MB · 100,000 rows · 5 files / 200 MB · 500,000 rows · 20 files / 500 MB · unlimited rows |
| Where it runs | 100% in your browser — the spreadsheet is never uploaded to a server |
Cookbook
Vendor-handoff rows: what the detectors remove automatically, and the residual identifiers you must clear before sharing under GDPR.
Analytics handoff: contact columns scrubbed
Email and phone are removed for the agency, but the segment/value columns they actually need stay intact.
Input (first sheet):
[
{ "Email": "li.wei@shop.eu", "Phone": "+34 91 123 4567",
"LTV": "412.50", "Segment": "VIP" }
]
Output:
[
{ "Email": "[REDACTED_EMAIL]", "Phone": "[REDACTED_PHONE]",
"LTV": "412.50", "Segment": "VIP" }
]
Counts: { email: 1, phone: 1 }Billing tab IBAN: validated then redacted
A real IBAN passes mod-97-10 and is tagged; a transposed-digit IBAN fails the check and is left, flagging a data-quality issue.
Input:
IBAN good: GB82 WEST 1234 5698 7654 32
IBAN typo: GB82 WEST 1234 5698 7654 33
Output:
IBAN good: [REDACTED_IBAN]
IBAN typo: GB82 WEST 1234 5698 7654 33
Counts: { iban: 1 }Residual re-identification risk the tool can't see
Email is gone, but full name + postcode remain — together they may still identify the subject. The DPO-recommended fix is yours to apply.
Output after redaction (still risky):
{ "Name": "Marta Kowalski", "Postcode": "SW1A 1AA",
"Email": "[REDACTED_EMAIL]" }
Action: drop Name, or coarsen Postcode to outward code (SW1A),
before handing the file to the vendor.Second tab is ignored
A 'Contacts' tab full of emails is not scanned because only the first sheet is read. Consolidate first.
Workbook tabs: [ Orders (sheet 1) | Contacts (sheet 2) ] Scanned: Orders -> emails/phones redacted NOT scanned: Contacts -> emails LEFT IN THE CLEAR Fix: move Contacts data into sheet 1, or scrub it separately.
Counts as a lightweight sharing record
The per-category tally is useful evidence to file alongside your own note of which export went to whom.
Findings after a run:
{ counts: { email: 1840, phone: 1622, iban: 311 },
itemsRedacted: 3773 }
File this with: file name, sheet processed, recipient, date.
(No per-cell addresses are produced — this is a summary.)Edge cases and what actually happens
Card or SSN-shaped number that fails its checksum / structure
PreservedCredit-card detection only fires when the digit run passes the Luhn checksum, and IBAN only when the mod-97-10 check passes. A 16-digit order number or a typo'd card that fails Luhn is left exactly as-is. An SSN-shaped value in a banned block (e.g. 000-12-3456 or 666-...) is likewise skipped by design — those are never valid SSNs.
Only the first worksheet is scanned
By designThe reader flattens wb.SheetNames[0] to text and scrubs that. PII sitting on a second tab (a hidden 'Contacts' sheet, an 'Audit' tab) is not touched. Split multi-tab workbooks first, or scrub each sheet's data separately, and remember to clear residual sheets with the hidden-sheet destroyer.
Output comes back as text, not a workbook
ExpectedFeeding an .xlsx in does not return an .xlsx out. The first sheet becomes a JSON array of row objects, the scrub runs, and you download <name>-scrubbed.xlsx whose contents are scrubbed text/JSON. Treat it as a redacted data dump, not a styled, multi-sheet workbook — formulas, formatting, and extra tabs are gone.
Phone numbers in an unusual local format
May missThe phone matcher targets a broad international shape (+CC then 2–4 digit groups split by space/dot/dash). Tightly packed or unusually punctuated locals (e.g. parenthesised area codes glued to text, vanity numbers like 1-800-FLOWERS) can slip through. Check the per-category count after a run and spot-check the output.
A full name or street address in a cell
Not detectedThere is no name/address NLP here — the six detectors are all pattern-and-checksum based. John Smith, 14 Mill Lane stays in the clear. For free-text identifiers you must redact those columns by other means (drop the column, or generalise it) before sharing.
Email-like token inside a URL or file path
RedactedThe email pattern matches anything shaped like local@domain.tld regardless of context, so mailto:sam@acme.io and //user@host.example/path both get the address portion replaced with [REDACTED_EMAIL]. This is usually what you want, but it can rewrite a host string you meant to keep.
Numbers run last — phone can swallow a near-miss
By designBecause the generic phone pattern runs after IBAN/card/SSN/NI, any long digit group those earlier detectors rejected (failed Luhn, wrong block) is still eligible to match as a phone number if it fits the loose phone shape. Review counts if a value you expected to keep was tagged [REDACTED_PHONE].
File over the tier row/byte cap
RejectedA Free-tier scan stops at 5 MB / 10,000 rows / 1 file. Larger exports return a limit error before any scrubbing runs. Split the export, trim to the needed columns, or move to a higher tier (Pro 50 MB / 100k rows, Developer 500 MB / unlimited).
Quasi-identifiers left after redaction
Not detectedName, postcode, date of birth, and similar quasi-identifiers are not pattern PII and survive the scrub. In combination they can re-identify a person even with email/phone removed. Generalise or drop them per your DPO's guidance — redaction here is a minimisation step, not a full anonymisation guarantee.
Frequently asked questions
Does running this make my file 'GDPR compliant'?
It helps with data minimisation by removing six PII categories, but compliance also depends on what remains (names, addresses, quasi-identifiers), your lawful basis, and your DPA with the vendor. Treat it as one control, not a sign-off.
Can the vendor re-identify subjects from the scrubbed file?
Possibly. If full names, postcodes, or DOBs remain, re-identification is feasible even with emails and phones redacted. The tool does not detect those free-text fields — drop or generalise them before sharing.
What exactly gets replaced, and with what?
Six categories: emails, IBANs, credit-card numbers, US SSNs, UK National Insurance numbers, and phone numbers. Each is swapped for a fixed label — [REDACTED_EMAIL], [REDACTED_IBAN], [REDACTED_CARD], [REDACTED_SSN], [REDACTED_NI], [REDACTED_PHONE]. The tags are not configurable.
Can I add my own regex or change the mask string?
No. This tool has no options panel — the detectors and their replacement tags are fixed in code. If you need custom patterns or realistic fake values, generate substitutes downstream (e.g. with a Faker-based scrambler) rather than expecting per-pattern masks here.
Does it produce a redacted .xlsx workbook?
No. It reads the first worksheet, flattens it to a JSON array of rows, scrubs that text, and gives you a <name>-scrubbed.<ext> text download. It is a redacted data export, not a rebuilt styled workbook with all sheets and formulas intact.
Are credit cards and IBANs really validated, not just pattern-matched?
Yes. Card numbers must pass the Luhn checksum and IBANs must pass the ISO 13616 mod-97-10 check before they are redacted. Values that fail are left untouched, which keeps order numbers and SKUs from being mangled — but also means a malformed card slips through.
Does my spreadsheet get uploaded anywhere?
No. Detection and replacement happen entirely in your browser. The file is parsed locally with SheetJS and scrubbed in memory — nothing is sent to a server, which is the whole point for sensitive exports.
Why did a number I wanted to keep get redacted as a phone?
The phone pattern is deliberately broad and runs last, so any long digit group that earlier detectors rejected can still match it. Check the per-category counts; if phone is higher than expected, your data has digit runs that fit the loose phone shape.
Why did a real card / SSN slip through?
Cards must pass Luhn and SSNs must fall outside the SSA-invalid blocks to be redacted. A card with a typo'd digit fails Luhn; an SSN-shaped value in a banned area/group/serial is treated as not-an-SSN. The validation that avoids false positives also means malformed values are kept.
What are the size and row limits?
Excel-family caps apply: Free 5 MB / 10,000 rows / 1 file; Pro 50 MB / 100,000 rows / 5 files; Pro-media 200 MB / 500,000 rows / 20 files; Developer 500 MB / unlimited rows. Oversized files are rejected before scrubbing.
What other privacy tools should I run alongside this?
Strip document metadata with the app-metadata wiper, remove cell comments via the comment purger, and check for data-leaking links with the external-link auditor. Together they cover content, metadata, and references.
Is this the same engine as the Email & Phone Scrubber?
Yes. The Excel PII Redactor entry points at the Email & Phone Scrubber; it is the same detection pipeline, just fed by a spreadsheet that is flattened to text first.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.