How to strip emails, phones & pii from csv / json locally
- Step 1Export the data you need to anonymise — Pull the file from wherever it lives — a CRM contact export, a support-desk ticket dump, a database
SELECT ... INTO OUTFILE, or a logs extract. Any of CSV, JSON, Markdown, TXT, or an Excel/ODS workbook works. The scrubber operates on raw text, so column names and JSON keys don't matter — only the cell and value content is inspected. - Step 2Drop the file onto the scrubber above — Click the drop area (it accepts
.csv,.json,.txt,.mdand also.xlsx/.xls/.ods) or drag the file in. There is no paste-text box for this tool — it works on an uploaded file. Processing is entirely in-browser: the email, phone, IBAN, card, SSN, and NI patterns are applied locally and nothing is transmitted. - Step 3Run the scrubber — Press Run Email/Phone Scrubber. There are no options to set — the six detectors always run, in a fixed order (email, then IBAN, then card, then SSN, then NI, then phone last). The IBAN and credit-card passes each apply their checksum filter, so only valid IBANs and Luhn-valid PANs are replaced.
- Step 4Review the scrubbed output and the count — The result appears in a scrollable text panel showing the full redacted file, and a metrics line reports the total items redacted, bytes in/out, and run time. Skim the output to confirm the
[REDACTED_*]tags landed where you expected and that no real value slipped through. - Step 5Copy or download the clean file — Use Copy to grab the text, or Download to save it. The download is named after the source with a
-scrubbedsuffix before the extension (contacts.csv→contacts-scrubbed.csv). Note: a spreadsheet input downloads asbook-scrubbed.xlsxbut the content is the scrubbed JSON array of the first sheet, not a re-built workbook. - Step 6Spot-check the categories you care about — Because the detectors are fixed regexes, search the output for the tags relevant to your jurisdiction —
[REDACTED_SSN]for US records,[REDACTED_NI]for UK payroll,[REDACTED_IBAN]/[REDACTED_CARD]for finance. Anything the regex can't shape-match (free-text postal addresses, names, account numbers without a known format) is not redacted — handle those with a different tool or a manual pass.
The six PII detectors and what each one actually matches
Every detector and its replacement tag, in the exact order the engine applies them. Order matters: the broad phone matcher runs last so the more specific IBAN / card / SSN / NI passes claim their digit runs first.
| Detector | Replacement tag | What it matches | Verification |
|---|---|---|---|
[REDACTED_EMAIL] | Standard address shape: local part (letters, digits, . _ % + -), an @, a domain, and a 2+ letter TLD. Catches plus-addressing (sue+news@x.com) and subdomains (a@mail.corp.co.uk) | Pattern only (no DNS / validity check) | |
| IBAN | [REDACTED_IBAN] | Two letters + 2 check digits + grouped alphanumerics (15–34 chars total), with or without the usual 4-char spacing | ISO 13616 mod-97-10 checksum — invalid IBANs are left untouched |
| Credit card | [REDACTED_CARD] | 13–19 digit runs, optionally separated by spaces or dashes (4111 1111 1111 1111) | Luhn checksum — a 16-digit order ID that fails Luhn is not redacted |
| US SSN | [REDACTED_SSN] | Strict NNN-NN-NNNN dashed format only, with SSA invalid-block exclusions (no 000/666/9xx area, no 00 group, no 0000 serial) | Pattern + block rules (dashes required) |
| UK NI number | [REDACTED_NI] | Two valid prefix letters + 6 digits + a final letter A–D, spaces optional (QQ 12 34 56 C) | Pattern + disallowed-prefix exclusions (BG GB NK KN TN NT ZZ) |
| Phone | [REDACTED_PHONE] | Optional +, a 1–3 digit prefix, then grouped digit blocks with spaces/dots/dashes/parens (+44 20 7946 0958, (212) 555-0143) | Pattern only — runs last so it can't steal card/IBAN/SSN digits |
Inputs, output, and tier limits
What the tool accepts and produces. The scrubber is a free-tier security tool; the file-size ceiling is the only thing that changes by plan. Multiple files can be selected but only the first is processed.
| Aspect | Behaviour |
|---|---|
| Accepted input | .csv, .json, .md, .txt read as text; .xlsx / .xls / .ods accepted (first sheet flattened to a JSON array before scrubbing) |
| Options / controls | None — there is no options panel and no per-category toggle. All six detectors always run |
| Output | Plain text in the original format with [REDACTED_*] tags substituted in place; downloaded as <name>-scrubbed.<ext> |
| Multiple files | The drop area allows selecting several files, but only the first file is scrubbed per run — process the rest one at a time |
| Where it runs | 100% in your browser tab; also has a server-safe API path that returns { output, redactedCount, counts } as JSON |
| Minimum plan | Free — no upgrade needed for the tool itself |
| File-size limit | Free 10 MB · Pro 100 MB · Pro + Media 500 MB · Developer 2 GB (oversize text files throw a clear exceeds the … limit error) |
Cookbook
Real before/after fragments showing exactly what each detector does — including the cases where verification deliberately leaves a value alone. PII values below are fabricated examples.
Plain customer CSV — email and phone redacted
The everyday case: a contacts export with an email and a phone column. The email pattern and the phone pattern each fire once per row. Note the output stays CSV; only the matched substrings are replaced.
Input (contacts.csv): name,email,phone Sue Adler,sue.adler@example.com,+44 20 7946 0958 Jon Ek,jon+crm@mail.example.co.uk,(212) 555-0143 Output (contacts-scrubbed.csv): name,email,phone Sue Adler,[REDACTED_EMAIL],[REDACTED_PHONE] Jon Ek,[REDACTED_EMAIL],[REDACTED_PHONE] Reported: 4 items redacted
JSON support archive stays valid JSON
The scrubber works on raw text regardless of structure, so emails buried in nested JSON values are replaced in place. Because the tag goes inside the existing quotes, the document remains parseable.
Input (tickets.json):
[
{ "id": 1, "from": "alice@acme.io", "note": "call me on 0207 946 0958" },
{ "id": 2, "from": "bob@acme.io", "note": "no contact info" }
]
Output (tickets-scrubbed.json):
[
{ "id": 1, "from": "[REDACTED_EMAIL]", "note": "call me on [REDACTED_PHONE]" },
{ "id": 2, "from": "[REDACTED_EMAIL]", "note": "no contact info" }
]Luhn check spares an order ID but catches a real card
Both are 16-digit runs, but the credit-card detector only redacts numbers that pass the Luhn checksum. The order ID fails Luhn and survives; the test PAN passes and is redacted. This is why you can scrub finance exports without losing your reference numbers.
Input (orders.csv): order_id,card 1234567890123456,4111 1111 1111 1111 Output (orders-scrubbed.csv): order_id,card 1234567890123456,[REDACTED_CARD] # 1234567890123456 fails Luhn → left untouched # 4111111111111111 passes Luhn → [REDACTED_CARD]
IBAN verified by mod-97; SSN requires the dashed format
The IBAN pass confirms the ISO 13616 checksum before redacting, and the US SSN pass only matches the strict NNN-NN-NNNN shape with valid SSA blocks. A 9-digit number with no dashes is NOT seen as an SSN.
Input (payroll.csv): account,ssn,bad_ssn GB82 WEST 1234 5698 7654 32,123-45-6789,123456789 Output (payroll-scrubbed.csv): account,ssn,bad_ssn [REDACTED_IBAN],[REDACTED_SSN],123456789 # 123456789 (no dashes) is not matched by the SSN pattern # — review undashed national IDs manually
Spreadsheet input is flattened to scrubbed JSON
An .xlsx is accepted, but the first sheet is converted to a JSON array of row objects before scrubbing — so the downloaded file carries the .xlsx name yet holds JSON text, not a rebuilt workbook. Plan for that if a downstream step expects binary Excel.
Input: leads.xlsx (first sheet)
| Email | Mobile |
| kit@demo.org | 07700 900123 |
Download: leads-scrubbed.xlsx (CONTENT is JSON, not Excel):
[
{
"Email": "[REDACTED_EMAIL]",
"Mobile": "[REDACTED_PHONE]"
}
]Edge cases and what actually happens
A postal address or a person's name is left in the file
By designThe scrubber only matches shapes it has a regex for: email, phone, IBAN, card, SSN, and UK NI. Free-text mailing addresses, full names, dates of birth, and account numbers with no fixed format are not detected and pass through unchanged. For realistic field-by-field anonymisation (replacing names/addresses with plausible fakes) use the CSV/JSON Data Scrambler instead.
A 16-digit number that isn't a card stays unredacted
PreservedCredit-card matching requires a Luhn-valid 13–19 digit run. Order IDs, tracking numbers, and synthetic test data that happen to be 16 digits but fail Luhn are deliberately left alone. This prevents false positives on finance and logistics exports — but it also means a genuine card written with an unusual checksum-defeating typo won't be caught.
An SSN written without dashes is not redacted
Not matchedThe US SSN detector only matches the strict NNN-NN-NNNN dashed form (plus the SSA invalid-block rules). A nine-digit string like 123456789 is ambiguous — it could be many things — so it is intentionally not treated as an SSN. If your source stores SSNs undashed, normalise them to the dashed form first, or treat them with a manual pass.
Only the first file is processed when you drop several
First file onlyThe drop area lets you select multiple files, but a run scrubs only files[0]. The others are ignored. Run each file separately, or use the public API / local runner to loop a batch. Don't assume a multi-file selection produced a multi-file result — check the output filename.
A spreadsheet download has a .xlsx name but JSON content
ExpectedFor .xlsx / .xls / .ods inputs the first sheet is converted to a JSON array of row objects before scrubbing, and that JSON is what gets returned and downloaded — even though the filename keeps the .xlsx extension. If a downstream tool needs real Excel binary, save the JSON and re-import, or export the sheet to CSV before scrubbing.
Text file is larger than your plan's limit
Rejected — over limitText inputs are read through the tier file-size gate (Free 10 MB, Pro 100 MB, Pro + Media 500 MB, Developer 2 GB). An oversize file throws File "…" is N MB — exceeds the … limit for your plan. and nothing is scrubbed. Split the export or upgrade. (The spreadsheet path reads the workbook directly, so very large workbooks can still be memory-bound in the browser tab.)
An unusual local phone format slips through
May be missedThe phone pattern targets common international and grouped formats with +, parentheses, and space/dot/dash separators. Very short internal extensions, run-together digit strings with no separators, or heavily localised formats may not match. Review the output for any phone style your dataset uses that isn't a standard grouped number.
A long phone number coincidentally passes Luhn and is tagged as a card
Card winsDetectors run in a fixed order and the card pass (13–19 digits, Luhn-valid) runs before phone. A long unseparated numeric string that happens to satisfy Luhn could be replaced with [REDACTED_CARD] rather than [REDACTED_PHONE]. The data is still redacted — but if the exact tag label matters to you, be aware the more specific financial pass claims qualifying digit runs first.
Names or domains that look like emails inside free text
Pattern matchAnything matching local@domain.tld is redacted, including a Twitter-style handle written as name@site.com in a note field, or a sample address in documentation. The scrubber can't tell a real contact from an illustrative one — review redactions in documentation-style files where example addresses are intentional.
You wanted a black box over a signature or face, not text redaction
Wrong toolThis tool only redacts text patterns in text/CSV/JSON files. To burn out a handwritten signature or stamp from a document image use Signature Burner; to redact PII text inside a PDF use the PDF PII Redactor; to blur faces in images/video use Face Pixelate.
Frequently asked questions
Does it really only handle email and phone?
No — that's the name, but the engine runs six detectors: email, phone, IBAN (mod-97 verified), credit card (Luhn verified), US SSN, and UK National Insurance number. Each gets its own tag ([REDACTED_EMAIL], [REDACTED_PHONE], [REDACTED_IBAN], [REDACTED_CARD], [REDACTED_SSN], [REDACTED_NI]) so you can see what category was found.
Does it detect names or postal addresses?
No. There is no name or free-text address detector — only the six shape-matchable categories above. For realistic field-level anonymisation that replaces names, addresses, and other fields with plausible fake values, use the CSV/JSON Data Scrambler.
Can I choose which categories to redact or change the mask text?
No. The tool has no options panel — all six detectors always run and the replacement tags are fixed strings ([REDACTED_EMAIL], etc.). You can't disable a detector or supply a custom mask. If you only want certain tags in the output, you can find-and-replace the unwanted ones back afterward in a text editor.
Will it break my JSON or CSV structure?
No. Replacements happen inside the existing quoted string values, so JSON stays valid and CSV keeps its columns. The scrubber operates on raw text and only substitutes the matched substring with a tag of the same kind — it never reorders or removes fields.
Is my data uploaded anywhere?
No. The browser tool runs the regex passes entirely in your tab using local JavaScript — emails, phones, financial identifiers, and the file itself never reach a server. That's what makes it safe for confidential customer data under GDPR / CCPA. (A separate opt-in API path exists for automation, but the on-page tool is local-only.)
How does it avoid redacting a 16-digit order ID as a credit card?
The card detector applies the Luhn checksum before redacting. A 13–19 digit run is only replaced with [REDACTED_CARD] if it passes Luhn, so most order IDs, SKUs, and tracking numbers — which don't satisfy Luhn — are left untouched. Real card numbers do pass Luhn and are redacted.
How are IBANs validated?
With the ISO 13616 mod-97-10 checksum: the country/check pair is moved to the end, letters are remapped to digits, and the whole string is reduced modulo 97. Only IBANs that reduce to 1 (i.e. structurally valid) are redacted, so random country-code-shaped strings aren't falsely tagged.
Why didn't it catch an SSN in my file?
The US SSN detector requires the strict dashed NNN-NN-NNNN format plus the SSA invalid-block rules (no 000/666/900-999 area, no 00 group, no 0000 serial). A nine-digit string with no dashes is too ambiguous to treat as an SSN and is left alone. Normalise undashed SSNs to the dashed form first if you need them redacted.
Which phone formats are detected?
Common international and grouped formats: an optional +, a 1–3 digit prefix, and digit blocks separated by spaces, dots, dashes, or parentheses — e.g. +44 20 7946 0958 or (212) 555-0143. Highly localised or separator-less formats may need manual review; scan the output for any phone style your dataset uses.
Can I scrub a whole folder of files at once?
Not in one click on the page. The drop area accepts multiple files but a single run only processes the first one. Run files one at a time, or call the server-safe API / local runner in a loop to batch them. The output is always a single scrubbed file per run.
It accepts Excel — does it give me Excel back?
It accepts .xlsx / .xls / .ods, but it converts the first sheet to a JSON array of row objects, scrubs that text, and returns the JSON — the download keeps the .xlsx name but the content is JSON, not a rebuilt workbook. If you need binary Excel out, export your sheet to CSV before scrubbing, or re-import the JSON.
How big a file can I scrub, and what happens if it's too big?
Text files are gated by the security tier limit: Free 10 MB, Pro 100 MB, Pro + Media 500 MB, Developer 2 GB. An oversize text file throws a clear exceeds the … limit for your plan error and isn't processed. Spreadsheets are read directly into the browser, so very large workbooks are bounded by your tab's memory rather than a hard byte cap.
What if I need to redact PII inside a PDF or an image instead?
This tool is text/CSV/JSON only. For true text redaction inside a PDF use the PDF PII Redactor; to burn out a signature or stamp on a document use Signature Burner; to pixelate faces in photos or video use Face Pixelate. To encrypt a sensitive file end-to-end rather than redact it, see the AES-256 Encryptor.
Privacy first
Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.