How to strip non-ascii characters from every excel cell
- Step 1Open the tool and drop your file — Drag an
.xlsx,.xls,.ods, or.csvonto the dropzone. The file is read in your browser; for spreadsheets the first sheet is converted to rows internally. Free tier handles up to 5 MB / 10,000 rows / 1 file. - Step 2Choose which character classes to keep — Four checkboxes appear under Characters to keep: Letters (incl. accents), Digits (0–9), Spaces, Punctuation (.,!?@-_ etc.). All four are ticked by default — that combination keeps normal text and drops only noise.
- Step 3Untick a class to strip it too — For a digits-only ID column, untick everything except Digits. For a strict alphanumeric key, keep Letters + Digits only. Each untick widens what gets deleted; there are no other controls (no regex box, no per-column picker — the filter runs on all columns).
- Step 4Run the strip — Click Strip special chars. The tool filters every data cell character by character, keeping matches and discarding the rest. The header row (row 1) is skipped entirely.
- Step 5Review the result panel — Two stats show: Cells modified (how many cells actually changed) and Data rows. A first-10-rows preview lets you confirm accents survived and noise is gone before you commit.
- Step 6Download the cleaned file — Click Download. XLSX input yields
<name>.stripped.xlsx; CSV input yields<name>.stripped.csv. The download happens locally — nothing was uploaded.
The four keep-toggles and what each one preserves
All four are ON by default. Anything not covered by a ticked class is deleted (not replaced). The Letters class is the Unicode \p{L} class, so it is far broader than ASCII a-z.
| Toggle | Default | Keeps | Notable things it does NOT keep |
|---|---|---|---|
| Letters (incl. accents) | On | All Unicode letters via \p{L}: a–z, A–Z, accented Latin (é ñ ü ç å), Greek, Cyrillic, CJK ideographs, etc. | Digits, punctuation, emoji, symbols, marks (so a combining accent that is a separate code point may drop — see edge cases) |
| Digits (0–9) | On | ASCII digits 0–9 only | Superscript/subscript digits, full-width digits 012, Roman numerals (those are letters, not digits) |
| Spaces | On | The ASCII space character (U+0020) only | Tab, NBSP (U+00A0), zero-width space (U+200B), narrow no-break space, and other Unicode spaces — all removed |
| Punctuation | On | Exactly this set: .,-_@/()!?:;'" | Smart quotes “ ” ‘ ’, em/en dashes, # $ % & * + = [ ] { } < > | ~ ^, bullets, and all other symbols |
Common noise characters and whether they survive (defaults: all four ON)
Behaviour with every keep-toggle enabled — the normal cleanup configuration. Removed characters are deleted with no replacement, so adjacent text closes up.
| Character | Code point | Survives (all 4 on)? | Why |
|---|---|---|---|
| é (accented e) | U+00E9 | Preserved | Matches \p{L} (Letters) |
| 中 (CJK) | U+4E2D | Preserved | CJK ideographs are \p{L} letters |
| Zero-width space | U+200B | Removed | Not a letter/digit/ASCII-space/listed-punct |
| NBSP (non-breaking space) | U+00A0 | Removed | Spaces toggle keeps only ASCII U+0020 |
| Curly quote “ ” | U+201C/D | Removed | Not in the ASCII punctuation set |
| Em dash — | U+2014 | Removed | Not in the ASCII punctuation set (- U+002D is kept) |
| Emoji 🚀 | U+1F680 | Removed | Symbol, not a letter |
| Bullet • | U+2022 | Removed | Symbol, not listed punctuation |
| Control char (e.g. STX) | U+0002 | Removed | Control characters are never kept |
Cookbook
Before/after rows from real sanitisation jobs. The left column is the raw cell, the right is the output with all four keep-toggles on. Header rows are shown unchanged on purpose.
Strip invisible + symbol noise, keep the accented name
A contact export with a zero-width space wedged into a name and a trailing star rating glyph. With defaults on, the accented letters survive and only the junk is deleted.
Input (CSV): name,note José Müller,Top pick ★ ZoëChen,☺ happy Output (.stripped.csv, all toggles on): name,note José Müller,Top pick ZoëChen, happy Note: the header row 'name,note' is never modified.
Digits-only cleanup of a phone column copied from the web
Phone numbers pasted from a web table arrive with NBSPs, parentheses, dashes, and a leading +. Untick everything except Digits to reduce each cell to bare digits. (Punctuation off means + ( ) - all go.)
Config: keep DIGITS only (Letters/Spaces/Punctuation OFF) Input: phone +1 (415) 555-0142 +44 20 7946 0958 Output: phone 14155550142 442079460958
Strict alphanumeric SKU key for a database join
SKUs must match an internal key that allows only letters and digits. Keep Letters + Digits, untick Spaces and Punctuation, so slashes, dashes, and spaces collapse out.
Config: keep LETTERS + DIGITS (Spaces/Punctuation OFF) Input: sku AB-1029 / v2 N-77 (new) Output: sku AB1029v2 N77new
Curly quotes and em dashes from a pasted PDF
Marketing copy pasted out of a PDF carries typographic quotes and em dashes. The ASCII punctuation set keeps straight quotes and hyphen, but drops their fancy cousins entirely.
Input: tagline “Built for speed” — fast ‘beta’ release Output (all toggles on): tagline Built for speed fast beta release (— and the curly quotes are deleted; the straight hyphen - would have stayed.)
XLSX in, XLSX out — multi-column sweep
An .xlsx with several text columns. The tool reads the first sheet, strips every data cell, and writes a fresh .stripped.xlsx. Formulas/formatting are not carried (CSV round-trip), so use this on value data, not live workbooks.
Input file: contacts.xlsx (Sheet1) row1 (header): Name | City | Bio <- untouched row2: Anya | São Paulo★ | dev🚀 Download: contacts.stripped.xlsx row2 becomes: Anya | São Paulo | dev
Edge cases and what actually happens
Accented and non-Latin letters are kept by default
By designThe Letters toggle uses Unicode \p{L}, so café, Ñoño, Ångström, Greek, Cyrillic and CJK all survive. This tool is not an ASCII-only purge — if you actually need to transliterate accents to plain ASCII, this tool will not do it; it only deletes characters, never substitutes.
Header row is never stripped
PreservedRow 1 is returned verbatim. If your header itself contains a junk character (e.g. a BOM or a stray bullet), this tool will not remove it — clean headers with the header sanitizer instead.
Removal deletes, it never replaces with a space
ExpectedA stripped character is dropped and the surrounding text closes up: A\u200bnya becomes Anya, not A nya. If a removed glyph was acting as a separator, words can fuse — review the preview before downloading.
NBSP and tabs disappear even with Spaces ON
ExpectedThe Spaces toggle keeps only the ASCII space U+0020. NBSP (U+00A0), tabs (U+0009), and other Unicode spaces are removed. To convert NBSP/tabs into real spaces rather than delete them, use the whitespace trimmer.
Untick all four toggles
Strips everythingIf no class is ticked the keep-pattern matches nothing, so every character in every data cell is deleted — cells become empty strings. This is rarely what you want; keep at least one class on.
4-byte CJK letters are kept, only 4-byte symbols/emoji go
By design4-byte (astral-plane) characters are not stripped wholesale. Astral-plane CJK extensions are \p{L} letters and survive; emoji and symbols are removed because they are not letters. If your target system rejects all 4-byte characters, this tool is not the right filter.
File over the tier limit
RejectedFree tier caps at 5 MB / 10,000 rows / 1 file. Larger jobs need Pro (50 MB / 100,000 rows / 5 files), Pro-media (200 MB / 500,000 rows / 20 files), or Developer (500 MB / unlimited rows). The dropzone blocks oversized files before any processing.
Multi-sheet workbook
First sheet onlySpreadsheet input is read as the first sheet. Other sheets are not processed and are not present in the downloaded file. Split or rearrange sheets first if you need a different one.
Combining-mark accents (decomposed text)
EdgePrecomposed é (U+00E9) is a single letter and is kept. A decomposed e + combining acute (U+0301) is two code points — the base e is a letter and stays, but the standalone combining mark is \p{M}, not \p{L}, so it can be removed, turning é into e. Normalise to NFC upstream if this matters.
Formulas and formatting in XLSX
Not preservedBecause the spreadsheet is round-tripped through CSV, formulas become their last cached value (or text) and cell formatting/styles are dropped. Run this on exported value data, not on a workbook you intend to keep editing.
Frequently asked questions
Does this remove accented letters like é, ñ, or ü?
No. By default the Letters toggle keeps all Unicode letters (\p{L}), which includes accented Latin, Greek, Cyrillic, and CJK. café and Ñoño come through unchanged. The only way letters are removed is if you untick the Letters toggle.
Is there a 'strict ASCII-only' mode or a code-point whitelist?
No. The tool has exactly four keep-toggles (Letters, Digits, Spaces, Punctuation) — there is no strict mode, no custom code-point range, and no regex box. Older descriptions mentioning 'strict mode' or 'whitelist ranges' are out of date; the real control surface is these four checkboxes.
What exactly counts as 'punctuation'?
A fixed ASCII set: .,-_@/()!?:;'". Anything else — # $ % & * + = [ ] { } < > | ~ ^, curly quotes, em dashes, bullets — is treated as a symbol and removed when other toggles do not cover it.
Are removed characters replaced with a space?
No. They are deleted and the text closes up. A\u200bnya becomes Anya. If a stripped glyph was separating words, the words can merge — check the preview.
Will it strip zero-width spaces and BOM characters inside cells?
Yes. Zero-width space (U+200B), zero-width joiner, BOM (U+FEFF) appearing mid-cell, and soft hyphens are not letters, digits, ASCII spaces, or listed punctuation, so they are removed with all toggles on.
Does it clean the header row too?
No — the first row is preserved verbatim so your column names stay intact. If a header is itself dirty, fix it with the header sanitizer.
What output format do I get?
The same family as the input. XLSX/XLS/ODS input downloads as <name>.stripped.xlsx (first sheet); CSV input downloads as <name>.stripped.csv.
Does the file get uploaded anywhere?
No. Parsing and stripping happen entirely in your browser via SheetJS and PapaParse. Nothing is sent to a server.
Can I strip only one column?
Not in this tool — the filter applies to every data column. If you need per-column control, run a single-column file or use the regex extractor to pull just the field you want first.
How big a file can I process?
Free: 5 MB / 10,000 rows / 1 file. Pro: 50 MB / 100,000 rows / 5 files. Pro-media: 200 MB / 500,000 rows / 20 files. Developer: 500 MB / unlimited rows.
Will it transliterate é to e or ü to u?
No. This tool only deletes characters; it never substitutes. To map accents to plain ASCII you need a transliteration step, which this tool does not perform.
What is the difference between this and the whitespace trimmer?
This deletes any character outside the kept classes (including NBSP and tabs). The whitespace trimmer focuses on normalising and collapsing whitespace (e.g. turning NBSP/tabs into spaces) rather than deleting non-text noise. Use this for symbol/control-character removal.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.