How to strip zero-width spaces and invisible characters from web form excel exports
- Step 1Export responses to a spreadsheet — Typeform: Results → Responses → Download (XLSX/CSV). Google Forms: Responses → ⋮ → Download responses (.csv) or open in Sheets → File → Download → .xlsx. Jotform/Tally: Submissions → Export. Drop the file onto the tool.
- Step 2Keep the default toggles for invisible-character cleanup — Leave all four — Letters, Digits, Spaces, Punctuation — ticked. This keeps normal answer text (including accents) and deletes only invisible characters and stray symbols.
- Step 3Decide if you want NBSP treated as deletion — With Spaces on, only the ASCII space is kept — NBSP and other Unicode spaces are deleted. If you would rather convert NBSP to a real space, do that step in the whitespace trimmer instead, then run this for the remaining noise.
- Step 4Run the strip — Click Strip special chars. Each data cell is filtered character by character; the header row (your question text) is skipped and left exactly as exported.
- Step 5Verify with the cells-modified count and preview — The result panel shows Cells modified and Data rows, plus a first-10-rows preview. A surprisingly high cells-modified count usually means the form was injecting hidden characters on many responses.
- Step 6Download the cleaned export — Click Download:
.stripped.xlsxfor spreadsheet input,.stripped.csvfor CSV. Re-import to your CRM, database, or mailing platform with the hidden characters gone.
Where form builders inject invisible characters
Typical sources of hidden Unicode in web-form exports and whether this tool removes them with default toggles (all four on).
| Source | Injected character | Removed by default? | Symptom it causes |
|---|---|---|---|
| Copy-paste into a text field | Zero-width space U+200B | Yes | Unique-constraint failure; search never matches |
| Mobile keyboard / autocorrect | NBSP U+00A0 | Yes | 'New York' ≠ 'New York' in WHERE clauses |
| First field / encoding | BOM U+FEFF (mid-cell) | Yes | Leading \ufeff breaks exact-match joins |
| Paste from Word/PDF | Curly quotes “ ”, em dash — | Yes | API JSON escaping / display glitches |
| Autofill control bytes | C0/C1 control chars | Yes | Boxes, corrupted CRM payloads |
| Emoji in free-text answers | 🙂 and other symbols | Yes | Index bloat, downstream encoding errors |
| Respondent's accented name | é, ñ, ü, ø, CJK | No — kept | Legitimate data, preserved by \p{L} |
Toggle settings for common form-cleanup goals
Pick toggles to match the destination system. Defaults (all on) are right for most form data; untick to be stricter.
| Goal | Letters | Digits | Spaces | Punctuation |
|---|---|---|---|---|
| General invisible-character cleanup (keep readable text) | On | On | On | On |
| Strict alphanumeric IDs from a 'reference' field | On | On | Off | Off |
| Digits-only phone field | Off | On | Off | Off |
| Letters-only first/last name field | On | Off | On | Off |
Cookbook
Real (anonymised) rows from form exports. The hidden characters are shown as escape sequences for clarity — in the actual file they are invisible.
Zero-width space breaking an email unique key
A Typeform email field with a trailing zero-width space from the respondent pasting their address. Invisible in the sheet, but the row fails your database's unique-email constraint because it is not byte-identical to the same email typed cleanly.
Input (CSV): email,name user@x.com,Ana user2@x.com,Bram Output (.stripped.csv, defaults): email,name user@x.com,Ana user2@x.com,Bram
NBSP in a city answer that never matched a filter
Google Forms response where the mobile keyboard inserted a non-breaking space inside 'New York'. Your dashboard's WHERE city = 'New York' returned nothing for this respondent. Spaces-on still deletes NBSP (only ASCII space is kept).
Input: city New York Los Angeles Output (defaults): city NewYork Los Angeles (If you wanted 'New York' with a normal space, run the whitespace trimmer first.)
BOM stuck to the first answer
Jotform export with a BOM (U+FEFF) prepended to the first data cell. The leading invisible byte broke an exact-match join against a clean reference list.
Input: ref,answer ABC123,Yes ABC124,No Output (defaults): ref,answer ABC123,Yes ABC124,No
Keep the accented name, drop the emoji
A free-text 'About you' field where the respondent's name is accented and they added an emoji. Defaults keep the name and remove the emoji and any control noise.
Input: name,about Søren Müller,Loves coffee ☕ 🙂 José,Dev 🚀 Output (defaults): name,about Søren Müller,Loves coffee José,Dev
XLSX export cleaned and re-saved as XLSX
Typeform XLSX with several text columns. The tool reads the first sheet, strips hidden noise across all data cells, and gives back a .stripped.xlsx. Header (question text) untouched.
Input: typeform-results.xlsx (Sheet1) header: Email | Full name | Comments row: a@x.com | An ya | great👍 Download: typeform-results.stripped.xlsx row becomes: a@x.com | Anya | great
Edge cases and what actually happens
Respondent's accented name is preserved
PreservedJosé, Søren, Ş, and non-Latin scripts are \p{L} letters and stay. Cleaning hidden characters never strips legitimate multilingual answers as long as Letters is on.
NBSP becomes nothing, not a space
ExpectedWith Spaces on, only the ASCII space is kept; NBSP (U+00A0) is deleted, so New\u00a0York becomes NewYork. If you need it to become New York, normalise NBSP→space in the whitespace trimmer first.
Header (form question text) is never cleaned
PreservedRow 1 is returned verbatim, so your question wording survives as column names. If a question itself contains a junk byte, sanitise headers with the header rename tool.
Words fuse where a zero-width joiner sat between them
ExpectedRemoval deletes with no replacement. A zero-width joiner sitting between two visible characters is dropped and they close up. Usually harmless, but verify in the preview for languages that rely on ZWJ.
Emoji and pictographs are removed
ExpectedEmoji are symbols, not letters, so they are deleted with default toggles. There is no option to keep them — if you must retain emoji, this is not the tool.
Curly quotes in a comment field are stripped
ExpectedSmart quotes “ ” ‘ ’ are not in the ASCII punctuation set, so they are removed. Straight quotes ' and " are kept. The tool deletes the curly glyph rather than converting it to a straight quote.
Multi-sheet export
First sheet onlyOnly the first sheet of an XLSX/ODS is read and written back. Additional response tabs are not included in the output.
File exceeds the free limit
RejectedFree tier: 5 MB / 10,000 rows / 1 file. Large response sets need Pro (50 MB / 100,000 rows / 5 files) or higher. Oversized files are rejected at the dropzone.
All toggles unticked
Strips everythingIf no class is kept, every data cell becomes empty. Keep at least Letters on for form text.
Decomposed accents from some keyboards
EdgeA precomposed é is kept; a decomposed e + combining acute (U+0301) keeps the e but may drop the combining mark (it is \p{M}, not \p{L}). Normalise responses to NFC upstream if exact glyphs matter.
Frequently asked questions
Will this delete my respondents' accented names?
No. The Letters toggle keeps all Unicode letters (\p{L}), including accented Latin and non-Latin scripts. José and Søren survive. Only invisible characters and symbols are removed.
Does it remove zero-width spaces?
Yes. U+200B (zero-width space), zero-width joiner, BOM (U+FEFF) appearing mid-cell, and soft hyphens are all deleted with default toggles, because none of them is a kept letter/digit/ASCII-space/punctuation.
Why does 'New York' from a form not match my database?
Almost always an NBSP (U+00A0) from a mobile keyboard sitting where a space should be. This tool deletes the NBSP. If you specifically want it turned into a normal space instead, run the whitespace trimmer first, then this tool for the remaining noise.
Does it strip the BOM at the start of the file?
It removes BOM characters that appear inside data cells. A BOM at the very start of a CSV file is normally handled by the parser; if a BOM ends up wedged in the first cell value, it is removed as noise.
Are emoji removed?
Yes. Emoji are symbols, not letters, so they are deleted with default toggles. There is no keep-emoji option.
Does it change my form's question text (the headers)?
No. The header row is preserved exactly. To clean a dirty question/header, use the header rename tool.
What formats can I upload and download?
Upload XLSX, XLS, ODS, or CSV. Download mirrors the input: .stripped.xlsx for spreadsheets, .stripped.csv for CSV.
Is respondent data uploaded to a server?
No. All parsing and stripping run in your browser. PII in form responses never leaves your machine.
Can I clean just the email column?
Not directly — the filter runs on all data columns. To isolate one field, export a single-column file, or pull the field first with the regex extractor.
Will it fix duplicate responses?
No. It removes characters; it does not deduplicate rows. After cleaning hidden characters (which is what makes duplicates look distinct), deduplicate with the deduplicator.
How many responses can I process at once?
Free: 10,000 rows / 5 MB / 1 file. Pro: 100,000 rows / 50 MB / 5 files. Pro-media: 500,000 rows / 200 MB / 20 files. Developer: unlimited rows / 500 MB.
Does removing a character leave a gap?
No. Deleted characters close up — An\u00a0ya becomes Anya. Check the preview if you are worried a removed separator caused words to merge.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.