How to strip encoding-artefact symbols from a csv
- Step 1Confirm the file is actually mis-encoded — Open the CSV in a plain text editor (not Excel). If you see
éwhereéshould be, or’for an apostrophe, it is a UTF-8/Latin-1 mismatch. Decide whether you can re-export — that is always the better fix. - Step 2If re-export is impossible, drop the file onto the stripper — Accepts CSV, XLSX, XLS, and ODS. Free tier: 2 MB / 500 rows. Pro: 100 MB / 100,000 rows. PapaParse auto-detects the delimiter.
- Step 3Leave all four keep boxes on — Letters, Digits, Spaces, Punctuation stay checked. This keeps real text and deletes the symbol noise. There is no per-column option, so the strip applies to all data cells.
- Step 4Run Strip special chars — Symbol fragments of the mojibake are deleted; letter fragments remain. The result is cleaner but understand it is lossy — see the cookbook for exactly what survives.
- Step 5Review the preview critically — Check names and descriptions in the first-10-row preview. If words still look wrong (e.g.
ébecameÃnoté), the stripper cannot recover them — note which columns need a proper re-encode. - Step 6Download and import, or escalate to a re-encode — Download writes
<name>.stripped.csvas UTF-8. If too much meaning was lost, go back to the source system and re-export with UTF-8, or use csv-cleaner's encoding handling instead.
What the stripper actually does to mojibake (all boxes on)
Behaviour verified character by character against the keep-pattern. Note how letter-classified bytes survive — this is why the tool is a partial clean, not a repair.
| Intended char | Mojibake seen | After stripping | Recovered? |
|---|---|---|---|
é | é | à (the © is removed, à is a letter) | No — mangled further |
’ (curly apostrophe) | ’ | â (€ and ™ removed) | No — left a stray â |
€ | € | â (‚ and ¬ removed) | No |
ü | ü | ü (both are letters, kept) | No — fully preserved garbage |
| BOM at start of cell | \ufeffid | id | Yes — BOM is not a letter |
Stray  before NBSP |  | removed if no letter,  kept if letter-classified | Partial |
Choose the right tool for the encoding problem
This stripper is the last-resort symbol deleter. For a real fix, a different tool or step is better.
| Your situation | Best tool / step | Why |
|---|---|---|
| You can re-export from the source system | Re-export with UTF-8 selected | The only true fix — recovers original characters losslessly |
| You need curly quotes / NBSP folded to ASCII | csv-cleaner | It normalises (substitutes) rather than deletes, and handles encoding/BOM |
| You want to delete one specific artefact pattern | csv-find-replace | Targeted find/replace incl. regex — surgical, no collateral deletion |
| You just need the visible symbol garbage gone for an import | This stripper | Fast bulk deletion of all non-keep characters |
| The file won't import because of a BOM in the header | This stripper (header kept, but BOM in data cells removed) or csv-cleaner | BOM is stripped from data cells; csv-cleaner handles the header BOM too |
Cookbook
Real re-encoded exports, before and after. Each shows exactly what the stripper keeps vs. deletes so you can judge whether it is enough or whether you must re-encode.
Apostrophe mojibake from Excel re-save
ExampleA UTF-8 CSV opened and saved as plain CSV in Excel turns curly apostrophes into ’. The stripper deletes € and ™ but keeps the letter â, so O'Brien written as O’Brien becomes OâBrien — readable-ish, still wrong. Decide if that is acceptable for your import.
Input (mis-encoded): id,name 1,O’Brien 2,DâAngelo Output (all boxes on): id,name 1,OâBrien 2,DâAngelo The apostrophe is gone but a stray 'â' remains. A UTF-8 re-export would give O'Brien correctly.
BOM in the first header cell breaking a match
ExampleA BOM prepended to the file shows as \ufeffid in the first header cell, so a script matching on id fails. Important nuance: this stripper never modifies the header row, so a BOM ON the header stays. It does strip BOMs that appear inside data cells. For a header BOM, use csv-cleaner.
Header BOM (NOT stripped here — header is protected): \ufeffid,name 1,Acme Data-cell BOM (stripped): id,note 1,\ufeffstarts with bom Output: id,note 1,starts with bom
Euro-symbol mojibake in a price note
ExampleA Latin-1 mis-decode renders € as €. The stripper removes ‚ and ¬ (not letters) but keeps â. The price still reads oddly. If the column is numeric, strip the symbol with find-replace instead so you don't leave a stray letter.
Input: id,blurb 1,Price €49 incl. VAT Output (all boxes on): id,blurb 1,Price â49 incl. VAT Use /tool/csv-find-replace to delete just '€' cleanly, or re-export with UTF-8 to get the real €.
Fully letter-classified mojibake survives unchanged
ExampleWhen both bytes of the mojibake are letters (ü for ü), the stripper changes nothing — both are kept by \p{L}. This is the clearest case where a stripper cannot help and you must re-encode.
Input: id,city 1,München 2,Zürich Output (all boxes on) — UNCHANGED: id,city 1,München 2,Zürich Only a UTF-8 re-export or csv-cleaner can fix these.
Mixing artefact cleanup with genuine symbols you want gone
ExampleIf your real goal is just to make a noisy export importable, accept that some legitimate symbols also go. Here ™ (artefact-adjacent) and a genuine & are both removed. Confirm the import target tolerates the loss.
Input: id,desc 1,Acmeâ„¢ Tools & Parts Output (all boxes on): id,desc 1,Acmeâ Tools Parts '™' fragment and '&' removed; stray 'â' and double space remain. Trim with /tool/csv-whitespace-trimmer.
Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Stripping does not recover the original character
Not fixedRemoving the symbol bytes of mojibake leaves a shortened token (e.g. é → Ã), never the intended é. Recovery requires re-decoding the original bytes. Re-export from the source with UTF-8, or use csv-cleaner which handles encoding properly.
Both mojibake bytes are letters, so nothing changes
No changeü (for ü) and é-style pairs where both characters are letters are fully preserved by \p{L}. The result shows 0 cells modified for those cells. This tool genuinely cannot help that pattern — re-encode at source.
BOM on the header row is not removed
PreservedThe first row is protected, so a BOM that prefixes the first header cell (\ufeffid) survives. Use csv-cleaner, which detects and strips the BOM from the header, or open and re-save the file as UTF-8 without BOM.
BOM inside a data cell is removed
SupportedIf a BOM (U+FEFF) ended up inside a data cell rather than at file start, it is deleted because it is not a kept character. This silently fixes the rare case where a concatenation step embedded a BOM mid-file.
Output is always written as UTF-8
By designDownloads are encoded UTF-8 regardless of the input's original encoding — the tool does not let you choose the output encoding. If your import target needs Latin-1 or UTF-16, convert after download with a dedicated encoder.
Genuine symbols are deleted alongside artefacts
ExpectedBecause it is a whitelist, &, #, %, +, =, and currency symbols go too, even where they are intentional. If you need to remove only the artefact, use csv-find-replace to target the exact garbled string.
Curly quotes vanish rather than fold to straight
By design“ ” ‘ ’ are removed, not converted to " or '. If you want them folded, csv-cleaner's smart-quote normalisation is the correct tool.
File over the tier limit is blocked
BlockedFree is 2 MB / 500 rows; Pro is 100 MB / 100,000 rows. A large re-encoded dump is blocked at upload. Split with csv-row-splitter or upgrade before stripping.
Double spaces remain after deleting a multi-char artefact
ExpectedDeleting a space-flanked artefact leaves adjacent spaces. The stripper never collapses whitespace. Chain csv-whitespace-trimmer to tidy up afterwards.
All four boxes unchecked does nothing
No changeWith no class enabled, the keep-pattern becomes /./ and matches everything, so no character is removed. Keep at least Letters and Spaces on for any artefact cleanup to occur.
Frequently asked questions
Can this recover my original accented characters from mojibake?
No. It deletes the non-letter pieces of a garbled sequence but cannot reconstruct the intended character. The only reliable fix is re-exporting from the source system with UTF-8 encoding, or using csv-cleaner's encoding handling.
Why is there still a stray 'â' or 'Ã' after stripping?
Those bytes are letters under \p{L}, so the keep-list preserves them while removing the surrounding symbol bytes. That is the inherent limitation of a character stripper against mojibake.
Does it remove the BOM?
It removes a BOM that appears inside a data cell, but not one on the header row, because the header is never modified. For a header BOM, use csv-cleaner or re-save the file as UTF-8 without BOM.
Is this tool a re-encoder?
No. It does not change character encoding or transliterate. It only deletes characters outside your keep-set. The download is always UTF-8.
When should I use csv-cleaner instead?
When you want to normalise rather than delete — fold curly quotes and NBSPs to ASCII, handle BOM and encoding detection. csv-cleaner substitutes; this stripper deletes.
When is csv-find-replace the better choice?
When you want to remove exactly one artefact string (like ’) without deleting any other symbols. Find-replace is surgical and supports regex; the stripper is bulk and indiscriminate.
Will it touch correctly-encoded accented names elsewhere in the file?
No. Properly-encoded letters of any script are kept by \p{L}. Only non-letter symbols are removed, so a clean François in another column is untouched.
What file formats can I use?
CSV, XLSX, XLS, and ODS. Spreadsheets are converted to rows, stripped, and downloaded in the same format; CSV downloads as UTF-8 with a .stripped.csv suffix.
Is anything uploaded to a server?
No. Parsing and stripping happen in your browser. A sensitive export never leaves your machine.
What are the limits?
Free: 2 MB and 500 data rows. Pro: 100 MB and 100,000 rows. Larger files are blocked at upload.
How do I prevent encoding artefacts in the first place?
When saving from Excel, choose 'CSV UTF-8' rather than the plain 'CSV' option, and confirm the source export uses UTF-8. Never round-trip a UTF-8 file through a Latin-1 save.
Why do I get double spaces after cleanup?
Deleting a multi-character artefact between two spaces leaves both spaces. The stripper doesn't collapse whitespace; run csv-whitespace-trimmer afterwards.
Privacy first
Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.