Fix Encoding Artefacts in a CSV — Strip Stray Symbols Online

How to strip encoding-artefact symbols from a csv

Step 1
Confirm the file is actually mis-encoded — Open the CSV in a plain text editor (not Excel). If you see Ã© where é should be, or â€™ for an apostrophe, it is a UTF-8/Latin-1 mismatch. Decide whether you can re-export — that is always the better fix.
Step 2
If re-export is impossible, drop the file onto the stripper — Accepts CSV, XLSX, XLS, and ODS. Free tier: 2 MB / 500 rows. Pro: 100 MB / 100,000 rows. PapaParse auto-detects the delimiter.
Step 3
Leave all four keep boxes on — Letters, Digits, Spaces, Punctuation stay checked. This keeps real text and deletes the symbol noise. There is no per-column option, so the strip applies to all data cells.
Step 4
Run Strip special chars — Symbol fragments of the mojibake are deleted; letter fragments remain. The result is cleaner but understand it is lossy — see the cookbook for exactly what survives.
Step 5
Review the preview critically — Check names and descriptions in the first-10-row preview. If words still look wrong (e.g. Ã© became Ã not é), the stripper cannot recover them — note which columns need a proper re-encode.
Step 6
Download and import, or escalate to a re-encode — Download writes <name>.stripped.csv as UTF-8. If too much meaning was lost, go back to the source system and re-export with UTF-8, or use csv-cleaner's encoding handling instead.

What the stripper actually does to mojibake (all boxes on)

Behaviour verified character by character against the keep-pattern. Note how letter-classified bytes survive — this is why the tool is a partial clean, not a repair.

Intended char	Mojibake seen	After stripping	Recovered?
`é`	`Ã©`	`Ã` (the `©` is removed, `Ã` is a letter)	No — mangled further
`’` (curly apostrophe)	`â€™`	`â` (`€` and `™` removed)	No — left a stray `â`
`€`	`â‚¬`	`â` (`‚` and `¬` removed)	No
`ü`	`Ã¼`	`Ã¼` (both are letters, kept)	No — fully preserved garbage
BOM at start of cell	`\ufeffid`	`id`	Yes — BOM is not a letter
Stray `Â` before NBSP	`Â`	removed if no letter, `Â` kept if letter-classified	Partial

Choose the right tool for the encoding problem

This stripper is the last-resort symbol deleter. For a real fix, a different tool or step is better.

Your situation	Best tool / step	Why
You can re-export from the source system	Re-export with UTF-8 selected	The only true fix — recovers original characters losslessly
You need curly quotes / NBSP folded to ASCII	csv-cleaner	It normalises (substitutes) rather than deletes, and handles encoding/BOM
You want to delete one specific artefact pattern	csv-find-replace	Targeted find/replace incl. regex — surgical, no collateral deletion
You just need the visible symbol garbage gone for an import	This stripper	Fast bulk deletion of all non-keep characters
The file won't import because of a BOM in the header	This stripper (header kept, but BOM in data cells removed) or csv-cleaner	BOM is stripped from data cells; csv-cleaner handles the header BOM too

Cookbook

Real re-encoded exports, before and after. Each shows exactly what the stripper keeps vs. deletes so you can judge whether it is enough or whether you must re-encode.

Apostrophe mojibake from Excel re-save

Example

A UTF-8 CSV opened and saved as plain CSV in Excel turns curly apostrophes into â€™. The stripper deletes € and ™ but keeps the letter â, so O'Brien written as Oâ€™Brien becomes OâBrien — readable-ish, still wrong. Decide if that is acceptable for your import.

Input (mis-encoded):
id,name
1,Oâ€™Brien
2,DâAngelo

Output (all boxes on):
id,name
1,OâBrien
2,DâAngelo

The apostrophe is gone but a stray 'â' remains.
A UTF-8 re-export would give O'Brien correctly.

BOM in the first header cell breaking a match

Example

A BOM prepended to the file shows as \ufeffid in the first header cell, so a script matching on id fails. Important nuance: this stripper never modifies the header row, so a BOM ON the header stays. It does strip BOMs that appear inside data cells. For a header BOM, use csv-cleaner.

Header BOM (NOT stripped here — header is protected):
\ufeffid,name
1,Acme

Data-cell BOM (stripped):
id,note
1,\ufeffstarts with bom

Output:
id,note
1,starts with bom

Euro-symbol mojibake in a price note

Example

A Latin-1 mis-decode renders € as â‚¬. The stripper removes ‚ and ¬ (not letters) but keeps â. The price still reads oddly. If the column is numeric, strip the symbol with find-replace instead so you don't leave a stray letter.

Input:
id,blurb
1,Price â‚¬49 incl. VAT

Output (all boxes on):
id,blurb
1,Price â49 incl. VAT

Use /tool/csv-find-replace to delete just 'â‚¬' cleanly,
or re-export with UTF-8 to get the real €.

Fully letter-classified mojibake survives unchanged

Example

When both bytes of the mojibake are letters (Ã¼ for ü), the stripper changes nothing — both are kept by \p{L}. This is the clearest case where a stripper cannot help and you must re-encode.

Input:
id,city
1,MÃ¼nchen
2,ZÃ¼rich

Output (all boxes on) — UNCHANGED:
id,city
1,MÃ¼nchen
2,ZÃ¼rich

Only a UTF-8 re-export or csv-cleaner can fix these.

Mixing artefact cleanup with genuine symbols you want gone

Example

If your real goal is just to make a noisy export importable, accept that some legitimate symbols also go. Here ™ (artefact-adjacent) and a genuine & are both removed. Confirm the import target tolerates the loss.

Input:
id,desc
1,Acmeâ„¢ Tools & Parts

Output (all boxes on):
id,desc
1,Acmeâ Tools  Parts

'™' fragment and '&' removed; stray 'â' and double space
remain. Trim with /tool/csv-whitespace-trimmer.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Stripping does not recover the original character

Not fixed

Removing the symbol bytes of mojibake leaves a shortened token (e.g. Ã© → Ã), never the intended é. Recovery requires re-decoding the original bytes. Re-export from the source with UTF-8, or use csv-cleaner which handles encoding properly.

Both mojibake bytes are letters, so nothing changes

No change

Ã¼ (for ü) and Ã©-style pairs where both characters are letters are fully preserved by \p{L}. The result shows 0 cells modified for those cells. This tool genuinely cannot help that pattern — re-encode at source.

BOM on the header row is not removed

Preserved

The first row is protected, so a BOM that prefixes the first header cell (\ufeffid) survives. Use csv-cleaner, which detects and strips the BOM from the header, or open and re-save the file as UTF-8 without BOM.

BOM inside a data cell is removed

Supported

If a BOM (U+FEFF) ended up inside a data cell rather than at file start, it is deleted because it is not a kept character. This silently fixes the rare case where a concatenation step embedded a BOM mid-file.

Output is always written as UTF-8

By design

Downloads are encoded UTF-8 regardless of the input's original encoding — the tool does not let you choose the output encoding. If your import target needs Latin-1 or UTF-16, convert after download with a dedicated encoder.

Genuine symbols are deleted alongside artefacts

Expected

Because it is a whitelist, &, #, %, +, =, and currency symbols go too, even where they are intentional. If you need to remove only the artefact, use csv-find-replace to target the exact garbled string.

Curly quotes vanish rather than fold to straight

By design

“ ” ‘ ’ are removed, not converted to " or '. If you want them folded, csv-cleaner's smart-quote normalisation is the correct tool.

File over the tier limit is blocked

Blocked

Free is 2 MB / 500 rows; Pro is 100 MB / 100,000 rows. A large re-encoded dump is blocked at upload. Split with csv-row-splitter or upgrade before stripping.

Double spaces remain after deleting a multi-char artefact

Expected

Deleting a space-flanked artefact leaves adjacent spaces. The stripper never collapses whitespace. Chain csv-whitespace-trimmer to tidy up afterwards.

All four boxes unchecked does nothing

No change

With no class enabled, the keep-pattern becomes /./ and matches everything, so no character is removed. Keep at least Letters and Spaces on for any artefact cleanup to occur.

Frequently asked questions

Can this recover my original accented characters from mojibake?

No. It deletes the non-letter pieces of a garbled sequence but cannot reconstruct the intended character. The only reliable fix is re-exporting from the source system with UTF-8 encoding, or using csv-cleaner's encoding handling.

Why is there still a stray 'â' or 'Ã' after stripping?

Those bytes are letters under \p{L}, so the keep-list preserves them while removing the surrounding symbol bytes. That is the inherent limitation of a character stripper against mojibake.

Does it remove the BOM?

It removes a BOM that appears inside a data cell, but not one on the header row, because the header is never modified. For a header BOM, use csv-cleaner or re-save the file as UTF-8 without BOM.

Is this tool a re-encoder?

No. It does not change character encoding or transliterate. It only deletes characters outside your keep-set. The download is always UTF-8.

When should I use csv-cleaner instead?

When you want to normalise rather than delete — fold curly quotes and NBSPs to ASCII, handle BOM and encoding detection. csv-cleaner substitutes; this stripper deletes.

When is csv-find-replace the better choice?

When you want to remove exactly one artefact string (like â€™) without deleting any other symbols. Find-replace is surgical and supports regex; the stripper is bulk and indiscriminate.

Will it touch correctly-encoded accented names elsewhere in the file?

No. Properly-encoded letters of any script are kept by \p{L}. Only non-letter symbols are removed, so a clean François in another column is untouched.

What file formats can I use?

CSV, XLSX, XLS, and ODS. Spreadsheets are converted to rows, stripped, and downloaded in the same format; CSV downloads as UTF-8 with a .stripped.csv suffix.

Is anything uploaded to a server?

No. Parsing and stripping happen in your browser. A sensitive export never leaves your machine.

What are the limits?

Free: 2 MB and 500 data rows. Pro: 100 MB and 100,000 rows. Larger files are blocked at upload.

How do I prevent encoding artefacts in the first place?

When saving from Excel, choose 'CSV UTF-8' rather than the plain 'CSV' option, and confirm the source export uses UTF-8. Never round-trip a UTF-8 file through a Latin-1 save.

Why do I get double spaces after cleanup?

Deleting a multi-character artefact between two spaces leaves both spaces. The stripper doesn't collapse whitespace; run csv-whitespace-trimmer afterwards.

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

How to strip encoding-artefact symbols from a csv

Step 1
Confirm the file is actually mis-encoded — Open the CSV in a plain text editor (not Excel). If you see Ã© where é should be, or â€™ for an apostrophe, it is a UTF-8/Latin-1 mismatch. Decide whether you can re-export — that is always the better fix.
Step 2
If re-export is impossible, drop the file onto the stripper — Accepts CSV, XLSX, XLS, and ODS. Free tier: 2 MB / 500 rows. Pro: 100 MB / 100,000 rows. PapaParse auto-detects the delimiter.
Step 3
Leave all four keep boxes on — Letters, Digits, Spaces, Punctuation stay checked. This keeps real text and deletes the symbol noise. There is no per-column option, so the strip applies to all data cells.
Step 4
Run Strip special chars — Symbol fragments of the mojibake are deleted; letter fragments remain. The result is cleaner but understand it is lossy — see the cookbook for exactly what survives.
Step 5
Review the preview critically — Check names and descriptions in the first-10-row preview. If words still look wrong (e.g. Ã© became Ã not é), the stripper cannot recover them — note which columns need a proper re-encode.
Step 6
Download and import, or escalate to a re-encode — Download writes <name>.stripped.csv as UTF-8. If too much meaning was lost, go back to the source system and re-export with UTF-8, or use csv-cleaner's encoding handling instead.

What the stripper actually does to mojibake (all boxes on)

Behaviour verified character by character against the keep-pattern. Note how letter-classified bytes survive — this is why the tool is a partial clean, not a repair.

Intended char	Mojibake seen	After stripping	Recovered?
`é`	`Ã©`	`Ã` (the `©` is removed, `Ã` is a letter)	No — mangled further
`’` (curly apostrophe)	`â€™`	`â` (`€` and `™` removed)	No — left a stray `â`
`€`	`â‚¬`	`â` (`‚` and `¬` removed)	No
`ü`	`Ã¼`	`Ã¼` (both are letters, kept)	No — fully preserved garbage
BOM at start of cell	`\ufeffid`	`id`	Yes — BOM is not a letter
Stray `Â` before NBSP	`Â`	removed if no letter, `Â` kept if letter-classified	Partial

Choose the right tool for the encoding problem

This stripper is the last-resort symbol deleter. For a real fix, a different tool or step is better.

Your situation	Best tool / step	Why
You can re-export from the source system	Re-export with UTF-8 selected	The only true fix — recovers original characters losslessly
You need curly quotes / NBSP folded to ASCII	csv-cleaner	It normalises (substitutes) rather than deletes, and handles encoding/BOM
You want to delete one specific artefact pattern	csv-find-replace	Targeted find/replace incl. regex — surgical, no collateral deletion
You just need the visible symbol garbage gone for an import	This stripper	Fast bulk deletion of all non-keep characters
The file won't import because of a BOM in the header	This stripper (header kept, but BOM in data cells removed) or csv-cleaner	BOM is stripped from data cells; csv-cleaner handles the header BOM too

Cookbook

Real re-encoded exports, before and after. Each shows exactly what the stripper keeps vs. deletes so you can judge whether it is enough or whether you must re-encode.

Apostrophe mojibake from Excel re-save

Example

Input (mis-encoded):
id,name
1,Oâ€™Brien
2,DâAngelo

Output (all boxes on):
id,name
1,OâBrien
2,DâAngelo

The apostrophe is gone but a stray 'â' remains.
A UTF-8 re-export would give O'Brien correctly.

BOM in the first header cell breaking a match

Example

Header BOM (NOT stripped here — header is protected):
\ufeffid,name
1,Acme

Data-cell BOM (stripped):
id,note
1,\ufeffstarts with bom

Output:
id,note
1,starts with bom

Euro-symbol mojibake in a price note

Example

Input:
id,blurb
1,Price â‚¬49 incl. VAT

Output (all boxes on):
id,blurb
1,Price â49 incl. VAT

Use /tool/csv-find-replace to delete just 'â‚¬' cleanly,
or re-export with UTF-8 to get the real €.

Fully letter-classified mojibake survives unchanged

Example

Input:
id,city
1,MÃ¼nchen
2,ZÃ¼rich

Output (all boxes on) — UNCHANGED:
id,city
1,MÃ¼nchen
2,ZÃ¼rich

Only a UTF-8 re-export or csv-cleaner can fix these.

Mixing artefact cleanup with genuine symbols you want gone

Example

Input:
id,desc
1,Acmeâ„¢ Tools & Parts

Output (all boxes on):
id,desc
1,Acmeâ Tools  Parts

'™' fragment and '&' removed; stray 'â' and double space
remain. Trim with /tool/csv-whitespace-trimmer.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Stripping does not recover the original character

Not fixed

Both mojibake bytes are letters, so nothing changes

No change

BOM on the header row is not removed

Preserved

BOM inside a data cell is removed

Supported

Output is always written as UTF-8

By design

Genuine symbols are deleted alongside artefacts

Expected

Curly quotes vanish rather than fold to straight

By design

“ ” ‘ ’ are removed, not converted to " or '. If you want them folded, csv-cleaner's smart-quote normalisation is the correct tool.

File over the tier limit is blocked

Blocked

Free is 2 MB / 500 rows; Pro is 100 MB / 100,000 rows. A large re-encoded dump is blocked at upload. Split with csv-row-splitter or upgrade before stripping.

Double spaces remain after deleting a multi-char artefact

Expected

Deleting a space-flanked artefact leaves adjacent spaces. The stripper never collapses whitespace. Chain csv-whitespace-trimmer to tidy up afterwards.

All four boxes unchecked does nothing

No change

With no class enabled, the keep-pattern becomes /./ and matches everything, so no character is removed. Keep at least Letters and Spaces on for any artefact cleanup to occur.

Frequently asked questions

Can this recover my original accented characters from mojibake?

Why is there still a stray 'â' or 'Ã' after stripping?

Those bytes are letters under \p{L}, so the keep-list preserves them while removing the surrounding symbol bytes. That is the inherent limitation of a character stripper against mojibake.

Does it remove the BOM?

It removes a BOM that appears inside a data cell, but not one on the header row, because the header is never modified. For a header BOM, use csv-cleaner or re-save the file as UTF-8 without BOM.

Is this tool a re-encoder?

No. It does not change character encoding or transliterate. It only deletes characters outside your keep-set. The download is always UTF-8.

When should I use csv-cleaner instead?

When you want to normalise rather than delete — fold curly quotes and NBSPs to ASCII, handle BOM and encoding detection. csv-cleaner substitutes; this stripper deletes.

When is csv-find-replace the better choice?

When you want to remove exactly one artefact string (like â€™) without deleting any other symbols. Find-replace is surgical and supports regex; the stripper is bulk and indiscriminate.

Will it touch correctly-encoded accented names elsewhere in the file?

No. Properly-encoded letters of any script are kept by \p{L}. Only non-letter symbols are removed, so a clean François in another column is untouched.

What file formats can I use?

CSV, XLSX, XLS, and ODS. Spreadsheets are converted to rows, stripped, and downloaded in the same format; CSV downloads as UTF-8 with a .stripped.csv suffix.

Is anything uploaded to a server?

No. Parsing and stripping happen in your browser. A sensitive export never leaves your machine.

What are the limits?

Free: 2 MB and 500 data rows. Pro: 100 MB and 100,000 rows. Larger files are blocked at upload.

How do I prevent encoding artefacts in the first place?

When saving from Excel, choose 'CSV UTF-8' rather than the plain 'CSV' option, and confirm the source export uses UTF-8. Never round-trip a UTF-8 file through a Latin-1 save.

Why do I get double spaces after cleanup?

Deleting a multi-character artefact between two spaces leaves both spaces. The stripper doesn't collapse whitespace; run csv-whitespace-trimmer afterwards.

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

Strip Encoding-Artefact Symbols From a CSV

How to strip encoding-artefact symbols from a csv

What the stripper actually does to mojibake (all boxes on)

Choose the right tool for the encoding problem

Cookbook

Apostrophe mojibake from Excel re-save

BOM in the first header cell breaking a match

Euro-symbol mojibake in a price note

Fully letter-classified mojibake survives unchanged

Mixing artefact cleanup with genuine symbols you want gone

Errors and edge cases

Stripping does not recover the original character

Both mojibake bytes are letters, so nothing changes

BOM on the header row is not removed

BOM inside a data cell is removed

Output is always written as UTF-8

Genuine symbols are deleted alongside artefacts

Curly quotes vanish rather than fold to straight

File over the tier limit is blocked

Double spaces remain after deleting a multi-char artefact

All four boxes unchecked does nothing

Frequently asked questions

Can this recover my original accented characters from mojibake?

Why is there still a stray 'â' or 'Ã' after stripping?

Does it remove the BOM?

Is this tool a re-encoder?

When should I use csv-cleaner instead?

When is csv-find-replace the better choice?

Will it touch correctly-encoded accented names elsewhere in the file?

What file formats can I use?

Is anything uploaded to a server?

What are the limits?

How do I prevent encoding artefacts in the first place?

Why do I get double spaces after cleanup?

Privacy first

Related guides

Strip Encoding-Artefact Symbols From a CSV

How to strip encoding-artefact symbols from a csv

What the stripper actually does to mojibake (all boxes on)

Choose the right tool for the encoding problem

Cookbook

Apostrophe mojibake from Excel re-save

BOM in the first header cell breaking a match

Euro-symbol mojibake in a price note

Fully letter-classified mojibake survives unchanged

Mixing artefact cleanup with genuine symbols you want gone

Errors and edge cases

Stripping does not recover the original character

Both mojibake bytes are letters, so nothing changes

BOM on the header row is not removed

BOM inside a data cell is removed

Output is always written as UTF-8

Genuine symbols are deleted alongside artefacts

Curly quotes vanish rather than fold to straight

File over the tier limit is blocked

Double spaces remain after deleting a multi-char artefact

All four boxes unchecked does nothing

Frequently asked questions

Can this recover my original accented characters from mojibake?

Why is there still a stray 'â' or 'Ã' after stripping?

Does it remove the BOM?

Is this tool a re-encoder?

When should I use csv-cleaner instead?

When is csv-find-replace the better choice?

Will it touch correctly-encoded accented names elsewhere in the file?

What file formats can I use?

Is anything uploaded to a server?

What are the limits?

How do I prevent encoding artefacts in the first place?

Why do I get double spaces after cleanup?

Privacy first

Related guides