How to resolve mysql utf8 import errors caused by special characters in excel
- Step 1Reproduce the 1366 error and note the bytes — MySQL reports the offending byte sequence, e.g.
\xF0\x9F\x98\x80(😀). A leading\xF0byte means a 4-byte character — that is whatutf8mb3cannot store. - Step 2Drop the export onto the tool — Upload the XLSX or CSV you are trying to import. The first sheet of a spreadsheet is read into rows; free tier handles 5 MB / 10,000 rows / 1 file.
- Step 3Keep all four toggles on for the standard fix — Defaults (Letters + Digits + Spaces + Punctuation) keep readable text and delete emoji/symbols/control characters — the typical 1366 triggers.
- Step 4Run the strip — Click Strip special chars. Every data cell is filtered; emoji and symbol bytes are dropped. The header row is left untouched.
- Step 5Confirm via cells-modified and preview — The Cells modified stat tells you how many cells contained problem characters. Spot-check the preview to confirm accented names and CJK survived.
- Step 6Re-import the cleaned file — Download
.stripped.csv(or.stripped.xlsx) and re-run yourLOAD DATA INFILE/ import. The 1366 errors from emoji/symbols are gone. For a permanent fix, also migrate the column toutf8mb4.
Why MySQL throws 1366 and what this tool does about it
The tool addresses character-level triggers by deletion. It is a data workaround, not a schema fix — utf8mb4 is the permanent solution.
| Trigger | Byte signature | Removed by defaults? | Permanent fix |
|---|---|---|---|
| Emoji 😀 🚀 ✅ | starts \xF0 | Yes (symbol) | Migrate column to utf8mb4 |
| Pictographs ▣ ◆ ★ | 3-byte symbols | Yes (symbol) | n/a — safe to remove |
| Box-drawing / control chars | C0/C1 + line glyphs | Yes | n/a — safe to remove |
| Curly quotes “ ” ‘ ’ | 3-byte U+201x | Yes | n/a — or normalise to ASCII |
| Accented letter é, ñ, ü | 2-byte \p{L} | No — kept | Supported by utf8mb3; keep |
| CJK ideograph 中 (3-byte) | 3-byte \p{L} | No — kept | Supported by utf8mb3; keep |
| 4-byte CJK extension letter | starts \xF0, is \p{L} | No — kept (still 4-byte!) | Needs utf8mb4 — this tool will NOT remove it |
Engine-specific notes
How the same data behaves across common targets, so you know whether deletion is the right move.
| Database | Symptom with 4-byte chars | Does deletion help? |
|---|---|---|
MySQL utf8/utf8mb3 | ERROR 1366 on emoji/4-byte chars | Yes for emoji/symbols; use utf8mb4 for 4-byte letters |
MySQL utf8mb4 | Stores 4-byte fine | Usually unnecessary — only strip if you want clean text |
| PostgreSQL UTF8 | Rejects NUL/invalid bytes, not emoji | Yes for control/NUL bytes |
| SQLite | Accepts most UTF-8 | Optional, for downstream cleanliness |
Cookbook
Real INSERT/import failures and the cleaned result. The MySQL error line is shown so you can match it to your own log.
Emoji in a review column throwing 1366
A product-review export contains a thumbs-up emoji. On a utf8mb3 column the import dies at that row. Defaults delete the emoji; the rest of the review text survives.
MySQL error: ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x91\x8D...' for column 'body' at row 12 Input (CSV): id,body 12,Great product 👍 works well Output (.stripped.csv, defaults): id,body 12,Great product works well
Accented customer name is kept (no over-stripping)
The same import has accented names. Those are valid in utf8mb3 and must NOT be removed. With defaults on, they are preserved — the tool only removed the emoji that caused 1366.
Input: id,name,body 12,José Müller,Great 👍 13,Zoë,Fine Output (defaults): id,name,body 12,José Müller,Great 13,Zoë,Fine
Control byte from a legacy ERP export
A mainframe-style export wedged a control character into a description. Even with utf8mb4 the strict-mode insert complained. Defaults delete the control byte.
Input (control STX shown as \x02): sku,desc A1,Wid\x02get blue Output (defaults): sku,desc A1,Widget blue
When deletion is NOT enough — 4-byte CJK letter
A name uses a 4-byte CJK-extension character. It is a letter, so this tool KEEPS it — and it will still fail on utf8mb3. This case requires migrating the column to utf8mb4.
Input:
id,name
20,𪛖 Lee
Output (defaults): unchanged — the 4-byte letter is KEPT
id,name
20,𪛖 Lee
Fix: ALTER TABLE t MODIFY name VARCHAR(80)
CHARACTER SET utf8mb4;XLSX export cleaned before LOAD DATA
Stakeholders sent an .xlsx. Clean it to .stripped.xlsx, then export to CSV for LOAD DATA, or import the XLSX via your ETL. First sheet only.
Input: orders.xlsx (Sheet1) header: id | customer | note row: 9 | Søren | shipped ✅ Download: orders.stripped.xlsx row becomes: 9 | Søren | shipped
Edge cases and what actually happens
4-byte letter (CJK extension) is kept and still fails utf8mb3
Not a fix hereBecause Letters keeps all \p{L} characters, a 4-byte CJK-extension letter survives this tool and will still throw 1366 on a utf8mb3 column. The only correct fix is migrating the column/table to utf8mb4. This tool is for emoji/symbol/control-byte 1366s, not 4-byte letters.
Accented Latin letters are preserved
By designé, ñ, ü are 2-byte and valid in utf8mb3, and they are kept by Letters. The tool will not over-strip them, which is what you want — they are not the cause of 1366.
Emoji and pictographs are deleted
ResolvedEmoji are symbols, not letters, so they are removed with defaults — this clears the most common 1366 trigger.
NUL bytes / control characters
RemovedControl characters (including NUL) are never kept and are deleted, which also helps PostgreSQL imports that reject NUL in text.
Header row is not cleaned
PreservedColumn names stay verbatim so your import mapping and LOAD DATA ... (col1, col2) list remain valid. Clean a dirty header separately with the header rename tool.
Curly quotes removed, straight quotes kept
ExpectedSmart quotes are deleted (not in the ASCII punctuation set); straight ' and " are kept. Watch for SQL escaping if your values contain straight quotes.
Removal closes up text
ExpectedA deleted emoji leaves no gap: Great 👍 product becomes Great product (the surrounding spaces remain because Spaces is on). No placeholder is inserted.
File over tier limit
RejectedFree: 5 MB / 10,000 rows / 1 file. Larger exports need Pro (50 MB / 100,000 rows / 5 files) or higher. Oversized files are rejected before processing.
Multi-sheet workbook
First sheet onlyOnly the first sheet is processed and exported. Move the target data to the first sheet if needed.
Best paired with a schema migration
By designStripping is a stopgap. For durable support of all Unicode, run ALTER TABLE ... CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci and set the connection charset to utf8mb4.
Frequently asked questions
Does this guarantee my MySQL import will stop throwing 1366?
It removes the most common triggers — emoji, pictographs, symbols, and control bytes. It does NOT remove 4-byte characters that are letters (some CJK extensions), because those are kept by the \p{L} rule. If your data has 4-byte letters, migrate the column to utf8mb4 as well.
Why does the tool keep accented characters if I'm fixing an encoding error?
Accented Latin (é, ñ) is 2-byte and perfectly valid in MySQL utf8mb3 — it is not what causes 1366. Stripping it would corrupt legitimate data. The actual 1366 culprits (emoji/symbols/4-byte) are what get removed.
What is the real difference between utf8 and utf8mb4?
MySQL's utf8 (utf8mb3) stores up to 3 bytes per character and cannot hold 4-byte characters like emoji. utf8mb4 stores the full Unicode range. The permanent fix for 1366 is migrating to utf8mb4; this tool is the data-side workaround when you cannot.
Will it remove emoji from review/comment columns?
Yes. Emoji are symbols, not letters, so they are deleted with default toggles, which clears the typical 1366 on a comment/body column.
Does it handle PostgreSQL encoding errors too?
It helps with PostgreSQL errors caused by NUL/control bytes, which it removes. PostgreSQL UTF8 accepts emoji, so deletion is only needed there if you want cleaner text.
Will my header / column names change?
No. The header row is preserved exactly so your import column list stays valid.
What output do I get for re-import?
CSV input downloads as .stripped.csv (ideal for LOAD DATA INFILE); XLSX input downloads as .stripped.xlsx.
Is my customer data uploaded anywhere?
No. All processing is in-browser; the export never leaves your machine.
Can I strip only the column that's failing?
Not in this tool — it filters all data columns. To target one column, export it alone or extract it first with the regex extractor.
Does removing characters insert a placeholder like '?'?
No. Characters are deleted and the text closes up; no ? or space is substituted (except that existing spaces remain).
How large a file can I clean before import?
Free: 5 MB / 10,000 rows / 1 file. Pro: 50 MB / 100,000 rows / 5 files. Pro-media: 200 MB / 500,000 rows / 20 files. Developer: 500 MB / unlimited rows.
After cleaning, should I still change the schema?
Ideally yes. Run ALTER TABLE ... CONVERT TO CHARACTER SET utf8mb4 and use a utf8mb4 connection so future data with emoji or 4-byte letters imports without error.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.