How to strip special characters from a csv before mysql import
- Step 1Generate or export the CSV destined for MySQL — Produce the file you plan to load. If it came from another database or an XML/JSON converter, it is a prime candidate for null bytes and control characters.
- Step 2Drop it onto the stripper — Accepts CSV, XLSX, XLS, ODS. Free: 2 MB / 500 rows. Pro: 100 MB / 100,000 rows. PapaParse auto-detects the delimiter from the first rows.
- Step 3Keep all four boxes checked — Letters, Digits, Spaces, Punctuation on. This removes control/invisible characters and symbol noise while keeping the text, numbers, and separators your INSERT needs. The strip covers every data cell — there is no per-column option.
- Step 4Run Strip special chars — Null bytes, control characters, emoji, and any character outside your keep-set are deleted. The header row is left intact.
- Step 5Verify counts and preview — Check cells modified and the first-10-row preview. Confirm numeric and key columns still look right — especially that no intended symbol (like
&or%) was needed. - Step 6Download and run your import — Download writes
<name>.stripped.csvas UTF-8. PointLOAD DATA INFILE(or your bulk INSERT pipeline) at the cleaned file.
Import-breaking characters and how the stripper handles them
With all four keep boxes on. Verified against the keep-pattern. 'Removed' means deleted entirely from the cell.
| Character | MySQL/Postgres effect | Stripper result (all boxes on) |
|---|---|---|
Null byte \0 | Truncates value or inserts literal \0; can abort LOAD DATA | Removed |
Tab \t | If tab is the FIELD delimiter, an in-cell tab splits the field | Removed |
Carriage return \r | Can break line parsing on LINES TERMINATED BY '\n' | Removed |
| Vertical tab / form feed | Misaligns or corrupts the value | Removed |
| Emoji / 4-byte char | Fails on a utf8 (3-byte) column unless utf8mb4 | Removed |
& # % * = | Insert fine, but may be unwanted noise | Removed (not in keep-set) |
Comma , | The standard field delimiter — kept inside quoted fields | Kept (kept punctuation) |
Hyphen -, period . | Dates, decimals, keys | Kept (kept punctuation) |
Digits 0-9 | Numeric columns, IDs | Kept (Digits on) |
Keep-list choices for SQL-bound data
Why each box matters for an INSERT/LOAD DATA workflow. Leave all on unless you have a specific reason.
| Checkbox | Leave ON because... | Risk if turned OFF |
|---|---|---|
| Letters | Names, statuses, text columns need their characters | All text content is deleted |
| Digits | Primary keys, foreign keys, prices, dates need digits | 2026-01-15 becomes --; IDs gutted |
| Spaces | Multi-word values (New York) keep their spaces | Words collapse: New York → NewYork |
| Punctuation | Decimals, hyphenated dates/keys, commas in quoted fields survive | 19.99 → 1999; 2026-01 → 202601 |
Cookbook
Before/after rows from CSVs that were failing a MySQL import. Each shows the invisible or stray character that caused the failure and what the stripper leaves behind.
Null byte from an XML-to-CSV converter
ExampleSome XML/legacy exporters emit null-terminated strings; the terminator survives into the CSV as \0. LOAD DATA INFILE then truncates the value or inserts a literal \0. The stripper removes it because it is a control character outside every keep class.
Input (null byte shown as \0): id,name 1,Acme Ltd\0 2,Beta Corp MySQL: row 1 value truncated / corrupt. Output (all boxes on): id,name 1,Acme Ltd 2,Beta Corp
Embedded tab inside a comma-delimited field
ExampleA copy-paste from a spreadsheet dragged a literal tab into a note. On a comma-delimited LOAD DATA it is harmless, but on a TAB-delimited import it splits the field. The stripper deletes in-cell tabs so the value is single-token.
Input (tab shown as \t): id,note 1,line1\tline2 TAB-delimited LOAD DATA splits this into two columns. Output (all boxes on): id,note 1,line1line2
Emoji failing on a utf8 (non-mb4) column
ExampleA 4-byte emoji can't insert into a legacy utf8 MySQL column and raises Incorrect string value. Either migrate the column to utf8mb4 or strip the emoji first. The stripper removes it cleanly.
Input: id,review 1,Great product 👍 MySQL utf8 column: 'Incorrect string value: \xF0\x9F...' Output (all boxes on): id,review 1,Great product
Decimals and dates survive when Punctuation and Digits stay on
ExampleThe biggest risk for SQL prep is over-stripping numeric data. With the default boxes, periods, hyphens, and commas are kept so prices and ISO dates pass through intact.
Input: id,price,created 1,1299.50,2026-01-15 Output (all boxes on) — UNCHANGED: id,price,created 1,1299.50,2026-01-15 (Turn Digits or Punctuation OFF and these would be destroyed.)
Stray control characters from a legacy mainframe export
ExampleMainframe and AS/400 exports often carry record-separator and unit-separator control bytes. They are invisible in editors but abort or misalign a bulk import. All such control characters are deleted by the keep-list.
Input (control chars shown as ^): id,status^ 1,ACTIVE^ 2,CLOSED Output (all boxes on): id,status 1,ACTIVE 2,CLOSED Note: a control char in the HEADER (row 1) is NOT stripped — the header row is protected. Clean the header separately.
Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Control characters in the header row are not removed
PreservedThe first row is protected from stripping, so a null byte or control character in a header cell survives and can still break a column-name match. Clean the header separately with csv-find-replace, or remove the header, strip, and re-add it.
Turning Digits off destroys keys and dates
Data lossUnticking Digits removes every 0-9, so 2026-01-15 becomes --, prices lose their numbers, and primary/foreign keys are gutted. Always keep Digits on for SQL-bound data. To remove specific numeric noise, use csv-find-replace.
Works for PostgreSQL COPY too
SupportedPostgreSQL COPY also rejects null bytes (invalid byte sequence / unquoted carriage return). The same keep-list clean prepares a CSV for COPY exactly as it does for MySQL LOAD DATA INFILE.
Delimiter and quoting are not changed
By designThe stripper does not alter the field delimiter, line endings, or quote style. If your import needs a specific delimiter or quoting, handle that separately — this tool only deletes non-keep characters from cell values.
A legitimate `&` or `%` in data is deleted
Expected&, #, %, +, =, * are not in the kept punctuation set, so a value like R&D becomes RD and 50% becomes 50. If those symbols matter for the column, use csv-find-replace to target only the truly unwanted characters.
Output is UTF-8 regardless of target collation
By designDownloads are UTF-8. If your table or session uses latin1/utf8mb3, set the import CHARACTER SET to match, or convert the file after download. The tool offers no output-encoding choice.
NBSP between numbers joins them
ExpectedA non-breaking space (U+00A0) used as a thousands separator is removed (only regular U+0020 space is kept), so 1 299 becomes 1299 — which may actually be what you want for a numeric column, but verify. For controlled handling, replace NBSP first with csv-find-replace.
File over the free limit is blocked
BlockedFree is 2 MB / 500 rows; Pro is 100 MB / 100,000 rows. A bulk-import dump usually exceeds free — split with csv-row-splitter or upgrade before stripping.
Empty cells stay empty
PreservedA cell that is already empty has nothing to strip and is left as an empty string. The stripper does not turn empty strings into SQL NULL — configure NULL handling in your LOAD DATA statement (e.g. SET col = NULLIF(col,'')).
Quoted multi-line values lose their internal newlines
ExpectedA properly quoted cell containing a newline is parsed as one value, but the newline character is not in the keep-set, so it is removed and the lines concatenate. If you need to preserve in-cell line breaks for a TEXT column, this tool is not the right step — import without stripping that column.
Frequently asked questions
What causes null bytes in a CSV?
Common sources: binary data pasted into a text column, exports from legacy systems that use null-terminated strings, and some XML-to-CSV converters. The stripper removes null bytes because they are control characters outside every keep class.
Will this affect commas, quotes, and other delimiters?
Commas, periods, hyphens, slashes, parentheses, and straight quotes are in the kept punctuation set, so they survive. The tool does not change the file's delimiter or quoting structure — it only edits cell contents.
Does it work for PostgreSQL COPY imports too?
Yes. PostgreSQL COPY also rejects null bytes and stray carriage returns. The same clean prepares the file for COPY just as for MySQL LOAD DATA INFILE.
Why should I keep the Digits box checked?
Unticking Digits removes every 0-9, which destroys primary keys, foreign keys, prices, and dates. For SQL-bound data, always leave Digits (and Punctuation) on.
Does it remove emoji that break a utf8 column?
Yes. Emoji and other 4-byte characters are deleted, which avoids the 'Incorrect string value' error on a 3-byte utf8 column. Alternatively, migrate the column to utf8mb4 to keep emoji.
Can I strip only one column before import?
No — the tool strips all data cells. To scope the change to one column, isolate it with csv-column-filter first, or use csv-find-replace for targeted edits.
Does it convert empty strings to NULL?
No. Empty cells stay empty strings. Handle NULL conversion in your import statement, e.g. SET col = NULLIF(col,'').
What output encoding does it produce?
UTF-8. If your table uses latin1 or utf8mb3, set the import CHARACTER SET accordingly or convert the file after download.
Is the data uploaded anywhere?
No. Everything runs in your browser via PapaParse. Production data never leaves your machine.
What are the size and row limits?
Free: 2 MB and 500 data rows. Pro: 100 MB and 100,000 rows. Larger dumps are blocked at upload — split them first.
Will it fix control characters in my header row?
No — the header row is never modified. Clean the header separately with csv-find-replace, or strip the header off, run this tool, and re-attach the header.
Does it change line endings or delimiters?
No. It only deletes non-keep characters from cell values. Set delimiters, line endings, and quoting in your LOAD DATA / COPY statement.
Privacy first
Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.