Clean a Database CSV Export (psql, mysql, sqlite3) — Free Browser Tool

How to clean a database csv export

Step 1
Export from your database CLI — PostgreSQL: \copy table_name TO 'file.csv' WITH (FORMAT csv, HEADER true) — comma-delimited UTF-8. MySQL: SELECT * FROM table_name INTO OUTFILE '/tmp/file.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' — explicitly comma-delimited, otherwise defaults to tab. SQLite: .mode csv then .headers on then .output file.csv then your SELECT. mysqldump tab mode: mysqldump --tab=/tmp --fields-terminated-by=',' database_name — writes one .txt per table.
Step 2
Drop the file onto the cleaner above — PapaParse runs in your browser — database rows never reach a server. The cleaner sniffs encoding from the BOM (or absence) and delimiter from the first row. You don't have to tell it whether the file came from psql, mysql, sqlite3, or SQL Server bcp.
Step 3
For mysql output: confirm the delimiter detected — If you ran SELECT INTO OUTFILE without explicit FIELDS TERMINATED BY ',', the file is tab-separated (TSV with a .csv extension). The cleaner's delimiter: auto detects this from the first row. The output panel shows the detected delimiter — confirm it's \t rather than , before downloading if you intend to convert to comma-CSV. Use the Advanced → Delimiter override to force a specific value if auto-detection guessed wrong (rare on multi-column files).
Step 4
For psql output destined for Excel: enable BOM on output — Excel on Windows can't auto-detect UTF-8 without a BOM. psql's \copy writes UTF-8 without a BOM (good for programmatic use, bad for Excel). The cleaner's output adds a BOM by default — your accented column values (José, Müller) will display correctly when the cleaned CSV is double-clicked in Excel.
Step 5
For Windows-saved files going to Linux tools: choose LF line endings — If psql ran on Windows and produced \r\n line endings, downstream Linux/pandas pipelines may show \r artefacts ('value\r' instead of 'value'). Set outputLineEnding: lf to normalise to Unix. Conversely, for output going back to Windows / QuickBooks, leave the default crlf.
Step 6
Decide whether to keep NULL markers as `\N` or convert to empty — psql's text-format mode uses \N for NULL; CSV mode uses empty string. mysql uses \N in tab-format text files. sqlite3 always uses empty string in CSV mode. The cleaner doesn't auto-convert — it preserves whatever's in the cell. If your downstream tool needs empty-for-NULL, run csv-find-replace with find: \N, replace: empty, caseSensitive: true after cleaning. If it needs \N-for-NULL (a pg target table), leave them.

Per-database export shapes

What each major database CLI writes by default, the platform-specific overrides, and the cleaner option that addresses each quirk. Verified against the official documentation for PostgreSQL 18, MySQL 8.4, and SQLite (latest).

Database	Default delimiter	NULL representation	Encoding / BOM	Line endings	Common quirks
PostgreSQL `\copy ... CSV HEADER`	`,`	Empty string (unquoted) in CSV mode; `\N` in text mode	UTF-8, no BOM	`\n` on Linux/Mac; `\r\n` on Windows (server-side COPY); `\copy ... TO STDOUT` always `\n`	`CHAR(N)` columns pad with trailing spaces; embedded newlines in `TEXT` values are quote-wrapped per RFC 4180 (multi-line cells are valid)
MySQL `SELECT INTO OUTFILE`	`\t` (tab) by default — must specify `FIELDS TERMINATED BY ','` for true CSV	`\N` (per MySQL docs, mirrors text-format)	Server's character set — `utf8mb4` if configured, often `utf8` or `latin1` on older defaults	`\n` by default; explicitly settable via `LINES TERMINATED BY '\r\n'`	Fields not enclosed by default — strings containing the delimiter break rows. Use `OPTIONALLY ENCLOSED BY '"'` for quoting
MySQL `mysqldump --tab`	`\t`	`\N`	Per `--default-character-set`; older mysqldump defaults to `utf8` which can't dump `utf8mb4` correctly — known bug, must specify `--default-character-set=utf8mb4`	`\n`	Writes `.txt` per table (not `.csv`); paired with a separate `.sql` file containing CREATE TABLE
SQLite `.mode csv`	`,`	Empty string	UTF-8, no BOM (Windows users report Excel mojibake)	OS native (`\r\n` on Windows, `\n` on Linux/Mac)	In SQLite 3.53+, CLI flag order matters: `sqlite3 -csv -header` works, `sqlite3 -header -csv` produces no headers. For dot commands inside the REPL, the conventional order is `.mode csv` then `.headers on`
SQL Server `bcp` (queryout)	`\t` (tab) by default; `-t,` for comma	Empty string	UCS-2 / UTF-16 LE if `-w` flag, ASCII if `-c`, UTF-8 if `-c -t, -C 65001`	`\r\n`	Without `-c` flag, output is binary-format (not CSV). With `-c` it's text; quoting only via `-q` and specific FORMAT options
SQL Server `SQLCMD -E -Q ... -o`	Configurable via `-s` (column separator)	`NULL` (literal text) — confusing if downstream tools expect empty	UTF-16 LE by default	`\r\n`	Adds a separator-line of dashes between header and data rows — `skipRows: 1` after the header to remove it

What the cleaner does

Each cleaner option mapped to the database-export problem it solves. Single-pass — all selected options apply together.

Cleaner option	Database-export problem it solves	Default
`delimiter: auto`	MySQL default-tab vs PostgreSQL default-comma vs SQLite default-comma — sniffs from the first row so you don't have to specify per-file	auto
`inputEncoding: auto`	Detects UTF-8 with BOM (Excel-saved), UTF-8 no-BOM (psql, sqlite3, mysql defaults), UTF-16 LE (`bcp -w` from SQL Server). No flag needed for the common cases	auto
`outputLineEnding: crlf`	Convert psql-on-Linux `\n` output to `\r\n` for QuickBooks / Excel-on-Windows / Power BI ingestion	crlf
`outputLineEnding: lf`	Convert mysqldump-on-Windows `\r\n` to `\n` for awk / Python pandas / DuckDB / Linux pipeline ingestion	—
`trimWhitespace`	Strip trailing spaces from `CHAR(N)` column values (PostgreSQL pads `CHAR(10)` to exactly 10 chars even if the data is shorter)	On
`removeEmptyRows`	Drop rows that were empty in the source — common when a `UNION ALL` query left placeholder rows or a partial export was concatenated	On
`repairRows`	Pad short rows / trim long rows caused by `SELECT INTO OUTFILE` writing unquoted strings containing the delimiter character	On
`dedupMode: exact`	Collapse exact-row duplicates from a `UNION ALL` you should have made `UNION DISTINCT` — or accidentally re-running the same export and concatenating	exact (legacy default)
`dedupMode: trim`	Collapse duplicates that differ only by trailing-space padding from `CHAR(N)` columns	—
`normalizeSmartQuotes`	Rare for database exports — only useful if your `TEXT` columns contain user-entered content with curly quotes (e.g. a comments table populated by a Word-paste workflow)	Off
`normalizeHiddenWhitespace`	Rare — only useful if your TEXT columns received user input from web forms that injected CHAR(160) NBSPs	Off
`skipRows: 1`	SQL Server `SQLCMD` writes a separator-line of dashes between header and data — `skipRows: 1` removes it. Also useful for any custom export script that prepends a `-- generated by X` comment row	0

Cookbook

Real before/after rows from common database CLI outputs. Headers and values illustrative.

MySQL default-tab output mistaken for comma-CSV

Example

Running SELECT * FROM users INTO OUTFILE '/tmp/users.csv' writes a TSV (tab-separated values) — the .csv extension is just a filename choice; the delimiter is \t because MySQL defaults to tab. Excel double-click merges every column into row 1 because Excel uses comma as its default delimiter.

Source MySQL command (no FIELDS TERMINATED BY):
SELECT id, name, email FROM users
  INTO OUTFILE '/tmp/users.csv';

Actual file contents (TSV — note the tabs):
1\tJohn\tjohn@x.com
2\tJane\tjane@x.com

Excel double-click: every row appears as one merged cell.

Cleaner output (auto-detects tab delimiter, outputs comma-CSV):
id,name,email
1,John,john@x.com
2,Jane,jane@x.com

psql `\copy` on Windows: CRLF line endings break Linux pandas

Example

psql on Windows writes server-side files with CRLF line endings. pandas.read_csv() accepts either, but some text-only utilities (awk, simple split calls) see \r as part of the last column value. The cleaner's outputLineEnding: lf normalises.

Source (psql on Windows, written to a file):
id,name,email\r\n
1,John,john@x.com\r\n

Python pandas reads correctly (handles either ending), but:
  awk -F, '{print $3}' users.csv
  →  'john@x.com\r'   ← trailing carriage return is part of the value

Cleaner config: outputLineEnding: lf

Output:
id,name,email\n
1,John,john@x.com\n

PostgreSQL `CHAR(N)` column padded with trailing spaces

Example

CHAR(10) columns in PostgreSQL pad every value to exactly 10 characters with trailing spaces. \copy writes the padded form. Downstream string comparisons (WHERE code = 'ABC' vs the padded 'ABC ') fail until trimmed.

Source PostgreSQL table:
  CREATE TABLE products (code CHAR(10), name VARCHAR(100));
  INSERT INTO products VALUES ('ABC', 'Widget');

psql \copy output:
code,name
"ABC       ",Widget

Cleaner config: trimWhitespace: true (default)

Output:
code,name
ABC,Widget

SQLite UTF-8-no-BOM CSV opened in Excel-on-Windows

Example

SQLite's .mode csv writes UTF-8 without a BOM. Excel on Windows defaults to Windows-1252 and mojibakes any accented character. Same issue as Google Forms and Mailchimp — the data isn't corrupted, the encoding declaration is missing.

Source SQLite query result:
id,name
1,José Müller

Excel-on-Windows display (Windows-1252 fallback):
id,name
1,JosÃ© MÃ¼ller

Cleaner adds BOM on output → Excel auto-detects UTF-8 → displays correctly.

psql `\N` NULL marker survives cleaner unchanged

Example

PostgreSQL text-format COPY writes \N for NULL. CSV-format COPY writes unquoted empty string. The cleaner is text-only — it preserves whatever's in the cell. If your downstream tool needs empty-for-NULL, chain with csv-find-replace.

Source (psql in text-format COPY mode):
id\tname\temail
1\tJohn\t\N
2\tJane\tjane@x.com

Cleaner output (preserves `\N` verbatim):
id,name,email
1,John,\N
2,Jane,jane@x.com

Follow-up: csv-find-replace with find: \\N, replace: empty,
  caseSensitive: true → converts to empty-string NULL.

Embedded newline in `TEXT` column from unquoted MySQL output

Example

MySQL's SELECT INTO OUTFILE without OPTIONALLY ENCLOSED BY '"' doesn't quote strings. A TEXT value containing a newline becomes a row break — every subsequent column shifts up one row. The cleaner's repairRows: true pads/trims to header width but the underlying multi-line value needs manual review.

Source MySQL row:
  id=42, notes='Line 1\nLine 2', email='x@y.com'

Unquoted SELECT INTO OUTFILE writes:
42\tLine 1\nLine 2\tx@y.com

  ↓ Parser sees this as two rows:
42\tLine 1
Line 2\tx@y.com

Fix at source: use OPTIONALLY ENCLOSED BY '"' or switch to mysqldump.
Fix at cleaner: repairRows: true (default) pads short rows to
  header width — the data still needs manual review.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

MySQL `SELECT INTO OUTFILE` default-tab delimiter mistaken for CSV

MySQL default behaviour

MySQL's FIELDS TERMINATED BY defaults to \t (tab). A SELECT INTO OUTFILE 'file.csv' without explicit FIELDS TERMINATED BY ',' produces a TSV with a .csv extension. Excel double-click merges all columns. The cleaner's delimiter: auto detects this; force delimiter: ',' if your downstream needs comma-CSV.

MySQL `SELECT INTO OUTFILE` without `OPTIONALLY ENCLOSED BY '"'`

Unquoted output

By default, MySQL writes string values without enclosure. Any value containing the field delimiter or a newline breaks the row. Always use OPTIONALLY ENCLOSED BY '"' (or just ENCLOSED BY '"' if you don't mind quoting numeric fields too) for safe CSV output. The cleaner can repair some damage via repairRows but multi-line embedded values need manual review.

`mysqldump --default-character-set=utf8` corrupts `utf8mb4` data

Known MySQL bug (#71746)

Older mysqldump versions default to utf8 (which is actually MySQL's 3-byte alias for UTF-8 BMP only — emoji and many CJK chars need 4-byte utf8mb4). If your tables are utf8mb4 but mysqldump uses utf8, 4-byte characters get replaced with ? permanently. Always specify --default-character-set=utf8mb4 for utf8mb4 tables. The cleaner can't recover characters that were truncated at the dump step.

PostgreSQL `\copy` on Windows produces `\r\n`, breaks Linux awk

Cross-platform line endings

Server-side COPY writes \n on Linux and \r\n on Windows. Client-side \copy ... TO STDOUT always writes \n. If a Windows analyst exported via \copy directly to a file (not stdout), downstream awk -F, shells see \r as part of the last column. The cleaner's outputLineEnding: lf normalises to \n.

SQLite CLI flag order: `sqlite3 -header -csv` produces no headers

SQLite 3.53+ behaviour change

Confirmed in the SQLite forum: starting in 3.53.0, the order of CLI flags now matters. sqlite3 -csv -header db.db < query.sql outputs headers correctly; sqlite3 -header -csv db.db < query.sql does not. The workaround is to always put -csv before -header in scripts. For dot commands inside the REPL, the conventional order is .mode csv then .headers on. The cleaner can't recover a missing header — re-run the export with the corrected flag order.

PostgreSQL `\N` NULL marker in CSV-mode output

Configuration

psql's \copy ... CSV writes empty string for NULL. psql's \copy ... CSV NULL '\N' writes the literal \N. Different configs produce different output for the same data — be explicit about which one you used. The cleaner preserves the cell content verbatim; conversion between the two needs csv-find-replace.

PostgreSQL `COPY FROM` rejects mixed line endings

Re-import strictness

Per PostgreSQL docs: 'COPY FROM will complain if the line endings in the input are not all alike.' If a file has some rows ending in \n and others in \r\n (common after manual edits in different editors), re-import fails. The cleaner's outputLineEnding: crlf (or lf) normalises all rows to one ending.

SQL Server `SQLCMD` writes a separator line of dashes between header and data

Pretty-printing artefact

SQLCMD -E -Q '...' -o output.csv writes a line of - characters between the header row and the data rows. This breaks any standard CSV parser. skipRows: 1 after the header drops the separator line. Better: use the -s, flag with -W (no trailing whitespace) and -h -1 (no header repeat) — or use bcp queryout for cleaner output.

SQL Server `bcp -w` writes UTF-16 LE

Encoding non-default

bcp queryout file.csv -w produces a UTF-16 Little-Endian file with a BOM. The .csv extension is misleading — Excel and pandas both default to assuming UTF-8 and may misread. The cleaner's inputEncoding: auto detects the UTF-16 BOM and decodes correctly; the output is UTF-8 (with BOM if outputLineEnding: crlf).

Embedded carriage returns in TEXT values escape as visible `\r`

psql text-mode escaping

PostgreSQL's text-format COPY escapes embedded \r and \n as the two-character sequences \r and \n. CSV-format COPY quote-wraps the entire field and embeds the raw newline. If you see literal \r in your data after a text-mode COPY, the data isn't corrupted — psql preserved the value verbatim. Re-export with WITH (FORMAT csv) to get standard CSV quoting.

Frequently asked questions

Why does my MySQL `SELECT INTO OUTFILE` produce a tab-delimited file?

Because MySQL's default for FIELDS TERMINATED BY is \t (tab), not comma. Even when you name the output file .csv, the contents are TSV. Two fixes: (1) add FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' to the export to produce true CSV; (2) drop the existing file onto the cleaner above — delimiter: auto detects the tab and the output is comma-delimited.

Why does my `psql \copy` output show `\r` characters in Linux tools?

Server-side COPY on Windows writes CRLF (\r\n) line endings. When you grep, awk, or column-slice that file on Linux, the \r becomes part of the last column's value. Three fixes: (1) use \copy ... TO STDOUT > file.csv instead of \copy ... TO 'file.csv' — STDOUT always writes \n regardless of platform; (2) run dos2unix file.csv after the export; (3) drop the file onto the cleaner with outputLineEnding: lf.

How do I handle `\N` NULL markers from psql?

Depends on what your downstream tool expects. PostgreSQL's text-format COPY writes \N for NULL; CSV-format COPY writes empty string. Downstream pandas accepts either via na_values=['\\N']. Downstream re-import to PostgreSQL with \copy ... CSV NULL '\N' accepts \N. For BI tools and Excel, convert to empty: chain the cleaner with csv-find-replace (find: \\N, replace: empty, caseSensitive: true). For schema-aware downstreams (DuckDB, BigQuery), keep \N if their loader recognises it.

Will the cleaner damage data types like dates, booleans, JSON?

No. The cleaner treats every cell as a string from input to output. Dates stay as the string the database emitted (2026-05-15 or 2026-05-15 14:32:00+00:00). Booleans stay as t/f (psql), 1/0 (mysql), or whatever your CAST chose. JSON column values stay as the literal JSON string. There's no type coercion, no precision loss, no locale-based date reformatting.

Why is my emoji column showing `?` after a mysqldump round-trip?

MySQL bug #71746: older mysqldump defaults to --default-character-set=utf8 which is MySQL's 3-byte alias (BMP only). Emoji are 4-byte characters that need utf8mb4. The emoji are replaced with ? at the dump step — the data on disk is already corrupted before the cleaner sees it. Re-run mysqldump with --default-character-set=utf8mb4. The cleaner can't recover lost characters.

Can I clean a SQL Server `bcp` output with this?

Yes, but mind the flags. bcp ... queryout file.csv -c -t, writes ASCII text with comma delimiter — cleans straightforwardly. bcp ... queryout file.csv -w -t, writes UTF-16 LE with BOM — the cleaner's inputEncoding: auto detects this and decodes to UTF-8. Native-format bcp (without -c or -w) is binary and can't be cleaned as CSV — re-run with text mode.

How does the cleaner handle multi-line values inside a TEXT/VARCHAR column?

Correctly, if the source CSV quote-wrapped them per RFC 4180. PostgreSQL's CSV-format COPY and mysqldump (with proper enclosure flags) both do this — multi-line values appear as "Line 1\nLine 2" and the cleaner's parser keeps the embedded newline inside the cell. The breakage case is MySQL SELECT INTO OUTFILE without OPTIONALLY ENCLOSED BY '"' — unquoted multi-line values produce row-misalignment that no parser can fully recover from. repairRows: true pads/trims to header width but the underlying value still needs manual review.

Why does my SQLite CSV export have no header row?

Most likely a CLI flag-order issue in SQLite 3.53.0+. sqlite3 -csv -header db.db < query.sql outputs headers correctly; sqlite3 -header -csv db.db < query.sql does not. Documented in the SQLite forum as a behaviour change introduced in 3.53. Fix: put -csv before -header in your script. For dot commands inside the REPL, run .mode csv first, then .headers on, then .output file.csv, then your SELECT. The cleaner can't recover a missing header — re-run the export.

Can I dedupe rows by a single column (e.g. primary key) instead of whole-row match?

Not with csv-cleaner — its dedup is whole-row. For column-based dedup use csv-deduplicator which supports a columns config and strategy: first | last. Alternatively, dedup at the source: re-run your query with SELECT DISTINCT (psql/mysql/sqlite) or GROUP BY primary_key with appropriate aggregates.

Will database PII be uploaded to JAD Apps?

No. PapaParse runs entirely in your browser. Database rows — customer names, addresses, financial values — never reach a server. The only thing saved server-side is a single counter (file processed, no content) for signed-in dashboard stats. You can opt out in account settings. For corporate compliance, every modern database export tool offers an in-VPC option (RDS Data API, psql via SSH tunnel, etc.) so the file can go directly from database to your machine without intermediate exposure.

How large a database export can the cleaner handle?

Free tier caps at 2 MB — roughly 8k–30k rows depending on column count and value length. Pro removes the limit, with a practical ceiling around 5–10M rows determined by browser memory. For multi-million-row tables, the database-side fix is faster: re-run with a smaller LIMIT for testing, then run the production export through the cleaner via the runner API (POST 127.0.0.1:9789/v1/tools/csv-cleaner/run) to stream through without the browser-memory ceiling.

Can I run this in a CI pipeline before loading data into a warehouse?

Yes — GET /api/v1/tools/csv-cleaner returns the 11-option schema; pair the @jadapps/runner once on your CI runner and POST the payload to 127.0.0.1:9789/v1/tools/csv-cleaner/run. Common pipeline: nightly pg_dump --table → cleaner (outputLineEnding: lf, delimiter: auto, trimWhitespace: true) → load into Snowflake / BigQuery / Redshift via their CSV ingest. Database rows never reach JAD's servers — the cleaner runs locally on your CI runner.

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

How to clean a database csv export

Step 1
Export from your database CLI — PostgreSQL: \copy table_name TO 'file.csv' WITH (FORMAT csv, HEADER true) — comma-delimited UTF-8. MySQL: SELECT * FROM table_name INTO OUTFILE '/tmp/file.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' — explicitly comma-delimited, otherwise defaults to tab. SQLite: .mode csv then .headers on then .output file.csv then your SELECT. mysqldump tab mode: mysqldump --tab=/tmp --fields-terminated-by=',' database_name — writes one .txt per table.
Step 2
Drop the file onto the cleaner above — PapaParse runs in your browser — database rows never reach a server. The cleaner sniffs encoding from the BOM (or absence) and delimiter from the first row. You don't have to tell it whether the file came from psql, mysql, sqlite3, or SQL Server bcp.
Step 3
For mysql output: confirm the delimiter detected — If you ran SELECT INTO OUTFILE without explicit FIELDS TERMINATED BY ',', the file is tab-separated (TSV with a .csv extension). The cleaner's delimiter: auto detects this from the first row. The output panel shows the detected delimiter — confirm it's \t rather than , before downloading if you intend to convert to comma-CSV. Use the Advanced → Delimiter override to force a specific value if auto-detection guessed wrong (rare on multi-column files).
Step 4
For psql output destined for Excel: enable BOM on output — Excel on Windows can't auto-detect UTF-8 without a BOM. psql's \copy writes UTF-8 without a BOM (good for programmatic use, bad for Excel). The cleaner's output adds a BOM by default — your accented column values (José, Müller) will display correctly when the cleaned CSV is double-clicked in Excel.
Step 5
For Windows-saved files going to Linux tools: choose LF line endings — If psql ran on Windows and produced \r\n line endings, downstream Linux/pandas pipelines may show \r artefacts ('value\r' instead of 'value'). Set outputLineEnding: lf to normalise to Unix. Conversely, for output going back to Windows / QuickBooks, leave the default crlf.
Step 6
Decide whether to keep NULL markers as `\N` or convert to empty — psql's text-format mode uses \N for NULL; CSV mode uses empty string. mysql uses \N in tab-format text files. sqlite3 always uses empty string in CSV mode. The cleaner doesn't auto-convert — it preserves whatever's in the cell. If your downstream tool needs empty-for-NULL, run csv-find-replace with find: \N, replace: empty, caseSensitive: true after cleaning. If it needs \N-for-NULL (a pg target table), leave them.

Per-database export shapes

Database	Default delimiter	NULL representation	Encoding / BOM	Line endings	Common quirks
PostgreSQL `\copy ... CSV HEADER`	`,`	Empty string (unquoted) in CSV mode; `\N` in text mode	UTF-8, no BOM	`\n` on Linux/Mac; `\r\n` on Windows (server-side COPY); `\copy ... TO STDOUT` always `\n`	`CHAR(N)` columns pad with trailing spaces; embedded newlines in `TEXT` values are quote-wrapped per RFC 4180 (multi-line cells are valid)
MySQL `SELECT INTO OUTFILE`	`\t` (tab) by default — must specify `FIELDS TERMINATED BY ','` for true CSV	`\N` (per MySQL docs, mirrors text-format)	Server's character set — `utf8mb4` if configured, often `utf8` or `latin1` on older defaults	`\n` by default; explicitly settable via `LINES TERMINATED BY '\r\n'`	Fields not enclosed by default — strings containing the delimiter break rows. Use `OPTIONALLY ENCLOSED BY '"'` for quoting
MySQL `mysqldump --tab`	`\t`	`\N`	Per `--default-character-set`; older mysqldump defaults to `utf8` which can't dump `utf8mb4` correctly — known bug, must specify `--default-character-set=utf8mb4`	`\n`	Writes `.txt` per table (not `.csv`); paired with a separate `.sql` file containing CREATE TABLE
SQLite `.mode csv`	`,`	Empty string	UTF-8, no BOM (Windows users report Excel mojibake)	OS native (`\r\n` on Windows, `\n` on Linux/Mac)	In SQLite 3.53+, CLI flag order matters: `sqlite3 -csv -header` works, `sqlite3 -header -csv` produces no headers. For dot commands inside the REPL, the conventional order is `.mode csv` then `.headers on`
SQL Server `bcp` (queryout)	`\t` (tab) by default; `-t,` for comma	Empty string	UCS-2 / UTF-16 LE if `-w` flag, ASCII if `-c`, UTF-8 if `-c -t, -C 65001`	`\r\n`	Without `-c` flag, output is binary-format (not CSV). With `-c` it's text; quoting only via `-q` and specific FORMAT options
SQL Server `SQLCMD -E -Q ... -o`	Configurable via `-s` (column separator)	`NULL` (literal text) — confusing if downstream tools expect empty	UTF-16 LE by default	`\r\n`	Adds a separator-line of dashes between header and data rows — `skipRows: 1` after the header to remove it

What the cleaner does

Each cleaner option mapped to the database-export problem it solves. Single-pass — all selected options apply together.

Cleaner option	Database-export problem it solves	Default
`delimiter: auto`	MySQL default-tab vs PostgreSQL default-comma vs SQLite default-comma — sniffs from the first row so you don't have to specify per-file	auto
`inputEncoding: auto`	Detects UTF-8 with BOM (Excel-saved), UTF-8 no-BOM (psql, sqlite3, mysql defaults), UTF-16 LE (`bcp -w` from SQL Server). No flag needed for the common cases	auto
`outputLineEnding: crlf`	Convert psql-on-Linux `\n` output to `\r\n` for QuickBooks / Excel-on-Windows / Power BI ingestion	crlf
`outputLineEnding: lf`	Convert mysqldump-on-Windows `\r\n` to `\n` for awk / Python pandas / DuckDB / Linux pipeline ingestion	—
`trimWhitespace`	Strip trailing spaces from `CHAR(N)` column values (PostgreSQL pads `CHAR(10)` to exactly 10 chars even if the data is shorter)	On
`removeEmptyRows`	Drop rows that were empty in the source — common when a `UNION ALL` query left placeholder rows or a partial export was concatenated	On
`repairRows`	Pad short rows / trim long rows caused by `SELECT INTO OUTFILE` writing unquoted strings containing the delimiter character	On
`dedupMode: exact`	Collapse exact-row duplicates from a `UNION ALL` you should have made `UNION DISTINCT` — or accidentally re-running the same export and concatenating	exact (legacy default)
`dedupMode: trim`	Collapse duplicates that differ only by trailing-space padding from `CHAR(N)` columns	—
`normalizeSmartQuotes`	Rare for database exports — only useful if your `TEXT` columns contain user-entered content with curly quotes (e.g. a comments table populated by a Word-paste workflow)	Off
`normalizeHiddenWhitespace`	Rare — only useful if your TEXT columns received user input from web forms that injected CHAR(160) NBSPs	Off
`skipRows: 1`	SQL Server `SQLCMD` writes a separator-line of dashes between header and data — `skipRows: 1` removes it. Also useful for any custom export script that prepends a `-- generated by X` comment row	0

Cookbook

Real before/after rows from common database CLI outputs. Headers and values illustrative.

MySQL default-tab output mistaken for comma-CSV

Example

Source MySQL command (no FIELDS TERMINATED BY):
SELECT id, name, email FROM users
  INTO OUTFILE '/tmp/users.csv';

Actual file contents (TSV — note the tabs):
1\tJohn\tjohn@x.com
2\tJane\tjane@x.com

Excel double-click: every row appears as one merged cell.

Cleaner output (auto-detects tab delimiter, outputs comma-CSV):
id,name,email
1,John,john@x.com
2,Jane,jane@x.com

psql `\copy` on Windows: CRLF line endings break Linux pandas

Example

Source (psql on Windows, written to a file):
id,name,email\r\n
1,John,john@x.com\r\n

Python pandas reads correctly (handles either ending), but:
  awk -F, '{print $3}' users.csv
  →  'john@x.com\r'   ← trailing carriage return is part of the value

Cleaner config: outputLineEnding: lf

Output:
id,name,email\n
1,John,john@x.com\n

PostgreSQL `CHAR(N)` column padded with trailing spaces

Example

Source PostgreSQL table:
  CREATE TABLE products (code CHAR(10), name VARCHAR(100));
  INSERT INTO products VALUES ('ABC', 'Widget');

psql \copy output:
code,name
"ABC       ",Widget

Cleaner config: trimWhitespace: true (default)

Output:
code,name
ABC,Widget

SQLite UTF-8-no-BOM CSV opened in Excel-on-Windows

Example

Source SQLite query result:
id,name
1,José Müller

Excel-on-Windows display (Windows-1252 fallback):
id,name
1,JosÃ© MÃ¼ller

Cleaner adds BOM on output → Excel auto-detects UTF-8 → displays correctly.

psql `\N` NULL marker survives cleaner unchanged

Example

Source (psql in text-format COPY mode):
id\tname\temail
1\tJohn\t\N
2\tJane\tjane@x.com

Cleaner output (preserves `\N` verbatim):
id,name,email
1,John,\N
2,Jane,jane@x.com

Follow-up: csv-find-replace with find: \\N, replace: empty,
  caseSensitive: true → converts to empty-string NULL.

Embedded newline in `TEXT` column from unquoted MySQL output

Example

Source MySQL row:
  id=42, notes='Line 1\nLine 2', email='x@y.com'

Unquoted SELECT INTO OUTFILE writes:
42\tLine 1\nLine 2\tx@y.com

  ↓ Parser sees this as two rows:
42\tLine 1
Line 2\tx@y.com

Fix at source: use OPTIONALLY ENCLOSED BY '"' or switch to mysqldump.
Fix at cleaner: repairRows: true (default) pads short rows to
  header width — the data still needs manual review.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

MySQL `SELECT INTO OUTFILE` default-tab delimiter mistaken for CSV

MySQL default behaviour

MySQL `SELECT INTO OUTFILE` without `OPTIONALLY ENCLOSED BY '"'`

Unquoted output

`mysqldump --default-character-set=utf8` corrupts `utf8mb4` data

Known MySQL bug (#71746)

PostgreSQL `\copy` on Windows produces `\r\n`, breaks Linux awk

Cross-platform line endings

SQLite CLI flag order: `sqlite3 -header -csv` produces no headers

SQLite 3.53+ behaviour change

PostgreSQL `\N` NULL marker in CSV-mode output

Configuration

PostgreSQL `COPY FROM` rejects mixed line endings

Re-import strictness

SQL Server `SQLCMD` writes a separator line of dashes between header and data

Pretty-printing artefact

SQL Server `bcp -w` writes UTF-16 LE

Encoding non-default

Embedded carriage returns in TEXT values escape as visible `\r`

psql text-mode escaping

Frequently asked questions

Why does my MySQL `SELECT INTO OUTFILE` produce a tab-delimited file?

Why does my `psql \copy` output show `\r` characters in Linux tools?

How do I handle `\N` NULL markers from psql?

Will the cleaner damage data types like dates, booleans, JSON?

Why is my emoji column showing `?` after a mysqldump round-trip?

Can I clean a SQL Server `bcp` output with this?

How does the cleaner handle multi-line values inside a TEXT/VARCHAR column?

Why does my SQLite CSV export have no header row?

Can I dedupe rows by a single column (e.g. primary key) instead of whole-row match?

Will database PII be uploaded to JAD Apps?

How large a database export can the cleaner handle?

Can I run this in a CI pipeline before loading data into a warehouse?

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

Clean a Database CSV Export

How to clean a database csv export

Per-database export shapes

What the cleaner does

Cookbook

MySQL default-tab output mistaken for comma-CSV

psql `\copy` on Windows: CRLF line endings break Linux pandas

PostgreSQL `CHAR(N)` column padded with trailing spaces

SQLite UTF-8-no-BOM CSV opened in Excel-on-Windows

psql `\N` NULL marker survives cleaner unchanged

Embedded newline in `TEXT` column from unquoted MySQL output

Errors and edge cases

MySQL `SELECT INTO OUTFILE` default-tab delimiter mistaken for CSV

MySQL `SELECT INTO OUTFILE` without `OPTIONALLY ENCLOSED BY '"'`

`mysqldump --default-character-set=utf8` corrupts `utf8mb4` data

PostgreSQL `\copy` on Windows produces `\r\n`, breaks Linux awk

SQLite CLI flag order: `sqlite3 -header -csv` produces no headers

PostgreSQL `\N` NULL marker in CSV-mode output

PostgreSQL `COPY FROM` rejects mixed line endings

SQL Server `SQLCMD` writes a separator line of dashes between header and data

SQL Server `bcp -w` writes UTF-16 LE

Embedded carriage returns in TEXT values escape as visible `\r`

Frequently asked questions

Why does my MySQL `SELECT INTO OUTFILE` produce a tab-delimited file?

Why does my `psql \copy` output show `\r` characters in Linux tools?

How do I handle `\N` NULL markers from psql?

Will the cleaner damage data types like dates, booleans, JSON?

Why is my emoji column showing `?` after a mysqldump round-trip?

Can I clean a SQL Server `bcp` output with this?

How does the cleaner handle multi-line values inside a TEXT/VARCHAR column?

Why does my SQLite CSV export have no header row?

Can I dedupe rows by a single column (e.g. primary key) instead of whole-row match?

Will database PII be uploaded to JAD Apps?

How large a database export can the cleaner handle?

Can I run this in a CI pipeline before loading data into a warehouse?

Privacy first

Related guides

Clean a Database CSV Export

How to clean a database csv export

Per-database export shapes

What the cleaner does

Cookbook

MySQL default-tab output mistaken for comma-CSV

psql `\copy` on Windows: CRLF line endings break Linux pandas

PostgreSQL `CHAR(N)` column padded with trailing spaces

SQLite UTF-8-no-BOM CSV opened in Excel-on-Windows

psql `\N` NULL marker survives cleaner unchanged

Embedded newline in `TEXT` column from unquoted MySQL output

Errors and edge cases

MySQL `SELECT INTO OUTFILE` default-tab delimiter mistaken for CSV

MySQL `SELECT INTO OUTFILE` without `OPTIONALLY ENCLOSED BY '"'`

`mysqldump --default-character-set=utf8` corrupts `utf8mb4` data

PostgreSQL `\copy` on Windows produces `\r\n`, breaks Linux awk

SQLite CLI flag order: `sqlite3 -header -csv` produces no headers

PostgreSQL `\N` NULL marker in CSV-mode output

PostgreSQL `COPY FROM` rejects mixed line endings

SQL Server `SQLCMD` writes a separator line of dashes between header and data

SQL Server `bcp -w` writes UTF-16 LE

Embedded carriage returns in TEXT values escape as visible `\r`

Frequently asked questions

Why does my MySQL `SELECT INTO OUTFILE` produce a tab-delimited file?

Why does my `psql \copy` output show `\r` characters in Linux tools?

How do I handle `\N` NULL markers from psql?

Will the cleaner damage data types like dates, booleans, JSON?

Why is my emoji column showing `?` after a mysqldump round-trip?

Can I clean a SQL Server `bcp` output with this?

How does the cleaner handle multi-line values inside a TEXT/VARCHAR column?

Why does my SQLite CSV export have no header row?

Can I dedupe rows by a single column (e.g. primary key) instead of whole-row match?

Will database PII be uploaded to JAD Apps?

How large a database export can the cleaner handle?

Can I run this in a CI pipeline before loading data into a warehouse?

Privacy first

Related guides