Clean a Survey-Response CSV (Google Forms, Typeform, SurveyMonkey)

How to clean a survey-response csv

Step 1
Export from your survey platform — Google Forms: Responses tab → ⋮ → Download responses (.csv). Typeform: Results → Responses → Download all responses → CSV. SurveyMonkey: Analyze Results → Save As → All response data → CSV; pick the Condensed Columns layout for the simplest shape. Microsoft Forms: Responses → Open in Excel → File → Save As → CSV UTF-8.
Step 2
Drop the file onto the cleaner above — Everything runs in your browser via PapaParse. The cleaner sniffs encoding from the BOM (or its absence) and the delimiter from the first row — you don't have to tell it which platform the file came from. Survey PII (emails, free-text answers, IP addresses) never reaches a server.
Step 3
Pick the dedup mode for your data — Exact: only collapses byte-identical duplicate rows. Trim: collapses rows that differ only in leading/trailing whitespace. Case-insensitive: collapses on lowercase comparison — Sue@x.com matches sue@x.com. Normalized: trim + lowercase + CHAR(160) fold all applied to the comparison key; output cells keep their original casing. For survey email-based dedup, case-insensitive is the sane default. Leave it off if your survey allowed anonymous repeat submissions intentionally.
Step 4
Enable Advanced → smart-quote + hidden-whitespace if you see textareas in your form — Free-text answer columns are where curly quotes and NBSPs hide. Mobile keyboards autocorrect " to “ and ”; pasted answers from email/Word/Pages bring those plus CHAR(160) NBSPs. Both options are off by default to avoid changing data unnecessarily — turn them on when your form had any textarea / paragraph question.
Step 5
Set the output line-ending for your downstream tool — CRLF for Excel-on-Windows, QuickBooks, Power BI, most legacy importers. LF for Mac terminals, Python pandas (handles either), Linux awk pipelines. Preserve keeps whatever the input had — useful when the file is going back into the same system.
Step 6
Download and verify the row count — The output panel shows: rows in, rows out, duplicates removed, smart quotes normalized, hidden-whitespace cells folded, rows skipped from top, BOM detected (yes/no), encoding detected (utf8/utf16le/utf16be), delimiter detected (,/\t/;). Cross-check the rows-out count against the platform's reported total responses number before importing to your downstream tool.

Per-platform export quirks

What each survey platform writes when you click Export, and the specific cleaner option that addresses each quirk. Information is current as of May 2026.

Platform	Encoding	BOM	Delimiter	Quirks	Cleaner option(s)
Google Forms	UTF-8	No — Excel on Windows mojibakes non-ASCII chars	`,`	Always exports as `Form Responses 1.csv` with the form's name as a prefix; column order matches the question order; multi-select columns hold semicolon-separated values in a single cell	`outputLineEnding: crlf` for Excel; cleaner output adds the missing BOM
Typeform	UTF-8	Yes (newer exports)	`,`	Forms created before June 2025 have no `Ending` column. Multiple-choice with multi-select shows answer choices as separate columns rather than one cell. Some legacy exports prepend a single metadata row (`# Exported from typeform.com on ...`) before the question headers	`skipRows: 1` for the legacy metadata row; `inputEncoding: auto` handles either BOM state
SurveyMonkey (Standard)	UTF-8 with BOM	Yes	`,`	Two header rows by default (question text + answer-choice subhead); Respondent ID is a 16-digit number that displays as scientific notation in Excel if opened by double-click	`skipRows: 1` to drop the second header; cleaner preserves the 16-digit Respondent ID as text
SurveyMonkey (Condensed)	UTF-8 with BOM	Yes	`,`	Single header row; multi-select stored as comma-separated values inside one cell (use `csv-column-splitter` to separate after cleaning)	Same as Standard; no `skipRows` needed
Microsoft Forms	UTF-8	Yes (when exported via Excel → Save As CSV UTF-8)	`,`	Submission timestamp uses the form owner's local timezone — different respondents reading the same Excel file may see different times based on display locale	`outputLineEnding: crlf` for Excel; date fields untouched (cleaner is text-only)
Jotform	UTF-8 with BOM	Yes	`,`	Multi-row addresses are joined with `\n` literals inside one cell — quote-aware parser handles these; `trimWhitespace` only trims cell boundaries, not internal newlines	Default options
Tally	UTF-8 with BOM	Yes	`,`	Includes a Submission ID column that's a 24-character ULID — text-safe in Excel because it's not all-digits	Default options

What the cleaner does

Every cleaner option mapped to the survey-specific problem it solves. The cleaner is a single pass — all selected options apply together.

Cleaner option	Survey-data problem it solves	Default
`trimWhitespace`	Mobile keyboards add a trailing space on autocomplete; trim removes it so dedup catches `john@x.com` matching `john@x.com`	On
`removeEmptyRows`	Optional skipped questions leave blank cells; rows where every cell is blank (a respondent who started but submitted nothing) get dropped	On
`repairRows`	Pads short rows / truncates long rows to the header width. Triggered when an unquoted comma in a free-text answer broke a row	On
`dedupMode: case-insensitive`	Email-based dedup that catches `Sue@gmail.com` matching `sue@gmail.com`. For survey responses where users submitted twice — once on each device — this is the right default	Off (must enable)
`dedupMode: normalized`	Trim + lowercase + CHAR(160) fold applied to the comparison key only. Catches `Sue@gmail.com` (trailing space) matching `sue@gmail.com`. The output cells keep their original form	Off (must enable)
`skipRows: 1`	Strip the metadata row Typeform legacy exports prepend, or the second header row SurveyMonkey Standard layout writes	0 (none)
`inputEncoding: auto`	Detects UTF-8 with/without BOM, UTF-16 LE, UTF-16 BE from the file's first bytes — handles all five major platforms automatically	auto
`delimiter: auto`	Sniffs comma vs semicolon (some EU-locale Typeform exports use semicolon) vs tab from the first 4 KB	auto
`normalizeSmartQuotes`	Converts `“ ” ‘ ’ – —` from mobile autocorrect and Word/Pages paste to ASCII straight quotes / hyphens. Preserves the words, fixes the encoding	Off (must enable)
`normalizeHiddenWhitespace`	Folds CHAR(160) NBSP and U+200B zero-width space (both common from paste-from-mobile and paste-from-webpage) to regular space; collapses internal runs to single space	Off (must enable)
`outputLineEnding: crlf`	Windows / QuickBooks / Power BI line endings. `lf` for Mac terminals / pandas; `preserve` to keep input format	crlf

Cookbook

Real before/after rows from real survey exports. Headers shortened for brevity.

Google Forms response with mojibake when opened in Excel-on-Windows

Example

Google Forms exports UTF-8 without a BOM. Excel on Windows defaults to Windows-1252 and reads é as the two-byte sequence Ã©. The data is fine — the encoding declaration is missing. Cleaner output adds a BOM that Excel auto-detects.

Input (Google Forms UTF-8, no BOM, opened in Excel-on-Windows):
Name,Comment
José Smith,Bonjour à tous

  ↓ Excel-on-Windows displays as:
JosÃ© Smith,Bonjour Ã  tous

Cleaner output (UTF-8 BOM added):
Name,Comment
José Smith,Bonjour à tous   ← Excel reads correctly

Typeform legacy export with prepended metadata row

Example

Some pre-2025 Typeform exports prepend a single metadata row before the actual question headers. The cleaner's skipRows: 1 drops it so row 1 of the output is the real header.

Input (Typeform legacy):
# Exported from typeform.com on 2026-05-15
Submit Date,Respondent,Q1: Name,Q2: Email
2026-05-15,abc123,John Smith,john@x.com

Cleaner config: skipRows: 1

Output:
Submit Date,Respondent,Q1: Name,Q2: Email
2026-05-15,abc123,John Smith,john@x.com

Free-text answer with curly quotes from mobile autocorrect

Example

Mobile keyboards autocorrect straight " to curly “ ”. The data is the same; the encoding isn't. Downstream tools that do exact text matching (BI dashboards, NLP pipelines) will treat "frustrated" and “frustrated” as different strings. normalizeSmartQuotes folds them to ASCII.

Input:
Email,Feedback
user@x.com,"The “frustrated” user said “I don't think this works”"

Cleaner config: normalizeSmartQuotes: true

Output:
Email,Feedback
user@x.com,"The ""frustrated"" user said ""I don't think this works"""

Email duplicates with case + whitespace differences

Example

Respondent filled the form twice — once from desktop (John.Smith@Gmail.com) and once from mobile (john.smith@gmail.com with trailing space from autocomplete). Exact dedup misses both. case-insensitive mode catches the case difference; normalized mode catches case + whitespace + NBSP.

Input:
Email,Response
John.Smith@Gmail.com,Yes
john.smith@gmail.com ,Yes

Cleaner config: dedupMode: normalized

Output (first occurrence kept):
Email,Response
John.Smith@Gmail.com,Yes

CHAR(160) NBSP from a paste-from-email answer

Example

Respondent copied their support ticket from email and pasted it into a textarea. Outlook/Gmail/Apple Mail wrap text with CHAR(160) NBSPs between words — invisible to the eye, visible to any string-matching pipeline. normalizeHiddenWhitespace folds them to regular space.

Input (with NBSPs shown as · for clarity — they're U+00A0 in the actual file):
Email,Description
user@x.com,The·error·says·"Cannot·connect·to·server"

Cleaner config: normalizeHiddenWhitespace: true

Output:
Email,Description
user@x.com,The error says "Cannot connect to server"

Empty rows from respondents who started but didn't submit

Example

Most survey platforms record an empty row when a respondent loads the form but doesn't submit. Skip-the-empties cleanup removes these silently.

Input:
Email,Response
user1@x.com,Yes
,
,
user2@x.com,No

Cleaner config: removeEmptyRows: true (default)

Output:
Email,Response
user1@x.com,Yes
user2@x.com,No

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

`é` displays as `Ã©` after opening Google Forms CSV in Excel

Encoding mismatch

Google Forms exports UTF-8 without a BOM. Excel on Windows defaults to Windows-1252 when no encoding is declared. Cause is not the data — the file is correctly UTF-8. Two fixes: (1) re-save through Google Sheets (which writes UTF-8 with smarter Excel handling); (2) open via Excel's Data → Get Data → From File → From Text/CSV and manually pick UTF-8. The cleaner adds a BOM on output so future opens auto-detect.

SurveyMonkey Respondent ID displays as scientific notation in Excel

Excel precision corruption

SurveyMonkey writes Respondent IDs as 16-digit integers. Excel coerces 12+ digit numbers to scientific notation (1.23457E+15) and 16+ digit values get digit 16 replaced with 0 — permanently after save. SurveyMonkey's docs recommend using the Text Import Wizard with the Respondent ID column set to Text. The cleaner treats every cell as a string so the ID round-trips correctly — but if Excel was used between download and cleaner, the damage is done.

Typeform multi-select questions have one column per choice instead of one cell

Schema choice

Typeform's CSV export writes multi-select (multiple-choice with multi-select enabled) as separate columns, one per answer choice, with Yes/No or 1/0 values. The question text itself doesn't appear as a column header. If your downstream tool expects a single multi-value cell, you'll need csv-column-merger after the cleaner. For most survey-analysis tools, the separate-column shape is preferred.

Comma in a free-text answer breaks the row

Quote-stripped paste

Most survey platforms quote-wrap free-text fields correctly. The exception: when a respondent's answer was copy-pasted into the form from a source that already had RFC 4180 quoting, and the form processed the literal " characters. Result: every comma in the answer shifts subsequent cells one column right. The cleaner's repairRows: true (default on) pads short rows or trims long rows to the header width. The underlying answer text may need manual review.

Survey allowed multiple submissions intentionally — dedup deletes legitimate data

Risk of data loss

If your form allowed respondents to submit multiple times by design (longitudinal surveys, weekly check-ins, multi-stage research), any email-keyed dedup will collapse legitimate distinct responses. Use dedupMode: off for these scenarios. The cleaner's other passes (trim, smart-quote, NBSP, empty-row) are still safe to apply.

Multi-row addresses with embedded newlines

Quote-aware parser handles this

Jotform and Google Forms wrap multi-line address fields with \n inside the cell. RFC 4180 requires those cells to be quoted ("..."). The cleaner's parser is quote-aware so the embedded newlines stay inside the cell. If a downstream tool fails on these cells, the cleaner's outputLineEnding setting doesn't normalise newlines *inside* quoted cells — only between rows. For full sanitisation, run csv-find-replace with regex \r?\n → on the address column.

SurveyMonkey export has two header rows (question text + answer-choice subhead)

Standard layout default

SurveyMonkey's Standard layout writes question text on row 1 and the answer-choice subhead on row 2. The cleaner's skipRows: 1 drops row 1, leaving the subhead as the canonical header. Alternative: switch to Condensed Columns in SurveyMonkey's export options for a single-header layout — generally easier downstream.

Free-text answers contain commas inside semicolon-separated lists

Locale-specific delimiter pitfall

Some EU-locale Typeform exports use ; as the field delimiter (because , is the decimal separator). The cleaner's delimiter: auto sniffs from the first row. Force delimiter: ";" if your file is small enough that the sniffer guessed wrong (1–2 columns of data can confuse heuristics).

Microsoft Forms timestamp displays differently for different respondents

Display locale, not data corruption

Microsoft Forms writes the submission timestamp in the form owner's timezone. Excel displays dates in the local timezone of whoever opens the file. The cleaner treats every cell as text, so the timestamp string is preserved exactly — the perceived difference is downstream display behaviour. If you need a fixed timezone, format the timestamp as text inside Forms before export.

Frequently asked questions

Why does my Google Forms CSV show `Ã©` instead of `é` in Excel?

Google Forms exports as UTF-8 without a Byte Order Mark. Excel on Windows defaults to Windows-1252 (Western European Latin-1) when no encoding indicator is present, and Windows-1252 reads the UTF-8 byte sequence for é (0xC3 0xA9) as the two characters Ã©. The data isn't corrupted — the encoding label is missing. Three fixes, easiest first: (1) open the cleaner above, drop the file, and download — the output adds a BOM Excel auto-detects; (2) Excel's Data → Get Data → From File → From Text/CSV lets you pick UTF-8 manually; (3) re-save through Google Sheets, which writes a UTF-8 CSV Excel handles better.

How do I dedupe survey responses by email when one was submitted with capital letters?

Use the cleaner's dedupMode: case-insensitive option (under Advanced). It lower-cases each cell *only for the comparison key* — the output preserves whatever the respondent typed. If you also want to catch trailing whitespace from mobile autocomplete (Sue@x.com vs Sue@x.com), use dedupMode: normalized instead — it adds trim + NBSP folding to the comparison pipeline. Neither mode modifies the visible cell values; only the dedup decision uses the normalised form.

Why does my SurveyMonkey Respondent ID look like `1.23457E+15` in Excel?

Respondent IDs are 16-digit integers. Excel coerces any 12+ digit number to scientific notation for display, and any 16+ digit number has its 16th digit replaced with 0 permanently after save. This is Excel's documented 15-digit precision limit (IEEE 754 floating-point). The cleaner treats every cell as a string, so if you drop the SurveyMonkey CSV directly onto the cleaner *before* opening it in Excel, the IDs survive. If you've already saved through Excel, the IDs are damaged — re-export from SurveyMonkey.

Should I turn on `normalizeSmartQuotes` for my survey CSV?

Turn it on if your form had any free-text or paragraph question. Mobile keyboards autocorrect straight " to curly “ ”; pasted answers from email or Word bring curly quotes plus en-dashes (–) plus the horizontal ellipsis (…). Downstream tools doing exact text matching — BI dashboards, NLP sentiment pipelines, customer-support search — treat the two strings as different. Folding them to ASCII straight quotes preserves the words and fixes the encoding. Leave it off if your form was all multiple-choice or numeric input.

What about CHAR(160) non-breaking space in responses?

Same advice as smart quotes: turn normalizeHiddenWhitespace on if your form had textarea / paragraph questions. CHAR(160) NBSPs come in via three paths: (1) Outlook / Gmail / Apple Mail wrap pasted text with NBSPs between words; (2) Webpages with   survive copy-paste as U+00A0; (3) Some mobile keyboards on iOS Safari insert NBSP after punctuation. All three produce strings that look identical to human eyes but never match in exact-string comparison. The cleaner folds them to regular U+0020 space.

Does the cleaner work on `.xlsx` exports from Google Forms / Typeform / SurveyMonkey?

Not directly — the cleaner parses CSV. For XLSX, use the platform's CSV export option (every platform offers it). If you only have XLSX, open it in Excel and File → Save As → CSV UTF-8 (Comma delimited). The XLSX-direct dedup tool in our JAD Excel suite is the right home for native XLSX workflows; CSV is faster and lighter-weight for one-off survey-export cleanup.

My survey allows multiple submissions — will the cleaner delete legitimate duplicates?

Only if you turn dedup on. The default mode is exact (byte-for-byte row match) which still collapses exact duplicates — change to none if your form is a weekly check-in or repeat-research design where the same email submitting twice with different answers is expected. The cleaner's other passes (trim, smart-quote, NBSP, empty-row, repair) are non-destructive — they normalise formatting without changing meaning.

Why does my Typeform CSV have a metadata row before the headers?

Some Typeform exports — particularly older forms or files generated via the API rather than the web UI — prepend a single metadata row (# Exported from typeform.com on ...) before the actual question headers. The cleaner's skipRows: 1 drops it so row 1 of the output is the canonical header. SurveyMonkey's Standard layout has the same pattern — two header rows where you usually want the second.

How do I tell which encoding my export is in before dropping it onto the cleaner?

You don't need to. inputEncoding: auto (the default) sniffs the encoding from the first bytes: UTF-8 BOM (EF BB BF), UTF-16 LE BOM (FF FE), UTF-16 BE BOM (FE FF), or content heuristics for no-BOM files. Forced override options exist if you have a specific need — but for the five major survey platforms (Google Forms, Typeform, SurveyMonkey, Microsoft Forms, Jotform), auto-detection works correctly every time.

Will survey PII (emails, free-text answers, IP addresses) be uploaded to JAD Apps?

No. PapaParse runs entirely in your browser. The CSV is never uploaded — the only thing saved server-side is a single counter (file processed, no content) for signed-in dashboard stats. You can opt out in account settings. This matters for survey data because most jurisdictions treat email addresses + free-text responses as PII under GDPR (EU), CCPA (California), CPA (Colorado), VCDPA (Virginia).

How large a survey CSV can the cleaner handle?

Free tier caps at 2 MB — roughly 8,000–15,000 responses depending on answer length. Pro removes the limit, with a practical ceiling around 5–10M rows determined by browser memory. Even the busiest survey rarely exceeds that. The downstream constraint is your BI / accounting / analysis tool, not the cleaner: SurveyMonkey's max single export is 100,000 responses (paginate beyond that); Typeform paginates at 1,000 by default.

Can I run this as part of an automated pipeline?

Yes — the same csv-cleaner is available via your local @jadapps/runner. Hit GET /api/v1/tools/csv-cleaner from any MCP/SDK client to fetch the option schema (returns the same 11 options the UI exposes), then POST the payload to your runner's 127.0.0.1:9789/v1/tools/csv-cleaner/run endpoint with your file. Survey PII never reaches JAD's servers — the cleaner runs on your machine via the runner. Pair the runner once and you can automate the daily survey-export cleanup.

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

How to clean a survey-response csv

Step 1
Export from your survey platform — Google Forms: Responses tab → ⋮ → Download responses (.csv). Typeform: Results → Responses → Download all responses → CSV. SurveyMonkey: Analyze Results → Save As → All response data → CSV; pick the Condensed Columns layout for the simplest shape. Microsoft Forms: Responses → Open in Excel → File → Save As → CSV UTF-8.
Step 2
Drop the file onto the cleaner above — Everything runs in your browser via PapaParse. The cleaner sniffs encoding from the BOM (or its absence) and the delimiter from the first row — you don't have to tell it which platform the file came from. Survey PII (emails, free-text answers, IP addresses) never reaches a server.
Step 3
Pick the dedup mode for your data — Exact: only collapses byte-identical duplicate rows. Trim: collapses rows that differ only in leading/trailing whitespace. Case-insensitive: collapses on lowercase comparison — Sue@x.com matches sue@x.com. Normalized: trim + lowercase + CHAR(160) fold all applied to the comparison key; output cells keep their original casing. For survey email-based dedup, case-insensitive is the sane default. Leave it off if your survey allowed anonymous repeat submissions intentionally.
Step 4
Enable Advanced → smart-quote + hidden-whitespace if you see textareas in your form — Free-text answer columns are where curly quotes and NBSPs hide. Mobile keyboards autocorrect " to “ and ”; pasted answers from email/Word/Pages bring those plus CHAR(160) NBSPs. Both options are off by default to avoid changing data unnecessarily — turn them on when your form had any textarea / paragraph question.
Step 5
Set the output line-ending for your downstream tool — CRLF for Excel-on-Windows, QuickBooks, Power BI, most legacy importers. LF for Mac terminals, Python pandas (handles either), Linux awk pipelines. Preserve keeps whatever the input had — useful when the file is going back into the same system.
Step 6
Download and verify the row count — The output panel shows: rows in, rows out, duplicates removed, smart quotes normalized, hidden-whitespace cells folded, rows skipped from top, BOM detected (yes/no), encoding detected (utf8/utf16le/utf16be), delimiter detected (,/\t/;). Cross-check the rows-out count against the platform's reported total responses number before importing to your downstream tool.

Per-platform export quirks

What each survey platform writes when you click Export, and the specific cleaner option that addresses each quirk. Information is current as of May 2026.

Platform	Encoding	BOM	Delimiter	Quirks	Cleaner option(s)
Google Forms	UTF-8	No — Excel on Windows mojibakes non-ASCII chars	`,`	Always exports as `Form Responses 1.csv` with the form's name as a prefix; column order matches the question order; multi-select columns hold semicolon-separated values in a single cell	`outputLineEnding: crlf` for Excel; cleaner output adds the missing BOM
Typeform	UTF-8	Yes (newer exports)	`,`	Forms created before June 2025 have no `Ending` column. Multiple-choice with multi-select shows answer choices as separate columns rather than one cell. Some legacy exports prepend a single metadata row (`# Exported from typeform.com on ...`) before the question headers	`skipRows: 1` for the legacy metadata row; `inputEncoding: auto` handles either BOM state
SurveyMonkey (Standard)	UTF-8 with BOM	Yes	`,`	Two header rows by default (question text + answer-choice subhead); Respondent ID is a 16-digit number that displays as scientific notation in Excel if opened by double-click	`skipRows: 1` to drop the second header; cleaner preserves the 16-digit Respondent ID as text
SurveyMonkey (Condensed)	UTF-8 with BOM	Yes	`,`	Single header row; multi-select stored as comma-separated values inside one cell (use `csv-column-splitter` to separate after cleaning)	Same as Standard; no `skipRows` needed
Microsoft Forms	UTF-8	Yes (when exported via Excel → Save As CSV UTF-8)	`,`	Submission timestamp uses the form owner's local timezone — different respondents reading the same Excel file may see different times based on display locale	`outputLineEnding: crlf` for Excel; date fields untouched (cleaner is text-only)
Jotform	UTF-8 with BOM	Yes	`,`	Multi-row addresses are joined with `\n` literals inside one cell — quote-aware parser handles these; `trimWhitespace` only trims cell boundaries, not internal newlines	Default options
Tally	UTF-8 with BOM	Yes	`,`	Includes a Submission ID column that's a 24-character ULID — text-safe in Excel because it's not all-digits	Default options

What the cleaner does

Every cleaner option mapped to the survey-specific problem it solves. The cleaner is a single pass — all selected options apply together.

Cleaner option	Survey-data problem it solves	Default
`trimWhitespace`	Mobile keyboards add a trailing space on autocomplete; trim removes it so dedup catches `john@x.com` matching `john@x.com`	On
`removeEmptyRows`	Optional skipped questions leave blank cells; rows where every cell is blank (a respondent who started but submitted nothing) get dropped	On
`repairRows`	Pads short rows / truncates long rows to the header width. Triggered when an unquoted comma in a free-text answer broke a row	On
`dedupMode: case-insensitive`	Email-based dedup that catches `Sue@gmail.com` matching `sue@gmail.com`. For survey responses where users submitted twice — once on each device — this is the right default	Off (must enable)
`dedupMode: normalized`	Trim + lowercase + CHAR(160) fold applied to the comparison key only. Catches `Sue@gmail.com` (trailing space) matching `sue@gmail.com`. The output cells keep their original form	Off (must enable)
`skipRows: 1`	Strip the metadata row Typeform legacy exports prepend, or the second header row SurveyMonkey Standard layout writes	0 (none)
`inputEncoding: auto`	Detects UTF-8 with/without BOM, UTF-16 LE, UTF-16 BE from the file's first bytes — handles all five major platforms automatically	auto
`delimiter: auto`	Sniffs comma vs semicolon (some EU-locale Typeform exports use semicolon) vs tab from the first 4 KB	auto
`normalizeSmartQuotes`	Converts `“ ” ‘ ’ – —` from mobile autocorrect and Word/Pages paste to ASCII straight quotes / hyphens. Preserves the words, fixes the encoding	Off (must enable)
`normalizeHiddenWhitespace`	Folds CHAR(160) NBSP and U+200B zero-width space (both common from paste-from-mobile and paste-from-webpage) to regular space; collapses internal runs to single space	Off (must enable)
`outputLineEnding: crlf`	Windows / QuickBooks / Power BI line endings. `lf` for Mac terminals / pandas; `preserve` to keep input format	crlf

Cookbook

Real before/after rows from real survey exports. Headers shortened for brevity.

Google Forms response with mojibake when opened in Excel-on-Windows

Example

Input (Google Forms UTF-8, no BOM, opened in Excel-on-Windows):
Name,Comment
José Smith,Bonjour à tous

  ↓ Excel-on-Windows displays as:
JosÃ© Smith,Bonjour Ã  tous

Cleaner output (UTF-8 BOM added):
Name,Comment
José Smith,Bonjour à tous   ← Excel reads correctly

Typeform legacy export with prepended metadata row

Example

Some pre-2025 Typeform exports prepend a single metadata row before the actual question headers. The cleaner's skipRows: 1 drops it so row 1 of the output is the real header.

Input (Typeform legacy):
# Exported from typeform.com on 2026-05-15
Submit Date,Respondent,Q1: Name,Q2: Email
2026-05-15,abc123,John Smith,john@x.com

Cleaner config: skipRows: 1

Output:
Submit Date,Respondent,Q1: Name,Q2: Email
2026-05-15,abc123,John Smith,john@x.com

Free-text answer with curly quotes from mobile autocorrect

Example

Input:
Email,Feedback
user@x.com,"The “frustrated” user said “I don't think this works”"

Cleaner config: normalizeSmartQuotes: true

Output:
Email,Feedback
user@x.com,"The ""frustrated"" user said ""I don't think this works"""

Email duplicates with case + whitespace differences

Example

Input:
Email,Response
John.Smith@Gmail.com,Yes
john.smith@gmail.com ,Yes

Cleaner config: dedupMode: normalized

Output (first occurrence kept):
Email,Response
John.Smith@Gmail.com,Yes

CHAR(160) NBSP from a paste-from-email answer

Example

Input (with NBSPs shown as · for clarity — they're U+00A0 in the actual file):
Email,Description
user@x.com,The·error·says·"Cannot·connect·to·server"

Cleaner config: normalizeHiddenWhitespace: true

Output:
Email,Description
user@x.com,The error says "Cannot connect to server"

Empty rows from respondents who started but didn't submit

Example

Most survey platforms record an empty row when a respondent loads the form but doesn't submit. Skip-the-empties cleanup removes these silently.

Input:
Email,Response
user1@x.com,Yes
,
,
user2@x.com,No

Cleaner config: removeEmptyRows: true (default)

Output:
Email,Response
user1@x.com,Yes
user2@x.com,No

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Encoding mismatch

SurveyMonkey Respondent ID displays as scientific notation in Excel

Excel precision corruption

Typeform multi-select questions have one column per choice instead of one cell

Schema choice

Comma in a free-text answer breaks the row

Quote-stripped paste

Survey allowed multiple submissions intentionally — dedup deletes legitimate data

Risk of data loss

Multi-row addresses with embedded newlines

Quote-aware parser handles this

SurveyMonkey export has two header rows (question text + answer-choice subhead)

Standard layout default

Free-text answers contain commas inside semicolon-separated lists

Locale-specific delimiter pitfall

Microsoft Forms timestamp displays differently for different respondents

Display locale, not data corruption

Frequently asked questions

How do I dedupe survey responses by email when one was submitted with capital letters?

Why does my SurveyMonkey Respondent ID look like `1.23457E+15` in Excel?

Should I turn on `normalizeSmartQuotes` for my survey CSV?

What about CHAR(160) non-breaking space in responses?

Does the cleaner work on `.xlsx` exports from Google Forms / Typeform / SurveyMonkey?

My survey allows multiple submissions — will the cleaner delete legitimate duplicates?

Why does my Typeform CSV have a metadata row before the headers?

How do I tell which encoding my export is in before dropping it onto the cleaner?

Will survey PII (emails, free-text answers, IP addresses) be uploaded to JAD Apps?

How large a survey CSV can the cleaner handle?

Can I run this as part of an automated pipeline?

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

Clean a Survey-Response CSV

How to clean a survey-response csv

Per-platform export quirks

What the cleaner does

Cookbook

Google Forms response with mojibake when opened in Excel-on-Windows

Typeform legacy export with prepended metadata row

Free-text answer with curly quotes from mobile autocorrect

Email duplicates with case + whitespace differences

CHAR(160) NBSP from a paste-from-email answer

Empty rows from respondents who started but didn't submit

Errors and edge cases

`é` displays as `Ã©` after opening Google Forms CSV in Excel

SurveyMonkey Respondent ID displays as scientific notation in Excel

Typeform multi-select questions have one column per choice instead of one cell

Comma in a free-text answer breaks the row

Survey allowed multiple submissions intentionally — dedup deletes legitimate data

Multi-row addresses with embedded newlines

SurveyMonkey export has two header rows (question text + answer-choice subhead)

Free-text answers contain commas inside semicolon-separated lists

Microsoft Forms timestamp displays differently for different respondents

Frequently asked questions

Why does my Google Forms CSV show `Ã©` instead of `é` in Excel?

How do I dedupe survey responses by email when one was submitted with capital letters?

Why does my SurveyMonkey Respondent ID look like `1.23457E+15` in Excel?

Should I turn on `normalizeSmartQuotes` for my survey CSV?

What about CHAR(160) non-breaking space in responses?

Does the cleaner work on `.xlsx` exports from Google Forms / Typeform / SurveyMonkey?

My survey allows multiple submissions — will the cleaner delete legitimate duplicates?

Why does my Typeform CSV have a metadata row before the headers?

How do I tell which encoding my export is in before dropping it onto the cleaner?

Will survey PII (emails, free-text answers, IP addresses) be uploaded to JAD Apps?

How large a survey CSV can the cleaner handle?

Can I run this as part of an automated pipeline?

Privacy first

Related guides

Clean a Survey-Response CSV

How to clean a survey-response csv

Per-platform export quirks

What the cleaner does

Cookbook

Google Forms response with mojibake when opened in Excel-on-Windows

Typeform legacy export with prepended metadata row

Free-text answer with curly quotes from mobile autocorrect

Email duplicates with case + whitespace differences

CHAR(160) NBSP from a paste-from-email answer

Empty rows from respondents who started but didn't submit

Errors and edge cases

`é` displays as `Ã©` after opening Google Forms CSV in Excel

SurveyMonkey Respondent ID displays as scientific notation in Excel

Typeform multi-select questions have one column per choice instead of one cell

Comma in a free-text answer breaks the row

Survey allowed multiple submissions intentionally — dedup deletes legitimate data

Multi-row addresses with embedded newlines

SurveyMonkey export has two header rows (question text + answer-choice subhead)

Free-text answers contain commas inside semicolon-separated lists

Microsoft Forms timestamp displays differently for different respondents

Frequently asked questions

Why does my Google Forms CSV show `Ã©` instead of `é` in Excel?

How do I dedupe survey responses by email when one was submitted with capital letters?

Why does my SurveyMonkey Respondent ID look like `1.23457E+15` in Excel?

Should I turn on `normalizeSmartQuotes` for my survey CSV?

What about CHAR(160) non-breaking space in responses?

Does the cleaner work on `.xlsx` exports from Google Forms / Typeform / SurveyMonkey?

My survey allows multiple submissions — will the cleaner delete legitimate duplicates?

Why does my Typeform CSV have a metadata row before the headers?

How do I tell which encoding my export is in before dropping it onto the cleaner?

Will survey PII (emails, free-text answers, IP addresses) be uploaded to JAD Apps?

How large a survey CSV can the cleaner handle?

Can I run this as part of an automated pipeline?

Privacy first

Related guides