How to clean an email-marketing subscriber csv
- Step 1Export the subscriber list from your platform — **Mailchimp**: Audience → All contacts → Export Audience → wait for the email → download the
.zip→ extract the four CSVs (subscribed_members_export_*.csvetc.). **Klaviyo**: Lists & Segments → pick a list → ⋮ → Export List to CSV → email link arrives within minutes. **Kit**: Subscribers → Export → CSV. **Constant Contact**: Contacts → Export → email or download direct. **HubSpot**: Contacts → Actions → Export. - Step 2If exporting from Mailchimp, decide which CSVs to clean — Mailchimp's
.zipcontains four files — one per subscription status. For a re-import or platform migration, you usually wantsubscribed_members_export_*.csvonly. For audit or list-health analysis, drop all four onto the cleaner sequentially or use the csv-merger tool to combine them first, then clean. - Step 3Drop the file onto the cleaner above — PapaParse runs in your browser — subscriber emails, names, and tags never reach a server. The cleaner sniffs encoding from the BOM and delimiter from the first row, so you don't have to tell it which platform the file came from.
- Step 4Pick the right dedup mode for your list — **Case-insensitive**: collapses
Sue@gmail.com↔sue@gmail.cominto one row. This is what Mailchimp does silently on re-import — better to do it explicitly so the row count is predictable. **Normalized**: trim + lowercase + CHAR(160) fold for the comparison key. Catches the trailing-space duplicates Klaviyo rejects. **Exact**: only collapses byte-identical rows (e.g. when you concatenated multiple Mailchimp status CSVs and the same email appears in two statuses). **Off**: when you're auditing list health and want to see every row verbatim. - Step 5Enable Advanced → smart-quote + hidden-whitespace if you have free-text columns —
Tags,Notes, custom fields with respondent-entered text — these are where curly quotes and NBSPs hide. Mobile signup forms autocorrect"to“ ”. Paste-from-email tags drag in CHAR(160) NBSPs. Both options normalise the comparison form without altering the visible cell content unnecessarily. - Step 6Download and verify against the platform's reported subscriber count — The output panel shows: rows in, rows out, duplicates removed, encoding (
utf8/utf16le/utf16be), delimiter (,/;/\t), BOM detected. **Cross-check** the rows-out count against the platform's contact-count display before re-importing. A larger discrepancy than expected usually means the cleaner's dedup found more duplicates than you knew about — review the dedup preview before exporting.
Per-platform export quirks
Each major email-marketing platform writes a slightly different CSV shape. Information is current as of May 2026. Cleaner options listed are the ones most users need to touch — defaults handle most of the rest.
| Platform | Export shape | Encoding / BOM | Common quirks | Cleaner option(s) |
|---|---|---|---|---|
| **Mailchimp** | .zip containing four CSVs: subscribed_members_export_*.csv, unsubscribed_members_export_*.csv, non-subscribed_members_export_*.csv, cleaned_members_export_*.csv | UTF-8, no BOM | MEMBER_RATING column (1–5) reflects engagement; audience fields vary by signup-form configuration. **Documented import limits**: 200 MB / ~1M contacts per file; Mailchimp 'can't process addresses with UTF-8 encoded characters in the email prefix' (so josé@example.com is silently rejected) | dedupMode: case-insensitive matches Mailchimp's Auto-Update behaviour |
| **Klaviyo** | Single CSV per list or segment, delivered by emailed download link | UTF-8 with BOM | **Max 700 columns** per export — properties beyond that are silently dropped. First column should be Email. List-type properties write as JSON arrays inside one cell: ["value1", "value2"]. Per Klaviyo docs, rows with **whitespace anywhere in the email** or **line breaks inside any cell** are skipped on re-import | trimWhitespace (default on) + dedupMode: normalized to catch space-padded emails |
| **Kit (ConvertKit)** | Single CSV with subscribers + tags + location (city, state, country) columns since the 2025 update | UTF-8 with BOM | Tags column is comma-separated within the cell (use csv-column-splitter if you need separate columns); pre-2025 exports lack location columns — migration scripts may need updating | normalizeHiddenWhitespace if tags were typed on mobile (CHAR(160) NBSPs after punctuation) |
| **Constant Contact** | Single CSV per list, immediate download | UTF-8 with BOM | Birthday column writes as MM/DD/YYYY regardless of locale; multi-list subscribers show one row per list (so a contact in 3 lists = 3 rows) — dedup on email if you want unique contacts | dedupMode: case-insensitive on the email column for cross-list dedup |
| **HubSpot** (Contacts) | Single CSV with up to 100 properties selected at export time | UTF-8 with BOM | Default export includes Record ID, Email, First Name, Last Name, Lead Status, Lifecycle Stage. Property selector lets you add up to 100 fields; HubSpot's documented limit. Multi-value lifecycle history is JSON-encoded inside the cell | Default options usually work; dedupMode: case-insensitive for email cross-checks |
| **Brevo (formerly Sendinblue)** | Single CSV per list, immediate download | UTF-8 with BOM | Phone numbers stored as E.164 with + prefix; opt-in date stored as ISO 8601 with timezone; soft-bounces / hard-bounces / unsubscribes flagged in Statistics column with comma-separated codes | Default; treat as text (the cleaner is text-only, so E.164 phones with + are preserved) |
What the cleaner does
Each cleaner option mapped to the email-marketing problem it solves. Single-pass — all selected options apply together.
| Cleaner option | Email-marketing problem it solves | Default |
|---|---|---|
trimWhitespace | Removes the trailing space on emails (user@x.com ) that triggers Klaviyo's Email Syntax Error and Mailchimp's silent reject | On |
removeEmptyRows | Drops blank rows that appear in Mailchimp's cleaned_members_export_*.csv for hard-bounced contacts whose data was scrubbed | On |
repairRows | Pads or trims rows broken by unquoted commas in Notes / custom fields — the second-most-common Klaviyo Duplicate Headers / row-misalignment cause | On |
dedupMode: case-insensitive | Collapses Sue@x.com ↔ sue@x.com into one row. **Mailchimp does this silently on re-import** — better to do it explicitly so your row counts match | Off (must enable) |
dedupMode: normalized | Trim + lowercase + CHAR(160) fold on the comparison key. Catches space-padded + case-different duplicates. Output cells keep original casing | Off (must enable) |
skipRows: 1 | Strip a leading metadata row from custom-export scripts that prepend # Exported on YYYY-MM-DD. Stock platform exports don't need this | 0 |
inputEncoding: auto | Detects BOM (UTF-8 BOM is what Klaviyo, Kit, Constant Contact, HubSpot, Brevo all write) or no-BOM (Mailchimp) | auto |
delimiter: auto | Sniffs comma vs semicolon (some EU-locale Klaviyo exports) from the first 4 KB | auto |
normalizeSmartQuotes | Folds curly quotes in First Name / custom-field text — common when respondents used Apple Mail's autocomplete on signup forms | Off (must enable) |
normalizeHiddenWhitespace | Folds CHAR(160) NBSPs in Tags / Notes columns from mobile-typed input (iOS Safari inserts NBSPs after punctuation) | Off (must enable) |
outputLineEnding: crlf | Excel-on-Windows / QuickBooks-compatible line endings (default). lf for Mac terminals / Python pipelines | crlf |
Cookbook
Real before/after rows from real email-marketing CSV exports. Email and PII fields anonymised.
Klaviyo silent-skip from trailing-space email
ExampleKlaviyo's docs are explicit: any whitespace in the email is treated as invalid format and skipped on re-import. The space usually comes from mobile autocomplete adding a trailing space when the user typed their email — invisible in the platform's UI, fatal on re-import. The cleaner's default trimWhitespace: true removes it.
Input (Klaviyo CSV, exported, re-imported as-is): Email,First Name,Source user1@x.com ,Sue,Signup form user2@x.com,Jon,Signup form Klaviyo re-import result: • 1 contact added (user2@x.com) • 1 record skipped: invalid email format (user1@x.com) Cleaner output (trim default ON): Email,First Name,Source user1@x.com,Sue,Signup form user2@x.com,Jon,Signup form
Mailchimp four-file ZIP concatenated and deduplicated
ExampleMailchimp's export ZIP has four CSVs by status. For a platform migration, you want one combined deduplicated list — Mailchimp's own re-import would silently merge them, so you might as well do it deliberately before the migration to keep counts predictable.
Inputs (extracted from Mailchimp .zip):
subscribed_members_export_2026-05.csv (3,200 rows)
unsubscribed_members_export_2026-05.csv (450 rows)
non-subscribed_members_export_2026-05.csv (180 rows)
cleaned_members_export_2026-05.csv (75 rows)
Workflow:
1. csv-merger combines into one file → 3,905 rows
2. csv-cleaner with dedupMode: case-insensitive →
1 contact appeared as both 'subscribed' and 'unsubscribed'
(the user re-subscribed) — first occurrence wins → 3,904 rowsCase-different email duplicates created by signup-form variation
ExampleRespondent first signed up via desktop (John.Smith@Gmail.com because Chrome autocompleted the capital), then again via mobile (john.smith@gmail.com lowercase). To Mailchimp's import: two distinct addresses. To Mailchimp's send pipeline: same person — they get every campaign twice. Cleaner dedup before re-import catches this.
Input: Email,First Name,MEMBER_RATING John.Smith@Gmail.com,John,4 john.smith@gmail.com,John,3 Cleaner config: dedupMode: case-insensitive Output (first occurrence kept): Email,First Name,MEMBER_RATING John.Smith@Gmail.com,John,4
Klaviyo line-break-in-cell row misalignment
ExampleKlaviyo's docs flag this: if a Notes cell contains a literal \n that wasn't quoted, Klaviyo's parser treats it as a row break — every subsequent column shifts up one row. The cleaner's quote-aware parser handles RFC 4180 quoted newlines correctly; if the source CSV was malformed (unquoted newline in field), repairRows: true pads/trims to the header width but the underlying note text still needs manual review.
Input (unquoted newline mid-field — malformed CSV): Email,Notes user1@x.com,Customer asked about\nrefund policy user2@x.com,Subscribed Jan 2026 Klaviyo parses this as 3 rows, second row malformed. Fix at source (re-quote during export) OR cleaner repair: repairRows: true pads each row to 2 columns; the multi-line note becomes two rows (one with 'Customer asked about', one with 'refund policy') — review and merge manually.
Kit (ConvertKit) tags column with NBSPs from mobile signup
ExampleKit's Tags column is comma-separated within the cell. Mobile-typed tags often have CHAR(160) NBSPs after the comma (iOS auto-spacing). Visually identical; lexically different. Tag-based segmentation in Kit then misses the NBSP-prefixed tags. normalizeHiddenWhitespace folds them.
Input (NBSPs after commas — shown as ·): Email,Tags user@x.com,"newsletter,·premium,·beta" Kit treats this as three tags: `newsletter`, `·premium`, `·beta` Segment filter on `premium` misses the leading-NBSP version. Cleaner config: normalizeHiddenWhitespace: true Output: Email,Tags user@x.com,"newsletter, premium, beta" → trimmed back to three clean tags by Kit's parser.
Constant Contact multi-list contact deduplicated to one row
ExampleConstant Contact writes one row per (contact, list) pair. A subscriber on 3 lists produces 3 rows. For a platform migration where you want unique contacts, dedup on email — but keep the multi-list info in a Lists audit column.
Input (raw Constant Contact export): Email,First Name,List sue@x.com,Sue,Newsletter sue@x.com,Sue,Product Updates sue@x.com,Sue,Beta Testers Cleaner config: dedupMode: case-insensitive (on Email) Output (first occurrence wins — Sue's first list kept): Email,First Name,List sue@x.com,Sue,Newsletter To preserve list membership, use csv-column-merger BEFORE dedup to roll Lists into a single 'Lists' column: sue@x.com,Sue,"Newsletter,Product Updates,Beta Testers"
Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Mailchimp re-import silently merges case-different duplicates
Silent mergeMailchimp treats Sue@x.com and sue@x.com as the same contact on re-import. If you imported a CSV with both, Mailchimp reports '1 contact added' rather than 2, with no warning that one was a case-collision. The cleaner's dedupMode: case-insensitive makes the collision explicit so your row count matches Mailchimp's contact count exactly.
Klaviyo skips rows with whitespace in the email
Klaviyo silent skipKlaviyo's docs are explicit: any whitespace anywhere in the email value causes the row to be skipped as invalid format. user@x.com (trailing space from mobile autocomplete) is the most common trigger — invisible to the user, fatal on re-import. The skipped rows appear in the import-result file Klaviyo provides post-upload. trimWhitespace: true (default) removes the space.
Klaviyo line breaks inside cells misalign rows
Klaviyo silent skip / misalignmentPer Klaviyo docs: line breaks inside a cell that aren't quote-wrapped are interpreted as new rows, causing every subsequent column to shift. A multi-line Notes or address value that wasn't properly RFC-4180-quoted in the source CSV produces silent corruption. The cleaner's parser handles correctly-quoted multi-line values; for unquoted ones, repairRows: true pads/trims to header width but the underlying value still needs manual review.
Mailchimp can't process UTF-8 in the email prefix
Mailchimp silent rejectDocumented per Mailchimp: 'Mailchimp can't process addresses with UTF-8 encoded characters in the email prefix.' So josé@example.com (with accented é in the local part) is silently rejected at import. Technically valid per RFC 6531 (Internationalized Email Addresses), but Mailchimp's pipeline doesn't support it. The cleaner doesn't strip these — review and either drop, ASCII-normalise the local part, or replace the address with the contact's alternative.
Klaviyo `Duplicate Headers` error
Klaviyo rejectionVerbatim error from Klaviyo's docs: a CSV with two columns sharing the same header (e.g. two Email columns) fails import with Duplicate Headers. Common cause: re-exporting after manually adding a column without removing the original. Not a cleaner-fixable issue — review the column headers manually. The cleaner doesn't dedupe headers because doing so silently could collapse intentionally-similar columns (Email + Email Address).
Mailchimp `.zip` extracted but only one CSV imported
Partial migrationMailchimp's export .zip contains four CSVs by status. Migrating only subscribed_members_export_*.csv to a new platform leaves unsubscribed contacts behind — fine for active-list migration, problematic for compliance (you also need the unsub list to honour opt-outs on the new platform). Best practice: import the subscribed list as the active audience; import the unsubscribed list as a suppression list on the new platform.
Klaviyo 700-column export cap
Schema limitKlaviyo caps profile-property exports at 700 columns. Properties beyond that are silently dropped. Workarounds: (1) export properties in two halves, then merge with csv-merger on email key; (2) export only the properties you need by deselecting at export time. The cleaner has no way to recover dropped columns — they never reached the file.
Kit subscriber missing tags + location columns
Pre-2025 schemaKit added tags + location columns (city, state, country) to the export in 2025. If your migration script was written before that, it may not request or process those columns. The cleaner doesn't add missing columns — re-export from Kit to include them, then re-run the migration.
Mailchimp `MEMBER_RATING` column displays as scientific notation
Excel coercionMEMBER_RATING is an integer 1–5 — no Excel precision issues there. But Mailchimp also exports EUID (8-char alphanumeric) and the LEID numeric ID which can be 9–10 digits. These don't hit the 15-digit precision limit, but Excel may right-align them as numbers. Treat both as text in any spreadsheet to avoid format drift. The cleaner is text-only, so values are preserved exactly.
HubSpot export hits the 100-property limit
Schema limitHubSpot caps each export at 100 contact properties — fewer than Klaviyo's 700 but the same principle applies. Properties beyond the 100 selected at export time are not in the file. Workaround: run multiple targeted exports (each with a different 100-property selection) and merge with csv-merger on Record ID.
Excel-on-Windows mojibake on Mailchimp's no-BOM UTF-8
Encoding mismatchMailchimp exports UTF-8 without a BOM. Excel on Windows defaults to Windows-1252 when no encoding marker is present — José becomes José. Same issue as the Google Forms case in the survey-response spoke. The cleaner adds a BOM on output (configurable via outputLineEnding).
Frequently asked questions
Why does Mailchimp report fewer contacts added than I uploaded?
Mailchimp silently merges duplicates on import — including case-different duplicates (Sue@x.com and sue@x.com collapse). The post-import report shows added and updated separately: contacts that already existed under the same email (case-insensitive) appear in updated rather than added. If your math doesn't add up, run the cleaner's dedupMode: case-insensitive on the source CSV first so you know exactly how many unique contacts you're sending in.
Why is Klaviyo rejecting some of my email rows as invalid format?
Klaviyo's documented rules: a valid email needs a local part, an @ symbol, and a domain — and **any whitespace in the email** or **line breaks inside any cell** causes the row to be skipped. The single most common cause is a trailing space from mobile autocomplete on a signup form — the row looks fine in your CSV but Klaviyo's parser flags it. Run the cleaner with trimWhitespace: true (the default) and the trailing spaces disappear. For emails wrapped in <> from an Outlook contacts paste, use csv-find-replace with regex [<>] → empty as a follow-up step.
I'm migrating from Mailchimp to Klaviyo — should I keep the four-CSV zip structure?
Convert it into two CSVs: active subscribers (subscribed_members_export_*.csv only) and suppression list (unsubscribed + cleaned + non-subscribed combined, then deduplicated). Klaviyo treats suppression as a separate concept — uploading a single combined file with mixed statuses won't work cleanly. Use csv-merger to combine the three suppression-status CSVs, then run this cleaner with dedupMode: case-insensitive to produce a single suppression list.
Klaviyo says my export is missing properties — did I hit the 700-column limit?
Yes, almost certainly. Klaviyo caps profile-property exports at 700 columns per file — extra properties are silently dropped. Two workarounds: (1) deselect properties you don't need at export time; (2) export in two halves and merge with csv-merger on the email key. The cleaner can't recover columns that never made it into the file.
How do I dedupe across Mailchimp's four status CSVs?
Use csv-merger to concatenate the four files first — they share a header schema so merging works cleanly. Then drop the merged file onto the cleaner with dedupMode: case-insensitive. Mailchimp's own re-import would do the dedup silently, but doing it explicitly keeps your row count predictable and surfaces edge cases (e.g. someone who appears in both subscribed and unsubscribed because they re-subscribed).
Does the cleaner work on subscriber CSVs from other platforms — Beehiiv, Substack, MailerLite?
Yes. All of them write CSV with comma delimiter, UTF-8 encoding, and at least an email column. The cleaner's auto-detection handles encoding/BOM differences automatically. Platform-specific column names vary (Email vs email vs Email Address) but the cleaner is column-agnostic — it operates on cell content, not column semantics. For column-rename use csv-header-rename as a follow-up step.
Will the cleaner strip my `MEMBER_RATING` / engagement scores?
No. The cleaner is text-only — every cell is preserved exactly in the output (modulo your selected normalisation passes, which only touch whitespace and smart quotes by request). Numeric columns like Mailchimp's MEMBER_RATING (1–5), Klaviyo's Predicted Lifetime CLV, or HubSpot's HubSpot Score survive unchanged. Excel may right-align them on visual display, but the underlying string is identical.
How does the cleaner handle emoji in `First Name` / `Tags`?
Emoji round-trip correctly when input + output are both UTF-8. Some emoji are stored as surrogate pairs (4 bytes), which trips older Windows-1252-only readers — the cleaner's default outputLineEnding: crlf plus adding the UTF-8 BOM lets Excel-on-Windows display them correctly. If you see ? or □ in Excel where there should be an emoji, the encoding chain broke somewhere upstream of the cleaner — the CSV file itself probably has the emoji byte stream intact (verify by opening in Notepad++ or VS Code).
Can the cleaner help with GDPR / suppression-list compliance?
Partially. The cleaner doesn't track opt-in dates or consent status — those are platform-specific columns (Klaviyo's consent_status, Brevo's opt-in date, Constant Contact's Permission) that you handle in your migration logic. What the cleaner does help with: ensuring the suppression list you import to a new platform is deduplicated and trimmed (Klaviyo + Mailchimp + Brevo all require unique emails in suppression lists; trailing-space addresses bypass suppression matching). Run dedupMode: normalized on the suppression list before import.
Will subscriber PII be uploaded to JAD Apps?
No. PapaParse runs entirely in your browser. Email addresses, names, tags, and engagement data never reach a server. The only thing saved server-side is a single counter (file processed, no content) for signed-in dashboard stats. You can opt out in account settings. This matters for email marketing under GDPR / CCPA — subscriber PII handling is on you, and the cleaner provides no exposure surface.
How large a subscriber list can the cleaner handle?
Free tier caps at 2 MB — roughly 8,000–15,000 subscriber rows depending on column count. Pro removes the limit with a practical ceiling around 5–10M rows determined by browser memory. The downstream constraint is usually the platform: **Mailchimp** has no documented row cap; **Klaviyo** caps at 700 columns (not rows); **HubSpot** at 100 properties; **Constant Contact** at 60,000 contacts per upload. Use csv-row-splitter to split large lists into platform-compatible chunks before import.
Can I run this as an automated pipeline before re-importing?
Yes — GET /api/v1/tools/csv-cleaner returns the 11-option schema; pair the @jadapps/runner once and POST the payload to 127.0.0.1:9789/v1/tools/csv-cleaner/run. Subscriber PII never reaches JAD's servers — the cleaner runs locally on your machine via the runner. Common pipeline: scheduled Klaviyo export → cleaner → suppression-list merge → re-import to a new platform, run monthly.
Privacy first
Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.