How to trim crm excel exports before you load them into sql
- Step 1Export from the CRM (don't pre-clean) — Run your Salesforce report export / Data Loader, HubSpot list export, Zoho or Pipedrive export to
.xlsxor.csv. Leave it as-is — let the trimmer do the cleaning. - Step 2Drop the export on the trimmer — Drag the file in. SheetJS parses it locally; the data never leaves the browser. Multi-sheet exports are processed first-sheet-only, so export one object (Accounts, Contacts, Deals) per file.
- Step 3Let the single pass run — There is nothing to configure. Every cell gets a leading/trailing trim — including edge CHAR 160 — in one in-memory pass.
- Step 4Download the cleaned staging file — An
.xlsxexport comes back as<name>-processed.xlsx; a CSV stays CSV. The result is values-only, ready to stage. - Step 5Load into your database — Run your
\copy/COPY FROM/LOAD DATA INFILE/BULK INSERTagainst the cleaned file. Foreign-key joins on name fields andUNIQUEconstraints on email/account now behave because the keys match byte-for-byte. - Step 6Spot-check the keys — After load, run
SELECT company, COUNT(*) FROM staging GROUP BY company HAVING COUNT(*)<>1(or compare against the CRM's record count). Counts that finally line up confirm the edge-space variants collapsed.
Where CRM exports hide whitespace
Common edge-whitespace sources by platform and the database symptom they cause. The trimmer addresses the edge-space causes; case and duplicate issues need the linked siblings.
| CRM source | Typical edge-whitespace | Database symptom |
|---|---|---|
| Salesforce report / Data Loader | Trailing space in Account Name, Billing fields | JOIN on account name returns fewer rows than expected |
| HubSpot list export | Edge CHAR 160 in rich-text company / notes fields | WHERE email = ? misses; constraint admits a near-duplicate |
| Zoho CRM export | Leading space from copy-pasted phone / address | Phone lookup fails; dedup keeps both variants |
| Pipedrive export | Trailing newline pasted into deal title | GROUP BY splits one deal stage into two buckets |
| Any web-sourced field | Tabs / NBSP from pasted web content | Exact-match key comparison silently fails |
What this tool fixes vs. what needs a sibling tool
Scope it precisely so the pre-load checklist routes each problem to the right tool. All siblings below accept XLSX as well as CSV.
| Problem | Right tool | Link |
|---|---|---|
| Leading/trailing spaces, tabs, newlines, edge CHAR 160 | This trimmer | (current tool) |
Internal Acme Corp double spaces | Find/replace {2,} → | csv-find-replace |
Case mismatch (acme vs Acme) breaking keys | Case converter | csv-case-converter |
| Duplicate account/contact rows | Deduplicator | csv-deduplicator |
| Garbled non-ASCII / smart quotes in keys | Special-char stripper | csv-special-char-stripper |
| Blank rows from empty CRM groups | Empty-row remover | csv-empty-row-remover |
Tier limits for CRM-sized exports
Per-file Excel-family limits. A 10,000-contact Salesforce export fits the free tier; full account/contact dumps usually need Pro.
| Tier | Max file size | Max rows | Files at once |
|---|---|---|---|
| Free | 5 MB | 10,000 | 1 |
| Pro | 50 MB | 100,000 | 5 |
| Pro-media | 200 MB | 500,000 | 20 |
| Developer | 500 MB | Unlimited | Unlimited |
Cookbook
Pre-load recipes for the most common CRM-to-SQL whitespace failures. PII shown is anonymised.
Salesforce account name trailing space breaks a JOIN
Accounts export carries Acme Corp (trailing space). The Opportunities table has Acme Corp. The join on account name silently drops every Acme opportunity. Trim the Accounts export before staging.
accounts.xlsx (before): accounts-processed.xlsx (after): account_name account_name Acme Corp· (trailing) Acme Corp Globex Globex SQL after load: SELECT * FROM opps o JOIN accounts a USING(account_name); -- before: 0 Acme rows after: all Acme opps join
HubSpot CHAR 160 defeats Excel TRIM but not this tool
A rich-text company field pasted from a web page ends in CHAR 160. A teammate already ran Excel =TRIM() and it 'looked clean' — but CHAR 160 survives Excel's TRIM and the unique constraint still admits a duplicate. This tool removes the edge CHAR 160.
Before (· = CHAR 160):
company
Initech·
Initech
Excel =TRIM("Initech·") → "Initech·" (still distinct → 2 rows pass UNIQUE)
Whitespace Trimmer → "Initech" (both collapse → constraint catches dup)Pipedrive deal title trailing newline splits a GROUP BY
A deal title pasted from another tab ends in a newline. After load, GROUP BY stage shows 'Negotiation' and 'Negotiation\n' as two buckets. The trimmer removes the trailing newline.
Before (\n = newline): After: deal_title deal_title Negotiation\n Negotiation Negotiation Negotiation GROUP BY before: 2 buckets after: 1 bucket
Full pre-load pipeline for a Salesforce contact dump
Order matters: trim edges first so case-folding and dedup operate on clean keys. Each step is a separate tool; all accept XLSX.
1. Whitespace Trimmer strip edge spaces/CHAR 160 → contacts-processed.xlsx 2. csv-case-converter lower-case the email column (/tool/csv-case-converter) 3. csv-deduplicator drop duplicate emails (/tool/csv-deduplicator) 4. \copy staging FROM 'contacts-clean.csv' CSV HEADER;
Why not just TRIM() in SQL on every column?
Wrapping every text column in TRIM() inside your load mapping is brittle (you forget a column; TRIM in SQL also ignores CHAR 160 in many engines) and re-runs on every reload. Cleaning the file once before load is simpler and catches CHAR 160 that SQL TRIM misses.
Brittle: INSERT ... SELECT TRIM(c1), TRIM(c2), ... (24 columns, easy to miss one)
-- and Postgres TRIM(' ' ...) does not strip CHAR 160 by default
Clean: run file through Whitespace Trimmer once → \copy directlyEdge cases and what actually happens
SQL TRIM() ignored CHAR 160 but this tool removes it
ExpectedMany database TRIM() implementations strip only the ASCII space by default, leaving the non-breaking space (CHAR 160) that rich-text CRM fields carry. This tool's trim removes edge CHAR 160, so a value that stayed 'distinct' after SELECT TRIM(col) will collapse correctly here before it ever reaches the database.
Internal double spaces in addresses remain
By design12 Main St keeps its internal double spaces — the tool trims edges only. If your address-matching depends on single-spaced strings, follow with csv-find-replace using {2,} → before staging.
Case mismatch is not fixed
Out of scopeAcme and acme will still be two distinct keys after trimming — this tool does not change case. Fold case with csv-case-converter if your join or unique key is case-sensitive in the database.
Duplicate rows are not removed
Out of scopeTrimming makes near-duplicates identical, but it does not delete them. Run csv-deduplicator after trimming so the now-identical rows collapse to one before they hit a UNIQUE constraint.
Multi-object export in one workbook — sheet 1 only
First sheet onlyIf a single .xlsx has Accounts on sheet 1 and Contacts on sheet 2, only Accounts is processed and returned. Export each CRM object to its own file (Salesforce, HubSpot, Zoho, Pipedrive all let you do this) and trim them separately.
Number/date formatting not preserved
Formatting lostBecause the file passes through a plain-text stage, currency/date cell formats are not carried to the output .xlsx. The displayed text is preserved. For loads this is usually fine (you cast types on import), but cast date and numeric columns explicitly in your COPY/LOAD mapping.
Phone numbers with leading zeros
Text preservedIf a phone or account number with a leading zero is stored as text in the source, its text is preserved (only edge whitespace is removed). If it was stored as a number in Excel, the leading zero was already gone before export — that is a CRM/Excel typing issue, not a trimming one.
Export exceeds the free tier
BlockedA full contact dump above 5 MB / 10,000 rows is blocked on the free tier. Either upgrade to Pro (50 MB / 100,000 rows) or split the export into chunks and trim each before loading.
Embedded line breaks inside a quoted CSV field
PreservedA correctly quoted multi-line notes field is parsed as one cell; the trim removes only leading/trailing whitespace of that whole value, leaving internal line breaks intact. If the source CSV had an *unquoted* newline, the row was already malformed at export — re-export with proper quoting.
Encrypted CRM export
Cannot openSome CRMs offer password-protected exports. An encrypted workbook cannot be read by the browser parser — export without encryption (or decrypt in Excel) before trimming.
Frequently asked questions
Does Salesforce really add trailing spaces?
Frequently — usually because a rep typed a stray space into a free-text field (Account Name, Billing Street) and Salesforce stores values verbatim. Report exports and Data Loader output carry those spaces straight into the file, where they silently break joins after load. Trimming the export before staging removes them.
Will this also remove non-breaking spaces from HubSpot rich-text fields?
At the edges, yes. Rich-text CRM fields pasted from web content commonly carry CHAR 160 (the non-breaking space). This tool's trim removes edge CHAR 160 — which both Excel's TRIM() and many database TRIM() functions ignore — so your keys actually match after load.
Should I trim before or after loading into the database?
Before. Cleaning the file once is simpler and more reliable than wrapping every text column in TRIM() in your load mapping (easy to miss a column, and SQL TRIM() often skips CHAR 160). Trim → optionally case-fold → dedup → load.
Does it fix case differences like 'acme' vs 'Acme'?
No — it only removes whitespace. For case-sensitive keys, fold case with csv-case-converter after trimming. Many DBAs lowercase email and code columns as part of staging for exactly this reason.
Will it remove duplicate account/contact rows?
No, but it is the right first step. Trimming makes Acme and Acme identical; then csv-deduplicator collapses the now-identical rows. Running dedup before trimming would miss the whitespace-hidden duplicates.
Is my CRM data sent to a server?
No. Parsing and trimming happen entirely in your browser via SheetJS. Account names, contact emails, deal values, and notes never leave your machine — important for PII and any data-processing-agreement constraints. Only an anonymous run counter is recorded when signed in.
My export has Accounts and Contacts on separate tabs — will both be cleaned?
No. The XLSX path processes the first worksheet only and returns a single-sheet file. Export each CRM object to its own file and trim them individually.
Can I use this for Shopify or WooCommerce order exports too?
Yes. Any export that produces .xlsx or .csv works. E-commerce exports are a common source of whitespace-padded SKUs and product titles that break inventory joins exactly the way CRM keys do.
Will trimming change my numeric fields like amount or quantity?
No. Numeric cells have no surrounding text to trim and are passed through unchanged in value. Note that custom number/currency *formatting* is not preserved across the round-trip, so cast numeric columns explicitly in your load mapping.
What about internal double spaces in addresses?
Those are left in place — the tool trims edges only. If your address matching needs single-spaced strings, run csv-find-replace with the regex {2,} replaced by one space after trimming.
How large a CRM export can I clean?
Free tier handles 5 MB / 10,000 rows — fine for a mid-size contact list. Pro raises it to 50 MB / 100,000 rows, Pro-media to 200 MB / 500,000, and Developer to 500 MB / unlimited. Browser memory is the practical ceiling on very large dumps.
Can I automate this in an ETL pipeline?
The trimming logic is identical to the canonical CSV whitespace trimmer at /tool/csv-whitespace-trimmer, which exposes the runner API for scripted local runs. The Excel page itself is a browser tool — for a headless pipeline, point your job at the canonical CSV tool, which accepts XLSX input.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.