How to remove duplicates from excel before sql import to prevent unique constraint violations
- Step 1Find the constrained column — Check the target DDL for the
UNIQUEorPRIMARY KEYcolumn — e.g.emailinCREATE TABLE users (email text UNIQUE ...). - Step 2Open the deduplicator — Open the Excel deduplicator (it loads the CSV/Excel deduplicator workspace) and drop your .xlsx or .csv source.
- Step 3Key on the constraint column — Select that same column as the unique key. Two rows count as duplicates only when this column matches.
- Step 4Match your index's case rule — Leave
Case-sensitive keysoff only if your DB index is case-insensitive (e.g.citext/LOWER()index). For a default case-sensitive text index, tick it. - Step 5Remove duplicates — Run it. Compare
duplicates removedagainst what you expected; investigate if the number is surprisingly high. - Step 6Load the clean file — Run your
COPY/LOAD DATA/INSERT. With in-file duplicates gone, the constraint won't trip — assuming the table doesn't already hold those keys (see FAQ).
Excel dedup vs SQL UNIQUE semantics
Where the tool's matching and a SQL UNIQUE index agree — and where you must set the option to match. Getting case wrong is the usual gotcha.
| Aspect | This tool's default | SQL UNIQUE (typical) | What to do |
|---|---|---|---|
| Case | Case-insensitive | Case-SENSITIVE for text/varchar | Tick Case-sensitive keys to match |
| Whitespace | Trims key before compare | 'x ' ≠ 'x' (distinct) | Trim cells too before load (see edge cases) |
| Blank / NULL key | Kept (one row each) | NOT NULL rejects; multiple NULLs may be allowed | Strip blanks before load if column is NOT NULL |
| Composite key | Single column only | Multi-column UNIQUE supported | Concatenate into a helper column |
| Existing table rows | Not considered | Conflicts with existing keys still fail | Truncate or use ON CONFLICT / IGNORE |
Loader behaviour on duplicate keys
How each loader reacts when in-file duplicates hit a UNIQUE constraint. Deduping first avoids all of these.
| Loader | On duplicate key | Pre-dedup benefit |
|---|---|---|
PostgreSQL COPY | Aborts entire load with constraint error | Whole load succeeds |
PostgreSQL INSERT ... ON CONFLICT | Skips/updates conflicts (verbose) | Cleaner, smaller load |
MySQL LOAD DATA | Errors, or last-wins with REPLACE | Deterministic first-wins instead |
MySQL INSERT IGNORE | Silently drops conflicting rows | You control which row survives |
pg_restore / bulk | Aborts on first violation | Restore completes |
Tier limits for Excel deduplication
Real per-file limits applied when you drop an .xlsx onto the Excel hub. Row count is the parsed first-sheet body, not bytes.
| Tier | Max file size | Max rows | Files at once |
|---|---|---|---|
| Free | 5 MB | 10,000 | 1 |
| Pro | 50 MB | 100,000 | 5 |
| Pro-media | 200 MB | 500,000 | 20 |
| Developer | 500 MB | unlimited | unlimited |
Cookbook
Pre-load fixes for real constraint errors. SQL shown is illustrative; dedup behaviour is exactly as implemented.
Stop a PostgreSQL COPY from aborting
A source sheet has the same email twice. COPY users FROM ... aborts with a unique-constraint error. Dedup on email first so the whole load succeeds.
Target: CREATE TABLE users (email text UNIQUE, name text); Input (key column = email, case-SENSITIVE to match the index): email,name sue@x.com,Sue sue@x.com,Sue Dup jon@x.com,Jon Output -> COPY users FROM 'clean.csv' CSV HEADER; -> no 'duplicate key value violates unique constraint'
Case mismatch: match your index's rule
A default Postgres text UNIQUE index is case-sensitive, so Sue@x.com and sue@x.com are TWO valid rows to the DB. The tool's default (case-insensitive) would wrongly drop one. Tick case-sensitive to mirror the index.
Index is case-sensitive (plain text column): Key column = email, Case-sensitive keys = ON: email Sue@x.com sue@x.com Sue@x.com Output: email Sue@x.com sue@x.com -> only the exact-case repeat removed (matches DB)
citext / LOWER() index: keep case-insensitive
If the column is citext or has a LOWER(email) functional unique index, the DB treats Sue@x.com = sue@x.com. Here the default case-insensitive dedup is correct — leave the box unchecked.
Index: CREATE UNIQUE INDEX ON users (lower(email)); Key column = email, Case-sensitive keys = OFF (default): email Sue@x.com sue@x.com Output: email Sue@x.com -> collapsed, matching the case-insensitive index
Composite UNIQUE via a helper column
The table has UNIQUE (tenant_id, email). The tool keys on one column, so concatenate the two into a helper, dedup on it, then delete the helper before load.
Target: UNIQUE (tenant_id, email) Add helper 'k' = tenant_id & '|' & email: k,tenant_id,email 1|sue@x.com,1,sue@x.com 1|sue@x.com,1,sue@x.com 2|sue@x.com,2,sue@x.com Key column = k -> 1 duplicate removed (tenant 2 kept). Delete helper, then load.
Trailing space the DB would treat as distinct
'sue@x.com ' and 'sue@x.com' are DIFFERENT to PostgreSQL, so both load fine into a UNIQUE column — but they're the same person. The tool's trim collapses them; trim the cells too so the loaded value is clean.
Input (key column = email):
email
sue@x.com
sue@x.com
This tool: keeps 1 ('sue@x.com ' verbatim) — trim matched.
BEFORE load, run the whitespace trimmer so the stored
value is 'sue@x.com' not 'sue@x.com ' (DB keeps the space).Edge cases and what actually happens
DB index is case-sensitive but dedup default isn't
Mismatch riskA plain text/varchar UNIQUE index in PostgreSQL/MySQL (non-ci collation) is case-sensitive — A@x and a@x are two valid rows. The tool's default case-insensitive dedup would drop one you needed. Tick Case-sensitive keys to match a case-sensitive index, or leave it off for citext / LOWER() indexes.
Trim collapses keys the DB keeps distinct
WatchThe tool always trims the comparison key, so 'x ' and 'x' collapse. But the database treats them as distinct values and stores the space on the surviving row. Run the whitespace trimmer on the column so the loaded value is actually clean, not just deduped.
Table already contains the key
Still failsDeduping the file only removes in-file duplicates. If the target table already holds one of those keys, the load still violates the constraint. Truncate first, or use INSERT ... ON CONFLICT / INSERT IGNORE.
Composite UNIQUE constraint
Single column onlyThe tool keys on one column; a multi-column UNIQUE (a, b) needs a helper column concatenating a and b. Dedup on the helper, then delete it before generating the load.
NOT NULL column with blank cells
Will rejectBlank-key rows are kept by this tool, but a NOT NULL constrained column rejects them at load. Strip blank-key rows with the empty-row remover before loading.
Numeric key parsed inconsistently from XLSX
Type driftAfter XLSX parsing, a key like 00123 may arrive as text in one row and the number 123 in another, so they won't match here but the DB (text column) treats them differently again. Standardise the column's type before deduping.
Only the first worksheet is loaded
First sheet onlyIf your import data is split across sheets, only sheet 1 is deduplicated and exported. Combine sheets with the sheet joiner before deduping so all rows are considered.
File exceeds the tier limit
RejectedFree caps at 5 MB / 10,000 rows / 1 file. For large loads upgrade to Pro (100,000 rows) / Developer (unlimited), or split the file and dedup each chunk — but note cross-chunk duplicates won't be caught.
XLSX output is values-only
Values onlyThe deduplicated .xlsx is rebuilt as a single values-only sheet; formulas become their computed values. That's usually ideal for a load file, but export to CSV if your loader prefers plain text.
Near-duplicate keys (typos)
Not collapsedExact match only — acme and acme ltd stay separate and both load. For approximate dedup before a fuzzy upsert, use the fuzzy deduplicator (Pro).
Frequently asked questions
What if the table has a composite (multi-column) unique constraint?
Single-column key only. Concatenate the constrained columns into a helper column (e.g. tenant_id|email), dedup on the helper, then delete it before loading. That replicates a composite UNIQUE.
Does deduplicating guarantee no constraint violations?
It removes in-file duplicate keys. If the target table already holds some of those keys, the load still fails. Truncate the table first, or load with INSERT ... ON CONFLICT DO NOTHING/UPDATE (Postgres) or INSERT IGNORE (MySQL).
Should I use case-sensitive or case-insensitive dedup for SQL?
Match your index. A plain text/varchar UNIQUE index is case-sensitive — tick Case-sensitive keys. A citext column or a LOWER() functional index is case-insensitive — leave the box off (the default).
Why did the database keep two rows that look identical?
Almost always a trailing space: 'x ' and 'x' are distinct to SQL. The tool trims the comparison key so it dedups them, but it doesn't rewrite the cell — run the whitespace trimmer so the stored value is clean before loading.
Which row survives the dedup?
The first occurrence in row order. Sort the file so the row you want to keep (newest, most complete) is first before deduping, since that's the one that loads.
Can I generate INSERT statements from the cleaned data?
Yes — after deduping, feed the file to the Excel-to-SQL generator to produce INSERT statements (or use COPY/LOAD DATA on the file directly).
What about NULL / blank values in the key column?
The tool keeps blank-key rows (it doesn't merge them). A NOT NULL column will reject them at load, so strip blank-key rows with the empty-row remover first.
Does it read multiple sheets?
No — only the first worksheet. If your load data spans sheets, merge them with the sheet joiner before deduping so all rows are deduplicated together.
How large a dataset can I dedupe before loading?
Free handles 5 MB / 10,000 rows / 1 file. Pro raises it to 50 MB / 100,000 rows, Pro-media to 200 MB / 500,000, and Developer to 500 MB / unlimited — match this to your load size.
Does the tool talk to my database?
No. It runs entirely in your browser and only produces a clean file. You run the COPY / LOAD DATA / INSERT yourself — no credentials are ever involved.
My numeric keys aren't deduping — why?
After XLSX parsing, a key like 00123 may be text in one row and a number in another, so they compare as different. Standardise the column to text (preserve leading zeros) before deduping.
What about near-duplicate keys from typos?
This tool needs an exact match (after trim/case-fold), so acme and acme ltd both survive. For similarity-based dedup before a fuzzy upsert, use the fuzzy deduplicator on Pro.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.