How to find duplicate employee records in an hr csv
- Step 1Export the employee CSV — Download from your HRIS, payroll system, or HR spreadsheet. A standard comma-delimited export works; the delimiter is auto-detected.
- Step 2Drop the file onto the tool — Parsing runs in your browser via PapaParse. Names, Employee IDs, NI numbers, and salaries never reach a server. Free runs handle up to 2 MB / 500 rows; Pro handles 100 MB / 100,000 rows.
- Step 3Select the key column — In Find duplicates in column, choose
Employee ID(orNI Number,Email). One key column per run — start with the strongest unique identifier. - Step 4Set case sensitivity — Leave Case-sensitive matching off (default) so casing is ignored. NI numbers and emails are best treated case-insensitively. Click Find duplicates.
- Step 5Compare the flagged records — Read the summary cards, then Download Marked CSV and filter
_is_duplicate = YES. Compare start dates and roles to identify the canonical record versus a rehire or a dual-role entry. - Step 6Resolve, then run further passes — Merge or deactivate duplicate records in your HR system. Re-run on
NI NumberandEmailto catch records that duplicate on those fields before migration or the next payroll run.
What the HR duplicate finder does
The complete control set. One key column per pass, one checkbox, flag-only output. No fuzzy matching, no multi-column key, no auto-removal.
| Control | Behaviour | Default |
|---|---|---|
| Find duplicates in column | Single key column (Employee ID, then NI Number, then Email); values grouped to find repeats | First column |
| Case-sensitive matching | Off lowercases before comparing (EMP-1 = emp-1); on requires identical casing | Off |
_is_duplicate column | YES if the identifier appears 2+ times, NO if once; first occurrence is YES too | Always added |
| Removal | None — records are flagged for HR review, never deleted. Use csv-deduplicator to drop surplus rows | Zero removed |
Identifier passes and what each catches
Run separate passes since the key is one column. Normalise formats first so genuine duplicates aren't split by formatting differences.
| Pass | Key column | Catches | Pre-step / note |
|---|---|---|---|
| 1 | Employee ID | Same ID entered twice — the clearest duplicate-record signal | Confirm casing policy; trim whitespace |
| 2 | NI Number | Same person under two Employee IDs (e.g. a rehire) | Normalise spacing — AB 12 34 56 C vs AB123456C differ |
| 3 | Email | Same work email reused across records | Case-insensitive (default) is correct for email |
Cookbook
Before/after rows from HR/payroll exports. Names, IDs, and NI numbers anonymised; the _is_duplicate column is exactly what the tool appends.
Same Employee ID entered twice
ExampleA sync error created the same Employee ID twice with conflicting departments. Selecting Employee ID marks both YES, including the first, so the full record set is preserved for the audit.
Input (employees.csv): Employee ID,Name,Department EMP-100,A. Patel,Finance EMP-101,B. Cole,Sales EMP-100,A. Patel,Operations Key column: Employee ID · Case-sensitive: off Output (employees.duplicates-marked.csv): Employee ID,Name,Department,_is_duplicate EMP-100,A. Patel,Finance,YES EMP-101,B. Cole,Sales,NO EMP-100,A. Patel,Operations,YES
Rehire detected on NI number, not Employee ID
ExampleA rehired employee got a new Employee ID, so the ID pass misses them — but the NI-number pass catches both records as one person. Compare start dates in the flagged rows to confirm it's a rehire.
Input: Employee ID,NI Number,Start Date EMP-200,AB123456C,2021-03-01 EMP-450,AB123456C,2026-01-15 Key column: NI Number · Case-sensitive: off Output: Employee ID,NI Number,Start Date,_is_duplicate EMP-200,AB123456C,2021-03-01,YES EMP-450,AB123456C,2026-01-15,YES
NI number spacing hides a duplicate
ExampleOne record stored the NI number with spaces, the other without. Whole-cell matching treats them as different, so the pair is not flagged until you normalise the format.
Input: Employee ID,NI Number EMP-300,AB 12 34 56 C EMP-301,AB123456C Key column: NI Number · Case-sensitive: off Output (missed — text differs): Employee ID,NI Number,_is_duplicate EMP-300,AB 12 34 56 C,NO EMP-301,AB123456C,NO Fix: strip spaces with csv-find-replace, then re-run.
Dual-role employee — one record per role
ExampleAn employee holds two roles, exported as two rows sharing the Employee ID. The tool flags both; the audit decision is whether to keep both rows (dual-role design) or merge.
Input: Employee ID,Role,FTE EMP-500,Nurse,0.6 EMP-500,Trainer,0.4 Key column: Employee ID · Case-sensitive: off Output: Employee ID,Role,FTE,_is_duplicate EMP-500,Nurse,0.6,YES EMP-500,Trainer,0.4,YES
Reading the GDPR exposure summary
ExampleFor a 480-row HR export with 9 Employee IDs duplicated, the summary quantifies how many surplus records exist before migration.
Summary after Find duplicates (Employee ID pass): Duplicate groups : 9 (Employee IDs that repeat) Extra copies : 11 (surplus records to review) Unique values : 460 (Employee IDs appearing once) Meaning: 469 distinct IDs across 480 rows; 9 repeat, some more than twice. 11 surplus records to resolve pre-migration.
Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
You wanted duplicates removed for migration
By designThis is a flag-only audit tool — it keeps every record and appends _is_duplicate, which is what you want for an HR audit trail. To produce a deduplicated file, use csv-deduplicator after you've reviewed the flags.
NI numbers stored with different spacing
Not matchedAB 12 34 56 C and AB123456C are different text and won't group. Strip spaces with csv-find-replace (and standardise casing) before the NI pass so true duplicates surface.
Rehire creates a legitimate-looking duplicate
ExpectedA rehired employee may share an NI number but have a new Employee ID and a later start date. The NI pass flags both; compare the start dates in the marked rows to distinguish a rehire from an erroneous duplicate before resolving.
Dual-role employee flagged as a duplicate
ExpectedTwo rows sharing an Employee ID for two roles are flagged — correctly, since the ID repeats. Whether to keep both (dual-role) or merge is an HR decision; the tool surfaces the case, it doesn't judge it.
First occurrence marked YES
ExpectedEvery member of a duplicate group is flagged YES, including the first, so the canonical record stays visible alongside the stray. For surplus-only removal, use csv-deduplicator.
Need to match on Employee ID and NI together
Single key onlyOnly one key column per pass. Run them separately and reconcile, or build a combined EmployeeID|NI column with csv-column-merger for an exact composite match in one pass.
Blank Employee ID or NI cells
Grouped togetherRows with an empty key value all group under one empty key and are flagged YES together (shown as (empty)). Fill or filter blanks before relying on the audit — missing identifiers are themselves a data-quality issue.
HR export over the free 500-row / 2 MB cap
Upgrade requiredFree runs cap at 2 MB and 500 rows; larger payroll exports are blocked with a Pro prompt. Pro raises it to 100 MB / 100,000 rows. Splitting with csv-row-splitter handles a one-off but won't detect duplicates across chunks.
Frequently asked questions
Is this safe to use with employee personal data?
Yes. All parsing and detection run entirely in your browser via PapaParse. Names, Employee IDs, NI numbers, emails, and salaries never reach a server — which supports GDPR data-minimisation since no personal data is transmitted. Only an anonymous usage counter is recorded when signed in, and it can be disabled in settings.
Can I check for duplicates on email and Employee ID separately?
Yes, and you should. The key is a single column, so run the finder once per identifier — Employee ID, then Email, then NI Number — and reconcile the marked files. Each pass catches duplicates the others miss.
What if duplicates exist because an employee was rehired?
A rehire often shares an NI number but has a new Employee ID and a later start date. The NI-number pass flags both records; compare the start dates in the marked rows to confirm it's a rehire rather than an error before deciding how to resolve it.
Does it merge or delete the duplicate records?
Neither. It appends an _is_duplicate column (YES/NO) and keeps every record so HR makes the call. To physically remove duplicates and keep one per group, use csv-deduplicator after review.
Why didn't it flag two records I know are the same person?
Matching is exact text (optionally lowercased). NI numbers stored with different spacing (AB 12 34 56 C vs AB123456C) or a trailing space won't group. Normalise with csv-find-replace or csv-whitespace-trimmer and re-run.
Is the first copy of a duplicate record marked too?
Yes. Every member of a duplicate group is marked YES, including the first, so the canonical record and the stray are both visible. To isolate only the surplus records, filter to YES and exclude the earliest per group, or use the deduplicator.
Can I match on Employee ID and NI number in one pass?
Not directly — the key is one column. Build a combined EmployeeID|NI column with csv-column-merger and key on it for an exact composite match, or run two passes and reconcile.
How large an HR/payroll export can I check?
Free runs handle up to 2 MB and 500 rows; larger files are blocked with a Pro prompt. Pro handles 100 MB and 100,000 rows. For larger datasets, split with csv-row-splitter, accepting that duplicates spanning chunks won't be detected.
What does the output file look like?
Your original CSV with a trailing _is_duplicate column, saved as <yourfile>.duplicates-marked.csv. Filter that column to YES to review the duplicate employee records.
Should I treat NI numbers and emails as case-sensitive?
No — keep case-sensitive matching off for both. Emails are case-insensitive by convention, and NI numbers are conventionally uppercase, so the default lowercased comparison handles any stray casing correctly.
How do I group all flagged records together for HR review?
The marked CSV keeps original order. After download, sort by the _is_duplicate column or by Employee ID with csv-sorter so all YES records sit together in the review sheet.
What do the summary numbers mean for a migration audit?
Duplicate groups = how many identifiers repeat. Extra copies = surplus records you'd resolve before migration. Unique values = records appearing exactly once. Together they size the data-quality work needed before moving to a new HRIS.
Privacy first
Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.