Audit Duplicate Employee Records in an HR CSV

How to find duplicate employee records in an hr csv

Step 1
Export the employee CSV — Download from your HRIS, payroll system, or HR spreadsheet. A standard comma-delimited export works; the delimiter is auto-detected.
Step 2
Drop the file onto the tool — Parsing runs in your browser via PapaParse. Names, Employee IDs, NI numbers, and salaries never reach a server. Free runs handle up to 2 MB / 500 rows; Pro handles 100 MB / 100,000 rows.
Step 3
Select the key column — In Find duplicates in column, choose Employee ID (or NI Number, Email). One key column per run — start with the strongest unique identifier.
Step 4
Set case sensitivity — Leave Case-sensitive matching off (default) so casing is ignored. NI numbers and emails are best treated case-insensitively. Click Find duplicates.
Step 5
Compare the flagged records — Read the summary cards, then Download Marked CSV and filter _is_duplicate = YES. Compare start dates and roles to identify the canonical record versus a rehire or a dual-role entry.
Step 6
Resolve, then run further passes — Merge or deactivate duplicate records in your HR system. Re-run on NI Number and Email to catch records that duplicate on those fields before migration or the next payroll run.

What the HR duplicate finder does

The complete control set. One key column per pass, one checkbox, flag-only output. No fuzzy matching, no multi-column key, no auto-removal.

Control	Behaviour	Default
Find duplicates in column	Single key column (`Employee ID`, then `NI Number`, then `Email`); values grouped to find repeats	First column
Case-sensitive matching	Off lowercases before comparing (`EMP-1` = `emp-1`); on requires identical casing	Off
`_is_duplicate` column	`YES` if the identifier appears 2+ times, `NO` if once; first occurrence is `YES` too	Always added
Removal	None — records are flagged for HR review, never deleted. Use csv-deduplicator to drop surplus rows	Zero removed

Identifier passes and what each catches

Run separate passes since the key is one column. Normalise formats first so genuine duplicates aren't split by formatting differences.

Pass	Key column	Catches	Pre-step / note
1	`Employee ID`	Same ID entered twice — the clearest duplicate-record signal	Confirm casing policy; trim whitespace
2	`NI Number`	Same person under two Employee IDs (e.g. a rehire)	Normalise spacing — `AB 12 34 56 C` vs `AB123456C` differ
3	`Email`	Same work email reused across records	Case-insensitive (default) is correct for email

Cookbook

Before/after rows from HR/payroll exports. Names, IDs, and NI numbers anonymised; the _is_duplicate column is exactly what the tool appends.

Same Employee ID entered twice

Example

A sync error created the same Employee ID twice with conflicting departments. Selecting Employee ID marks both YES, including the first, so the full record set is preserved for the audit.

Input (employees.csv):
Employee ID,Name,Department
EMP-100,A. Patel,Finance
EMP-101,B. Cole,Sales
EMP-100,A. Patel,Operations

Key column: Employee ID  ·  Case-sensitive: off

Output (employees.duplicates-marked.csv):
Employee ID,Name,Department,_is_duplicate
EMP-100,A. Patel,Finance,YES
EMP-101,B. Cole,Sales,NO
EMP-100,A. Patel,Operations,YES

Rehire detected on NI number, not Employee ID

Example

A rehired employee got a new Employee ID, so the ID pass misses them — but the NI-number pass catches both records as one person. Compare start dates in the flagged rows to confirm it's a rehire.

Input:
Employee ID,NI Number,Start Date
EMP-200,AB123456C,2021-03-01
EMP-450,AB123456C,2026-01-15

Key column: NI Number  ·  Case-sensitive: off

Output:
Employee ID,NI Number,Start Date,_is_duplicate
EMP-200,AB123456C,2021-03-01,YES
EMP-450,AB123456C,2026-01-15,YES

NI number spacing hides a duplicate

Example

One record stored the NI number with spaces, the other without. Whole-cell matching treats them as different, so the pair is not flagged until you normalise the format.

Input:
Employee ID,NI Number
EMP-300,AB 12 34 56 C
EMP-301,AB123456C

Key column: NI Number  ·  Case-sensitive: off

Output (missed — text differs):
Employee ID,NI Number,_is_duplicate
EMP-300,AB 12 34 56 C,NO
EMP-301,AB123456C,NO

Fix: strip spaces with csv-find-replace, then re-run.

Dual-role employee — one record per role

Example

An employee holds two roles, exported as two rows sharing the Employee ID. The tool flags both; the audit decision is whether to keep both rows (dual-role design) or merge.

Input:
Employee ID,Role,FTE
EMP-500,Nurse,0.6
EMP-500,Trainer,0.4

Key column: Employee ID  ·  Case-sensitive: off

Output:
Employee ID,Role,FTE,_is_duplicate
EMP-500,Nurse,0.6,YES
EMP-500,Trainer,0.4,YES

Reading the GDPR exposure summary

Example

For a 480-row HR export with 9 Employee IDs duplicated, the summary quantifies how many surplus records exist before migration.

Summary after Find duplicates (Employee ID pass):
  Duplicate groups : 9    (Employee IDs that repeat)
  Extra copies     : 11   (surplus records to review)
  Unique values    : 460  (Employee IDs appearing once)

Meaning: 469 distinct IDs across 480 rows; 9 repeat, some
more than twice. 11 surplus records to resolve pre-migration.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

You wanted duplicates removed for migration

By design

This is a flag-only audit tool — it keeps every record and appends _is_duplicate, which is what you want for an HR audit trail. To produce a deduplicated file, use csv-deduplicator after you've reviewed the flags.

NI numbers stored with different spacing

Not matched

AB 12 34 56 C and AB123456C are different text and won't group. Strip spaces with csv-find-replace (and standardise casing) before the NI pass so true duplicates surface.

Rehire creates a legitimate-looking duplicate

Expected

A rehired employee may share an NI number but have a new Employee ID and a later start date. The NI pass flags both; compare the start dates in the marked rows to distinguish a rehire from an erroneous duplicate before resolving.

Dual-role employee flagged as a duplicate

Expected

Two rows sharing an Employee ID for two roles are flagged — correctly, since the ID repeats. Whether to keep both (dual-role) or merge is an HR decision; the tool surfaces the case, it doesn't judge it.

First occurrence marked YES

Expected

Every member of a duplicate group is flagged YES, including the first, so the canonical record stays visible alongside the stray. For surplus-only removal, use csv-deduplicator.

Need to match on Employee ID and NI together

Single key only

Only one key column per pass. Run them separately and reconcile, or build a combined EmployeeID|NI column with csv-column-merger for an exact composite match in one pass.

Blank Employee ID or NI cells

Grouped together

Rows with an empty key value all group under one empty key and are flagged YES together (shown as (empty)). Fill or filter blanks before relying on the audit — missing identifiers are themselves a data-quality issue.

HR export over the free 500-row / 2 MB cap

Upgrade required

Free runs cap at 2 MB and 500 rows; larger payroll exports are blocked with a Pro prompt. Pro raises it to 100 MB / 100,000 rows. Splitting with csv-row-splitter handles a one-off but won't detect duplicates across chunks.

Frequently asked questions

Is this safe to use with employee personal data?

Yes. All parsing and detection run entirely in your browser via PapaParse. Names, Employee IDs, NI numbers, emails, and salaries never reach a server — which supports GDPR data-minimisation since no personal data is transmitted. Only an anonymous usage counter is recorded when signed in, and it can be disabled in settings.

Can I check for duplicates on email and Employee ID separately?

Yes, and you should. The key is a single column, so run the finder once per identifier — Employee ID, then Email, then NI Number — and reconcile the marked files. Each pass catches duplicates the others miss.

What if duplicates exist because an employee was rehired?

A rehire often shares an NI number but has a new Employee ID and a later start date. The NI-number pass flags both records; compare the start dates in the marked rows to confirm it's a rehire rather than an error before deciding how to resolve it.

Does it merge or delete the duplicate records?

Neither. It appends an _is_duplicate column (YES/NO) and keeps every record so HR makes the call. To physically remove duplicates and keep one per group, use csv-deduplicator after review.

Why didn't it flag two records I know are the same person?

Matching is exact text (optionally lowercased). NI numbers stored with different spacing (AB 12 34 56 C vs AB123456C) or a trailing space won't group. Normalise with csv-find-replace or csv-whitespace-trimmer and re-run.

Is the first copy of a duplicate record marked too?

Yes. Every member of a duplicate group is marked YES, including the first, so the canonical record and the stray are both visible. To isolate only the surplus records, filter to YES and exclude the earliest per group, or use the deduplicator.

Can I match on Employee ID and NI number in one pass?

Not directly — the key is one column. Build a combined EmployeeID|NI column with csv-column-merger and key on it for an exact composite match, or run two passes and reconcile.

How large an HR/payroll export can I check?

Free runs handle up to 2 MB and 500 rows; larger files are blocked with a Pro prompt. Pro handles 100 MB and 100,000 rows. For larger datasets, split with csv-row-splitter, accepting that duplicates spanning chunks won't be detected.

What does the output file look like?

Your original CSV with a trailing _is_duplicate column, saved as <yourfile>.duplicates-marked.csv. Filter that column to YES to review the duplicate employee records.

Should I treat NI numbers and emails as case-sensitive?

No — keep case-sensitive matching off for both. Emails are case-insensitive by convention, and NI numbers are conventionally uppercase, so the default lowercased comparison handles any stray casing correctly.

How do I group all flagged records together for HR review?

The marked CSV keeps original order. After download, sort by the _is_duplicate column or by Employee ID with csv-sorter so all YES records sit together in the review sheet.

What do the summary numbers mean for a migration audit?

Duplicate groups = how many identifiers repeat. Extra copies = surplus records you'd resolve before migration. Unique values = records appearing exactly once. Together they size the data-quality work needed before moving to a new HRIS.

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

How to find duplicate employee records in an hr csv

Step 1
Export the employee CSV — Download from your HRIS, payroll system, or HR spreadsheet. A standard comma-delimited export works; the delimiter is auto-detected.
Step 2
Drop the file onto the tool — Parsing runs in your browser via PapaParse. Names, Employee IDs, NI numbers, and salaries never reach a server. Free runs handle up to 2 MB / 500 rows; Pro handles 100 MB / 100,000 rows.
Step 3
Select the key column — In Find duplicates in column, choose Employee ID (or NI Number, Email). One key column per run — start with the strongest unique identifier.
Step 4
Set case sensitivity — Leave Case-sensitive matching off (default) so casing is ignored. NI numbers and emails are best treated case-insensitively. Click Find duplicates.
Step 5
Compare the flagged records — Read the summary cards, then Download Marked CSV and filter _is_duplicate = YES. Compare start dates and roles to identify the canonical record versus a rehire or a dual-role entry.
Step 6
Resolve, then run further passes — Merge or deactivate duplicate records in your HR system. Re-run on NI Number and Email to catch records that duplicate on those fields before migration or the next payroll run.

What the HR duplicate finder does

The complete control set. One key column per pass, one checkbox, flag-only output. No fuzzy matching, no multi-column key, no auto-removal.

Control	Behaviour	Default
Find duplicates in column	Single key column (`Employee ID`, then `NI Number`, then `Email`); values grouped to find repeats	First column
Case-sensitive matching	Off lowercases before comparing (`EMP-1` = `emp-1`); on requires identical casing	Off
`_is_duplicate` column	`YES` if the identifier appears 2+ times, `NO` if once; first occurrence is `YES` too	Always added
Removal	None — records are flagged for HR review, never deleted. Use csv-deduplicator to drop surplus rows	Zero removed

Identifier passes and what each catches

Run separate passes since the key is one column. Normalise formats first so genuine duplicates aren't split by formatting differences.

Pass	Key column	Catches	Pre-step / note
1	`Employee ID`	Same ID entered twice — the clearest duplicate-record signal	Confirm casing policy; trim whitespace
2	`NI Number`	Same person under two Employee IDs (e.g. a rehire)	Normalise spacing — `AB 12 34 56 C` vs `AB123456C` differ
3	`Email`	Same work email reused across records	Case-insensitive (default) is correct for email

Cookbook

Before/after rows from HR/payroll exports. Names, IDs, and NI numbers anonymised; the _is_duplicate column is exactly what the tool appends.

Same Employee ID entered twice

Example

A sync error created the same Employee ID twice with conflicting departments. Selecting Employee ID marks both YES, including the first, so the full record set is preserved for the audit.

Input (employees.csv):
Employee ID,Name,Department
EMP-100,A. Patel,Finance
EMP-101,B. Cole,Sales
EMP-100,A. Patel,Operations

Key column: Employee ID  ·  Case-sensitive: off

Output (employees.duplicates-marked.csv):
Employee ID,Name,Department,_is_duplicate
EMP-100,A. Patel,Finance,YES
EMP-101,B. Cole,Sales,NO
EMP-100,A. Patel,Operations,YES

Rehire detected on NI number, not Employee ID

Example

A rehired employee got a new Employee ID, so the ID pass misses them — but the NI-number pass catches both records as one person. Compare start dates in the flagged rows to confirm it's a rehire.

Input:
Employee ID,NI Number,Start Date
EMP-200,AB123456C,2021-03-01
EMP-450,AB123456C,2026-01-15

Key column: NI Number  ·  Case-sensitive: off

Output:
Employee ID,NI Number,Start Date,_is_duplicate
EMP-200,AB123456C,2021-03-01,YES
EMP-450,AB123456C,2026-01-15,YES

NI number spacing hides a duplicate

Example

One record stored the NI number with spaces, the other without. Whole-cell matching treats them as different, so the pair is not flagged until you normalise the format.

Input:
Employee ID,NI Number
EMP-300,AB 12 34 56 C
EMP-301,AB123456C

Key column: NI Number  ·  Case-sensitive: off

Output (missed — text differs):
Employee ID,NI Number,_is_duplicate
EMP-300,AB 12 34 56 C,NO
EMP-301,AB123456C,NO

Fix: strip spaces with csv-find-replace, then re-run.

Dual-role employee — one record per role

Example

An employee holds two roles, exported as two rows sharing the Employee ID. The tool flags both; the audit decision is whether to keep both rows (dual-role design) or merge.

Input:
Employee ID,Role,FTE
EMP-500,Nurse,0.6
EMP-500,Trainer,0.4

Key column: Employee ID  ·  Case-sensitive: off

Output:
Employee ID,Role,FTE,_is_duplicate
EMP-500,Nurse,0.6,YES
EMP-500,Trainer,0.4,YES

Reading the GDPR exposure summary

Example

For a 480-row HR export with 9 Employee IDs duplicated, the summary quantifies how many surplus records exist before migration.

Summary after Find duplicates (Employee ID pass):
  Duplicate groups : 9    (Employee IDs that repeat)
  Extra copies     : 11   (surplus records to review)
  Unique values    : 460  (Employee IDs appearing once)

Meaning: 469 distinct IDs across 480 rows; 9 repeat, some
more than twice. 11 surplus records to resolve pre-migration.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

You wanted duplicates removed for migration

By design

NI numbers stored with different spacing

Not matched

AB 12 34 56 C and AB123456C are different text and won't group. Strip spaces with csv-find-replace (and standardise casing) before the NI pass so true duplicates surface.

Rehire creates a legitimate-looking duplicate

Expected

Dual-role employee flagged as a duplicate

Expected

First occurrence marked YES

Expected

Every member of a duplicate group is flagged YES, including the first, so the canonical record stays visible alongside the stray. For surplus-only removal, use csv-deduplicator.

Need to match on Employee ID and NI together

Single key only

Only one key column per pass. Run them separately and reconcile, or build a combined EmployeeID|NI column with csv-column-merger for an exact composite match in one pass.

Blank Employee ID or NI cells

Grouped together

HR export over the free 500-row / 2 MB cap

Upgrade required

Frequently asked questions

Is this safe to use with employee personal data?

Can I check for duplicates on email and Employee ID separately?

What if duplicates exist because an employee was rehired?

Does it merge or delete the duplicate records?

Neither. It appends an _is_duplicate column (YES/NO) and keeps every record so HR makes the call. To physically remove duplicates and keep one per group, use csv-deduplicator after review.

Why didn't it flag two records I know are the same person?

Is the first copy of a duplicate record marked too?

Can I match on Employee ID and NI number in one pass?

Not directly — the key is one column. Build a combined EmployeeID|NI column with csv-column-merger and key on it for an exact composite match, or run two passes and reconcile.

How large an HR/payroll export can I check?

What does the output file look like?

Your original CSV with a trailing _is_duplicate column, saved as <yourfile>.duplicates-marked.csv. Filter that column to YES to review the duplicate employee records.

Should I treat NI numbers and emails as case-sensitive?

How do I group all flagged records together for HR review?

The marked CSV keeps original order. After download, sort by the _is_duplicate column or by Employee ID with csv-sorter so all YES records sit together in the review sheet.

What do the summary numbers mean for a migration audit?

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

Find Duplicate Employee Records in an HR CSV

How to find duplicate employee records in an hr csv

What the HR duplicate finder does

Identifier passes and what each catches

Cookbook

Same Employee ID entered twice

Rehire detected on NI number, not Employee ID

NI number spacing hides a duplicate

Dual-role employee — one record per role

Reading the GDPR exposure summary

Errors and edge cases

You wanted duplicates removed for migration

NI numbers stored with different spacing

Rehire creates a legitimate-looking duplicate

Dual-role employee flagged as a duplicate

First occurrence marked YES

Need to match on Employee ID and NI together

Blank Employee ID or NI cells

HR export over the free 500-row / 2 MB cap

Frequently asked questions

Is this safe to use with employee personal data?

Can I check for duplicates on email and Employee ID separately?

What if duplicates exist because an employee was rehired?

Does it merge or delete the duplicate records?

Why didn't it flag two records I know are the same person?

Is the first copy of a duplicate record marked too?

Can I match on Employee ID and NI number in one pass?

How large an HR/payroll export can I check?

What does the output file look like?

Should I treat NI numbers and emails as case-sensitive?

How do I group all flagged records together for HR review?

What do the summary numbers mean for a migration audit?

Privacy first

Related guides

Find Duplicate Employee Records in an HR CSV

How to find duplicate employee records in an hr csv

What the HR duplicate finder does

Identifier passes and what each catches

Cookbook

Same Employee ID entered twice

Rehire detected on NI number, not Employee ID

NI number spacing hides a duplicate

Dual-role employee — one record per role

Reading the GDPR exposure summary

Errors and edge cases

You wanted duplicates removed for migration

NI numbers stored with different spacing

Rehire creates a legitimate-looking duplicate

Dual-role employee flagged as a duplicate

First occurrence marked YES

Need to match on Employee ID and NI together

Blank Employee ID or NI cells

HR export over the free 500-row / 2 MB cap

Frequently asked questions

Is this safe to use with employee personal data?

Can I check for duplicates on email and Employee ID separately?

What if duplicates exist because an employee was rehired?

Does it merge or delete the duplicate records?

Why didn't it flag two records I know are the same person?

Is the first copy of a duplicate record marked too?

Can I match on Employee ID and NI number in one pass?

How large an HR/payroll export can I check?

What does the output file look like?

Should I treat NI numbers and emails as case-sensitive?

How do I group all flagged records together for HR review?

What do the summary numbers mean for a migration audit?

Privacy first

Related guides