Validate an Employee Data CSV Before Workday or BambooHR Import

How to validate an employee data CSV before an HR system import

Step 1
Export the roster or fill the import template — Download the employee roster from your current system, or populate your HR platform's import-template CSV (Workday and BambooHR both publish one). Save from Excel with Save As → 'CSV UTF-8 (Comma delimited)' so accented names survive. The validator reads text CSVs only — convert .xlsx to CSV first.
Step 2
Drop the file onto the validator above — Free tier: 2 MB / 500 rows — enough for a roster of up to ~499 employees plus the header. Pro: 100 MB / 100,000 rows for enterprise headcounts. The dropzone accepts .csv, .tsv, and .txt, and validation runs automatically when the file lands.
Step 3
Read the four summary tiles — Health score, Rows (lines including the header), Columns (the expected width), and Issues (errors + warnings). For HR files, scan both errors (structural blockers) and warnings — a duplicate Employee ID is only a warning here but will hard-fail in Workday, so don't ignore the warning count.
Step 4
Clear errors, then review the key warnings — The Top issues panel shows the first 8 issues. Red error badges (row_width, encoding) block the import; amber warning badges include the ones that matter most for HR: duplicate_key (duplicate employee IDs/emails) and empty_cell (blank required fields). Each line names the column and gives a detail you can act on.
Step 5
Read the Column profile for IDs, emails, and dates — Confirm Employee ID and Email are recognised — they get unique-key checking automatically because of the id/email token in their names. Check Start Date / Date of Birth show as date; a mixed profile means inconsistent date formats. A string profile on a numeric Salary column means non-numeric values are present.
Step 6
Fix in source, download the report, re-validate — Use 'Download report' for a <filename>.health-report.json record, then fix in your spreadsheet: remove duplicate employee rows with csv-deduplicator on the ID column, normalise typo'd codes with csv-find-replace, and re-save as CSV UTF-8. Re-drop to confirm errors is zero and the duplicate-key warnings are gone, then run the HR import.

What the validator checks on an HR CSV

Eight issue types, mapped to HR-import failure modes. Error severity blocks the import; warning severity is a data-quality flag — but for HR files, the duplicate_key and empty_cell warnings are often the ones that hard-fail downstream.

Issue type	Severity	What triggers it	Why HR files hit it
`duplicate_key`	Warning	Same value in a column whose name matches the key heuristic (`Employee ID`, `Email`, names with `id`/`email`/`reference`)	A record imported twice, or two employees sharing an email — Workday rejects duplicate IDs outright
`empty_cell`	Warning	An empty cell in a column that's at least 85% populated	A blank `Employee ID`, `Email`, or `Start Date` in one row of an otherwise-complete roster
`row_width`	Error	A row's column count differs from the header's column count	A copy-paste that gained or lost a column; an unescaped comma in a `Job Title` or `Address` field
`encoding`	Error	Cell contains `\uFFFD` or a control character below `0x20`	A CP1252 Excel save read as UTF-8 — `José`, `Müller`, `Łukasz`, `François` become garbage
`type_mismatch`	Warning	A cell doesn't match the column's dominant `number` / `date` profile (85% dominance)	Text in a numeric `Salary` / `Hours` column; mixed date formats in `Start Date`
`suspicious_value`	Warning	A once-only value substring-similar to a common value, in a 2–12 distinct-value column	`Eng` next to `Eng`, `Sales` next to `Sales`, in a `Department` / `Status` code column
`duplicate_row`	Warning	A whole data row matches an earlier one after trimming whitespace and lowercasing	Merging two department rosters that overlap, duplicating shared staff
`duplicate_header`	Warning	The same header name appears twice in row 1 (case-insensitive)	Two `Email` columns (work + personal both named `Email`), or `Name` and `name`

Which HR columns become unique keys

The unique-key detector matches column names case-insensitively and separator-insensitively. Columns it recognises get duplicate_key checking on top of the whole-row duplicate check.

HR column name	Matched as key?	Why
`Employee ID` / `Emp ID` / `employee_id`	Yes	Contains the `id` token (whole-word/separated match)
`Email` / `Work Email` / `e-mail`	Yes	Matches the `email` / `e-mail` token
`Reference` / `Ref` / `Payroll Ref`	Yes	Matches the `reference` / `ref` token
`Staff Number` / `Employee Number`	Yes	Matches the `employee number` key pattern
`National Insurance No` / `SSN`	No	No matching token — not auto-detected as a key (whole-row duplicate check still applies)
`Department` / `Location` / `Job Title`	No	Categorical, not an identifier — eligible for the suspicious-value scan instead
`First Name` / `Last Name`	No	Names repeat legitimately; not treated as keys
`Salary` / `Hours`	No	Numeric, not an identifier

What the validator catches vs what the HR system catches

Structural pre-flight here; identity and policy validation in Workday/BambooHR. Both layers matter for a clean import.

Check	JAD CSV Validator	Workday / BambooHR import
Blank required field in a mostly-filled column	Yes — `empty_cell` warning before import	Yes — rejects the row, sometimes the file
Duplicate `Employee ID` / `Email`	Yes — `duplicate_key` warning before import	Yes — hard rejection on identity conflict
Row split by a comma in `Address` / `Job Title`	Yes — `row_width` error before import	Yes — misaligned fields, rejected
Encoding mojibake in multilingual names	Yes — `encoding` error before import	Often imports garbled silently
Email is syntactically valid (has `@`, a domain)	No — only duplicates/blanks/type are checked	Yes — format validated by the HR system
Manager ID points to a real employee	No — relational logic out of scope	Yes — org-chart validation
Required field truly missing as a column	No — profiles only present columns	Yes — import mapping requires it
File size cap	Free 2 MB / Pro 100 MB	Varies by HR system / import job

Cookbook

Real HR-roster scenarios mapped to the exact validator output. Each code block is a representative excerpt; column names follow common Workday/BambooHR import conventions. Example values are fictional.

Blank Employee ID in an otherwise-complete roster

Example

The most consequential HR-import error is a missing identifier. Because the Employee ID column is mostly populated (over 85%), a single blank is flagged as an empty_cell warning pointing at the exact row.

Header:
Employee ID,First Name,Last Name,Department

Rows:
E-1001,Ana,Reyes,Engineering
E-1002,Bo,Lim,Sales
,Cara,Okafor,Marketing            <- Employee ID is blank
E-1004,Dev,Shah,Engineering

Validator output:
Row 4, Col 1  Employee ID
  type: empty_cell  severity: warning
  detail: "Cell is empty in a mostly populated column."

Fix: fill the missing ID in source. Empty-cell warnings fire only when
     a column is at least 85% populated, so this catches the one gap
     without flagging genuinely-optional columns.

Duplicate Employee ID from merged rosters

Example

Combining two department exports often duplicates shared staff. Because the column name contains the id token, the repeat is flagged as a unique-key collision.

Header:
Employee ID,Email,Department

Rows:
E-2001,ana@co.com,Engineering
E-2002,bo@co.com,Sales
E-2001,ana@co.com,Platform          <- same employee, two departments

Validator output:
Row 4, Col 1  Employee ID
  type: duplicate_key  severity: warning
  detail: 'Value "E-2001" also appears on row 2.
           This column looks like a unique key.'

Fix: decide which row wins and de-dupe with csv-deduplicator on the
     Employee ID column (keep first or last), or inspect first with
     csv-duplicate-finder to see all the collisions.

Mojibake in multilingual names from a CP1252 save

Example

Employee rosters frequently contain accented and non-Latin names. A CP1252 Excel save read back as UTF-8 turns those into the replacement character, flagged as encoding errors.

Header:
Employee ID,First Name,Last Name

Rows (CP1252 decoded as UTF-8):
E-3001,Jos\uFFFD,Garcia            encoding error (was 'José')
E-3002,\uFFFDukasz,Nowak           encoding error (was 'Łukasz')
E-3003,Fran\uFFFDois,Dubois        encoding error (was 'François')

Validator output:
Row 2, Col 2 (First Name): replacement/control character — encoding problem.
Row 3, Col 2 (First Name): same.
Row 4, Col 2 (First Name): same.

Fix: re-save as 'CSV UTF-8 (Comma delimited)', or re-export from the source
     system in UTF-8. csv-cleaner can normalise the text if you only have
     the corrupted file.

Typo'd department code caught by the suspicious-value scan

Example

A low-cardinality code column (a handful of distinct departments) triggers the suspicious-value scan. A trailing-space or near-duplicate code that appears once, substring-similar to a common code, is flagged.

Header:
Employee ID,Department

Rows (10 rows, Department has 3 common codes + 1 rare):
E-1,ENG
E-2,ENG
E-3,SALES
E-4,MKT
...   (more ENG / SALES / MKT rows)
E-10,ENG                  <- trailing space, appears once
        ^ note the trailing space

Validator output:
Row 10, Col 2  Department
  type: suspicious_value  severity: warning
  detail: 'Rare value "ENG" looks close to common category "ENG".'

Fix: trim with csv-whitespace-trimmer, or fix the specific code with
     csv-find-replace, then re-validate so the value count collapses.

Row-width drift from a copy-paste that lost a column

Example

Pasting a block of cells from another sheet can drop or add a column on some rows. The validator reports the affected rows as row_width errors with the column-count delta.

Header (5 cols):
Employee ID,First Name,Last Name,Department,Start Date

Good row (5 cols):
E-1001,Ana,Reyes,Engineering,2025-02-01

Bad row (4 cols — Department dropped during paste):
E-1002,Bo,Lim,2025-02-03

Validator output:
Row 3, Col 1  (row width)
  type: row_width  severity: error
  detail: "Row has 4 columns; header has 5."

Fix: re-paste the full block, or add the missing Department value so the
     row matches the header width, then re-validate.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

`Employee ID` and `Email` get automatic duplicate-key checking

Supported

The unique-key detector treats any column whose name contains a token like id, email, reference, or ref (whole-word or separated by space/underscore/hyphen) as a unique key, plus patterns like Employee Number. So Employee ID, Emp ID, Work Email, and Payroll Ref all get duplicate_key checking on top of the whole-row check. A duplicate is reported with the row it first appeared on, which is exactly what you need before an HR import that rejects duplicate identities.

National Insurance / SSN columns are NOT auto-detected as keys

Review manually

Columns named National Insurance No, NI Number, or SSN do not match the key heuristic (no id/reference/email token), so they don't get automatic duplicate_key checking — even though they're effectively unique identifiers. If you need duplicate detection on such a column, rename it to include a recognised token (e.g. add Reference) with csv-header-rename before validating, or run csv-duplicate-finder targeting that column directly.

Blank optional fields don't generate noise

By design

The empty_cell warning only fires when a column is at least 85% populated. So a Middle Name, Phone Extension, or Manager column that's blank for many staff won't flood you with warnings — it exceeds the 15% empty threshold and stays quiet. This is what makes the empty-cell check useful for finding the one row missing a required field, rather than reporting every legitimately-blank optional cell across the roster.

Email format itself is not validated

Not detected

The validator checks Email columns for duplicates and blanks, and infers the column type, but it does not verify that each value is a syntactically valid email (an @, a domain, no spaces). A typo'd address like ana@@co or ana co.com will pass the structural check and only fail in the HR system's own validation. If you need format validation, that's a downstream step; the validator's role is shape, duplicates, and blanks.

Mixed date formats in `Start Date` yield a `mixed` profile

Review — mixed profile

If Start Date or Date of Birth mixes ISO (2025-02-01) and slash (01/02/2025) formats, no single format reaches 85% dominance and the column is profiled as mixed. Per-cell type_mismatch warnings won't fire on a mixed column, so the mixed badge is itself the signal. HR systems want one consistent format — normalise to ISO YYYY-MM-DD to avoid day/month ambiguity (especially important across US/UK roster sources) and re-validate.

Manager-ID references aren't checked against the roster

Out of scope

If your file has a Manager ID column pointing at another Employee ID, the validator won't verify that the referenced manager actually exists in the file — it has no relational/org-chart logic. It will, however, flag a Manager ID column for duplicates if its name contains the id token (which may produce expected duplicates, since many staff share a manager — review those). Org-chart integrity is the HR system's job at import.

A required column is missing entirely

Not detected

The validator profiles only the columns present in the file. If your roster lacks a required Start Date or Employee ID column, the validator can't flag the absence — there's no HR schema to compare against. Check your header row against your platform's import-template required fields before importing. The validator confirms the columns you have are clean; matching the required column set is on you.

Roster exceeds the free 500-row limit

Blocked on free tier

Free tier caps at 2 MB and 500 rows, so it handles a roster of up to ~499 employees plus the header. A larger headcount needs Pro (100 MB / 100,000 rows). If you're on free tier and the file is blocked, split it with csv-row-splitter — for example by department or location — and validate each segment, then import the segments separately.

PII never leaves the browser

By design

The file is read with the browser's own File.text() and parsed by PapaParse client-side; names, salaries, IDs, and contact details are never transmitted. The tool is marked 100% client-side, and only an anonymous run counter is recorded for signed-in dashboard stats. For GDPR/PII-sensitive HR data this is the decisive feature — you can validate a full roster without it touching any server, including JAD's.

Only the first 250 issues are reported

By design

The reported issue list is capped at 250 to keep the browser responsive; the Issues tile shows that capped count. A systemic problem — every name corrupted by a bad encoding, or a wrong delimiter — can exceed 250. Fix the single root cause and re-validate; the count collapses. The downloaded JSON report holds the same capped list, so for very large counts the workflow is fix-and-re-run, not paging through 250 entries.

Frequently asked questions

Will it tell me which rows have a blank Employee ID?

Yes, as long as the Employee ID column is mostly populated. The empty_cell warning fires when a column is at least 85% filled and a cell is blank — so in a normal roster where almost every row has an ID, the one missing one is flagged with its exact row number. If the column were mostly empty (over 15% blank) the check wouldn't fire, on the assumption it's an optional field, but that's rarely the case for an identifier column.

Can it find duplicate employee records?

Yes. Columns named like Employee ID, Email, Reference, or anything containing the id/email/reference token are treated as unique keys, so a repeated value is flagged as duplicate_key with the row it first appeared on. It also flags whole-row duplicates. To actually remove duplicates, use csv-deduplicator on the ID column; to review them first, use csv-duplicate-finder.

Is sensitive HR data uploaded anywhere?

No. The file is read with the browser's File.text() and parsed entirely client-side by PapaParse — employee names, salaries, IDs, and contact details never reach a JAD server. The tool is marked 100% client-side, and only an anonymous usage counter is recorded for signed-in dashboard stats, never the file contents. For PII-bearing HR rosters, that on-device guarantee is the main reason to use a browser-local validator.

Does it check that emails are valid?

Not the format. The validator checks Email columns for duplicates and blanks and infers the column type, but it does not verify that each value is a syntactically valid email address (presence of @, a domain, no spaces). A malformed address will pass the structural check and only fail in your HR system's own validation. Use the validator to catch missing and duplicate emails; rely on the HR import for format enforcement.

My SSN / National Insurance column isn't being checked for duplicates — why?

Because its name doesn't contain a token the unique-key detector recognises (id, email, reference, ref, sku, etc.). Columns like SSN or National Insurance No are effectively unique but aren't auto-detected. To get duplicate checking, either rename the column to include a recognised token using csv-header-rename, or run csv-duplicate-finder directly on that column. The whole-row duplicate check still applies regardless.

Can it catch a typo in a department code?

Often, yes, through the suspicious-value scan. On a low-cardinality column (a string/mixed column with 2–12 distinct values), a value that appears once and is substring-similar to a value appearing two-plus times is flagged — the classic case is a trailing space, like Eng next to Eng. It won't catch a wholly different typo with no overlap (e.g. Enginering vs Engineering differ by a missing letter but still might not substring-match), so treat it as a high-precision tip-off, not exhaustive validation.

Does it detect a missing required column, like a deleted Start Date?

No. The validator profiles only the columns present in the file; there's no HR schema to compare against, so a deleted column simply isn't reported. You're responsible for matching your platform's import-template required fields — compare your header row against Workday's or BambooHR's template before importing. The validator's job is the content and shape of the columns you have, not which columns should exist.

What about date columns like Start Date or Date of Birth?

The validator classifies dates as a type — it recognises ISO (2025-02-01), slash (01/02/2025), and month-name (Feb 1, 2025) formats. If a date column uses a single consistent format, it's profiled as date. If it mixes formats, no format reaches 85% dominance and the column becomes mixed, which is your cue to normalise. HR systems want one consistent format; ISO YYYY-MM-DD is safest because it avoids day/month ambiguity across US/UK sources.

Can I download the report for compliance records?

Yes. 'Download report' saves a <filename>.health-report.json with the full stats, per-column profile, every reported issue (up to 250) with row/column references, the health score, and a 10-row preview. For HR workflows you can keep it alongside the import file as evidence of the file's structural state at import time — useful for audit trails. The issue list in the file matches the capped-at-250 list shown in the UI.

How many employees can I validate at once?

Free tier handles up to 2 MB and 500 rows — roughly 499 employees plus the header. Pro raises that to 100 MB and 100,000 rows for enterprise headcounts. If a roster exceeds the free limit, split it with csv-row-splitter — by department or location, say — and validate each segment, then import the segments separately into your HR system.

Does it read .xlsx files directly?

No. The validator reads text formats only — .csv, .tsv, and .txt — via the browser's text decoder; an .xlsx workbook is binary and would produce garbage or a parse failure. Save your roster to CSV first (CSV UTF-8 recommended so accented names survive), then validate the saved file. Do the spreadsheet-to-CSV export step in Excel or your source HR system before bringing the file here.

What does the health score mean, and should I trust it?

It's round((1 - reportedIssues / totalCells) * 100) — the share of cells that are clean. It's a useful at-a-glance density measure, but for HR files don't trust it alone: a high score can still include a duplicate Employee ID or a blank required field that will hard-fail in Workday. The score badge turns green at 90% and amber below, but always read the Top issues panel and confirm the duplicate-key and empty-cell warnings are resolved before importing.

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

How to validate an employee data CSV before an HR system import

Step 1
Export the roster or fill the import template — Download the employee roster from your current system, or populate your HR platform's import-template CSV (Workday and BambooHR both publish one). Save from Excel with Save As → 'CSV UTF-8 (Comma delimited)' so accented names survive. The validator reads text CSVs only — convert .xlsx to CSV first.
Step 2
Drop the file onto the validator above — Free tier: 2 MB / 500 rows — enough for a roster of up to ~499 employees plus the header. Pro: 100 MB / 100,000 rows for enterprise headcounts. The dropzone accepts .csv, .tsv, and .txt, and validation runs automatically when the file lands.
Step 3
Read the four summary tiles — Health score, Rows (lines including the header), Columns (the expected width), and Issues (errors + warnings). For HR files, scan both errors (structural blockers) and warnings — a duplicate Employee ID is only a warning here but will hard-fail in Workday, so don't ignore the warning count.
Step 4
Clear errors, then review the key warnings — The Top issues panel shows the first 8 issues. Red error badges (row_width, encoding) block the import; amber warning badges include the ones that matter most for HR: duplicate_key (duplicate employee IDs/emails) and empty_cell (blank required fields). Each line names the column and gives a detail you can act on.
Step 5
Read the Column profile for IDs, emails, and dates — Confirm Employee ID and Email are recognised — they get unique-key checking automatically because of the id/email token in their names. Check Start Date / Date of Birth show as date; a mixed profile means inconsistent date formats. A string profile on a numeric Salary column means non-numeric values are present.
Step 6
Fix in source, download the report, re-validate — Use 'Download report' for a <filename>.health-report.json record, then fix in your spreadsheet: remove duplicate employee rows with csv-deduplicator on the ID column, normalise typo'd codes with csv-find-replace, and re-save as CSV UTF-8. Re-drop to confirm errors is zero and the duplicate-key warnings are gone, then run the HR import.

What the validator checks on an HR CSV

Issue type	Severity	What triggers it	Why HR files hit it
`duplicate_key`	Warning	Same value in a column whose name matches the key heuristic (`Employee ID`, `Email`, names with `id`/`email`/`reference`)	A record imported twice, or two employees sharing an email — Workday rejects duplicate IDs outright
`empty_cell`	Warning	An empty cell in a column that's at least 85% populated	A blank `Employee ID`, `Email`, or `Start Date` in one row of an otherwise-complete roster
`row_width`	Error	A row's column count differs from the header's column count	A copy-paste that gained or lost a column; an unescaped comma in a `Job Title` or `Address` field
`encoding`	Error	Cell contains `\uFFFD` or a control character below `0x20`	A CP1252 Excel save read as UTF-8 — `José`, `Müller`, `Łukasz`, `François` become garbage
`type_mismatch`	Warning	A cell doesn't match the column's dominant `number` / `date` profile (85% dominance)	Text in a numeric `Salary` / `Hours` column; mixed date formats in `Start Date`
`suspicious_value`	Warning	A once-only value substring-similar to a common value, in a 2–12 distinct-value column	`Eng` next to `Eng`, `Sales` next to `Sales`, in a `Department` / `Status` code column
`duplicate_row`	Warning	A whole data row matches an earlier one after trimming whitespace and lowercasing	Merging two department rosters that overlap, duplicating shared staff
`duplicate_header`	Warning	The same header name appears twice in row 1 (case-insensitive)	Two `Email` columns (work + personal both named `Email`), or `Name` and `name`

Which HR columns become unique keys

The unique-key detector matches column names case-insensitively and separator-insensitively. Columns it recognises get duplicate_key checking on top of the whole-row duplicate check.

HR column name	Matched as key?	Why
`Employee ID` / `Emp ID` / `employee_id`	Yes	Contains the `id` token (whole-word/separated match)
`Email` / `Work Email` / `e-mail`	Yes	Matches the `email` / `e-mail` token
`Reference` / `Ref` / `Payroll Ref`	Yes	Matches the `reference` / `ref` token
`Staff Number` / `Employee Number`	Yes	Matches the `employee number` key pattern
`National Insurance No` / `SSN`	No	No matching token — not auto-detected as a key (whole-row duplicate check still applies)
`Department` / `Location` / `Job Title`	No	Categorical, not an identifier — eligible for the suspicious-value scan instead
`First Name` / `Last Name`	No	Names repeat legitimately; not treated as keys
`Salary` / `Hours`	No	Numeric, not an identifier

What the validator catches vs what the HR system catches

Structural pre-flight here; identity and policy validation in Workday/BambooHR. Both layers matter for a clean import.

Check	JAD CSV Validator	Workday / BambooHR import
Blank required field in a mostly-filled column	Yes — `empty_cell` warning before import	Yes — rejects the row, sometimes the file
Duplicate `Employee ID` / `Email`	Yes — `duplicate_key` warning before import	Yes — hard rejection on identity conflict
Row split by a comma in `Address` / `Job Title`	Yes — `row_width` error before import	Yes — misaligned fields, rejected
Encoding mojibake in multilingual names	Yes — `encoding` error before import	Often imports garbled silently
Email is syntactically valid (has `@`, a domain)	No — only duplicates/blanks/type are checked	Yes — format validated by the HR system
Manager ID points to a real employee	No — relational logic out of scope	Yes — org-chart validation
Required field truly missing as a column	No — profiles only present columns	Yes — import mapping requires it
File size cap	Free 2 MB / Pro 100 MB	Varies by HR system / import job

Cookbook

Blank Employee ID in an otherwise-complete roster

Example

Header:
Employee ID,First Name,Last Name,Department

Rows:
E-1001,Ana,Reyes,Engineering
E-1002,Bo,Lim,Sales
,Cara,Okafor,Marketing            <- Employee ID is blank
E-1004,Dev,Shah,Engineering

Validator output:
Row 4, Col 1  Employee ID
  type: empty_cell  severity: warning
  detail: "Cell is empty in a mostly populated column."

Fix: fill the missing ID in source. Empty-cell warnings fire only when
     a column is at least 85% populated, so this catches the one gap
     without flagging genuinely-optional columns.

Duplicate Employee ID from merged rosters

Example

Combining two department exports often duplicates shared staff. Because the column name contains the id token, the repeat is flagged as a unique-key collision.

Header:
Employee ID,Email,Department

Rows:
E-2001,ana@co.com,Engineering
E-2002,bo@co.com,Sales
E-2001,ana@co.com,Platform          <- same employee, two departments

Validator output:
Row 4, Col 1  Employee ID
  type: duplicate_key  severity: warning
  detail: 'Value "E-2001" also appears on row 2.
           This column looks like a unique key.'

Fix: decide which row wins and de-dupe with csv-deduplicator on the
     Employee ID column (keep first or last), or inspect first with
     csv-duplicate-finder to see all the collisions.

Mojibake in multilingual names from a CP1252 save

Example

Employee rosters frequently contain accented and non-Latin names. A CP1252 Excel save read back as UTF-8 turns those into the replacement character, flagged as encoding errors.

Header:
Employee ID,First Name,Last Name

Rows (CP1252 decoded as UTF-8):
E-3001,Jos\uFFFD,Garcia            encoding error (was 'José')
E-3002,\uFFFDukasz,Nowak           encoding error (was 'Łukasz')
E-3003,Fran\uFFFDois,Dubois        encoding error (was 'François')

Validator output:
Row 2, Col 2 (First Name): replacement/control character — encoding problem.
Row 3, Col 2 (First Name): same.
Row 4, Col 2 (First Name): same.

Fix: re-save as 'CSV UTF-8 (Comma delimited)', or re-export from the source
     system in UTF-8. csv-cleaner can normalise the text if you only have
     the corrupted file.

Typo'd department code caught by the suspicious-value scan

Example

Header:
Employee ID,Department

Rows (10 rows, Department has 3 common codes + 1 rare):
E-1,ENG
E-2,ENG
E-3,SALES
E-4,MKT
...   (more ENG / SALES / MKT rows)
E-10,ENG                  <- trailing space, appears once
        ^ note the trailing space

Validator output:
Row 10, Col 2  Department
  type: suspicious_value  severity: warning
  detail: 'Rare value "ENG" looks close to common category "ENG".'

Fix: trim with csv-whitespace-trimmer, or fix the specific code with
     csv-find-replace, then re-validate so the value count collapses.

Row-width drift from a copy-paste that lost a column

Example

Pasting a block of cells from another sheet can drop or add a column on some rows. The validator reports the affected rows as row_width errors with the column-count delta.

Header (5 cols):
Employee ID,First Name,Last Name,Department,Start Date

Good row (5 cols):
E-1001,Ana,Reyes,Engineering,2025-02-01

Bad row (4 cols — Department dropped during paste):
E-1002,Bo,Lim,2025-02-03

Validator output:
Row 3, Col 1  (row width)
  type: row_width  severity: error
  detail: "Row has 4 columns; header has 5."

Fix: re-paste the full block, or add the missing Department value so the
     row matches the header width, then re-validate.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

`Employee ID` and `Email` get automatic duplicate-key checking

Supported

National Insurance / SSN columns are NOT auto-detected as keys

Review manually

Blank optional fields don't generate noise

By design

Email format itself is not validated

Not detected

Mixed date formats in `Start Date` yield a `mixed` profile

Review — mixed profile

Manager-ID references aren't checked against the roster

Out of scope

A required column is missing entirely

Not detected

Roster exceeds the free 500-row limit

Blocked on free tier

PII never leaves the browser

By design

Only the first 250 issues are reported

By design

Frequently asked questions

Will it tell me which rows have a blank Employee ID?

Can it find duplicate employee records?

Is sensitive HR data uploaded anywhere?

Does it check that emails are valid?

My SSN / National Insurance column isn't being checked for duplicates — why?

Can it catch a typo in a department code?

Does it detect a missing required column, like a deleted Start Date?

What about date columns like Start Date or Date of Birth?

Can I download the report for compliance records?

How many employees can I validate at once?

Does it read .xlsx files directly?

What does the health score mean, and should I trust it?

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

Validate an Employee Data CSV

How to validate an employee data CSV before an HR system import

What the validator checks on an HR CSV

Which HR columns become unique keys

What the validator catches vs what the HR system catches

Cookbook

Blank Employee ID in an otherwise-complete roster

Duplicate Employee ID from merged rosters

Mojibake in multilingual names from a CP1252 save

Typo'd department code caught by the suspicious-value scan

Row-width drift from a copy-paste that lost a column

Errors and edge cases

`Employee ID` and `Email` get automatic duplicate-key checking

National Insurance / SSN columns are NOT auto-detected as keys

Blank optional fields don't generate noise

Email format itself is not validated

Mixed date formats in `Start Date` yield a `mixed` profile

Manager-ID references aren't checked against the roster

A required column is missing entirely

Roster exceeds the free 500-row limit

PII never leaves the browser

Only the first 250 issues are reported

Frequently asked questions

Will it tell me which rows have a blank Employee ID?

Can it find duplicate employee records?

Is sensitive HR data uploaded anywhere?

Does it check that emails are valid?

My SSN / National Insurance column isn't being checked for duplicates — why?

Can it catch a typo in a department code?

Does it detect a missing required column, like a deleted Start Date?

What about date columns like Start Date or Date of Birth?

Can I download the report for compliance records?

How many employees can I validate at once?

Does it read .xlsx files directly?

What does the health score mean, and should I trust it?

Privacy first

Related guides

Validate an Employee Data CSV

How to validate an employee data CSV before an HR system import

What the validator checks on an HR CSV

Which HR columns become unique keys

What the validator catches vs what the HR system catches

Cookbook

Blank Employee ID in an otherwise-complete roster

Duplicate Employee ID from merged rosters

Mojibake in multilingual names from a CP1252 save

Typo'd department code caught by the suspicious-value scan

Row-width drift from a copy-paste that lost a column

Errors and edge cases

`Employee ID` and `Email` get automatic duplicate-key checking

National Insurance / SSN columns are NOT auto-detected as keys

Blank optional fields don't generate noise

Email format itself is not validated

Mixed date formats in `Start Date` yield a `mixed` profile

Manager-ID references aren't checked against the roster

A required column is missing entirely

Roster exceeds the free 500-row limit

PII never leaves the browser

Only the first 250 issues are reported

Frequently asked questions

Will it tell me which rows have a blank Employee ID?

Can it find duplicate employee records?

Is sensitive HR data uploaded anywhere?

Does it check that emails are valid?

My SSN / National Insurance column isn't being checked for duplicates — why?

Can it catch a typo in a department code?

Does it detect a missing required column, like a deleted Start Date?

What about date columns like Start Date or Date of Birth?

Can I download the report for compliance records?

How many employees can I validate at once?

Does it read .xlsx files directly?

What does the health score mean, and should I trust it?

Privacy first

Related guides