How to validate an employee data CSV before an HR system import
- Step 1Export the roster or fill the import template — Download the employee roster from your current system, or populate your HR platform's import-template CSV (Workday and BambooHR both publish one). Save from Excel with Save As → 'CSV UTF-8 (Comma delimited)' so accented names survive. The validator reads text CSVs only — convert
.xlsxto CSV first. - Step 2Drop the file onto the validator above — Free tier: 2 MB / 500 rows — enough for a roster of up to ~499 employees plus the header. Pro: 100 MB / 100,000 rows for enterprise headcounts. The dropzone accepts
.csv,.tsv, and.txt, and validation runs automatically when the file lands. - Step 3Read the four summary tiles — Health score, Rows (lines including the header), Columns (the expected width), and Issues (errors + warnings). For HR files, scan both
errors(structural blockers) andwarnings— a duplicateEmployee IDis only a warning here but will hard-fail in Workday, so don't ignore the warning count. - Step 4Clear errors, then review the key warnings — The Top issues panel shows the first 8 issues. Red
errorbadges (row_width,encoding) block the import; amberwarningbadges include the ones that matter most for HR:duplicate_key(duplicate employee IDs/emails) andempty_cell(blank required fields). Each line names the column and gives a detail you can act on. - Step 5Read the Column profile for IDs, emails, and dates — Confirm
Employee IDandEmailare recognised — they get unique-key checking automatically because of theid/emailtoken in their names. CheckStart Date/Date of Birthshow asdate; amixedprofile means inconsistent date formats. Astringprofile on a numericSalarycolumn means non-numeric values are present. - Step 6Fix in source, download the report, re-validate — Use 'Download report' for a
<filename>.health-report.jsonrecord, then fix in your spreadsheet: remove duplicate employee rows with csv-deduplicator on the ID column, normalise typo'd codes with csv-find-replace, and re-save as CSV UTF-8. Re-drop to confirmerrorsis zero and the duplicate-key warnings are gone, then run the HR import.
What the validator checks on an HR CSV
Eight issue types, mapped to HR-import failure modes. Error severity blocks the import; warning severity is a data-quality flag — but for HR files, the duplicate_key and empty_cell warnings are often the ones that hard-fail downstream.
| Issue type | Severity | What triggers it | Why HR files hit it |
|---|---|---|---|
| `duplicate_key` | Warning | Same value in a column whose name matches the key heuristic (Employee ID, Email, names with id/email/reference) | A record imported twice, or two employees sharing an email — Workday rejects duplicate IDs outright |
| `empty_cell` | Warning | An empty cell in a column that's at least 85% populated | A blank Employee ID, Email, or Start Date in one row of an otherwise-complete roster |
| `row_width` | Error | A row's column count differs from the header's column count | A copy-paste that gained or lost a column; an unescaped comma in a Job Title or Address field |
| `encoding` | Error | Cell contains \uFFFD or a control character below 0x20 | A CP1252 Excel save read as UTF-8 — José, Müller, Łukasz, François become garbage |
| `type_mismatch` | Warning | A cell doesn't match the column's dominant number / date profile (85% dominance) | Text in a numeric Salary / Hours column; mixed date formats in Start Date |
| `suspicious_value` | Warning | A once-only value substring-similar to a common value, in a 2–12 distinct-value column | Eng next to Eng, Sales next to Sales, in a Department / Status code column |
| `duplicate_row` | Warning | A whole data row matches an earlier one after trimming whitespace and lowercasing | Merging two department rosters that overlap, duplicating shared staff |
| `duplicate_header` | Warning | The same header name appears twice in row 1 (case-insensitive) | Two Email columns (work + personal both named Email), or Name and name |
Which HR columns become unique keys
The unique-key detector matches column names case-insensitively and separator-insensitively. Columns it recognises get duplicate_key checking on top of the whole-row duplicate check.
| HR column name | Matched as key? | Why |
|---|---|---|
Employee ID / Emp ID / employee_id | Yes | Contains the id token (whole-word/separated match) |
Email / Work Email / e-mail | Yes | Matches the email / e-mail token |
Reference / Ref / Payroll Ref | Yes | Matches the reference / ref token |
Staff Number / Employee Number | Yes | Matches the employee number key pattern |
National Insurance No / SSN | No | No matching token — not auto-detected as a key (whole-row duplicate check still applies) |
Department / Location / Job Title | No | Categorical, not an identifier — eligible for the suspicious-value scan instead |
First Name / Last Name | No | Names repeat legitimately; not treated as keys |
Salary / Hours | No | Numeric, not an identifier |
What the validator catches vs what the HR system catches
Structural pre-flight here; identity and policy validation in Workday/BambooHR. Both layers matter for a clean import.
| Check | JAD CSV Validator | Workday / BambooHR import |
|---|---|---|
| Blank required field in a mostly-filled column | Yes — empty_cell warning before import | Yes — rejects the row, sometimes the file |
Duplicate Employee ID / Email | Yes — duplicate_key warning before import | Yes — hard rejection on identity conflict |
Row split by a comma in Address / Job Title | Yes — row_width error before import | Yes — misaligned fields, rejected |
| Encoding mojibake in multilingual names | Yes — encoding error before import | Often imports garbled silently |
Email is syntactically valid (has @, a domain) | No — only duplicates/blanks/type are checked | Yes — format validated by the HR system |
| Manager ID points to a real employee | No — relational logic out of scope | Yes — org-chart validation |
| Required field truly missing as a column | No — profiles only present columns | Yes — import mapping requires it |
| File size cap | Free 2 MB / Pro 100 MB | Varies by HR system / import job |
Cookbook
Real HR-roster scenarios mapped to the exact validator output. Each code block is a representative excerpt; column names follow common Workday/BambooHR import conventions. Example values are fictional.
Blank Employee ID in an otherwise-complete roster
ExampleThe most consequential HR-import error is a missing identifier. Because the Employee ID column is mostly populated (over 85%), a single blank is flagged as an empty_cell warning pointing at the exact row.
Header:
Employee ID,First Name,Last Name,Department
Rows:
E-1001,Ana,Reyes,Engineering
E-1002,Bo,Lim,Sales
,Cara,Okafor,Marketing <- Employee ID is blank
E-1004,Dev,Shah,Engineering
Validator output:
Row 4, Col 1 Employee ID
type: empty_cell severity: warning
detail: "Cell is empty in a mostly populated column."
Fix: fill the missing ID in source. Empty-cell warnings fire only when
a column is at least 85% populated, so this catches the one gap
without flagging genuinely-optional columns.Duplicate Employee ID from merged rosters
ExampleCombining two department exports often duplicates shared staff. Because the column name contains the id token, the repeat is flagged as a unique-key collision.
Header:
Employee ID,Email,Department
Rows:
E-2001,ana@co.com,Engineering
E-2002,bo@co.com,Sales
E-2001,ana@co.com,Platform <- same employee, two departments
Validator output:
Row 4, Col 1 Employee ID
type: duplicate_key severity: warning
detail: 'Value "E-2001" also appears on row 2.
This column looks like a unique key.'
Fix: decide which row wins and de-dupe with csv-deduplicator on the
Employee ID column (keep first or last), or inspect first with
csv-duplicate-finder to see all the collisions.Mojibake in multilingual names from a CP1252 save
ExampleEmployee rosters frequently contain accented and non-Latin names. A CP1252 Excel save read back as UTF-8 turns those into the replacement character, flagged as encoding errors.
Header:
Employee ID,First Name,Last Name
Rows (CP1252 decoded as UTF-8):
E-3001,Jos\uFFFD,Garcia encoding error (was 'José')
E-3002,\uFFFDukasz,Nowak encoding error (was 'Łukasz')
E-3003,Fran\uFFFDois,Dubois encoding error (was 'François')
Validator output:
Row 2, Col 2 (First Name): replacement/control character — encoding problem.
Row 3, Col 2 (First Name): same.
Row 4, Col 2 (First Name): same.
Fix: re-save as 'CSV UTF-8 (Comma delimited)', or re-export from the source
system in UTF-8. csv-cleaner can normalise the text if you only have
the corrupted file.Typo'd department code caught by the suspicious-value scan
ExampleA low-cardinality code column (a handful of distinct departments) triggers the suspicious-value scan. A trailing-space or near-duplicate code that appears once, substring-similar to a common code, is flagged.
Header:
Employee ID,Department
Rows (10 rows, Department has 3 common codes + 1 rare):
E-1,ENG
E-2,ENG
E-3,SALES
E-4,MKT
... (more ENG / SALES / MKT rows)
E-10,ENG <- trailing space, appears once
^ note the trailing space
Validator output:
Row 10, Col 2 Department
type: suspicious_value severity: warning
detail: 'Rare value "ENG" looks close to common category "ENG".'
Fix: trim with csv-whitespace-trimmer, or fix the specific code with
csv-find-replace, then re-validate so the value count collapses.Row-width drift from a copy-paste that lost a column
ExamplePasting a block of cells from another sheet can drop or add a column on some rows. The validator reports the affected rows as row_width errors with the column-count delta.
Header (5 cols):
Employee ID,First Name,Last Name,Department,Start Date
Good row (5 cols):
E-1001,Ana,Reyes,Engineering,2025-02-01
Bad row (4 cols — Department dropped during paste):
E-1002,Bo,Lim,2025-02-03
Validator output:
Row 3, Col 1 (row width)
type: row_width severity: error
detail: "Row has 4 columns; header has 5."
Fix: re-paste the full block, or add the missing Department value so the
row matches the header width, then re-validate.Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
`Employee ID` and `Email` get automatic duplicate-key checking
SupportedThe unique-key detector treats any column whose name contains a token like id, email, reference, or ref (whole-word or separated by space/underscore/hyphen) as a unique key, plus patterns like Employee Number. So Employee ID, Emp ID, Work Email, and Payroll Ref all get duplicate_key checking on top of the whole-row check. A duplicate is reported with the row it first appeared on, which is exactly what you need before an HR import that rejects duplicate identities.
National Insurance / SSN columns are NOT auto-detected as keys
Review manuallyColumns named National Insurance No, NI Number, or SSN do not match the key heuristic (no id/reference/email token), so they don't get automatic duplicate_key checking — even though they're effectively unique identifiers. If you need duplicate detection on such a column, rename it to include a recognised token (e.g. add Reference) with csv-header-rename before validating, or run csv-duplicate-finder targeting that column directly.
Blank optional fields don't generate noise
By designThe empty_cell warning only fires when a column is at least 85% populated. So a Middle Name, Phone Extension, or Manager column that's blank for many staff won't flood you with warnings — it exceeds the 15% empty threshold and stays quiet. This is what makes the empty-cell check useful for finding the one row missing a required field, rather than reporting every legitimately-blank optional cell across the roster.
Email format itself is not validated
Not detectedThe validator checks Email columns for duplicates and blanks, and infers the column type, but it does not verify that each value is a syntactically valid email (an @, a domain, no spaces). A typo'd address like ana@@co or ana co.com will pass the structural check and only fail in the HR system's own validation. If you need format validation, that's a downstream step; the validator's role is shape, duplicates, and blanks.
Mixed date formats in `Start Date` yield a `mixed` profile
Review — mixed profileIf Start Date or Date of Birth mixes ISO (2025-02-01) and slash (01/02/2025) formats, no single format reaches 85% dominance and the column is profiled as mixed. Per-cell type_mismatch warnings won't fire on a mixed column, so the mixed badge is itself the signal. HR systems want one consistent format — normalise to ISO YYYY-MM-DD to avoid day/month ambiguity (especially important across US/UK roster sources) and re-validate.
Manager-ID references aren't checked against the roster
Out of scopeIf your file has a Manager ID column pointing at another Employee ID, the validator won't verify that the referenced manager actually exists in the file — it has no relational/org-chart logic. It will, however, flag a Manager ID column for duplicates if its name contains the id token (which may produce expected duplicates, since many staff share a manager — review those). Org-chart integrity is the HR system's job at import.
A required column is missing entirely
Not detectedThe validator profiles only the columns present in the file. If your roster lacks a required Start Date or Employee ID column, the validator can't flag the absence — there's no HR schema to compare against. Check your header row against your platform's import-template required fields before importing. The validator confirms the columns you have are clean; matching the required column set is on you.
Roster exceeds the free 500-row limit
Blocked on free tierFree tier caps at 2 MB and 500 rows, so it handles a roster of up to ~499 employees plus the header. A larger headcount needs Pro (100 MB / 100,000 rows). If you're on free tier and the file is blocked, split it with csv-row-splitter — for example by department or location — and validate each segment, then import the segments separately.
PII never leaves the browser
By designThe file is read with the browser's own File.text() and parsed by PapaParse client-side; names, salaries, IDs, and contact details are never transmitted. The tool is marked 100% client-side, and only an anonymous run counter is recorded for signed-in dashboard stats. For GDPR/PII-sensitive HR data this is the decisive feature — you can validate a full roster without it touching any server, including JAD's.
Only the first 250 issues are reported
By designThe reported issue list is capped at 250 to keep the browser responsive; the Issues tile shows that capped count. A systemic problem — every name corrupted by a bad encoding, or a wrong delimiter — can exceed 250. Fix the single root cause and re-validate; the count collapses. The downloaded JSON report holds the same capped list, so for very large counts the workflow is fix-and-re-run, not paging through 250 entries.
Frequently asked questions
Will it tell me which rows have a blank Employee ID?
Yes, as long as the Employee ID column is mostly populated. The empty_cell warning fires when a column is at least 85% filled and a cell is blank — so in a normal roster where almost every row has an ID, the one missing one is flagged with its exact row number. If the column were mostly empty (over 15% blank) the check wouldn't fire, on the assumption it's an optional field, but that's rarely the case for an identifier column.
Can it find duplicate employee records?
Yes. Columns named like Employee ID, Email, Reference, or anything containing the id/email/reference token are treated as unique keys, so a repeated value is flagged as duplicate_key with the row it first appeared on. It also flags whole-row duplicates. To actually remove duplicates, use csv-deduplicator on the ID column; to review them first, use csv-duplicate-finder.
Is sensitive HR data uploaded anywhere?
No. The file is read with the browser's File.text() and parsed entirely client-side by PapaParse — employee names, salaries, IDs, and contact details never reach a JAD server. The tool is marked 100% client-side, and only an anonymous usage counter is recorded for signed-in dashboard stats, never the file contents. For PII-bearing HR rosters, that on-device guarantee is the main reason to use a browser-local validator.
Does it check that emails are valid?
Not the format. The validator checks Email columns for duplicates and blanks and infers the column type, but it does not verify that each value is a syntactically valid email address (presence of @, a domain, no spaces). A malformed address will pass the structural check and only fail in your HR system's own validation. Use the validator to catch missing and duplicate emails; rely on the HR import for format enforcement.
My SSN / National Insurance column isn't being checked for duplicates — why?
Because its name doesn't contain a token the unique-key detector recognises (id, email, reference, ref, sku, etc.). Columns like SSN or National Insurance No are effectively unique but aren't auto-detected. To get duplicate checking, either rename the column to include a recognised token using csv-header-rename, or run csv-duplicate-finder directly on that column. The whole-row duplicate check still applies regardless.
Can it catch a typo in a department code?
Often, yes, through the suspicious-value scan. On a low-cardinality column (a string/mixed column with 2–12 distinct values), a value that appears once and is substring-similar to a value appearing two-plus times is flagged — the classic case is a trailing space, like Eng next to Eng. It won't catch a wholly different typo with no overlap (e.g. Enginering vs Engineering differ by a missing letter but still might not substring-match), so treat it as a high-precision tip-off, not exhaustive validation.
Does it detect a missing required column, like a deleted Start Date?
No. The validator profiles only the columns present in the file; there's no HR schema to compare against, so a deleted column simply isn't reported. You're responsible for matching your platform's import-template required fields — compare your header row against Workday's or BambooHR's template before importing. The validator's job is the content and shape of the columns you have, not which columns should exist.
What about date columns like Start Date or Date of Birth?
The validator classifies dates as a type — it recognises ISO (2025-02-01), slash (01/02/2025), and month-name (Feb 1, 2025) formats. If a date column uses a single consistent format, it's profiled as date. If it mixes formats, no format reaches 85% dominance and the column becomes mixed, which is your cue to normalise. HR systems want one consistent format; ISO YYYY-MM-DD is safest because it avoids day/month ambiguity across US/UK sources.
Can I download the report for compliance records?
Yes. 'Download report' saves a <filename>.health-report.json with the full stats, per-column profile, every reported issue (up to 250) with row/column references, the health score, and a 10-row preview. For HR workflows you can keep it alongside the import file as evidence of the file's structural state at import time — useful for audit trails. The issue list in the file matches the capped-at-250 list shown in the UI.
How many employees can I validate at once?
Free tier handles up to 2 MB and 500 rows — roughly 499 employees plus the header. Pro raises that to 100 MB and 100,000 rows for enterprise headcounts. If a roster exceeds the free limit, split it with csv-row-splitter — by department or location, say — and validate each segment, then import the segments separately into your HR system.
Does it read .xlsx files directly?
No. The validator reads text formats only — .csv, .tsv, and .txt — via the browser's text decoder; an .xlsx workbook is binary and would produce garbage or a parse failure. Save your roster to CSV first (CSV UTF-8 recommended so accented names survive), then validate the saved file. Do the spreadsheet-to-CSV export step in Excel or your source HR system before bringing the file here.
What does the health score mean, and should I trust it?
It's round((1 - reportedIssues / totalCells) * 100) — the share of cells that are clean. It's a useful at-a-glance density measure, but for HR files don't trust it alone: a high score can still include a duplicate Employee ID or a blank required field that will hard-fail in Workday. The score badge turns green at 90% and amber below, but always read the Top issues panel and confirm the duplicate-key and empty-cell warnings are resolved before importing.
Privacy first
Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.