How to create qa test fixtures from a production csv
- Step 1Export real production data — Download a representative export from your system. The closer to real, the more useful the fixture — production exports carry the actual edge cases your code must handle.
- Step 2Drop it onto the Row Limiter — It reads locally; nothing uploads. Free accepts files up to 2 MB, Pro up to 100 MB. Larger files are blocked at drop before parsing.
- Step 3Set Row limit to 20–50 — Small fixtures keep CI fast and diffs reviewable (minimum
1). For broader coverage, take a second slice with an offset rather than one large fixture. - Step 4Use Row offset to capture different data shapes — Offset 0 gives the first rows; a second run at, say, offset 1000 captures a different region of the file if the data shape varies along its length.
- Step 5Scrub PII with Column Remover — The Row Limiter does not remove or mask columns. Run the slice through csv-column-remover to drop name, email, and other sensitive columns before committing.
- Step 6Commit the fixture and reference it in tests — Add the small
<name>.rows-1-N.csvto your fixtures folder and point your test loader at it. Because the slice is reproducible, the fixture is stable across regenerations.
Fixture-building recipes
How to set the two controls for common fixture needs. Counts are data rows; the header is added automatically and stays uncounted.
| Fixture goal | Row limit | Row offset | Result |
|---|---|---|---|
| Minimal smoke fixture | 20 | 0 | First 20 data rows — fast CI, tiny diff |
| Standard fixture | 50 | 0 | First 50 rows — broad enough for most parser tests |
| Second-region fixture | 50 | 1000 | Rows 1001–1050 — captures a different data shape further into the file |
| Edge-case carve-out | 10 | <position of the rows> | Set offset to land on a known cluster of tricky rows (after sorting/filtering them to a known position) |
What the tool does — and what to chain for the rest
The Row Limiter only selects a contiguous block of rows. Scrubbing, validating, and reshaping are separate sibling tools.
| Need | In this tool? | Use instead |
|---|---|---|
| Slice 20–50 rows | Yes | Row limit + offset |
| Remove PII columns | No | csv-column-remover after slicing |
| Random / representative sample | No | Sort or filter to position the rows, then slice |
| Validate fixture shape | No | csv-validator to confirm column counts |
| De-duplicate first | No | csv-deduplicator before slicing |
| Preserve header + columns | Yes | Header always kept; columns untouched |
Tier limits for fixture creation
Row Limiter is a Pro tool with a free allowance. Fixtures are small, so free usually suffices. The free cap applies to OUTPUT rows. From lib/tier-limits.ts.
| Tier | Max input file | Max output rows | Note |
|---|---|---|---|
| Free | 2 MB | 500 rows | Comfortably covers 20–500 row fixtures, provided the source export is under 2 MB |
| Pro | 100 MB | 100,000 rows | Needed only when the production export itself exceeds 2 MB |
Cookbook
Fixture slices using the two real controls, plus the scrub step you'll usually pair with them. Data is illustrative.
20-row smoke fixture
ExampleThe default fixture: header plus the first 20 data rows. Small enough to read in a code review, real enough to exercise the parser.
Source: prod_users.csv (header + 4,000 rows) Row limit: 20 Row offset: 0 Output (prod_users.rows-1-20.csv): id,name,email,plan,created_at 1,Ada,ada@x.com,pro,2026-01-04 ... (19 more) Stats: Total rows in 4000 · Rows out 20 · Rows skipped 3980
Slice then scrub PII before committing
ExampleThe Row Limiter keeps every column. Drop the personal-data columns with Column Remover before the fixture goes into version control.
Step 1 — csv-row-limiter: Row limit 30, Row offset 0 Step 2 — csv-column-remover: drop name, email Result committed: id,plan,created_at 1,pro,2026-01-04 2,free,2026-01-05 ... No PII in the repo; column shape still realistic.
Second fixture from a different region
ExampleIf the data shape varies along the file (e.g. older rows lack a column added later), use an offset to capture a fixture from deeper in the file.
Fixture A: Row limit 30, Row offset 0 → newest schema
Fixture B: Row limit 30, Row offset 3000 → older rows that
may have empty/legacy columns
Two committed fixtures cover both schema variants.Reproducible regeneration
ExampleDocument the slice parameters so anyone can regenerate the exact fixture from a fresh export — the slice is deterministic.
# fixtures/README source: prod_users export slice: Row limit 30, Row offset 0 file: prod_users.rows-1-30.csv Re-run on a new export with the same limit/offset to refresh the fixture deterministically (then re-scrub PII).
Validate the fixture shape after slicing
ExampleSlicing can't tell you if a row is malformed. Run the fixture through the validator so a ragged row (wrong column count) is caught before it's committed.
Step 1 — csv-row-limiter: Row limit 50, Row offset 0
Step 2 — csv-validator: confirm consistent column count,
required columns present
Validator clean → commit. Validator flags → pick a cleaner
slice or fix the source.Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Production export exceeds the file-size cap
Blocked at dropA large production dump can exceed 2 MB (free) or 100 MB (Pro) and is blocked at drop before parsing. For free, either export a smaller subset from the source system, or upgrade to Pro for a 100 MB cap.
Fixture would exceed 500 rows on free
Pro requiredFree caps the OUTPUT at 500 rows. Fixtures are normally well under that, but a Row limit above 500 is blocked after processing with a Pro prompt. Keep fixtures small — under 500 rows is also better for diff review and CI speed.
PII columns left in the committed fixture
Scrub requiredThe Row Limiter copies every column verbatim — it has no masking or column-removal feature. Names, emails, and other PII will end up in the repo unless you run the slice through csv-column-remover first. This is the single most common fixture mistake.
Headerless export consumed wrong
First row treated as headerRow 0 is always treated as the header. A headerless export loses its first record to the 'header' slot, so a 20-row fixture would carry 19 records. Add a header row before slicing if the source lacks one.
Slice misses the edge-case rows you wanted
Contiguous onlyThe first N rows may all be clean, vanilla data — the gnarly rows (empty cells, unicode, long values) might live later in the file. The limiter can't cherry-pick them. Sort or filter the tricky rows to the top first, then slice; or use an offset to land on a known cluster.
Blank rows pad the fixture
Counted as rowsWith skipEmptyLines: false, blank lines count toward the slice and end up in the fixture. That can be intentional (testing your parser's empty-row handling) or noise — remove them first with csv-empty-row-remover if unwanted.
Quoted multi-line cell in the fixture
PreservedA properly quoted cell containing a newline is one row, parsed and re-emitted intact. This is great for fixtures that need to test multi-line field handling — the limiter won't split it across rows.
Offset overshoots the data
Empty fixtureA second-region fixture with an offset past the end of the file yields a header-only output (Rows out 0). Check Total rows in against the offset before committing an empty fixture.
Frequently asked questions
How many rows make a good QA fixture?
20–100 is typical — small enough to review in a diff and parse instantly in CI, large enough to exercise the column shapes. For broader coverage, take a second slice with an offset instead of one big fixture.
Does the fixture include a known edge-case row?
Only if it falls within your slice. The limiter takes a contiguous block, so if the edge cases are deeper in the file, sort or filter them to the top first, or add the row manually in an editor after extracting the base rows.
Is it safe to commit production data as a fixture?
Only after scrubbing PII. The Row Limiter copies every column as-is and never uploads your data, but it doesn't mask anything. Run the slice through csv-column-remover to drop sensitive columns before committing.
Is the fixture reproducible across regenerations?
Yes. The slice is deterministic — the same export with the same limit and offset always produces byte-identical rows. Document the parameters so anyone can refresh the fixture from a new export (then re-scrub).
Does the fixture keep the header and column order?
Yes. The header is always row 0 and is kept first, uncounted by the limit. Columns and their order are untouched, so the fixture's schema matches production.
Can it remove or mask PII columns for me?
No — that's out of scope. Use csv-column-remover to drop columns, or an anonymizer/cleaner step, after slicing.
Can I take a representative random sample?
No. The output is a contiguous block in file order. For a varied sample, sort with csv-sorter or filter with csv-column-filter to arrange the rows, then slice.
Will it validate that my fixture rows are well-formed?
No. It only selects rows. Run the fixture through csv-validator to confirm consistent column counts and required columns before committing.
Is my production data uploaded during fixture creation?
No. Parsing and slicing happen entirely in your browser via PapaParse. Only an anonymous run counter is recorded for signed-in dashboard stats.
What does the fixture file get named?
<name>.rows-<start>-<end>.csv — e.g. a 30-row first slice of prod_users.csv downloads as prod_users.rows-1-30.csv. Rename it to your fixture convention after download.
What's the largest fixture I can build?
500 output rows on free, 100,000 on Pro. For fixtures you'll rarely need more than a few dozen — large fixtures slow CI and bloat diffs.
How do I cover two schema variants in fixtures?
Take two slices: one at offset 0 (newest schema) and one with an offset deep into the file (older rows, possibly with legacy/empty columns). Commit both and reference each in the relevant tests.
Privacy first
Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.