How to split a transaction csv into processing batches
- Step 1Export the transaction or settlement file as CSV — Download the period's transactions from your processor, gateway, or ledger — Stripe (Payments / Balance export), bank statement CSV, or an accounting-system export. Keep it sorted as your pipeline expects (usually by timestamp).
- Step 2Validate amounts and structure first (recommended) — The splitter copies rows verbatim — it does not check for ragged rows or malformed amounts. Run the csv-validator to catch row-width and encoding problems before splitting, so a bad row does not poison a downstream batch.
- Step 3Drop the file onto the splitter — PapaParse reads it in your browser and auto-detects the delimiter (comma, or semicolon for some EU exports). The first 10 rows preview so you can confirm the header and ordering before splitting.
- Step 4Set Rows per chunk to your batch size — Type the records-per-pass count into the single Rows per chunk field (default 1000, minimum 1). Match it to your pipeline's batch size or your downstream API's per-call record limit.
- Step 5Click Split into chunks — The result panel reports total data rows, chunk count, and rows per chunk, then lists each chunk's
rows X–Yrange — useful for logging exactly which records went to which batch. - Step 6Download and feed each batch to your pipeline — Each chunk has its own Download button (no zip). Files are
settlement.part-N-of-M.csv, header included. Process them in order for sequential reconciliation, or distribute across workers for parallel runs.
Split behaviour for transaction data
One option; the rest is fixed split logic. Confirm it matches your pipeline's batching contract before a production run.
| Behaviour | How it works | Why it matters for transactions |
|---|---|---|
| Rows per chunk (only option) | Number input, min 1, default 1000; consecutive blocks | Match your pipeline's per-pass or per-API-call record limit |
| Header in every batch | Source row 1 copied to each file's top | Guarantees amount/currency/id columns map correctly downstream |
| Order preserved | Records keep source order; no re-sort | Chronological export stays in time order — needed for running balances |
| Deterministic | Same input + chunk size → identical chunks | An audit re-run reproduces the exact same batches |
| Remainder in last file | Final chunk holds leftover records | 500,000 at 50,000/chunk → ten even files |
| No edits / no dedup | Amounts and rows untouched | Reconciliation totals across all chunks equal the source total |
Batch-size maths for settlement files
Worked examples (data rows excluding the header). The row splitter is a Pro tool; Free caps at 500 rows / 2 MB and at the 100,000-row Pro ceiling per file.
| Transactions | Rows per chunk | Batches | Last batch | Note |
|---|---|---|---|---|
| 100,000 | 50,000 | 2 | 50,000 | Even, at the Pro per-file ceiling |
| 100,000 | 10,000 | 10 | 10,000 | Even, ten parallel workers |
| 37,500 | 10,000 | 4 | 7,500 | Remainder in batch 4 |
| 100,000 | 25,000 | 4 | 25,000 | Even |
| 480 | 120 | 4 | 120 | Within Free row cap, but tool is Pro-gated |
Cookbook
Before/after examples from transaction-batching pipelines. Amounts and ids illustrative; the splitter never alters values.
100k settlement into 10 worker batches
ExampleA reconciliation job runs ten parallel workers. Split the day's settlement into ten 10,000-record files, one per worker, each totalling independently.
Input: settlement-2026-06-09.csv (100,000 rows + header) Rows per chunk: 10000 Result panel: Total rows: 100000 Chunks: 10 Rows/chunk: 10000 Part 1 rows 1–10000 … Part 10 rows 90001–100000 Sum of amounts across all 10 batches == source total (the splitter never drops or edits a row).
Header on every batch keeps columns aligned
ExampleIf a batch lost its header, a pipeline reading by position would map amount to the date column. The splitter copies the header into each chunk.
Source: txn_id,date,amount,currency T1,2026-06-09,12.40,USD T2,2026-06-09,99.00,USD T3,2026-06-09,4.10,USD Rows per chunk: 2 → part-1-of-2.csv part-2-of-2.csv txn_id,date,amount,currency txn_id,date,amount,currency T1,2026-06-09,12.40,USD T3,2026-06-09,4.10,USD T2,2026-06-09,99.00,USD
Chronological order preserved
ExampleRunning-balance reconciliation depends on time order. The splitter never re-sorts, so a timestamp-sorted export stays ordered across batches.
Source sorted by timestamp ascending. Rows per chunk: 50000 Part 1 = earliest 50,000 txns (rows 1–50000) Part 2 = next 50,000 (rows 50001–100000) → feed Part 1 then Part 2 to keep the running balance correct.
Uneven split — remainder batch
ExampleWhen records do not divide evenly, the final batch holds the leftover transactions.
Input: txns.csv (37,500 rows) Rows per chunk: 10000 Chunks: 4 Part 1 rows 1–10000 Part 2 rows 10001–20000 Part 3 rows 20001–30000 Part 4 rows 30001–37500 (7,500 records + header)
Validate before batching
ExampleThe splitter copies malformed rows. Validate first so a ragged row does not break a downstream batch parse.
Raw export: 100,000 rows, 2 with an extra unquoted comma Step 1 csv-validator → flags 2 row_width errors Step 2 fix at source, re-export Step 3 csv-row-splitter, 50000/chunk → 2 clean batches Skip step 1 and a worker may reject its whole batch on the bad row.
Errors and edge cases
Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.
Amount totals must reconcile across chunks
By designThe splitter neither drops nor edits rows, so the sum of any numeric column across all chunks equals the source total exactly. If a downstream reconciliation comes up short, the cause is in the pipeline or a dropped chunk — not the split. Confirm you processed every part-N-of-M file.
Record split mid-transaction
Cannot happenThe split is row-based and quote-aware, so a single transaction (one CSV record, even with quoted commas or newlines in the memo) is never cut across two chunks. Every record lands wholly in exactly one batch.
Order is not changed
PreservedThe splitter does not sort. Chunk 1 is the first N rows in source order. If you need chronological batches, sort the export by timestamp before splitting — sorting is a separate step via the csv-sorter.
Duplicate transactions copied verbatim
By designIf the export contains duplicate txn_id rows, they are copied into chunks unchanged and may land in different batches, risking double-posting. Dedup with the csv-deduplicator or surface them with the csv-duplicate-finder before splitting.
Blank rows counted, not skipped
PreservedEmpty lines are kept as rows, so a blank separator inflates the total and lands in a chunk as an empty record — which a strict ledger parser may reject. Strip them with the csv-empty-row-remover first.
Header-only file
ExpectedA file with only the header and no transactions produces zero chunks (total rows 0). If you expected batches, confirm the export period actually contained transactions.
Settlement over the 100,000-row Pro ceiling
Plan limitPro processes up to 100,000 rows / 100 MB per file. A 500,000-row monthly settlement exceeds it — split the source into sub-100k files at export time first, then batch each for processing.
Re-run produces identical chunks
ExpectedThe split is deterministic: the same input file and chunk size always yield the same chunks with the same row ranges. An auditor re-running the split gets byte-identical batches, which is what reproducibility requires.
Output without BOM
By designChunks are plain UTF-8 with no byte-order mark. Pipeline parsers handle this fine; only a human opening a batch in Excel-on-Windows might see accented merchant names render oddly — a display quirk, not a data error.
Frequently asked questions
Do the chunk totals add up to the source total?
Yes. The splitter never drops or edits a row, so summing any amount column across all chunks equals the source total exactly. Any shortfall points to the pipeline, not the split.
Can a single transaction be split across two batches?
No. The split is row-based and quote-aware. Each CSV record — even with commas or newlines inside quoted fields — lands wholly in exactly one chunk.
Does it keep transactions in chronological order?
It preserves source order. If your export is sorted by timestamp, the batches stay chronological. The splitter does not sort, so sort first via the csv-sorter if needed.
Can I set the number of batches instead of rows per batch?
No. The only option is Rows per chunk. Batch count is total data rows ÷ chunk size, rounded up. For exactly N batches, set the chunk size to total rows ÷ N.
Does each batch include the header?
Yes — the source header is copied to the top of every chunk, so amount, currency, and id columns map correctly in every downstream parse.
Is the split reproducible for audits?
Yes. The same file and chunk size always produce identical chunks with the same row ranges, so an audit re-run reproduces the exact batches.
Does the splitter remove duplicate transactions?
No. Duplicates are copied verbatim and may land in different batches. Dedup with the csv-deduplicator before splitting to avoid double-posting.
How are the batch files named?
Each chunk is the source filename plus a part suffix, like settlement.part-1-of-10.csv, sorting in processing order.
Can I download all batches at once?
No. Each chunk has its own Download button and saves individually — there is no zip bundle.
Is transaction data sent to a server?
No. Parsing and splitting run in your browser; amounts, account references, and card fragments never leave the page.
What is the largest file I can split?
The row splitter is a Pro tool: up to 100,000 rows / 100 MB per file. Larger settlement files should be pre-split at export.
What happens to blank lines in the export?
They are kept as rows, not skipped, and count toward the total. Strip them with the csv-empty-row-remover before splitting if your ledger parser rejects empty records.
Privacy first
Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.