Split a Transaction CSV Into Processing Batches

How to split a transaction csv into processing batches

Step 1
Export the transaction or settlement file as CSV — Download the period's transactions from your processor, gateway, or ledger — Stripe (Payments / Balance export), bank statement CSV, or an accounting-system export. Keep it sorted as your pipeline expects (usually by timestamp).
Step 2
Validate amounts and structure first (recommended) — The splitter copies rows verbatim — it does not check for ragged rows or malformed amounts. Run the csv-validator to catch row-width and encoding problems before splitting, so a bad row does not poison a downstream batch.
Step 3
Drop the file onto the splitter — PapaParse reads it in your browser and auto-detects the delimiter (comma, or semicolon for some EU exports). The first 10 rows preview so you can confirm the header and ordering before splitting.
Step 4
Set Rows per chunk to your batch size — Type the records-per-pass count into the single Rows per chunk field (default 1000, minimum 1). Match it to your pipeline's batch size or your downstream API's per-call record limit.
Step 5
Click Split into chunks — The result panel reports total data rows, chunk count, and rows per chunk, then lists each chunk's rows X–Y range — useful for logging exactly which records went to which batch.
Step 6
Download and feed each batch to your pipeline — Each chunk has its own Download button (no zip). Files are settlement.part-N-of-M.csv, header included. Process them in order for sequential reconciliation, or distribute across workers for parallel runs.

Split behaviour for transaction data

One option; the rest is fixed split logic. Confirm it matches your pipeline's batching contract before a production run.

Behaviour	How it works	Why it matters for transactions
Rows per chunk (only option)	Number input, min 1, default 1000; consecutive blocks	Match your pipeline's per-pass or per-API-call record limit
Header in every batch	Source row 1 copied to each file's top	Guarantees amount/currency/id columns map correctly downstream
Order preserved	Records keep source order; no re-sort	Chronological export stays in time order — needed for running balances
Deterministic	Same input + chunk size → identical chunks	An audit re-run reproduces the exact same batches
Remainder in last file	Final chunk holds leftover records	500,000 at 50,000/chunk → ten even files
No edits / no dedup	Amounts and rows untouched	Reconciliation totals across all chunks equal the source total

Batch-size maths for settlement files

Worked examples (data rows excluding the header). The row splitter is a Pro tool; Free caps at 500 rows / 2 MB and at the 100,000-row Pro ceiling per file.

Transactions	Rows per chunk	Batches	Last batch	Note
100,000	50,000	2	50,000	Even, at the Pro per-file ceiling
100,000	10,000	10	10,000	Even, ten parallel workers
37,500	10,000	4	7,500	Remainder in batch 4
100,000	25,000	4	25,000	Even
480	120	4	120	Within Free row cap, but tool is Pro-gated

Cookbook

Before/after examples from transaction-batching pipelines. Amounts and ids illustrative; the splitter never alters values.

100k settlement into 10 worker batches

Example

A reconciliation job runs ten parallel workers. Split the day's settlement into ten 10,000-record files, one per worker, each totalling independently.

Input: settlement-2026-06-09.csv (100,000 rows + header)
Rows per chunk: 10000

Result panel:
  Total rows: 100000   Chunks: 10   Rows/chunk: 10000
  Part 1  rows 1–10000  …  Part 10  rows 90001–100000

Sum of amounts across all 10 batches == source total
(the splitter never drops or edits a row).

Header on every batch keeps columns aligned

Example

If a batch lost its header, a pipeline reading by position would map amount to the date column. The splitter copies the header into each chunk.

Source:
txn_id,date,amount,currency
T1,2026-06-09,12.40,USD
T2,2026-06-09,99.00,USD
T3,2026-06-09,4.10,USD

Rows per chunk: 2  →

part-1-of-2.csv               part-2-of-2.csv
txn_id,date,amount,currency   txn_id,date,amount,currency
T1,2026-06-09,12.40,USD        T3,2026-06-09,4.10,USD
T2,2026-06-09,99.00,USD

Chronological order preserved

Example

Running-balance reconciliation depends on time order. The splitter never re-sorts, so a timestamp-sorted export stays ordered across batches.

Source sorted by timestamp ascending.
Rows per chunk: 50000

Part 1 = earliest 50,000 txns (rows 1–50000)
Part 2 = next 50,000        (rows 50001–100000)
→ feed Part 1 then Part 2 to keep the running balance correct.

Uneven split — remainder batch

Example

When records do not divide evenly, the final batch holds the leftover transactions.

Input: txns.csv (37,500 rows)
Rows per chunk: 10000

Chunks: 4
  Part 1  rows 1–10000
  Part 2  rows 10001–20000
  Part 3  rows 20001–30000
  Part 4  rows 30001–37500   (7,500 records + header)

Validate before batching

Example

The splitter copies malformed rows. Validate first so a ragged row does not break a downstream batch parse.

Raw export: 100,000 rows, 2 with an extra unquoted comma

Step 1  csv-validator → flags 2 row_width errors
Step 2  fix at source, re-export
Step 3  csv-row-splitter, 50000/chunk → 2 clean batches

Skip step 1 and a worker may reject its whole batch on the
bad row.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Amount totals must reconcile across chunks

By design

The splitter neither drops nor edits rows, so the sum of any numeric column across all chunks equals the source total exactly. If a downstream reconciliation comes up short, the cause is in the pipeline or a dropped chunk — not the split. Confirm you processed every part-N-of-M file.

Record split mid-transaction

Cannot happen

The split is row-based and quote-aware, so a single transaction (one CSV record, even with quoted commas or newlines in the memo) is never cut across two chunks. Every record lands wholly in exactly one batch.

Order is not changed

Preserved

The splitter does not sort. Chunk 1 is the first N rows in source order. If you need chronological batches, sort the export by timestamp before splitting — sorting is a separate step via the csv-sorter.

Duplicate transactions copied verbatim

By design

If the export contains duplicate txn_id rows, they are copied into chunks unchanged and may land in different batches, risking double-posting. Dedup with the csv-deduplicator or surface them with the csv-duplicate-finder before splitting.

Blank rows counted, not skipped

Preserved

Empty lines are kept as rows, so a blank separator inflates the total and lands in a chunk as an empty record — which a strict ledger parser may reject. Strip them with the csv-empty-row-remover first.

Header-only file

Expected

A file with only the header and no transactions produces zero chunks (total rows 0). If you expected batches, confirm the export period actually contained transactions.

Settlement over the 100,000-row Pro ceiling

Plan limit

Pro processes up to 100,000 rows / 100 MB per file. A 500,000-row monthly settlement exceeds it — split the source into sub-100k files at export time first, then batch each for processing.

Re-run produces identical chunks

Expected

The split is deterministic: the same input file and chunk size always yield the same chunks with the same row ranges. An auditor re-running the split gets byte-identical batches, which is what reproducibility requires.

Output without BOM

By design

Chunks are plain UTF-8 with no byte-order mark. Pipeline parsers handle this fine; only a human opening a batch in Excel-on-Windows might see accented merchant names render oddly — a display quirk, not a data error.

Frequently asked questions

Do the chunk totals add up to the source total?

Yes. The splitter never drops or edits a row, so summing any amount column across all chunks equals the source total exactly. Any shortfall points to the pipeline, not the split.

Can a single transaction be split across two batches?

No. The split is row-based and quote-aware. Each CSV record — even with commas or newlines inside quoted fields — lands wholly in exactly one chunk.

Does it keep transactions in chronological order?

It preserves source order. If your export is sorted by timestamp, the batches stay chronological. The splitter does not sort, so sort first via the csv-sorter if needed.

Can I set the number of batches instead of rows per batch?

No. The only option is Rows per chunk. Batch count is total data rows ÷ chunk size, rounded up. For exactly N batches, set the chunk size to total rows ÷ N.

Does each batch include the header?

Yes — the source header is copied to the top of every chunk, so amount, currency, and id columns map correctly in every downstream parse.

Is the split reproducible for audits?

Yes. The same file and chunk size always produce identical chunks with the same row ranges, so an audit re-run reproduces the exact batches.

Does the splitter remove duplicate transactions?

No. Duplicates are copied verbatim and may land in different batches. Dedup with the csv-deduplicator before splitting to avoid double-posting.

How are the batch files named?

Each chunk is the source filename plus a part suffix, like settlement.part-1-of-10.csv, sorting in processing order.

Can I download all batches at once?

No. Each chunk has its own Download button and saves individually — there is no zip bundle.

Is transaction data sent to a server?

No. Parsing and splitting run in your browser; amounts, account references, and card fragments never leave the page.

What is the largest file I can split?

The row splitter is a Pro tool: up to 100,000 rows / 100 MB per file. Larger settlement files should be pre-split at export.

What happens to blank lines in the export?

They are kept as rows, not skipped, and count toward the total. Strip them with the csv-empty-row-remover before splitting if your ledger parser rejects empty records.

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

How to split a transaction csv into processing batches

Step 1
Export the transaction or settlement file as CSV — Download the period's transactions from your processor, gateway, or ledger — Stripe (Payments / Balance export), bank statement CSV, or an accounting-system export. Keep it sorted as your pipeline expects (usually by timestamp).
Step 2
Validate amounts and structure first (recommended) — The splitter copies rows verbatim — it does not check for ragged rows or malformed amounts. Run the csv-validator to catch row-width and encoding problems before splitting, so a bad row does not poison a downstream batch.
Step 3
Drop the file onto the splitter — PapaParse reads it in your browser and auto-detects the delimiter (comma, or semicolon for some EU exports). The first 10 rows preview so you can confirm the header and ordering before splitting.
Step 4
Set Rows per chunk to your batch size — Type the records-per-pass count into the single Rows per chunk field (default 1000, minimum 1). Match it to your pipeline's batch size or your downstream API's per-call record limit.
Step 5
Click Split into chunks — The result panel reports total data rows, chunk count, and rows per chunk, then lists each chunk's rows X–Y range — useful for logging exactly which records went to which batch.
Step 6
Download and feed each batch to your pipeline — Each chunk has its own Download button (no zip). Files are settlement.part-N-of-M.csv, header included. Process them in order for sequential reconciliation, or distribute across workers for parallel runs.

Split behaviour for transaction data

One option; the rest is fixed split logic. Confirm it matches your pipeline's batching contract before a production run.

Behaviour	How it works	Why it matters for transactions
Rows per chunk (only option)	Number input, min 1, default 1000; consecutive blocks	Match your pipeline's per-pass or per-API-call record limit
Header in every batch	Source row 1 copied to each file's top	Guarantees amount/currency/id columns map correctly downstream
Order preserved	Records keep source order; no re-sort	Chronological export stays in time order — needed for running balances
Deterministic	Same input + chunk size → identical chunks	An audit re-run reproduces the exact same batches
Remainder in last file	Final chunk holds leftover records	500,000 at 50,000/chunk → ten even files
No edits / no dedup	Amounts and rows untouched	Reconciliation totals across all chunks equal the source total

Batch-size maths for settlement files

Worked examples (data rows excluding the header). The row splitter is a Pro tool; Free caps at 500 rows / 2 MB and at the 100,000-row Pro ceiling per file.

Transactions	Rows per chunk	Batches	Last batch	Note
100,000	50,000	2	50,000	Even, at the Pro per-file ceiling
100,000	10,000	10	10,000	Even, ten parallel workers
37,500	10,000	4	7,500	Remainder in batch 4
100,000	25,000	4	25,000	Even
480	120	4	120	Within Free row cap, but tool is Pro-gated

Cookbook

Before/after examples from transaction-batching pipelines. Amounts and ids illustrative; the splitter never alters values.

100k settlement into 10 worker batches

Example

A reconciliation job runs ten parallel workers. Split the day's settlement into ten 10,000-record files, one per worker, each totalling independently.

Input: settlement-2026-06-09.csv (100,000 rows + header)
Rows per chunk: 10000

Result panel:
  Total rows: 100000   Chunks: 10   Rows/chunk: 10000
  Part 1  rows 1–10000  …  Part 10  rows 90001–100000

Sum of amounts across all 10 batches == source total
(the splitter never drops or edits a row).

Header on every batch keeps columns aligned

Example

If a batch lost its header, a pipeline reading by position would map amount to the date column. The splitter copies the header into each chunk.

Source:
txn_id,date,amount,currency
T1,2026-06-09,12.40,USD
T2,2026-06-09,99.00,USD
T3,2026-06-09,4.10,USD

Rows per chunk: 2  →

part-1-of-2.csv               part-2-of-2.csv
txn_id,date,amount,currency   txn_id,date,amount,currency
T1,2026-06-09,12.40,USD        T3,2026-06-09,4.10,USD
T2,2026-06-09,99.00,USD

Chronological order preserved

Example

Running-balance reconciliation depends on time order. The splitter never re-sorts, so a timestamp-sorted export stays ordered across batches.

Source sorted by timestamp ascending.
Rows per chunk: 50000

Part 1 = earliest 50,000 txns (rows 1–50000)
Part 2 = next 50,000        (rows 50001–100000)
→ feed Part 1 then Part 2 to keep the running balance correct.

Uneven split — remainder batch

Example

When records do not divide evenly, the final batch holds the leftover transactions.

Input: txns.csv (37,500 rows)
Rows per chunk: 10000

Chunks: 4
  Part 1  rows 1–10000
  Part 2  rows 10001–20000
  Part 3  rows 20001–30000
  Part 4  rows 30001–37500   (7,500 records + header)

Validate before batching

Example

The splitter copies malformed rows. Validate first so a ragged row does not break a downstream batch parse.

Raw export: 100,000 rows, 2 with an extra unquoted comma

Step 1  csv-validator → flags 2 row_width errors
Step 2  fix at source, re-export
Step 3  csv-row-splitter, 50000/chunk → 2 clean batches

Skip step 1 and a worker may reject its whole batch on the
bad row.

Errors and edge cases

Real errors and silent failures sourced from each platform's own documentation. Match the wording to the row, fix what the row says to fix.

Amount totals must reconcile across chunks

By design

Record split mid-transaction

Cannot happen

Order is not changed

Preserved

Duplicate transactions copied verbatim

By design

Blank rows counted, not skipped

Preserved

Header-only file

Expected

A file with only the header and no transactions produces zero chunks (total rows 0). If you expected batches, confirm the export period actually contained transactions.

Settlement over the 100,000-row Pro ceiling

Plan limit

Pro processes up to 100,000 rows / 100 MB per file. A 500,000-row monthly settlement exceeds it — split the source into sub-100k files at export time first, then batch each for processing.

Re-run produces identical chunks

Expected

Output without BOM

By design

Frequently asked questions

Do the chunk totals add up to the source total?

Yes. The splitter never drops or edits a row, so summing any amount column across all chunks equals the source total exactly. Any shortfall points to the pipeline, not the split.

Can a single transaction be split across two batches?

No. The split is row-based and quote-aware. Each CSV record — even with commas or newlines inside quoted fields — lands wholly in exactly one chunk.

Does it keep transactions in chronological order?

It preserves source order. If your export is sorted by timestamp, the batches stay chronological. The splitter does not sort, so sort first via the csv-sorter if needed.

Can I set the number of batches instead of rows per batch?

No. The only option is Rows per chunk. Batch count is total data rows ÷ chunk size, rounded up. For exactly N batches, set the chunk size to total rows ÷ N.

Does each batch include the header?

Yes — the source header is copied to the top of every chunk, so amount, currency, and id columns map correctly in every downstream parse.

Is the split reproducible for audits?

Yes. The same file and chunk size always produce identical chunks with the same row ranges, so an audit re-run reproduces the exact batches.

Does the splitter remove duplicate transactions?

No. Duplicates are copied verbatim and may land in different batches. Dedup with the csv-deduplicator before splitting to avoid double-posting.

How are the batch files named?

Each chunk is the source filename plus a part suffix, like settlement.part-1-of-10.csv, sorting in processing order.

Can I download all batches at once?

No. Each chunk has its own Download button and saves individually — there is no zip bundle.

Is transaction data sent to a server?

No. Parsing and splitting run in your browser; amounts, account references, and card fragments never leave the page.

What is the largest file I can split?

The row splitter is a Pro tool: up to 100,000 rows / 100 MB per file. Larger settlement files should be pre-split at export.

What happens to blank lines in the export?

They are kept as rows, not skipped, and count toward the total. Strip them with the csv-empty-row-remover before splitting if your ledger parser rejects empty records.

Privacy first

Processing runs locally in your browser with PapaParse. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.

Split a Transaction CSV Into Processing Batches

How to split a transaction csv into processing batches

Split behaviour for transaction data

Batch-size maths for settlement files

Cookbook

100k settlement into 10 worker batches

Header on every batch keeps columns aligned

Chronological order preserved

Uneven split — remainder batch

Validate before batching

Errors and edge cases

Amount totals must reconcile across chunks

Record split mid-transaction

Order is not changed

Duplicate transactions copied verbatim

Blank rows counted, not skipped

Header-only file

Settlement over the 100,000-row Pro ceiling

Re-run produces identical chunks

Output without BOM

Frequently asked questions

Do the chunk totals add up to the source total?

Can a single transaction be split across two batches?

Does it keep transactions in chronological order?

Can I set the number of batches instead of rows per batch?

Does each batch include the header?

Is the split reproducible for audits?

Does the splitter remove duplicate transactions?

How are the batch files named?

Can I download all batches at once?

Is transaction data sent to a server?

What is the largest file I can split?

What happens to blank lines in the export?

Privacy first

Related guides

Split a Transaction CSV Into Processing Batches

How to split a transaction csv into processing batches

Split behaviour for transaction data

Batch-size maths for settlement files

Cookbook

100k settlement into 10 worker batches

Header on every batch keeps columns aligned

Chronological order preserved

Uneven split — remainder batch

Validate before batching

Errors and edge cases

Amount totals must reconcile across chunks

Record split mid-transaction

Order is not changed

Duplicate transactions copied verbatim

Blank rows counted, not skipped

Header-only file

Settlement over the 100,000-row Pro ceiling

Re-run produces identical chunks

Output without BOM

Frequently asked questions

Do the chunk totals add up to the source total?

Can a single transaction be split across two batches?

Does it keep transactions in chronological order?

Can I set the number of batches instead of rows per batch?

Does each batch include the header?

Is the split reproducible for audits?

Does the splitter remove duplicate transactions?

How are the batch files named?

Can I download all batches at once?

Is transaction data sent to a server?

What is the largest file I can split?

What happens to blank lines in the export?

Privacy first

Related guides