How to batch extractor for devops & incident response
- Step 1Gather every bundle from the incident — Pull the per-node log ZIPs, the profiling tarballs, and any vendor support bundles into one folder. Mixed formats are fine — ZIP, 7z, RAR, TAR, GZ can be dropped together in a single batch.
- Step 2Open the Batch Extractor and drop them all — Drag everything onto batch-extraction-manager. It detects each format by magic bytes and runs in-browser, so nothing about the incident leaves the responder's machine.
- Step 3Account for the one-level rule on tarballs — A
.tar.gz(e.g. a profiler or kernel dump) comes out as a single.tarbecause GZIP wraps one stream. Plan to run those through nested-archive-extractor as a second pass before grepping. - Step 4Handle encrypted vendor bundles separately — If a vendor ships an AES-protected ZIP, this tool will reject it (no password field). Decrypt it with your local CLI first, then add the cleartext bundle to the batch.
- Step 5Download the consolidated ZIP and load it into your viewer — You get
batch-extracted-N-archives.zipwith each source under its stem. Unpack it locally and point ripgrep / lnav / Splunk at the tree. The result panel'sTotal entriesis a quick sanity check that nothing silently dropped. - Step 6Build the cross-node timeline — With every node's logs under a distinct prefix, a single
rg -n 'request-id-abc' node-*/walks all hosts at once. If a node's folder is missing entries, check for a stem collision (two bundles named the same).
Incident artifacts and how the Batch Extractor handles them
Typical bundle types that show up in an incident and what to expect from the one-level, browser-side extractor.
| Artifact | Format | Result in this tool | Follow-up needed? |
|---|---|---|---|
| Per-node log bundle | ZIP | Fully unpacked under node-N/ | None |
| Rotated log | GZIP (app.log.gz) | Single decompressed app.log | None |
| Profiler / kernel dump | tar.gz | Emitted as one .tar | Yes — run nested-archive-extractor |
| Vendor support bundle | 7z | Fully unpacked (libarchive WASM) | None |
| Legacy bundle | RAR | Fully unpacked (libarchive WASM) | None |
| Encrypted bundle | AES ZIP | Rejected — no password field | Yes — decrypt with CLI first |
Tier sizing for incident batches
Forensic batches get big fast. The per-archive entry limit matters as much as size when bundles contain thousands of rotated files.
| Tier | Max file size | Max entries / archive | Max bundles / batch |
|---|---|---|---|
| Free | 50 MB | 500 | 1 (cannot batch) |
| Pro | 500 MB | 50,000 | 20 |
| Pro-media | 2 GB | 500,000 | 100 |
| Developer | 2 GB | 500,000 | unlimited |
Cookbook
Triage recipes drawn from real incident shapes. Hostnames and IDs are illustrative.
Five node log bundles into one grep-able tree
The core triage move: every node's logs unpacked under its own prefix so a cross-node search is unambiguous.
Drop: node-1-logs.zip ... node-5-logs.zip Download batch-extracted-5-archives.zip: node-1-logs/app.log node-1-logs/gc.log node-2-logs/app.log ... Then: rg -n 'OOMKilled' node-*-logs/
Mixed vendor bundle batch
A ZIP from your stack plus a 7z support bundle from a vendor, read together with no CLI installed on the responder laptop.
Drop: our-stack-logs.zip vendor-support-bundle.7z (read via libarchive WASM) Download batch-extracted-2-archives.zip: our-stack-logs/... vendor-support-bundle/diagnostics.txt, metrics.json
Profiler tarball needs a second pass
A pprof or heap dump shipped as tar.gz arrives packed. Chain the output through the nested extractor before analysis.
Drop: pprof-capture.tar.gz Download batch-extracted-1-archives.zip: pprof-capture/pprof-capture.tar (still packed) Next: /archive-tools/nested-archive-extractor -> profile001.pb.gz ...
Encrypted bundle blocks the batch
An AES-protected ZIP halts extraction because there's no password field. Decrypt locally, then batch the cleartext.
Drop: incident-evidence.zip (AES entries) -> Error: "Archive contains encrypted entries ... Provide a password to extract." Fix (local CLI): 7z x incident-evidence.zip -pPASS -oclear/ then re-zip clear/ and add it to the batch
Rotated .gz logs flattened for one grep
A handful of rotated gzip logs each decompress to a single file, all consolidated so the timeline is one search away.
Drop: app.log.gz, app.log.1.gz, app.log.2.gz Download batch-extracted-3-archives.zip: app.log/app.log app.log.1/app.log.1 app.log.2/app.log.2 Then: rg -n 'panic' app.log*/
Edge cases and what actually happens
Encrypted vendor support bundle
RejectedAES / ZipCrypto entries are rejected because the Batch Extractor has no password input. Decrypt with your local CLI first, then batch the cleartext. To verify a candidate password without extracting, use archive-password-tester.
tar.gz profiler / kernel dump
Not recursedEmitted as a single .tar (gz wraps one stream, one level only). Run the output through nested-archive-extractor before analysis. Budget the extra step into your runbook.
Two nodes' bundles share a filename
OverwriteIf two bundles are both named logs.zip, they share the logs/ prefix and the second overwrites the first — you'd silently lose a node's data mid-incident. Rename to node-1-logs.zip, node-2-logs.zip before batching.
Bundle exceeds the tier size cap
RejectedA full debug capture can blow past 500 MB (Pro) or even 2 GB (Pro-media / Developer). Over the cap, the bundle is refused. Pre-trim it with your CLI, or split it before batching.
Archive with tens of thousands of rotated files
RejectedThe per-archive entryLimit (Pro 50,000, top tiers 500,000) can trip on a bundle stuffed with rotated logs even if its bytes are modest. Split such bundles or move to a higher tier.
WebAssembly blocked by browser policy
Failed7z/RAR/bz2/xz/ISO need the libarchive WASM bridge. A hardened responder environment that blocks WebAssembly will fail those formats (ZIP/GZIP/TAR via fflate still work). Use a browser profile that permits WASM, or extract those formats with a CLI.
Huge consolidated output on a low-RAM laptop
Out of memoryAll uncompressed bytes are held in memory before zipping. A big forensic batch can exhaust a responder laptop's tab. Process the bundles in smaller groups, or run on a higher-RAM machine.
Bundle saved with the wrong extension
SupportedDetection is by magic bytes, so a 7z saved as bundle.log or a ZIP saved as data.bin is still read correctly. Only a file that matches no known signature falls back to a ZIP attempt and then errors.
Chain-of-custody / evidence integrity
By designExtraction recompresses into a new ZIP, so byte-level hashes of the output differ from the source. For evidence integrity, hash the original bundles first with checksum-generator and record those, not the extractor's output.
Frequently asked questions
Does incident data stay on my machine?
Yes. Archive tools are browser-only with no upload endpoint (execution.browserOnly is true). The bundles are read and re-zipped in your browser; nothing about the incident reaches a server. This is the main reason responders use it over a cloud extractor.
Can it open a vendor's password-protected support bundle?
No — there is no password field, so encrypted entries are rejected. Decrypt with your local CLI first, then add the cleartext bundle to the batch. Use archive-password-tester to check a candidate password.
Why did my pprof.tar.gz come out as a .tar?
GZIP wraps a single stream and this tool extracts one level, so it gunzips to the inner .tar and stops. Run the output through nested-archive-extractor to get the individual profiles.
How many node bundles can I unpack at once?
Free 1 (so batching needs Pro), Pro 20, Pro-media 100, Developer unlimited — subject to fitting all uncompressed bytes in browser memory.
Will it read a 7z support bundle without me installing anything?
Yes. 7z, RAR, bz2, xz, and ISO are read through the libarchive WASM bridge that ships with the page — handy on a locked-down responder laptop. It requires the browser to allow WebAssembly.
Does it preserve timestamps for forensics?
The extractor reads file contents and paths and writes a fresh ZIP; do not rely on the output ZIP's timestamps for forensic ordering. To read original timestamps and metadata, use archive-metadata-extractor on the source bundle, and to normalise them deliberately use timestamp-normalizer.
Can I automate this in our incident-response pipeline?
Not as an HTTP API — archive tools have no public endpoint. For automated pipelines use a CLI. On Pro+ the @jadapps/runner can execute the tool in headless Chromium, but it is not a file-accepting server.
What if a node's logs go missing from the output?
Almost always a stem collision: two bundles shared a filename, so one overwrote the other. The result panel's Total entries will be lower than expected. Rename bundles uniquely (include the node ID) and re-run.
Does it recurse into nested archives automatically?
No — strictly one level. For deep bundles, chain the output into nested-archive-extractor, which is built for recursion up to a configurable depth.
What about chain of custody on the extracted files?
Recompression changes byte-level hashes, so hash the original bundles before extraction with checksum-generator and keep those records. The extractor is for triage, not for producing evidentiary copies.
Can I just preview a bundle's contents before unpacking everything?
Yes — for a quick look at one bundle's file list without extracting, use archive-previewer or file-listing-generator. The Batch Extractor is for when you've decided to unpack the whole set.
Does grepping across nodes work after extraction?
Yes — that's the design. Each node's logs sit under a distinct stem prefix, so rg -n 'pattern' node-*/ searches every host at once with the host identifiable from the path.
Privacy first
Every JAD Archive tool runs entirely in your browser using fflate, @zip.js/zip.js, and the libarchive WASM bridge. Your archives never leave your device — verified by zero outbound network requests during processing.