How to selective extractor for devops & incident response
- Step 1Stage the bundle locally — Pull the archive from your evidence store, S3 bucket, pipeline artifact, or the customer's upload — to local disk. The tool reads it through the browser File API; nothing transits the network.
- Step 2Hash it first for chain of custody — Before extracting, run the original bundle through Checksum Generator to record a SHA-256. Note it in the incident timeline so the evidence is attestable.
- Step 3Open Selective Extractor and drop the bundle — Go to /archive-tools/selective-extractor and drop the single archive. For a 300 MB diagnostic package, use Pro tier (500 MB cap) so it isn't rejected at the size check.
- Step 4Filter to exactly what the investigation needs — Type a precise glob:
*.logfor all logs,kube-system/**for one namespace's tree,var/log/**/*.gzfor rotated logs. A slash-free pattern matches basenames at any depth; add**to span folders. - Step 5Process and review the counts — Click Process. The result panel shows Pattern, Matched entries, and Total entries — sanity-check that the matched count is plausible before you act on the extract.
- Step 6Hand off with a fresh hash — Download
<bundle>-filtered.zip, run it through the Checksum Generator again, and attach both the file and its hash to the ticket/SIEM so the extracted subset is independently verifiable.
Common incident-bundle formats and how the tool handles them
Bundles arrive in many shapes; detection is by magic bytes. All are read in-browser and the matches come back as ZIP.
| Bundle type | Typical format | Read engine | Useful glob |
|---|---|---|---|
| Linux sosreport / diagnostics | .tar.gz / .tar.xz | fflate / libarchive | var/log/** |
| Kubernetes must-gather | .tar.gz | fflate | **/*.yaml |
| Vendor support packet | .zip (sometimes encrypted) | fflate / zip.js | *.log |
| Heap / core dump bundle | .7z | libarchive WASM | *.hprof |
| Single rotated log | .gz | fflate | * (one inner file) |
| Windows diagnostic | .zip / .cab-in-zip | fflate | Windows/**/*.evtx |
Tier sizing for incident work
Size is checked before processing. Pick the tier that covers your typical bundle so it isn't rejected up front. Entry counts in the right column are the documented plan ceilings.
| Tier | Max bundle size | Entry ceiling (documented) | Good for |
|---|---|---|---|
| Free | 50 MB | 500 | Single-service log packets |
| Pro | 500 MB | 50,000 | Most vendor/diagnostic bundles |
| Pro-media | 2 GB | 500,000 | Full must-gather / dump bundles |
| Developer | 2 GB | 500,000 | Large bundles + automation roadmap |
Chain-of-custody pairing
Selective Extractor is read-only on the source. Combine with these siblings for an attestable workflow.
| Step | Tool | Output |
|---|---|---|
| Hash the source bundle | Checksum Generator | SHA-256 of original |
| See contents before extracting | Archive Previewer | Tree + table, no extract |
| Pull the relevant subset | Selective Extractor | <bundle>-filtered.zip |
| Hash the extracted subset | Checksum Generator | SHA-256 of subset |
| Test the bundle isn't corrupt | Archive Integrity Tester | Per-entry CRC report |
Cookbook
Triage recipes from real incident shapes. Each shows the glob to type and what comes back.
Pull every log from a must-gather
A Kubernetes must-gather is a deep .tar.gz. A slash-free *.log matches log basenames at any depth.
Input: must-gather.tar.gz (8,412 entries) Pattern: *.log Result panel: Matched entries: 612 Total entries: 8412 Download: must-gather-filtered.zip (612 logs, paths kept)
Isolate one namespace's tree
Use ** to capture an entire subtree for the affected namespace, keeping the directory structure for the ticket.
Input: must-gather.tar.gz Pattern: namespaces/payments/** Download entries: namespaces/payments/pods/api-0/error.log namespaces/payments/events.yaml namespaces/payments/pods/api-0/manifest.yaml
Extract one stack trace from a heap-dump 7z
libarchive reads the 7z in-browser; the matched dump comes back as a ZIP entry, hashable for custody.
Input: jvm-incident.7z Pattern: *.hprof Download: jvm-incident-filtered.zip threaddump-2026-06-13.hprof
Open an encrypted vendor packet
Vendor ZIPs are sometimes password-protected. Type the password to decrypt via zip.js, then filter.
Input: vendor-support.zip (AES-encrypted) Pattern: diagnostics/*.log Password: •••••••• Download: vendor-support-filtered.zip (decrypted logs)
Custody-safe extraction with hashes
Hash before and after so the extracted subset is provably derived from the original.
1. Checksum Generator on incident.zip → SHA-256 a1b2… 2. Selective Extractor, Pattern *.log → incident-filtered.zip 3. Checksum Generator on the subset → SHA-256 c3d4… 4. Record both hashes in the incident timeline
Edge cases and what actually happens
Encrypted 7Z/RAR evidence
Not supportedThe password field only decrypts ZIPs. An encrypted 7Z/RAR cannot be opened in the browser tool, because the libarchive reader is invoked without a password. Decrypt with an approved desktop tool inside your evidence environment first.
Bundle exceeds the tier cap
Error: tier limit exceededA 1.5 GB must-gather on Pro (500 MB) is rejected before processing with a clear tier-limit message. Use Pro-media/Developer (2 GB) or, for genuinely huge bundles, a CLI on a forensic workstation.
Glob matches nothing
Error: no matchNo entries matched pattern "…" means the filter was too narrow. Often it's depth: pods/*.log only hits files directly in pods/. Use pods/**/*.log. Preview first with Archive Previewer to learn the real paths.
Output format is ZIP, not the source
By designEven a .tar.gz input yields <bundle>-filtered.zip. If your downstream tooling expects TAR.GZ, convert the result with Archive Format Converter.
Glob is case-sensitive
By design*.LOG will not match app.log. Windows diagnostic bundles often mix cases; run the tool once per case or use * and filter afterward.
File-permission / ownership metadata
DroppedZIP output does not carry Unix ownership/permission bits that a tar bundle held. If permission context is part of the evidence, note it from the source listing rather than relying on the extracted ZIP.
Corrupt or truncated bundle
Error / partialA truncated download can fail extraction outright or surface a CRC mismatch on some entries. Verify the source with Archive Integrity Tester; re-pull if the bundle is incomplete.
Entry-count ceiling on the plan
Plan ceilingPlans document an entry ceiling (Free 500, Pro 50,000, Pro-media/Developer 500,000). The size check is the gate enforced before processing; for very large entry counts a Pro-media/Developer plan is the safe choice for big must-gathers.
Frequently asked questions
Is this safe for sensitive evidence?
The architecture is browser-only: the bundle is read and filtered in WebAssembly in your tab, with no upload of the file bytes and no temp files written to disk. Confirm with your compliance team that local browser processing matches your data-handling policy — it is equivalent to running a local CLI.
Can I maintain chain of custody?
Yes. Hash the source bundle with Checksum Generator, extract the subset, then hash the subset. Record both SHA-256 values in the incident timeline so the extract is provably derived from the original.
What formats do incident bundles come in, and are they supported?
ZIP, TAR.GZ, TAR.XZ, 7Z, RAR, GZ and BZ2 are all readable. Linux diagnostics are usually TAR.GZ/XZ; vendor packets are ZIP; dumps are often 7Z. Detection is by magic bytes, so a renamed bundle still reads.
How do I extract just the logs out of thousands of files?
Use a glob like *.log (matches log basenames at any depth) or var/log/** for a subtree. The result panel reports matched vs total so you can confirm the filter caught the right set.
Can it open password-protected support packets?
Encrypted ZIPs, yes — type the password to decrypt via @zip.js/zip.js. Encrypted 7Z/RAR are not supported; decrypt those with an approved desktop tool first.
How big a bundle can I process?
Free 50 MB, Pro 500 MB, Pro-media and Developer 2 GB. The size is checked before processing, so an oversized bundle is rejected immediately rather than failing midway.
Can several analysts use it at once?
Each browser tab is an independent instance with no shared server state, so multiple responders can run it concurrently. Tier limits apply per session.
Does it work on a locked-down responder laptop?
Yes — no install, no admin rights, and nothing written to disk except the download you choose to save. That makes it usable where you cannot install p7zip or WinRAR.
What about HIPAA / PCI / FedRAMP environments?
Because nothing transits a network, the regulated boundary does not move when you use the tool. Treat it like a local utility and validate against your specific control set; most teams accept local browser processing.
What if I need to see the bundle's contents first?
Use Archive Previewer for a tree/table view without extracting, or File Listing Generator to export the full entry list, then come back with a precise glob.
Can I confirm the bundle isn't corrupt before relying on it?
Run Archive Integrity Tester — it checks each entry's CRC and reports any failures, so you know whether a truncated download is the cause of a problem.
Will the extracted ZIP keep the original folder structure?
Yes. Each matched entry keeps its full internal path, so namespaces/payments/pods/api-0/error.log stays addressable when attached to a ticket.
Privacy first
Every JAD Archive tool runs entirely in your browser using fflate, @zip.js/zip.js, and the libarchive WASM bridge. Your archives never leave your device — verified by zero outbound network requests during processing.