How to selective extractor in developer workflows
- Step 1Grab the artifact during review — Download the PR/CI artifact ZIP (or the dependency tarball) locally. No need to check out the branch — you only want to look inside the produced archive.
- Step 2Drop it into Selective Extractor — Open /archive-tools/selective-extractor and drop the single archive. ZIP, TAR.GZ, GZ, 7Z, RAR, BZ2 and XZ are all accepted; format is detected by magic bytes.
- Step 3Glob to the files under review — Type
dist/**/*.jsfor build output,**/*.d.tsfor type declarations,coverage/**for the coverage tree, or*.jsonfor all manifests. Add**whenever you need to cross directory boundaries. - Step 4Check the matched/total counts — After Process, confirm the panel's Matched entries vs Total entries. If a build forgot to emit a file, the matched count will be lower than expected — a quick smoke test on the artifact.
- Step 5Make the subset reproducible — Pass
<artifact>-filtered.zipthrough Timestamp Normalizer with a fixed date (1980-01-01 is the conventional reproducible-build epoch) so the bytes are stable across machines. - Step 6Attest it for the supply chain — Run Checksum Generator over the normalised ZIP to emit a SHA-256 manifest your CI can compare against future builds.
Glob patterns developers reach for
How the tool's glob rules map to common developer intents. ** is required to cross directory boundaries.
| Intent | Pattern | Matches |
|---|---|---|
| All built JS, any depth | dist/**/*.js | dist/index.js, dist/api/db.js |
| Type declarations anywhere | **/*.d.ts or *.d.ts | types/index.d.ts, lib/x.d.ts |
| Files directly in src only | src/*.ts | src/index.ts (NOT src/api/db.ts) |
| Whole coverage tree | coverage/** | every entry under coverage/ |
| Single specific file | coverage/lcov.info | exactly that path |
| One-char variants | v?.json | v1.json, v2.json (not v10.json) |
Reproducible-output chain
Selective Extractor's re-zip does not guarantee stable timestamps. Chain these siblings to get deterministic, attestable bytes.
| Step | Tool | Why |
|---|---|---|
| 1. Extract the subset | Selective Extractor | Glob the files under review into one ZIP |
| 2. Normalise timestamps | Timestamp Normalizer | Fix mtimes (e.g. 1980-01-01) for byte-stable output |
| 3. Emit a manifest | Checksum Generator | SHA-256 per entry for CI verification |
| 4. Compare versions | Archive Diff | Entry-level diff of before/after artifacts |
Batch reality for developers
Selective Extractor reads one archive per run. For volume, route to the right sibling.
| Need | This tool? | Use instead |
|---|---|---|
| One artifact, one pattern | Yes | — |
| One artifact, many patterns | No (one glob) | Run twice + Archive Merger |
| Many artifacts at once | No (single file) | Batch Extraction Manager |
| Build a ZIP from local files | No (this reads) | Selective Zipper |
Cookbook
Developer recipes: the exact glob, and what the result panel and download contain.
Inspect build output without a checkout
Glob the dist tree out of a CI artifact to eyeball what the build actually produced.
Input: ci-artifact.zip (3,180 entries) Pattern: dist/**/*.js Result panel: Matched entries: 214 Total entries: 3180 Download: ci-artifact-filtered.zip (dist/**/*.js, paths kept)
Pull type declarations from a dependency tarball
npm tarballs are .tar.gz. A slash-free *.d.ts matches declaration files at any depth.
Input: some-pkg-4.2.0.tgz Pattern: *.d.ts Download entries: package/index.d.ts package/lib/client.d.ts package/lib/types.d.ts
Grab one report file
A fully-qualified path matches exactly one entry — handy for fishing a single report out of a big artifact.
Input: ci-artifact.zip Pattern: coverage/lcov.info Result panel: Matched entries: 1 Total entries: 3180 Download: ci-artifact-filtered.zip (lcov.info)
Make the extract reproducible
Chain timestamp normalisation and checksums so the same artifact always yields the same bytes and a verifiable manifest.
1. Selective Extractor: dist/** → artifact-filtered.zip 2. Timestamp Normalizer @ 1980-01-01 3. Checksum Generator → manifest.sha256 CI later: re-run steps 1-3, compare manifest.sha256 → identical
Diff what a PR changed inside the artifact
Extract the same subtree from the before/after artifacts, then diff to see exactly which entries moved.
before.zip → Selective Extractor dist/** → before-filtered.zip after.zip → Selective Extractor dist/** → after-filtered.zip Archive Diff(before-filtered, after-filtered): + dist/api/v2.js ~ dist/index.js (changed) - dist/api/legacy.js
Edge cases and what actually happens
`src/*.ts` misses nested files
ExpectedA single * does not cross /, so src/*.ts matches only files directly in src/, not src/api/db.ts. Use src/**/*.ts to recurse. This is the most common developer surprise.
Glob matched nothing
Error: no matchThe tool throws No entries matched pattern "…" and produces no download. Either the build didn't emit those files (a real signal) or the pattern needs **. Preview paths with Archive Previewer.
Output is ZIP regardless of input
By designA .tgz dependency tarball comes back as <name>-filtered.zip. If your pipeline expects TAR.GZ, convert with Archive Format Converter after extraction.
Timestamps are not reproducible by default
Use Timestamp NormalizerThe re-zip does not pin mtimes, so two runs can produce different bytes. For deterministic artifacts, normalise timestamps (e.g. to 1980-01-01) after extraction before hashing.
One pattern per run
LimitYou can't pass dist/**/*.js and dist/**/*.css together. Widen to dist/** and accept extra files, or run twice and merge with Archive Merger.
Single file per run (not batch)
LimitSelective Extractor reads one archive. To process many artifacts, use Batch Extraction Manager; to build a ZIP from local source files, use Selective Zipper.
Glob is case-sensitive
By design*.JS won't match app.js. Built artifacts are usually lowercase, but generated vendor files can vary — keep the pattern case in sync with the entries.
Encrypted dependency archive
ZIP onlyThe password field decrypts ZIPs via zip.js; encrypted 7Z/RAR aren't supported. Most build artifacts are unencrypted, so this rarely bites — but a password-locked vendor 7Z needs a desktop tool.
Frequently asked questions
Is there a CLI or API I can call from CI?
Not yet — a JAD programmatic API is on the roadmap. For CI today, use fflate (Node) for ZIP/GZIP/TAR, libarchive bindings for 7Z/RAR, and the Web Crypto / Node crypto API for SHA-256. This tool wraps the read+filter step behind one browser page.
How do I make the extracted subset reproducible?
Run the extract, then pass it through Timestamp Normalizer with a fixed date (1980-01-01 is conventional), then Checksum Generator to emit a manifest CI can verify against future builds.
Can I extract from a dependency tarball without npm?
Yes. Drop the .tgz/.tar.gz and glob *.d.ts, package/package.json, or whatever you need. It's read in-browser via fflate, no npm or tar install required.
Why does `src/*.ts` miss my nested files?
Because * doesn't cross /. Use src/**/*.ts to recurse into subfolders. A bare *.ts (no slash) matches .ts basenames at any depth.
Can I extract several patterns in one run?
No — one glob per run. Widen the pattern (e.g. dist/**) to catch everything you need at once, or run twice and combine with Archive Merger.
Can I script bulk runs over many artifacts?
This tool reads one archive at a time. For many archives, use Batch Extraction Manager. For building ZIPs from local files, use Selective Zipper.
Will the output work with my CI's format expectations?
The output is a standard ZIP with no JAD wrapper — drop it into any CI step. If a step needs TAR.GZ specifically, convert with Archive Format Converter.
How do I see exactly what a PR changed in an artifact?
Extract the same subtree from the before and after artifacts, then run Archive Diff on the two filtered ZIPs to get an entry-level changelist.
Does it preserve the artifact's folder layout?
Yes — every matched entry keeps its full internal path, so dist/api/db.js stays at that path in the output ZIP. Nothing is flattened.
What's the size limit for a big artifact?
Free 50 MB, Pro 500 MB, Pro-media and Developer 2 GB. The size is checked before processing, so an oversized artifact is rejected immediately.
Can I review an untrusted PR artifact safely?
Yes — it's read in-browser with no upload and no temp files on disk. You can inspect a suspicious artifact's contents without unpacking it into your working tree.
What if I want the whole artifact, not a subset?
Set the pattern to the default * to repackage everything as a ZIP, or use Multi-Format Extractor to expand it fully.
Privacy first
Every JAD Archive tool runs entirely in your browser using fflate, @zip.js/zip.js, and the libarchive WASM bridge. Your archives never leave your device — verified by zero outbound network requests during processing.