How to file type breakdown in developer workflows
- Step 1Grab the artifact — Download the PR's build artifact, the
npm packtarball, or the release ZIP. No need to check out the branch or extract anything. - Step 2Drop it into the tool — Open /archive-tools/file-type-breakdown and drop the archive.
.zip,.tgz/.tar.gz,.7z,.xz, and the rest all read directly. - Step 3Read the dominant types — Rows are sorted by uncompressed size, so the biggest type group is first. For a ZIP, check the ratio column — a large group with a near-0% ratio (e.g. already-compressed PNGs) is your bloat target.
- Step 4Look for files that should not ship — Scan for
map,ts,test,spec,env,logrows in a production artifact. Their presence is a packaging-config bug worth a PR comment. - Step 5Compare before/after a change — Run the breakdown on the artifact before and after a packaging or dependency change and diff the two CSVs to quantify the impact on each type.
- Step 6Automate via the shell, not this tool — For a CI gate, this interactive tool is not the right fit — there is no API. Use
unzip -l/tar -tzf+ awk in your pipeline, and reserve the browser tool for ad-hoc review.
Developer artifacts and what the breakdown reveals
Common artifacts you can drop in and what the type breakdown surfaces. ZIP gives real compression ratios; tarballs and others report uncompressed sizes only.
| Artifact | Format | What to look for |
|---|---|---|
| Frontend release bundle | .zip | Large low-ratio png/jpg groups (already compressed); stray map files shipping to prod |
| npm package | .tgz (tar.gz) | test/spec/fixture files that should be in .npmignore; oversized (no extension) config |
| Docker layer export | .tar / .tar.gz | Unexpected log, cache, or tmp content baked into the image |
| CI build output | .zip / .7z | Database dumps (sql), large bin/dll groups, leaked .env |
| Source snapshot | .zip | High (no extension) count (Dockerfile, LICENSE, Makefile) confirms it is a source archive |
What the CSV report contains
File Type Breakdown emits one CSV per run with these five columns. There are no options to change — the layout is fixed and the rows are sorted largest-uncompressed-first.
| Column | Meaning | Source |
|---|---|---|
extension | Lower-cased text after the final dot in each entry name; entries with no dot are grouped as (no extension) | name.split('.').pop().toLowerCase() |
count | Number of file entries that share that extension (directory entries are skipped) | Per-extension tally |
uncompressedSize | Sum of the original (decompressed) bytes for that extension, in bytes | ZIP central directory usize, or decompressed byte length for non-ZIP inputs |
compressedSize | Sum of the stored (compressed) bytes for that extension, in bytes | ZIP central directory csize; equals uncompressedSize for non-ZIP inputs |
ratio | Space saved for that extension, (1 - compressed/uncompressed) x 100, formatted like 64.2%; shows — when the uncompressed total is zero | Computed per extension |
Tier limits for the archive family
File Type Breakdown is an analysis tool, so the binding limits are usually the per-archive entry count and the file-size cap — both checked before processing starts. Limits are shared across every archive tool.
| Tier | Max archive size | Max entries per archive | Files per run |
|---|---|---|---|
| Free | 50 MB | 500 entries | 1 |
| Pro | 500 MB | 50,000 entries | 20 |
| Pro-media | 2 GB | 500,000 entries | 100 |
| Developer | 2 GB | 500,000 entries | unlimited |
Cookbook
Review-time scenarios. Each shows the CSV you would paste into a PR comment to make the point concrete.
Source maps shipping to production
A release ZIP is bigger than expected. The breakdown shows a fat map group that should never reach prod — a one-line build-config fix.
Input: web-release-3.4.0.zip Output: web-release-3.4.0-types.csv extension,count,uncompressedSize,compressedSize,ratio js,210,18044110,3810448,78.9% map,210,40221884,7044110,82.5% <-- source maps in prod png,142,24180736,23905111,1.1% css,18,2204118,410229,81.4% PR comment: 40 MB of .map files are shipping — set devtool: false or strip on build.
Bundle dominated by un-compressible images
The biggest group is PNGs with a ~1% ratio — they are already compressed, so gzip on the server will not help. The fix is image optimisation, not bundler config.
Input: app-bundle.zip Output: app-bundle-types.csv extension,count,uncompressedSize,compressedSize,ratio png,142,24180736,23905111,1.1% <-- 24 MB, barely compresses js,318,9842110,2110874,78.6% json,57,2204418,401221,81.8% The near-0% ratio means these PNGs are pre-compressed; optimise the source images.
npm pack tarball including test fixtures
npm pack produced a tarball far larger than the source. The breakdown shows a big group of fixture types that an .npmignore would have excluded.
Input: my-lib-2.1.0.tgz (from npm pack) Output: my-lib-2.1.0-types.csv extension,count,uncompressedSize,compressedSize,ratio js,40,401221,401221,0.0% snap,118,8804110,8804110,0.0% <-- jest snapshots json,22,2204118,2204118,0.0% fixture,84,4402110,4402110,0.0% ratio is 0.0% (tarball decompressed first); add test fixtures to .npmignore.
Diffing type totals before and after a dependency bump
Run the breakdown on the artifact pre- and post-upgrade and compare the CSVs to quantify which types grew.
before.csv: js,318,9842110,... png,142,24180736,... after.csv: js,402,14044110,... png,142,24180736,... Diff: js count 318 -> 402, js uncompressed +4.2 MB after the bump. png unchanged. The new dependency pulled in 84 extra JS modules.
Confirming a snapshot is a source archive, not a build
A high (no extension) count (Dockerfile, LICENSE, Makefile, README) together with ts/md confirms you are looking at a source tree, not a built artifact.
Input: snapshot.zip Output: snapshot-types.csv extension,count,uncompressedSize,compressedSize,ratio ts,410,8204110,1920448,76.6% md,22,180221,72104,60.0% (no extension),9,40221,40221,0.0% <-- Dockerfile, LICENSE, Makefile... No dist/ artifacts here — this is the source, not the build output.
Edge cases and what actually happens
Hoping to wire it into CI
No APIThere is no public REST API or batch mode for File Type Breakdown — it is an interactive, one-archive browser tool. For a CI gate, script unzip -l/tar -tzf + awk (or zipinfo for compressed sizes) in your pipeline instead.
Ratio is 0.0% for a .tgz from npm pack
By designTarballs are decompressed before counting, so the compressed column equals the uncompressed column and the ratio is 0.0%. To judge real compressibility, profile the equivalent ZIP, where per-entry compressed sizes are available.
Nested artifact (a zip inside the artifact)
Not expandedAn inner .zip, .tgz, or layer tarball is counted as a single entry of that extension — its contents are not recursed into. Extract the inner archive first and run the breakdown on it for a true picture.
Artifact over the tier cap
Rejected (tier limit)Big Docker exports or monorepo bundles can exceed 50 MB (free) or 500 MB (Pro). The size and 500/50,000-entry limits are checked before processing. Upgrade the tier or profile a smaller artifact.
Minified bundle has tiny extension diversity
ExpectedA heavily bundled app may show only a handful of extensions (js, css, map, png). Low Distinct types is normal for a built artifact — high diversity usually means you are looking at a source tree.
Multi-dot names like app.min.js.map
ExpectedOnly the text after the final dot is used, so app.min.js.map is grouped under map. That is intentional, but watch for it when reasoning about counts of .js vs .js.map.
Wanted per-file breakdown, not per-type
Use a siblingThis tool aggregates by extension. For a per-file listing with sizes and paths, use file-listing-generator; for folder rollups, archive-size-analyzer.
Reproducible-build comparison expected
Out of scopeFile Type Breakdown reads an archive and outputs a CSV — it does not rewrite or normalise the archive, so it cannot make a build reproducible. For byte-stable archives, address timestamps/ordering in your packaging step and verify with checksums.
Frequently asked questions
Is there an API so I can run this in CI?
No. File Type Breakdown is an interactive browser tool with no public API or batch endpoint — it processes one archive at a time. For automated CI checks, use shell equivalents like unzip -l/zipinfo/tar -tzf piped through awk, which give the same counts.
Can it read a tarball from npm pack or docker save?
Yes. .tgz/.tar.gz and plain .tar are supported, as are .zip, .7z, .xz, and more. Tarballs are decompressed first, so their compressed column equals the uncompressed column (ratio 0.0%).
How do I find bundle bloat with it?
Profile the release ZIP and read the rows top-down (sorted by uncompressed size). A large group with a near-0% ratio is already-compressed content (optimise the source); a large group with a high ratio is text you might split or lazy-load. The compressed sizes are real for ZIP.
Why does my npm tarball show a 0.0% ratio?
Tarballs are gunzipped before counting, so the tool only sees uncompressed bytes — compressed equals uncompressed, ratio 0.0%. This does not mean the package is uncompressible; it means the format does not expose per-entry compressed sizes the way ZIP does.
Can it tell me which individual files are biggest?
Not directly — it aggregates by extension. For a per-file size listing use file-listing-generator; for folder-level rollups use archive-size-analyzer.
Does it expand nested archives in the artifact?
No. A nested .zip or layer tarball counts as one entry. Extract it with multi-format-extractor and run the breakdown on the inner archive to see its types.
How are files like app.min.js.map grouped?
By the text after the final dot, so app.min.js.map lands under map. Keep that in mind when comparing .js and .js.map counts — the source map does not add to the js group.
Can it help make my build reproducible?
No — it only reads an archive and reports types; it does not rewrite the archive. Reproducibility comes from controlling timestamps and entry order in your packaging step; verify the result with a checksum/hash tool.
How big an artifact can I profile?
Up to the tier cap: 50 MB / 500 entries free, 500 MB / 50,000 on Pro, 2 GB / 500,000 on Pro-media and Developer. The limits are enforced before processing begins.
Will it catch a .env or secret in the artifact?
It will show an env (or pem, key) row if such a file is present, which is a useful packaging-hygiene flag. It does not read file contents, so it cannot detect a secret embedded inside, say, a .js file.
Is the reviewer's machine sending the artifact anywhere?
No. The artifact is read locally in the browser; nothing is uploaded. That makes it safe to profile an internal build from a colleague's PR on any machine with a browser.
What is the difference from the size analyser for dev work?
archive-size-analyzer breaks down by both folder and extension with rollups — better for tracing bloat to a directory. File Type Breakdown is the fast type-only view for a quick what-dominates-this check.
Privacy first
Every JAD Archive tool runs entirely in your browser using fflate, @zip.js/zip.js, and the libarchive WASM bridge. Your archives never leave your device — verified by zero outbound network requests during processing.