How to size analyser vs unzip -l / 7z l / du
- Step 1Reproduce unzip -l aggregation in the browser —
unzip -l a.zip | awk '{print $4}' | ...is the manual route to a per-type total. The Size Analyser does it directly: dropa.zip, readbyExtension, the biggest type is the first array element. - Step 2Reproduce 7z l for 7z/RAR —
7z l a.7zneeds p7zip installed. The Size Analyser routes 7z and RAR through libarchive WASM, so you get the same per-entry sizes summed by type — no local install. Note it fully decompresses 7z/RAR to measure, where7z lreads only the header. - Step 3Skip the du step entirely —
du -sh extracted/*only works afterunzip. The analyser reports per-top-folder bytes straight from the archive, so you never write the extracted tree to disk. - Step 4Compare on size ceiling — Files over 2 GB belong on the CLI — browser memory becomes the bottleneck, especially for 7z/RAR where libarchive expands entries in RAM. Under the tier caps the browser is faster to results because there is nothing to install or pipe.
- Step 5Compare on privacy — Both the browser tool and the CLI run locally, so both are private. The browser path adds: no install, no admin rights, nothing written to disk. Useful when auditing untrusted input you do not want to extract.
- Step 6Compare on automation — For one PR or one ad-hoc triage, the browser is faster end to end. For nightly jobs over thousands of archives, the CLI with
find ... -execwins — the analyser reads one archive per run.
Command-by-command equivalents
What each CLI command gives you versus what the Size Analyser returns. The analyser's headline advantage is that the aggregation step is already done.
| Goal | CLI | Size Analyser |
|---|---|---|
| List entries + sizes | unzip -l a.zip / 7z l a.7z / tar -tvf a.tar.gz | Reads them internally; not printed flat — returned as grouped totals |
| Total bytes per file type | unzip -l a.zip | awk + sort | byExtension, pre-summed and sorted descending |
| Total bytes per top folder | extract, then du -sh extracted/* | byTopFolder, no extraction needed |
| Which group is biggest | manual sort -rn | First element of each array (already sorted) |
| Machine-readable output | parse whitespace-aligned text | Structured JSON: { ext, count, totalSize } |
Browser tool vs CLI: trade-offs
Honest comparison. Neither is strictly better — the right choice depends on file size, install constraints, and whether you are scripting.
| Dimension | JAD Size Analyser | unzip -l / 7z l / du |
|---|---|---|
| Install | None (browser) | Needs unzip/p7zip/tar |
| Upload | Never (local WASM) | Never (local) |
| Max archive | 50 MB Free / 500 MB Pro / 2 GB Pro-Media | No ceiling (disk/RAM bound) |
| Entry cap | 500 Free / 50,000 Pro / 500,000 Pro-Media | None |
| Aggregation | Built in (by ext + by folder) | Manual (awk/sort/du) |
| Output | JSON report | Plain text |
| Best for | Ad-hoc triage, locked-down machines | Huge files, scripted batch jobs |
Cookbook
Side-by-side: the shell pipeline you would otherwise write, versus the analyser report. Sizes are uncompressed bytes in both.
Per-type total: unzip -l + awk vs one drop
The classic 'which extension dominates' question. The CLI needs a pipe; the analyser returns it sorted.
CLI:
unzip -l app.zip | awk 'NR>3 {n=split($4,a,"."); s[a[n]]+=$1}
END {for (k in s) print s[k], k}' | sort -rn
Analyser:
drop app.zip -> byExtension[0]
{ "ext": "map", "count": 19, "totalSize": 73400320 }
Same answer, no pipeline to debug.Per-folder total: du vs byTopFolder
du needs the archive extracted first. The analyser reads top-level folder totals directly.
CLI:
unzip -q bundle.zip -d /tmp/b && du -sh /tmp/b/* | sort -rh
Analyser:
drop bundle.zip -> byTopFolder
[ { "folder": "vendor", "totalSize": 188743680, "count": 4012 },
{ "folder": "src", "totalSize": 8388608, "count": 230 } ]
No /tmp extraction, no cleanup.7z without p7zip installed
On a machine with no p7zip, 7z l fails. The analyser reads 7z through libarchive WASM in the browser.
CLI (no p7zip):
7z l backup.7z -> command not found: 7z
Analyser:
drop backup.7z (engine: libarchive WASM)
byExtension[0] = { "ext": "sql", "count": 1, "totalSize": 524288000 }
Caveat: libarchive fully decompresses 7z to measure, so this
is RAM-heavier than `7z l` reading only headers.Where the CLI wins: a 6 GB archive
Above the 2 GB tier ceiling the browser cannot help. Stay on the CLI.
Archive: nightly-dump.tar.gz (6 GB)
Analyser: rejected (over 2 GB Pro-Media cap)
CLI: tar -tzvf nightly-dump.tar.gz | awk '{s[...]} ...'
streams the index without loading it all into RAM
Use the CLI for anything past the tier caps.Where the CLI wins: batch over 5,000 archives
The analyser reads one archive per run. A nightly sweep belongs in a shell loop.
CLI:
find /backups -name '*.zip' -print0 |
xargs -0 -P4 -I{} sh -c 'unzip -l "{}" | tail -1'
Analyser:
one archive per run; for a multi-archive size summary use
/archive-tools/batch-compression-report on a paid tier.Edge cases and what actually happens
Output is JSON, not text columns
By designUnlike unzip -l, the analyser does not print a fixed-width entry table. It returns grouped JSON. If your downstream parser expects unzip -l columns, you must adapt it — but JSON is far more robust than scraping whitespace-aligned text.
7z/RAR measured by full decompression
RAM cost7z l reads only the archive header to print sizes; the analyser routes 7z/RAR through libarchive, which decompresses each entry to measure its expanded length. For large 7z archives this uses more memory than the CLI listing — a reason to prefer the CLI on multi-GB 7z files.
Archive larger than the tier cap
Tier limit (rejected)Above 50 MB Free / 500 MB Pro / 2 GB Pro-Media the analyser rejects the file outright, where unzip -l has no ceiling. Split with /archive-tools/archive-splitter or use the CLI for oversized archives.
Entry count over the cap
Tier limit (rejected)The analyser also enforces an entry cap (500 Free / 50,000 Pro / 500,000 Pro-Media). unzip -l has none. A ZIP with a million tiny files is a CLI job.
du counts disk blocks, analyser counts bytes
Difference to expectdu reports allocated disk blocks (rounded up to the filesystem block size) of the EXTRACTED tree, so its numbers run slightly higher than the analyser's raw uncompressed byte sums. They will not match to the byte — that is expected, not a bug.
Compressed size not reported
Use a sibling toolunzip -l shows both compressed and uncompressed columns; the analyser groups by uncompressed bytes only. For compression ratio per file or overall, use /archive-tools/compression-ratio-calculator.
Nested archives
Counted as one entryBoth unzip -l and the analyser list an inner data.zip as a single entry — neither recurses by default. Extract first with /archive-tools/nested-archive-extractor to look inside.
Mis-named archive
Handled by magic bytesunzip keys off content and so does the analyser (magic-byte detection). A report.zip that is really a 7z is read correctly by the analyser via libarchive — and would fail under plain unzip.
Frequently asked questions
Does the Size Analyser just wrap unzip?
No. It uses fflate (a pure-JS ZIP/GZIP library) to read the ZIP/TAR.GZ central directory, and libarchive compiled to WebAssembly for 7z/RAR/bz2/xz. There is no shell-out and no server — everything runs in your browser tab.
When should I prefer the CLI?
Archives over 2 GB, scripted batch runs across thousands of files, CI pipelines, and any case where the archive already lives on a server you control. The CLI has no size or entry ceiling and slots into find -exec loops.
When should I prefer the browser tool?
One-off triage, machines where you cannot install p7zip or run sudo, untrusted input you do not want to extract to disk, and any time you want the by-type and by-folder rollups without writing an awk pipeline.
Is the output interchangeable with CLI tooling?
The analyser emits JSON, not unzip -l text. If a script expects the CLI's column format you will need to adapt it. The upside is that JSON parsing is far more reliable than scraping aligned columns.
Why might du give bigger numbers?
du measures allocated disk blocks of the extracted files, rounded up to the filesystem block size. The analyser sums raw uncompressed bytes from the archive. The two are close but rarely identical — block rounding accounts for the gap.
Does it read 7z and RAR like 7z l does?
Yes, via libarchive WASM. One difference: 7z l reads only the archive header to print sizes, while the analyser decompresses each 7z/RAR entry to measure it. That makes the analyser heavier on RAM for large 7z files.
Can it handle a 10 GB archive like the CLI?
No. The hard ceiling is 2 GB (Pro-Media / Developer). Past that, browser memory is the constraint and you should use the CLI, which streams the index.
Does it show compressed sizes like unzip -l?
No. The analyser groups by UNCOMPRESSED bytes only. For compressed vs uncompressed comparison and ratios, use /archive-tools/compression-ratio-calculator.
What about counts per type, like a quick wc?
Each byExtension row includes a count alongside totalSize. If you want a count-first view of types use /archive-tools/file-type-breakdown, which is the count-oriented companion.
Can I batch many archives like a shell loop?
Not in this tool — it reads one archive per run. For a multi-archive summary use /archive-tools/batch-compression-report on a paid tier; for true scripted batches the CLI remains the right tool.
Is anything uploaded when I use the browser tool?
No. Like the CLI, it runs locally — just inside your browser instead of your shell. The only network touch is a one-time WASM module fetch for 7z/RAR/bz2/xz support.
Does the analyser find duplicates the way fdupes does?
No. It groups by name and folder, not content hashes. For byte-identical duplicate detection use /archive-tools/redundancy-analyzer, which hashes entries with SHA-256.
Privacy first
Every JAD Archive tool runs entirely in your browser using fflate, @zip.js/zip.js, and the libarchive WASM bridge. Your archives never leave your device — verified by zero outbound network requests during processing.