How to break down an archive by file extension, free and online
- Step 1Open the tool — Go to /archive-tools/file-type-breakdown. No account is needed to run it on the free tier.
- Step 2Drop your archive — Drag a single archive onto the dropzone, or click to browse. Supported reads:
.zip,.gz,.tar,.tar.gz,.tar.bz2,.tar.xz,.bz2,.xz,.7z,.rar,.iso. The size and entry-count limits are checked before any work starts. - Step 3Let it parse the directory — For a ZIP, the tool reads the central directory — fast, and it does not decompress the payload. For other formats it decompresses in a Web Worker so the page stays responsive.
- Step 4Read the metrics — The result panel shows two metrics: Distinct types (how many unique extensions) and Total entries (file count, directories excluded). Use these as a quick sanity check against what you expected.
- Step 5Download the CSV — Click Download to save
<archive-name>-types.csv. Rows are already sorted by uncompressed size, descending, so the biggest type group is at the top. - Step 6Chart or filter it — Open the CSV in a spreadsheet and sort or chart on
count,uncompressedSize, orratio. To go deeper per folder, hand the same archive to archive-size-analyzer.
What the CSV report contains
File Type Breakdown emits one CSV per run with these five columns. There are no options to change — the layout is fixed and the rows are sorted largest-uncompressed-first.
| Column | Meaning | Source |
|---|---|---|
extension | Lower-cased text after the final dot in each entry name; entries with no dot are grouped as (no extension) | name.split('.').pop().toLowerCase() |
count | Number of file entries that share that extension (directory entries are skipped) | Per-extension tally |
uncompressedSize | Sum of the original (decompressed) bytes for that extension, in bytes | ZIP central directory usize, or decompressed byte length for non-ZIP inputs |
compressedSize | Sum of the stored (compressed) bytes for that extension, in bytes | ZIP central directory csize; equals uncompressedSize for non-ZIP inputs |
ratio | Space saved for that extension, (1 - compressed/uncompressed) x 100, formatted like 64.2%; shows — when the uncompressed total is zero | Computed per extension |
Input formats and which size columns are meaningful
ZIP is read straight from the central directory, so both size columns are real and the ratio is accurate. Every other format is decompressed first, after which the compressed total equals the uncompressed total and the ratio reads 0.0%.
| Input format | Read engine | compressedSize accurate? | ratio meaningful? |
|---|---|---|---|
.zip (incl. AES / ZipCrypto encrypted) | Central-directory parser (no fflate needed for metadata) | Yes — real per-entry compressed size | Yes |
.gz (single member) | fflate gunzip | No — one row only, compressed = uncompressed | No (reads 0.0%) |
.tar | fflate tar parser | No — tar stores files uncompressed | No (0.0%) |
.tar.gz / .tgz | fflate | No — sizes are post-decompression | No (0.0%) |
.tar.bz2, .tar.xz, .bz2, .xz, .7z, .rar, .iso | libarchive WASM (read-only) | No — sizes are post-decompression | No (0.0%) |
Tier limits for the archive family
File Type Breakdown is an analysis tool, so the binding limits are usually the per-archive entry count and the file-size cap — both checked before processing starts. Limits are shared across every archive tool.
| Tier | Max archive size | Max entries per archive | Files per run |
|---|---|---|---|
| Free | 50 MB | 500 entries | 1 |
| Pro | 500 MB | 50,000 entries | 20 |
| Pro-media | 2 GB | 500,000 entries | 100 |
| Developer | 2 GB | 500,000 entries | unlimited |
Cookbook
Real archives, real output. Every example shows the exact five-column CSV the tool produces — note that the ratio column is only meaningful for ZIP inputs.
Profiling a bloated project ZIP
A teammate sends a 38 MB project.zip and you want to know where the weight is before unzipping. Drop it in; the CSV sorts the largest type to the top — here the PNG assets, not the source.
Output: project-types.csv extension,count,uncompressedSize,compressedSize,ratio png,142,24180736,23905111,1.1% js,318,9842110,2110874,78.6% json,57,2204418,401221,81.8% map,40,1980221,388110,80.4% (no extension),6,40221,40221,0.0% Distinct types: 5 | Total entries: 563 Read at a glance: PNGs are 24 MB and barely compress (1.1%) — that is the dead weight.
A TAR.GZ where the ratio column reads 0.0%
Non-ZIP formats are decompressed before counting, so the tool only sees the uncompressed bytes — compressedSize equals uncompressedSize and ratio is 0.0% for every row. This is expected, not a bug.
Input: logs-2026-06.tar.gz Output: logs-2026-06-types.csv extension,count,uncompressedSize,compressedSize,ratio log,1204,1875432110,1875432110,0.0% gz,18,40221884,40221884,0.0% json,4,221044,221044,0.0% Distinct types: 3 | Total entries: 1226 For real compression ratios, convert to ZIP first or use compression-ratio-calculator.
Auditing an encrypted ZIP without the password
The payload is AES-encrypted but the central directory is not, so the breakdown still lists every extension, count, and size. Useful for screening an archive you have received but cannot yet open.
Input: confidential-export.zip (AES-256 encrypted) Output: confidential-export-types.csv extension,count,uncompressedSize,compressedSize,ratio csv,12,88204110,21044887,76.1% pdf,3,9920114,9810022,1.1% xlsx,2,4420118,4310229,2.5% Distinct types: 3 | Total entries: 17 No password was required — only the encrypted file bytes need the key.
Counting dotfiles and extension-less files
Source-code archives are full of Dockerfile, LICENSE, and .gitignore. The tool groups extension-less names under (no extension) and treats a leading-dot dotfile's text as its extension.
Input: repo-snapshot.zip Output: repo-snapshot-types.csv extension,count,uncompressedSize,compressedSize,ratio ts,410,8204110,1920448,76.6% (no extension),9,40221,40221,0.0% gitignore,1,412,412,0.0% md,22,180221,72104,60.0% Distinct types: 4 | Total entries: 442 Dockerfile, LICENSE, Makefile all land in (no extension).
Spotting case-folded duplicates merging
Mixed-case extensions from a Windows-built archive collapse together because the tool lower-cases before grouping. .JPG, .Jpg, and .jpg become a single jpg row.
Entries in archive: IMG_001.JPG, IMG_002.jpg, IMG_003.Jpg, banner.PNG, banner2.png Output rows (sizes omitted): jpg,3,... png,2,... If you need the original casing kept separate, use archive-metadata-extractor for the raw, un-folded entry list.
Edge cases and what actually happens
Archive over the tier size cap
Rejected (tier limit)The free tier rejects any archive over 50 MB before processing. Pro raises this to 500 MB, Pro-media and Developer to 2 GB. The check happens up front, so you see the limit message immediately rather than after a long parse.
More than 500 entries on the free tier
Rejected (entry limit)Archives are also capped by entry count: 500 on Free, 50,000 on Pro, 500,000 on Pro-media and Developer. A ZIP with thousands of tiny files can hit the entry cap well before the size cap. Upgrade or split the archive first.
Ratio column shows 0.0% on a TAR.GZ / 7z
By designOnly ZIP exposes per-entry compressed sizes (via the central directory). Every other format is decompressed first, so compressedSize equals uncompressedSize and the ratio reads 0.0%. This is expected behaviour, not corruption.
Ratio shows an em dash (—)
ExpectedWhen an extension's uncompressed total is zero (e.g. a group of empty placeholder files), the ratio is undefined and is rendered as — rather than a misleading percentage.
Folders counted as a type
PreservedDirectory entries are skipped entirely — they never appear as a row and never inflate count. Total entries reflect files only, so it can differ from a raw unzip -l line count that includes directory rows.
A renamed file (e.g. RAR saved as .zip)
Detected by signatureThe tool detects format from the file's magic bytes, not the name. A RAR renamed to .zip is read as RAR via libarchive. If detection fails, run auto-format-detector to confirm the true format.
A single .gz file
Supported (one row)A bare .gz holds one member, so the breakdown returns a single extension row (the inner file's extension). If you expected many rows, the file is probably a .tar.gz mislabelled as .gz, or it genuinely contains one file.
Encrypted ZIP with no password supplied
SupportedFilenames and sizes live in the unencrypted central directory, so the breakdown succeeds without a password. Only the file payloads would need a key, and this tool never reads them.
Truly corrupt archive
ErrorIf the ZIP central directory is unreadable and the libarchive fallback also fails, the run errors out. Try corrupted-zip-repair on a ZIP, or confirm the format with auto-format-detector before retrying.
Browser extension blocks WebAssembly
Errorlibarchive-backed formats (7z, rar, xz, iso) need WASM. A privacy extension that blocks WebAssembly can stall the worker. Retry in a private/incognito window with extensions disabled.
Frequently asked questions
Is File Type Breakdown really free?
Yes. The free tier processes archives up to 50 MB with up to 500 entries, with no signup. Larger archives need a paid tier (Pro: 500 MB / 50,000 entries; Pro-media and Developer: 2 GB / 500,000 entries), but the tool itself is free to use within those limits.
Do my files get uploaded?
No. The archive is read entirely in your browser via the File API. ZIPs are parsed from the central directory and other formats are decompressed in a Web Worker — all client-side. Nothing about the archive's contents is sent to a server.
What formats can it read?
ZIP (including AES/ZipCrypto-encrypted), GZIP, TAR, TAR.GZ, TAR.BZ2, TAR.XZ, BZ2, XZ, 7z, RAR, and ISO. ZIP/GZIP/TAR use fflate; the rest use a libarchive WASM bridge. All reads are local.
Why is the ratio 0.0% for my .tar.gz?
Only ZIP stores per-entry compressed sizes that the tool can read without decompressing. For TAR.GZ, 7z, RAR, and the others, the tool decompresses first and then only knows the uncompressed size, so compressed equals uncompressed and the ratio is 0.0%. That is expected.
Can it open password-protected ZIPs?
For the breakdown, yes — and without the password. Extension names and sizes are stored in the unencrypted central directory. You only need a password to read the actual file contents, which this tool does not do.
Does it look inside the files to detect type?
No. Grouping is purely by the text after the last dot in each entry name, lower-cased. A .txt that is really a renamed PDF is counted as txt. For content-based identification you would need a different approach.
What happens to files with no extension?
They are grouped under (no extension). Common members there are Dockerfile, LICENSE, README, and Makefile. A large (no extension) count usually signals a source-code or build archive.
Are uppercase and lowercase extensions separated?
No. Extensions are lower-cased before grouping, so .JPG and .jpg merge into one jpg row. If you need case preserved, use the archive-metadata-extractor for the raw entry list.
Can I run it offline?
Once the page (and, for non-ZIP formats, the WASM module) has loaded, the parsing itself is local and does not need the network. A fresh visit needs to download the page assets first.
What is the difference from the Size Analyser?
archive-size-analyzer groups by both extension and folder with rollup totals — richer but heavier. File Type Breakdown is just the extension grouping: simpler, faster, ideal for a quick what-is-in-here check.
Can it merge several archives into one report?
No — it processes one archive per run. To combine, run each archive separately and concatenate the CSVs in a spreadsheet, or first combine the archives with a sibling tool and then run the breakdown on the result.
How do I see real compression savings across the whole archive?
For a single overall ratio, use compression-ratio-calculator. For per-extension savings, File Type Breakdown's ratio column gives that directly — but only for ZIP inputs, where compressed sizes are known.
Privacy first
Every JAD Archive tool runs entirely in your browser using fflate, @zip.js/zip.js, and the libarchive WASM bridge. Your archives never leave your device — verified by zero outbound network requests during processing.