How to file type breakdown for security and compliance audits
- Step 1Stage the archive locally — Copy the archive to the analyst machine from your evidence store, backup vault, or pipeline artifact bucket. The tool reads from disk via the File API; nothing transits a network.
- Step 2Open the tool and check tier — Go to /archive-tools/file-type-breakdown. Free covers 50 MB / 500 entries; for larger deliverables or backups, a Pro tier raises the cap to 500 MB / 50,000 entries (2 GB / 500,000 on higher tiers).
- Step 3Drop and parse without decrypting — Drop the ZIP. The breakdown reads the central directory only — it never touches encrypted payloads — so you get the type profile even on a password-protected archive without the password.
- Step 4Scan for unexpected extensions — Read the CSV top-down. Anything executable or credential-like (
exe,dll,bat,sh,ps1,pem,key,p12) in an archive that should be documents or data is a finding to escalate. - Step 5Sanity-check proportions — Use the size columns to catch outliers — a single extension dominating the uncompressed total, or a type that should not be present consuming megabytes, both warrant a closer look.
- Step 6Record the evidence — Save
<archive-name>-types.csvand attach it to the ticket or audit record. For folder-level rollups or per-file detail, follow up with archive-size-analyzer or archive-metadata-extractor.
Compliance questions the breakdown answers (and the ones it does not)
File Type Breakdown classifies by filename extension only. Use it for the screening questions below; pair it with content-aware tooling for the rest.
| Question | Can the breakdown answer it? | How / what to use instead |
|---|---|---|
| Are there executables in this doc-only deliverable? | Yes (by extension) | Scan for exe, dll, bat, sh, ps1 rows in the CSV |
| Are credential files present (.pem, .key, .p12)? | Yes (by extension) | Look for those extensions in the type list |
| What proportion of the archive is each type? | Yes | Sort the CSV by uncompressedSize |
| Can I profile it without the password? | Yes, for ZIP | Central directory is unencrypted; no key needed |
| Is this .txt actually a renamed executable? | No | Extension-based only; needs content/magic-byte inspection of each member |
| Does any file contain PII / secrets? | No | Content scanning is out of scope — extract and scan with a DLP tool |
| Has any file been tampered with? | No | Use a checksum/hash workflow to compare against a known-good manifest |
Extension red flags to scan for
Quick reference for triage. Presence is a signal to investigate, not proof of compromise — the tool reports extensions, not behaviour.
| Category | Extensions | Why it is worth a second look in a deliverable/backup |
|---|---|---|
| Executables / scripts | exe, dll, bat, cmd, sh, ps1, vbs, jar | Unexpected runnable code in a documents-only or data-only package |
| Credentials / keys | pem, key, p12, pfx, keystore, env | Secrets that should never ship inside a deliverable or backup |
| Config / infra | tf, yaml, yml, conf, ini | Infrastructure config that may embed endpoints or secrets |
| Databases / dumps | sql, db, sqlite, bak | Bulk data exports — often the unexpectedly-large outlier |
| Archives within archives | zip, 7z, rar, gz | Nested archives the breakdown counts as one entry — extract to recurse |
Tier limits for the archive family
File Type Breakdown is an analysis tool, so the binding limits are usually the per-archive entry count and the file-size cap — both checked before processing starts. Limits are shared across every archive tool.
| Tier | Max archive size | Max entries per archive | Files per run |
|---|---|---|---|
| Free | 50 MB | 500 entries | 1 |
| Pro | 500 MB | 50,000 entries | 20 |
| Pro-media | 2 GB | 500,000 entries | 100 |
| Developer | 2 GB | 500,000 entries | unlimited |
Cookbook
Triage scenarios from real audit work. The breakdown narrows where to look; content-aware tools take it from there.
Executable hiding in a docs-only deliverable
A vendor sends a ZIP described as 'PDF reports and spreadsheets'. The breakdown shows an exe and a bat row — an immediate escalation before anyone double-clicks anything.
Input: vendor-q2-deliverable.zip Output: vendor-q2-deliverable-types.csv extension,count,uncompressedSize,compressedSize,ratio pdf,42,88204110,86110044,2.4% xlsx,11,12044118,8810229,26.8% exe,1,2204110,2150448,2.4% <-- unexpected bat,1,418,210,49.8% <-- unexpected Finding: runnable code in a documents-only package. Quarantine and review.
Credential files in a backup
A backup archive should hold app data only. A .pem and .env row reveal secrets were swept into the backup — a compliance issue regardless of intent.
Input: app-backup-2026-06.tar.gz Output: app-backup-2026-06-types.csv extension,count,uncompressedSize,compressedSize,ratio json,1840,402211044,402211044,0.0% log,512,188044110,188044110,0.0% pem,2,3221,3221,0.0% <-- private keys in a backup env,1,884,884,0.0% <-- secrets in a backup Finding: rotate the exposed keys and fix the backup exclusion rules.
Profiling an encrypted deliverable before getting the key
The deliverable is AES-encrypted and the password is still in transit from the vendor. The breakdown reads the unencrypted central directory so you can verify the type makeup matches the statement of work first.
Input: secure-handoff.zip (AES-256, no password yet) Output: secure-handoff-types.csv extension,count,uncompressedSize,compressedSize,ratio docx,18,40221110,9810448,75.6% pdf,9,22044118,21810229,1.1% csv,4,8804110,2210448,74.9% No password needed — only the contents would require the key. Profile matches the SOW; proceed to request the key.
Catching an oversized data dump
A pipeline artifact is unexpectedly huge. The breakdown shows a single .sql row accounting for nearly all of it — a database dump that should not be in the build output.
Input: ci-artifact-build-4412.zip Output: ci-artifact-build-4412-types.csv extension,count,uncompressedSize,compressedSize,ratio sql,1,1980221884,402211044,79.7% <-- 1.98 GB dump js,318,9842110,2110874,78.6% map,40,1980221,388110,80.4% Finding: a prod DB dump leaked into the build artifact. Remove and audit access.
Why this is screening, not proof
A .txt row looks harmless, but extension classification cannot confirm a renamed payload. The breakdown narrows scope; a content/magic-byte check on the suspicious member is the next step.
Output row (looks benign): txt,200,4402110,1810448,58.9% Reality: one 'notes.txt' is a renamed executable. File Type Breakdown classifies by extension only — it will not catch this. Next step: extract with selective-extractor and run a content/magic-byte check on members flagged by size or path anomalies.
Edge cases and what actually happens
Encrypted ZIP, no password
SupportedThe breakdown reads the unencrypted central directory, so extension counts and sizes are available for a password-protected ZIP without the key. This is exactly what makes pre-key screening possible. Only file contents need the password, which this tool never reads.
Renamed file (e.g. .exe as .txt)
Not detectedClassification is by filename extension only. A malicious executable renamed to .txt is counted as txt. Treat the breakdown as a triage filter and follow up with content/magic-byte inspection on suspicious members via selective-extractor.
Nested archive hiding files
Not expandedA .zip or .7z inside the archive is counted as one entry of that extension — its contents are not expanded. For defence-in-depth, extract nested archives with multi-format-extractor and run the breakdown on each.
Archive exceeds tier limit
Rejected (tier limit)Large backups can exceed the 50 MB free cap or the 500-entry free limit. Pro raises these to 500 MB / 50,000 entries and higher tiers to 2 GB / 500,000. The check runs before processing, so nothing partial is produced.
Content scanning expected
Out of scopeThe tool reports extensions, not file contents — it cannot find PII, secrets, or malware signatures inside files. For that, extract and run a DLP or AV pipeline. The breakdown only tells you where to point those tools.
Tamper detection expected
Out of scopeFile Type Breakdown does not verify integrity or compare against a baseline. For tamper detection, hash the entries and diff against a known-good manifest, or compare two archives with archive-diff.
Ratio reads 0.0% on a TAR.GZ backup
By designNon-ZIP formats are decompressed before counting, so the compressed column equals the uncompressed column and the ratio is 0.0%. The counts and uncompressed totals are still fully accurate for audit purposes.
Air-gapped analyst machine
Supported (after load)Once the page and any required WASM module are loaded, parsing is local and needs no network. On a fully air-gapped host you would need the assets cached first; the analysis itself does not phone home.
Frequently asked questions
Does the archive leave our network?
No. It is read entirely in the analyst's browser via the File API — ZIPs from the central directory, other formats decompressed in a local Web Worker. No archive bytes are uploaded, which is the point for sensitive deliverables and evidence.
Can it profile an encrypted ZIP we do not have the password for?
Yes. Filenames, extensions, and sizes live in the unencrypted central directory, so the breakdown works without the password. Only the file contents are encrypted, and this tool does not read them.
Will it tell me if a file contains secrets or PII?
No. It classifies by filename extension, not content. It can flag that a .pem or .env file is present, but to inspect what is inside any file you must extract it and run a content scanner or DLP tool.
Can it detect a malicious file renamed to look harmless?
Not on its own — a .exe renamed to .txt is counted as txt. Use the breakdown to spot anomalies (unexpected types, outsized files), then magic-byte/content-check the suspicious members with a dedicated tool.
Does it count files inside nested archives?
No. A nested .zip or .7z is one entry of that extension; its contents are not expanded. Extract nested archives separately and run the breakdown on each for full coverage.
Is the output suitable as audit evidence?
Yes — it is a plain, deterministic CSV with per-extension counts and byte totals plus two metrics (Distinct types, Total entries). Attach it to a ticket or evidence record; it reproduces exactly for anyone who runs the same archive.
Can analysts use it on a locked-down corporate laptop?
Yes. There is nothing to install and no admin rights needed — it runs in the browser. That is often easier than getting p7zip or unrar approved on a managed endpoint.
What extensions should raise a flag in a deliverable?
Executables and scripts (exe, dll, bat, sh, ps1), credentials (pem, key, p12, env), and unexpected bulk data (sql, bak) are common red flags in a package that should contain only documents or data. Presence is a signal to investigate, not proof.
How large an archive can we screen?
Up to the tier cap and entry limit: 50 MB / 500 entries free, 500 MB / 50,000 on Pro, 2 GB / 500,000 on Pro-media and Developer. Both the size and entry count are checked before processing.
Can it compare a deliverable against a known-good baseline?
Not directly — it profiles one archive. To detect changes or tampering, hash the contents and compare to a manifest, or diff two archives with archive-diff.
Why does the ratio column read 0.0% on our .tar.gz backups?
Only ZIP exposes per-entry compressed sizes without decompressing. TAR.GZ and other formats are decompressed first, so the tool only knows the uncompressed size and the ratio is 0.0%. Counts and uncompressed totals remain accurate.
Does using it create any server-side record of our data?
No archive content is stored server-side. The only server interaction is an optional usage counter for signed-in dashboard stats — it records that a file was processed, never its contents or filenames.
Privacy first
Every JAD Archive tool runs entirely in your browser using fflate, @zip.js/zip.js, and the libarchive WASM bridge. Your archives never leave your device — verified by zero outbound network requests during processing.