File Type Breakdown for Security & Compliance — Audit Archive Contents Offline

How to file type breakdown for security and compliance audits

Step 1
Stage the archive locally — Copy the archive to the analyst machine from your evidence store, backup vault, or pipeline artifact bucket. The tool reads from disk via the File API; nothing transits a network.
Step 2
Open the tool and check tier — Go to /archive-tools/file-type-breakdown. Free covers 50 MB / 500 entries; for larger deliverables or backups, a Pro tier raises the cap to 500 MB / 50,000 entries (2 GB / 500,000 on higher tiers).
Step 3
Drop and parse without decrypting — Drop the ZIP. The breakdown reads the central directory only — it never touches encrypted payloads — so you get the type profile even on a password-protected archive without the password.
Step 4
Scan for unexpected extensions — Read the CSV top-down. Anything executable or credential-like (exe, dll, bat, sh, ps1, pem, key, p12) in an archive that should be documents or data is a finding to escalate.
Step 5
Sanity-check proportions — Use the size columns to catch outliers — a single extension dominating the uncompressed total, or a type that should not be present consuming megabytes, both warrant a closer look.
Step 6
Record the evidence — Save <archive-name>-types.csv and attach it to the ticket or audit record. For folder-level rollups or per-file detail, follow up with archive-size-analyzer or archive-metadata-extractor.

Compliance questions the breakdown answers (and the ones it does not)

File Type Breakdown classifies by filename extension only. Use it for the screening questions below; pair it with content-aware tooling for the rest.

Question	Can the breakdown answer it?	How / what to use instead
Are there executables in this doc-only deliverable?	Yes (by extension)	Scan for `exe`, `dll`, `bat`, `sh`, `ps1` rows in the CSV
Are credential files present (.pem, .key, .p12)?	Yes (by extension)	Look for those extensions in the type list
What proportion of the archive is each type?	Yes	Sort the CSV by uncompressedSize
Can I profile it without the password?	Yes, for ZIP	Central directory is unencrypted; no key needed
Is this .txt actually a renamed executable?	No	Extension-based only; needs content/magic-byte inspection of each member
Does any file contain PII / secrets?	No	Content scanning is out of scope — extract and scan with a DLP tool
Has any file been tampered with?	No	Use a checksum/hash workflow to compare against a known-good manifest

Extension red flags to scan for

Quick reference for triage. Presence is a signal to investigate, not proof of compromise — the tool reports extensions, not behaviour.

Category	Extensions	Why it is worth a second look in a deliverable/backup
Executables / scripts	`exe`, `dll`, `bat`, `cmd`, `sh`, `ps1`, `vbs`, `jar`	Unexpected runnable code in a documents-only or data-only package
Credentials / keys	`pem`, `key`, `p12`, `pfx`, `keystore`, `env`	Secrets that should never ship inside a deliverable or backup
Config / infra	`tf`, `yaml`, `yml`, `conf`, `ini`	Infrastructure config that may embed endpoints or secrets
Databases / dumps	`sql`, `db`, `sqlite`, `bak`	Bulk data exports — often the unexpectedly-large outlier
Archives within archives	`zip`, `7z`, `rar`, `gz`	Nested archives the breakdown counts as one entry — extract to recurse

Tier limits for the archive family

File Type Breakdown is an analysis tool, so the binding limits are usually the per-archive entry count and the file-size cap — both checked before processing starts. Limits are shared across every archive tool.

Tier	Max archive size	Max entries per archive	Files per run
Free	50 MB	500 entries	1
Pro	500 MB	50,000 entries	20
Pro-media	2 GB	500,000 entries	100
Developer	2 GB	500,000 entries	unlimited

Cookbook

Triage scenarios from real audit work. The breakdown narrows where to look; content-aware tools take it from there.

Executable hiding in a docs-only deliverable

A vendor sends a ZIP described as 'PDF reports and spreadsheets'. The breakdown shows an exe and a bat row — an immediate escalation before anyone double-clicks anything.

Input: vendor-q2-deliverable.zip
Output: vendor-q2-deliverable-types.csv

extension,count,uncompressedSize,compressedSize,ratio
pdf,42,88204110,86110044,2.4%
xlsx,11,12044118,8810229,26.8%
exe,1,2204110,2150448,2.4%   <-- unexpected
bat,1,418,210,49.8%          <-- unexpected

Finding: runnable code in a documents-only package. Quarantine and review.

Credential files in a backup

A backup archive should hold app data only. A .pem and .env row reveal secrets were swept into the backup — a compliance issue regardless of intent.

Input: app-backup-2026-06.tar.gz
Output: app-backup-2026-06-types.csv

extension,count,uncompressedSize,compressedSize,ratio
json,1840,402211044,402211044,0.0%
log,512,188044110,188044110,0.0%
pem,2,3221,3221,0.0%   <-- private keys in a backup
env,1,884,884,0.0%     <-- secrets in a backup

Finding: rotate the exposed keys and fix the backup exclusion rules.

Profiling an encrypted deliverable before getting the key

The deliverable is AES-encrypted and the password is still in transit from the vendor. The breakdown reads the unencrypted central directory so you can verify the type makeup matches the statement of work first.

Input: secure-handoff.zip (AES-256, no password yet)
Output: secure-handoff-types.csv

extension,count,uncompressedSize,compressedSize,ratio
docx,18,40221110,9810448,75.6%
pdf,9,22044118,21810229,1.1%
csv,4,8804110,2210448,74.9%

No password needed — only the contents would require the key.
Profile matches the SOW; proceed to request the key.

Catching an oversized data dump

A pipeline artifact is unexpectedly huge. The breakdown shows a single .sql row accounting for nearly all of it — a database dump that should not be in the build output.

Input: ci-artifact-build-4412.zip
Output: ci-artifact-build-4412-types.csv

extension,count,uncompressedSize,compressedSize,ratio
sql,1,1980221884,402211044,79.7%   <-- 1.98 GB dump
js,318,9842110,2110874,78.6%
map,40,1980221,388110,80.4%

Finding: a prod DB dump leaked into the build artifact. Remove and audit access.

Why this is screening, not proof

A .txt row looks harmless, but extension classification cannot confirm a renamed payload. The breakdown narrows scope; a content/magic-byte check on the suspicious member is the next step.

Output row (looks benign):
  txt,200,4402110,1810448,58.9%

Reality: one 'notes.txt' is a renamed executable.
File Type Breakdown classifies by extension only — it will not catch this.
Next step: extract with selective-extractor and run a content/magic-byte check
on members flagged by size or path anomalies.

Edge cases and what actually happens

Encrypted ZIP, no password

Supported

The breakdown reads the unencrypted central directory, so extension counts and sizes are available for a password-protected ZIP without the key. This is exactly what makes pre-key screening possible. Only file contents need the password, which this tool never reads.

Renamed file (e.g. .exe as .txt)

Not detected

Classification is by filename extension only. A malicious executable renamed to .txt is counted as txt. Treat the breakdown as a triage filter and follow up with content/magic-byte inspection on suspicious members via selective-extractor.

Nested archive hiding files

Not expanded

A .zip or .7z inside the archive is counted as one entry of that extension — its contents are not expanded. For defence-in-depth, extract nested archives with multi-format-extractor and run the breakdown on each.

Archive exceeds tier limit

Rejected (tier limit)

Large backups can exceed the 50 MB free cap or the 500-entry free limit. Pro raises these to 500 MB / 50,000 entries and higher tiers to 2 GB / 500,000. The check runs before processing, so nothing partial is produced.

Content scanning expected

Out of scope

The tool reports extensions, not file contents — it cannot find PII, secrets, or malware signatures inside files. For that, extract and run a DLP or AV pipeline. The breakdown only tells you where to point those tools.

Tamper detection expected

Out of scope

File Type Breakdown does not verify integrity or compare against a baseline. For tamper detection, hash the entries and diff against a known-good manifest, or compare two archives with archive-diff.

Ratio reads 0.0% on a TAR.GZ backup

By design

Non-ZIP formats are decompressed before counting, so the compressed column equals the uncompressed column and the ratio is 0.0%. The counts and uncompressed totals are still fully accurate for audit purposes.

Air-gapped analyst machine

Supported (after load)

Once the page and any required WASM module are loaded, parsing is local and needs no network. On a fully air-gapped host you would need the assets cached first; the analysis itself does not phone home.

Frequently asked questions

Does the archive leave our network?

No. It is read entirely in the analyst's browser via the File API — ZIPs from the central directory, other formats decompressed in a local Web Worker. No archive bytes are uploaded, which is the point for sensitive deliverables and evidence.

Can it profile an encrypted ZIP we do not have the password for?

Yes. Filenames, extensions, and sizes live in the unencrypted central directory, so the breakdown works without the password. Only the file contents are encrypted, and this tool does not read them.

Will it tell me if a file contains secrets or PII?

No. It classifies by filename extension, not content. It can flag that a .pem or .env file is present, but to inspect what is inside any file you must extract it and run a content scanner or DLP tool.

Can it detect a malicious file renamed to look harmless?

Not on its own — a .exe renamed to .txt is counted as txt. Use the breakdown to spot anomalies (unexpected types, outsized files), then magic-byte/content-check the suspicious members with a dedicated tool.

Does it count files inside nested archives?

No. A nested .zip or .7z is one entry of that extension; its contents are not expanded. Extract nested archives separately and run the breakdown on each for full coverage.

Is the output suitable as audit evidence?

Yes — it is a plain, deterministic CSV with per-extension counts and byte totals plus two metrics (Distinct types, Total entries). Attach it to a ticket or evidence record; it reproduces exactly for anyone who runs the same archive.

Can analysts use it on a locked-down corporate laptop?

Yes. There is nothing to install and no admin rights needed — it runs in the browser. That is often easier than getting p7zip or unrar approved on a managed endpoint.

What extensions should raise a flag in a deliverable?

Executables and scripts (exe, dll, bat, sh, ps1), credentials (pem, key, p12, env), and unexpected bulk data (sql, bak) are common red flags in a package that should contain only documents or data. Presence is a signal to investigate, not proof.

How large an archive can we screen?

Up to the tier cap and entry limit: 50 MB / 500 entries free, 500 MB / 50,000 on Pro, 2 GB / 500,000 on Pro-media and Developer. Both the size and entry count are checked before processing.

Can it compare a deliverable against a known-good baseline?

Not directly — it profiles one archive. To detect changes or tampering, hash the contents and compare to a manifest, or diff two archives with archive-diff.

Why does the ratio column read 0.0% on our .tar.gz backups?

Only ZIP exposes per-entry compressed sizes without decompressing. TAR.GZ and other formats are decompressed first, so the tool only knows the uncompressed size and the ratio is 0.0%. Counts and uncompressed totals remain accurate.

Does using it create any server-side record of our data?

No archive content is stored server-side. The only server interaction is an optional usage counter for signed-in dashboard stats — it records that a file was processed, never its contents or filenames.

Privacy first

Every JAD Archive tool runs entirely in your browser using fflate, @zip.js/zip.js, and the libarchive WASM bridge. Your archives never leave your device — verified by zero outbound network requests during processing.

How to file type breakdown for security and compliance audits

Step 1
Stage the archive locally — Copy the archive to the analyst machine from your evidence store, backup vault, or pipeline artifact bucket. The tool reads from disk via the File API; nothing transits a network.
Step 2
Open the tool and check tier — Go to /archive-tools/file-type-breakdown. Free covers 50 MB / 500 entries; for larger deliverables or backups, a Pro tier raises the cap to 500 MB / 50,000 entries (2 GB / 500,000 on higher tiers).
Step 3
Drop and parse without decrypting — Drop the ZIP. The breakdown reads the central directory only — it never touches encrypted payloads — so you get the type profile even on a password-protected archive without the password.
Step 4
Scan for unexpected extensions — Read the CSV top-down. Anything executable or credential-like (exe, dll, bat, sh, ps1, pem, key, p12) in an archive that should be documents or data is a finding to escalate.
Step 5
Sanity-check proportions — Use the size columns to catch outliers — a single extension dominating the uncompressed total, or a type that should not be present consuming megabytes, both warrant a closer look.
Step 6
Record the evidence — Save <archive-name>-types.csv and attach it to the ticket or audit record. For folder-level rollups or per-file detail, follow up with archive-size-analyzer or archive-metadata-extractor.

Compliance questions the breakdown answers (and the ones it does not)

File Type Breakdown classifies by filename extension only. Use it for the screening questions below; pair it with content-aware tooling for the rest.

Question	Can the breakdown answer it?	How / what to use instead
Are there executables in this doc-only deliverable?	Yes (by extension)	Scan for `exe`, `dll`, `bat`, `sh`, `ps1` rows in the CSV
Are credential files present (.pem, .key, .p12)?	Yes (by extension)	Look for those extensions in the type list
What proportion of the archive is each type?	Yes	Sort the CSV by uncompressedSize
Can I profile it without the password?	Yes, for ZIP	Central directory is unencrypted; no key needed
Is this .txt actually a renamed executable?	No	Extension-based only; needs content/magic-byte inspection of each member
Does any file contain PII / secrets?	No	Content scanning is out of scope — extract and scan with a DLP tool
Has any file been tampered with?	No	Use a checksum/hash workflow to compare against a known-good manifest

Extension red flags to scan for

Quick reference for triage. Presence is a signal to investigate, not proof of compromise — the tool reports extensions, not behaviour.

Category	Extensions	Why it is worth a second look in a deliverable/backup
Executables / scripts	`exe`, `dll`, `bat`, `cmd`, `sh`, `ps1`, `vbs`, `jar`	Unexpected runnable code in a documents-only or data-only package
Credentials / keys	`pem`, `key`, `p12`, `pfx`, `keystore`, `env`	Secrets that should never ship inside a deliverable or backup
Config / infra	`tf`, `yaml`, `yml`, `conf`, `ini`	Infrastructure config that may embed endpoints or secrets
Databases / dumps	`sql`, `db`, `sqlite`, `bak`	Bulk data exports — often the unexpectedly-large outlier
Archives within archives	`zip`, `7z`, `rar`, `gz`	Nested archives the breakdown counts as one entry — extract to recurse

Tier limits for the archive family

Tier	Max archive size	Max entries per archive	Files per run
Free	50 MB	500 entries	1
Pro	500 MB	50,000 entries	20
Pro-media	2 GB	500,000 entries	100
Developer	2 GB	500,000 entries	unlimited

Cookbook

Triage scenarios from real audit work. The breakdown narrows where to look; content-aware tools take it from there.

Executable hiding in a docs-only deliverable

A vendor sends a ZIP described as 'PDF reports and spreadsheets'. The breakdown shows an exe and a bat row — an immediate escalation before anyone double-clicks anything.

Input: vendor-q2-deliverable.zip
Output: vendor-q2-deliverable-types.csv

extension,count,uncompressedSize,compressedSize,ratio
pdf,42,88204110,86110044,2.4%
xlsx,11,12044118,8810229,26.8%
exe,1,2204110,2150448,2.4%   <-- unexpected
bat,1,418,210,49.8%          <-- unexpected

Finding: runnable code in a documents-only package. Quarantine and review.

Credential files in a backup

A backup archive should hold app data only. A .pem and .env row reveal secrets were swept into the backup — a compliance issue regardless of intent.

Input: app-backup-2026-06.tar.gz
Output: app-backup-2026-06-types.csv

extension,count,uncompressedSize,compressedSize,ratio
json,1840,402211044,402211044,0.0%
log,512,188044110,188044110,0.0%
pem,2,3221,3221,0.0%   <-- private keys in a backup
env,1,884,884,0.0%     <-- secrets in a backup

Finding: rotate the exposed keys and fix the backup exclusion rules.

Profiling an encrypted deliverable before getting the key

Input: secure-handoff.zip (AES-256, no password yet)
Output: secure-handoff-types.csv

extension,count,uncompressedSize,compressedSize,ratio
docx,18,40221110,9810448,75.6%
pdf,9,22044118,21810229,1.1%
csv,4,8804110,2210448,74.9%

No password needed — only the contents would require the key.
Profile matches the SOW; proceed to request the key.

Catching an oversized data dump

A pipeline artifact is unexpectedly huge. The breakdown shows a single .sql row accounting for nearly all of it — a database dump that should not be in the build output.

Input: ci-artifact-build-4412.zip
Output: ci-artifact-build-4412-types.csv

extension,count,uncompressedSize,compressedSize,ratio
sql,1,1980221884,402211044,79.7%   <-- 1.98 GB dump
js,318,9842110,2110874,78.6%
map,40,1980221,388110,80.4%

Finding: a prod DB dump leaked into the build artifact. Remove and audit access.

Why this is screening, not proof

A .txt row looks harmless, but extension classification cannot confirm a renamed payload. The breakdown narrows scope; a content/magic-byte check on the suspicious member is the next step.

Output row (looks benign):
  txt,200,4402110,1810448,58.9%

Reality: one 'notes.txt' is a renamed executable.
File Type Breakdown classifies by extension only — it will not catch this.
Next step: extract with selective-extractor and run a content/magic-byte check
on members flagged by size or path anomalies.

Edge cases and what actually happens

Encrypted ZIP, no password

Supported

Renamed file (e.g. .exe as .txt)

Not detected

Nested archive hiding files

Not expanded

Archive exceeds tier limit

Rejected (tier limit)

Content scanning expected

Out of scope

Tamper detection expected

Out of scope

File Type Breakdown does not verify integrity or compare against a baseline. For tamper detection, hash the entries and diff against a known-good manifest, or compare two archives with archive-diff.

Ratio reads 0.0% on a TAR.GZ backup

By design

Air-gapped analyst machine

Supported (after load)

Frequently asked questions

Does the archive leave our network?

Can it profile an encrypted ZIP we do not have the password for?

Yes. Filenames, extensions, and sizes live in the unencrypted central directory, so the breakdown works without the password. Only the file contents are encrypted, and this tool does not read them.

Will it tell me if a file contains secrets or PII?

Can it detect a malicious file renamed to look harmless?

Does it count files inside nested archives?

No. A nested .zip or .7z is one entry of that extension; its contents are not expanded. Extract nested archives separately and run the breakdown on each for full coverage.

Is the output suitable as audit evidence?

Can analysts use it on a locked-down corporate laptop?

Yes. There is nothing to install and no admin rights needed — it runs in the browser. That is often easier than getting p7zip or unrar approved on a managed endpoint.

What extensions should raise a flag in a deliverable?

How large an archive can we screen?

Up to the tier cap and entry limit: 50 MB / 500 entries free, 500 MB / 50,000 on Pro, 2 GB / 500,000 on Pro-media and Developer. Both the size and entry count are checked before processing.

Can it compare a deliverable against a known-good baseline?

Not directly — it profiles one archive. To detect changes or tampering, hash the contents and compare to a manifest, or diff two archives with archive-diff.

File Type Breakdown for Security and Compliance Audits

How to file type breakdown for security and compliance audits

Compliance questions the breakdown answers (and the ones it does not)

Extension red flags to scan for

Tier limits for the archive family

Cookbook

Executable hiding in a docs-only deliverable

Credential files in a backup

Profiling an encrypted deliverable before getting the key

Catching an oversized data dump

Why this is screening, not proof

Edge cases and what actually happens

Encrypted ZIP, no password

Renamed file (e.g. .exe as .txt)

Nested archive hiding files

Archive exceeds tier limit

Content scanning expected

Tamper detection expected

Ratio reads 0.0% on a TAR.GZ backup

Air-gapped analyst machine

Frequently asked questions

Does the archive leave our network?

Can it profile an encrypted ZIP we do not have the password for?

Will it tell me if a file contains secrets or PII?

Can it detect a malicious file renamed to look harmless?

Does it count files inside nested archives?

Is the output suitable as audit evidence?

Can analysts use it on a locked-down corporate laptop?

What extensions should raise a flag in a deliverable?

How large an archive can we screen?

Can it compare a deliverable against a known-good baseline?

Why does the ratio column read 0.0% on our .tar.gz backups?

Does using it create any server-side record of our data?

Privacy first

Related guides

File Type Breakdown for Security and Compliance Audits

How to file type breakdown for security and compliance audits

Compliance questions the breakdown answers (and the ones it does not)

Extension red flags to scan for

Tier limits for the archive family

Cookbook

Executable hiding in a docs-only deliverable

Credential files in a backup

Profiling an encrypted deliverable before getting the key

Catching an oversized data dump

Why this is screening, not proof

Edge cases and what actually happens

Encrypted ZIP, no password

Renamed file (e.g. .exe as .txt)

Nested archive hiding files

Archive exceeds tier limit

Content scanning expected

Tamper detection expected

Ratio reads 0.0% on a TAR.GZ backup

Air-gapped analyst machine

Frequently asked questions

Does the archive leave our network?

Can it profile an encrypted ZIP we do not have the password for?

Will it tell me if a file contains secrets or PII?

Can it detect a malicious file renamed to look harmless?

Does it count files inside nested archives?

Is the output suitable as audit evidence?

Can analysts use it on a locked-down corporate laptop?

What extensions should raise a flag in a deliverable?

How large an archive can we screen?

Can it compare a deliverable against a known-good baseline?

Why does the ratio column read 0.0% on our .tar.gz backups?

Does using it create any server-side record of our data?

Privacy first

Related guides