How to size analyser for security & compliance
- Step 1Stage the artifact locally — Pull the archive from your evidence store, artifact registry, or build pipeline onto the analyst machine. The tool reads from disk via the File API — nothing transits a network at any point, so the artifact never leaves your custody.
- Step 2Hash it first for chain-of-custody — Before inspecting, run /archive-tools/checksum-generator to record the artifact's SHA-256. This pins the exact bytes you analysed so a later 'is this the file we reviewed?' question has a definitive answer.
- Step 3Run the Size Analyser — Open /archive-tools/archive-size-analyzer and drop the artifact. For artifacts over 50 MB use Pro (500 MB) so you can analyse the whole build without splitting. The format is detected by magic bytes, so a deliberately mis-named file is still read correctly.
- Step 4Scan byExtension for risk signals — Read the
byExtensionarray for extensions that should never ship in a release:pem,key,pfx,env,bak,map(source maps), or a surpriseexe. The count tells you how many; the totalSize tells you how much. - Step 5Scan byTopFolder for scope creep — Read
byTopFolderfor directories that do not belong in a clean release — a vendorednode_modules, a.gitcheckout, atest/fixturestree, or a large(root)of loose files. Each is a finding to raise with the build owner. - Step 6Attach the report to the ticket — Download
<artifact>-size-analysis.jsonand attach it to the audit ticket alongside the SHA-256 you recorded in step 2. The JSON is plain and tool-agnostic — your SIEM, GRC tool, or ticket can ingest it as-is.
Risk signals visible in the breakdown
What an auditor reads out of the byExtension / byTopFolder report. The analyser does not classify risk for you — it surfaces the groups; the judgement is yours.
| Signal in the report | What it may indicate | Where it shows |
|---|---|---|
pem / key / pfx extension group | Private keys or certificates bundled into a shipped artifact | byExtension |
env / bak / config group | Environment files or backups with credentials shipped by accident | byExtension |
map group with high totalSize | Source maps that expose original code in production | byExtension |
node_modules / vendor folder dominating | Dependency tree shipped where it should be excluded — larger attack surface | byTopFolder |
.git or test/fixtures folder present | Repo metadata or test data leaking into a release | byTopFolder |
Large (root) group | Loose files dumped at the archive root, often unreviewed | byTopFolder |
Privacy and tier facts for auditors
The facts a compliance reviewer needs before approving the tool for handling sensitive artifacts.
| Concern | Behaviour |
|---|---|
| Where does processing happen? | Entirely in the browser tab via fflate + libarchive WASM — no server path |
| Is the artifact uploaded? | No. Verify in DevTools, Network: zero outbound requests for the file data |
| Is the artifact modified? | No. This is a read-only analysis tool |
| Max artifact (Free / Pro / Pro-Media) | 50 MB / 500 MB / 2 GB |
| Max entries (Free / Pro / Pro-Media) | 500 / 50,000 / 500,000 |
| Chain-of-custody pairing | SHA-256 via /archive-tools/checksum-generator |
Cookbook
Audit-shaped runs on real-world artifacts. Filenames anonymised; sizes are uncompressed bytes.
Catch a private key shipped in a release
A release ZIP should contain only built assets. A pem group in byExtension is an immediate finding.
Artifact: release-1.8.0.zip
byExtension (filtered to risk signals):
{ "ext": "js", "count": 410, "totalSize": 31457280 }
{ "ext": "pem", "count": 1, "totalSize": 3243 }
Finding: one .pem (private key) bundled into the release.
Raise with build owner; rotate the key; rebuild excluding it.Source maps leaking into production
A .map group with real bytes means original source is shippable to anyone who downloads the artifact.
Artifact: web-dist.zip
byExtension:
{ "ext": "js", "count": 88, "totalSize": 12582912 }
{ "ext": "map", "count": 88, "totalSize": 41943040 }
Finding: 88 source maps, 40 MB, exposing original code.
Production builds should strip .map — flag the pipeline.Repo metadata leaking via .git
A .git top-level folder in a release means full history (and possibly secrets in old commits) shipped.
Artifact: deploy.tar.gz
byTopFolder:
{ "folder": "app", "count": 120, "totalSize": 8388608 }
{ "folder": ".git", "count": 2014, "totalSize": 67108864 }
Finding: full .git directory (64 MB, 2014 objects) in the
artifact. Exclude .git from the build.Tamper triage with a before/after manifest
Pair the analyser with checksums to make the size breakdown part of a chain-of-custody record.
Step 1 checksum-generator -> artifact SHA-256:
9f2c...e41 release-1.8.0.zip
Step 2 archive-size-analyzer -> release-1.8.0-size-analysis.json
byTopFolder, byExtension recorded
Step 3 attach both to the audit ticket.
A later rebuild with a different SHA-256 but identical
breakdown = repack; different breakdown = content change.Mis-named 7z masquerading as a ZIP
An artifact named .zip that is actually 7z is read correctly by magic-byte detection — the disguise itself can be a finding.
Artifact: bundle.zip (magic bytes: 37 7A BC AF 27 1C) Detected format: 7z (NOT zip, despite the name) Engine: libarchive WASM byExtension still produced normally. Finding to note: extension does not match content — confirm whether the rename was intentional.
Edge cases and what actually happens
Sensitive artifact never uploaded
By designAll processing is in-browser via fflate and libarchive WASM. The artifact bytes never reach a JAD server, so using the tool does not move the regulated boundary — it is equivalent to a local CLI for data-handling purposes. Confirm with your compliance team, but most treat local browser processing the same as a local tool.
Artifact is read-only — nothing modified
PreservedThe Size Analyser inspects; it never re-writes the archive. The evidence stays bit-for-bit identical, which is what chain-of-custody requires. To prove it, hash before and after with /archive-tools/checksum-generator — the SHA-256 will match.
Encrypted ZIP
Supported (sizes only)The central directory exposes entry names and uncompressed sizes even when payloads are encrypted, so the size breakdown works without the password. To explicitly confirm encryption status as part of an audit, use /archive-tools/encrypted-archive-detector.
Sizes are uncompressed, not on-disk
By designtotalSize sums uncompressed bytes. A small compressed artifact can show a large uncompressed footprint — relevant when assessing decompression-bomb risk (a tiny archive expanding to gigabytes). The total entry count and per-group totals make an over-expanding archive obvious.
Nested archive hides its contents
Counted as one entryAn inner archive is counted as a single entry at its own size; the analyser does not recurse. For a forensic audit, extract nested archives with /archive-tools/nested-archive-extractor and analyse each layer — secrets are often hidden one layer down.
Artifact exceeds the Free tier
Tier limit (rejected)Free rejects artifacts over 50 MB or 500 entries. Most release artifacts exceed this — auditors should run Pro (500 MB / 50,000 entries) or Pro-Media (2 GB / 500,000) so the whole artifact is analysed in one pass rather than split.
Decompression-bomb 7z
RAM riskBecause libarchive fully decompresses 7z/RAR/bz2/xz to measure entry sizes, a maliciously crafted archive that expands enormously can exhaust browser memory. Analyse untrusted 7z/RAR in a disposable browser profile, and watch the total / per-group sizes for implausible expansion.
Extension does not match content
Finding to noteFormat detection is by magic bytes, so a .zip that is really a 7z still analyses — and the mismatch itself is worth recording. Confirm with /archive-tools/auto-format-detector to document the true format in the audit trail.
Loose files at the root go unreviewed
Watch the (root) groupFiles with no folder prefix roll into (root). A large (root) group often means files were dumped in without review. Treat a heavy (root) as a prompt to enumerate those specific files.
No CSV or chart for the report
JSON onlyThe output is a JSON report plus on-screen metric pills — there is no CSV export and no rendered chart. For SIEM/GRC ingestion, the JSON is the deliverable; transform it to your tool's format downstream.
Frequently asked questions
Is this suitable for supply-chain audits?
Yes. The browser-only architecture means evidence is never uploaded, and the read-only design means the artifact is never modified. For chain-of-custody, pair it with /archive-tools/checksum-generator to record a SHA-256 before and after — the hash proves the bytes were untouched.
Does the artifact ever leave my machine?
No. fflate and libarchive WASM run inside your browser tab. Verify in DevTools, Network: there are zero outbound requests carrying the file. The only network touch is a one-time fetch of the WASM module for 7z/RAR/bz2/xz support.
Can it detect tampering directly?
Not by itself — it shows composition (which types and folders, and how big), not integrity. Combine it with a SHA-256 manifest from /archive-tools/checksum-generator: a changed hash plus an unchanged breakdown points to a repack; a changed breakdown points to a content change.
What risk signals should I look for?
In byExtension: pem, key, pfx, env, bak, and map groups. In byTopFolder: a dominating node_modules/vendor, a present .git, test/fixture trees, or a heavy (root). The tool surfaces the groups; you apply the judgement.
Does analysing an encrypted archive need the password?
No, for the size breakdown — entry names and uncompressed sizes live in the central directory and are readable without decrypting payloads. To confirm an archive is encrypted, use /archive-tools/encrypted-archive-detector.
How does this fit a HIPAA / PCI / FedRAMP environment?
Because nothing transits a network, the regulated boundary does not move when you use the tool — it behaves like a local CLI. Confirm with your compliance team that local browser processing matches your data-handling policy; most accept it on that basis.
Can it open nested archives for a deep audit?
It counts an inner archive as one entry without recursing. For layered audits, extract the inner archive with /archive-tools/nested-archive-extractor and run the analyser on each layer — bundled secrets often hide one level down.
What about decompression bombs?
For 7z/RAR/bz2/xz the tool fully decompresses to measure, so a malicious archive could exhaust memory. Analyse untrusted archives in a disposable browser profile and watch the per-group totals for implausible expansion versus the on-disk size.
Can several analysts use it at once?
Yes. Each browser tab is an independent instance with no shared state. Free-tier limits apply per session; Pro removes them. There is no contention because nothing is centralised.
What is the largest artifact I can audit?
50 MB on Free, 500 MB on Pro, 2 GB on Pro-Media / Developer, with matching entry caps (500 / 50,000 / 500,000). Above 2 GB, use a CLI listing as described in the CLI-comparison guide.
Does it modify timestamps or repackage the artifact?
No. It is strictly read-only. If you need to normalise timestamps for reproducible-build verification, that is a separate tool, /archive-tools/timestamp-normalizer — and that one DOES rewrite the archive.
What output do I attach to the audit ticket?
Download <artifact>-size-analysis.json (byExtension, byTopFolder, total) and attach it with the SHA-256 from the checksum tool. The JSON is plain and tool-agnostic, so your GRC platform or SIEM can ingest it directly.
Privacy first
Every JAD Archive tool runs entirely in your browser using fflate, @zip.js/zip.js, and the libarchive WASM bridge. Your archives never leave your device — verified by zero outbound network requests during processing.