Archive Size Analyser for Security & Compliance — Supply-Chain Audits

How to size analyser for security & compliance

Step 1
Stage the artifact locally — Pull the archive from your evidence store, artifact registry, or build pipeline onto the analyst machine. The tool reads from disk via the File API — nothing transits a network at any point, so the artifact never leaves your custody.
Step 2
Hash it first for chain-of-custody — Before inspecting, run /archive-tools/checksum-generator to record the artifact's SHA-256. This pins the exact bytes you analysed so a later 'is this the file we reviewed?' question has a definitive answer.
Step 3
Run the Size Analyser — Open /archive-tools/archive-size-analyzer and drop the artifact. For artifacts over 50 MB use Pro (500 MB) so you can analyse the whole build without splitting. The format is detected by magic bytes, so a deliberately mis-named file is still read correctly.
Step 4
Scan byExtension for risk signals — Read the byExtension array for extensions that should never ship in a release: pem, key, pfx, env, bak, map (source maps), or a surprise exe. The count tells you how many; the totalSize tells you how much.
Step 5
Scan byTopFolder for scope creep — Read byTopFolder for directories that do not belong in a clean release — a vendored node_modules, a .git checkout, a test/fixtures tree, or a large (root) of loose files. Each is a finding to raise with the build owner.
Step 6
Attach the report to the ticket — Download <artifact>-size-analysis.json and attach it to the audit ticket alongside the SHA-256 you recorded in step 2. The JSON is plain and tool-agnostic — your SIEM, GRC tool, or ticket can ingest it as-is.

Risk signals visible in the breakdown

What an auditor reads out of the byExtension / byTopFolder report. The analyser does not classify risk for you — it surfaces the groups; the judgement is yours.

Signal in the report	What it may indicate	Where it shows
`pem` / `key` / `pfx` extension group	Private keys or certificates bundled into a shipped artifact	byExtension
`env` / `bak` / `config` group	Environment files or backups with credentials shipped by accident	byExtension
`map` group with high totalSize	Source maps that expose original code in production	byExtension
`node_modules` / `vendor` folder dominating	Dependency tree shipped where it should be excluded — larger attack surface	byTopFolder
`.git` or `test`/`fixtures` folder present	Repo metadata or test data leaking into a release	byTopFolder
Large `(root)` group	Loose files dumped at the archive root, often unreviewed	byTopFolder

Privacy and tier facts for auditors

The facts a compliance reviewer needs before approving the tool for handling sensitive artifacts.

Concern	Behaviour
Where does processing happen?	Entirely in the browser tab via fflate + libarchive WASM — no server path
Is the artifact uploaded?	No. Verify in DevTools, Network: zero outbound requests for the file data
Is the artifact modified?	No. This is a read-only analysis tool
Max artifact (Free / Pro / Pro-Media)	50 MB / 500 MB / 2 GB
Max entries (Free / Pro / Pro-Media)	500 / 50,000 / 500,000
Chain-of-custody pairing	SHA-256 via /archive-tools/checksum-generator

Cookbook

Audit-shaped runs on real-world artifacts. Filenames anonymised; sizes are uncompressed bytes.

Catch a private key shipped in a release

A release ZIP should contain only built assets. A pem group in byExtension is an immediate finding.

Artifact: release-1.8.0.zip

byExtension (filtered to risk signals):
  { "ext": "js",  "count": 410, "totalSize": 31457280 }
  { "ext": "pem", "count": 1,   "totalSize": 3243 }

Finding: one .pem (private key) bundled into the release.
Raise with build owner; rotate the key; rebuild excluding it.

Source maps leaking into production

A .map group with real bytes means original source is shippable to anyone who downloads the artifact.

Artifact: web-dist.zip

byExtension:
  { "ext": "js",  "count": 88, "totalSize": 12582912 }
  { "ext": "map", "count": 88, "totalSize": 41943040 }

Finding: 88 source maps, 40 MB, exposing original code.
Production builds should strip .map — flag the pipeline.

Repo metadata leaking via .git

A .git top-level folder in a release means full history (and possibly secrets in old commits) shipped.

Artifact: deploy.tar.gz

byTopFolder:
  { "folder": "app",  "count": 120,  "totalSize": 8388608 }
  { "folder": ".git", "count": 2014, "totalSize": 67108864 }

Finding: full .git directory (64 MB, 2014 objects) in the
artifact. Exclude .git from the build.

Tamper triage with a before/after manifest

Pair the analyser with checksums to make the size breakdown part of a chain-of-custody record.

Step 1  checksum-generator -> artifact SHA-256:
        9f2c...e41  release-1.8.0.zip

Step 2  archive-size-analyzer -> release-1.8.0-size-analysis.json
        byTopFolder, byExtension recorded

Step 3  attach both to the audit ticket.
        A later rebuild with a different SHA-256 but identical
        breakdown = repack; different breakdown = content change.

Mis-named 7z masquerading as a ZIP

An artifact named .zip that is actually 7z is read correctly by magic-byte detection — the disguise itself can be a finding.

Artifact: bundle.zip  (magic bytes: 37 7A BC AF 27 1C)

Detected format: 7z (NOT zip, despite the name)
Engine: libarchive WASM

byExtension still produced normally.
Finding to note: extension does not match content —
confirm whether the rename was intentional.

Edge cases and what actually happens

Sensitive artifact never uploaded

By design

All processing is in-browser via fflate and libarchive WASM. The artifact bytes never reach a JAD server, so using the tool does not move the regulated boundary — it is equivalent to a local CLI for data-handling purposes. Confirm with your compliance team, but most treat local browser processing the same as a local tool.

Artifact is read-only — nothing modified

Preserved

The Size Analyser inspects; it never re-writes the archive. The evidence stays bit-for-bit identical, which is what chain-of-custody requires. To prove it, hash before and after with /archive-tools/checksum-generator — the SHA-256 will match.

Encrypted ZIP

Supported (sizes only)

The central directory exposes entry names and uncompressed sizes even when payloads are encrypted, so the size breakdown works without the password. To explicitly confirm encryption status as part of an audit, use /archive-tools/encrypted-archive-detector.

Sizes are uncompressed, not on-disk

By design

totalSize sums uncompressed bytes. A small compressed artifact can show a large uncompressed footprint — relevant when assessing decompression-bomb risk (a tiny archive expanding to gigabytes). The total entry count and per-group totals make an over-expanding archive obvious.

Nested archive hides its contents

Counted as one entry

An inner archive is counted as a single entry at its own size; the analyser does not recurse. For a forensic audit, extract nested archives with /archive-tools/nested-archive-extractor and analyse each layer — secrets are often hidden one layer down.

Artifact exceeds the Free tier

Tier limit (rejected)

Free rejects artifacts over 50 MB or 500 entries. Most release artifacts exceed this — auditors should run Pro (500 MB / 50,000 entries) or Pro-Media (2 GB / 500,000) so the whole artifact is analysed in one pass rather than split.

Decompression-bomb 7z

RAM risk

Because libarchive fully decompresses 7z/RAR/bz2/xz to measure entry sizes, a maliciously crafted archive that expands enormously can exhaust browser memory. Analyse untrusted 7z/RAR in a disposable browser profile, and watch the total / per-group sizes for implausible expansion.

Extension does not match content

Finding to note

Format detection is by magic bytes, so a .zip that is really a 7z still analyses — and the mismatch itself is worth recording. Confirm with /archive-tools/auto-format-detector to document the true format in the audit trail.

Loose files at the root go unreviewed

Watch the (root) group

Files with no folder prefix roll into (root). A large (root) group often means files were dumped in without review. Treat a heavy (root) as a prompt to enumerate those specific files.

No CSV or chart for the report

JSON only

The output is a JSON report plus on-screen metric pills — there is no CSV export and no rendered chart. For SIEM/GRC ingestion, the JSON is the deliverable; transform it to your tool's format downstream.

Frequently asked questions

Is this suitable for supply-chain audits?

Yes. The browser-only architecture means evidence is never uploaded, and the read-only design means the artifact is never modified. For chain-of-custody, pair it with /archive-tools/checksum-generator to record a SHA-256 before and after — the hash proves the bytes were untouched.

Does the artifact ever leave my machine?

No. fflate and libarchive WASM run inside your browser tab. Verify in DevTools, Network: there are zero outbound requests carrying the file. The only network touch is a one-time fetch of the WASM module for 7z/RAR/bz2/xz support.

Can it detect tampering directly?

Not by itself — it shows composition (which types and folders, and how big), not integrity. Combine it with a SHA-256 manifest from /archive-tools/checksum-generator: a changed hash plus an unchanged breakdown points to a repack; a changed breakdown points to a content change.

What risk signals should I look for?

In byExtension: pem, key, pfx, env, bak, and map groups. In byTopFolder: a dominating node_modules/vendor, a present .git, test/fixture trees, or a heavy (root). The tool surfaces the groups; you apply the judgement.

Does analysing an encrypted archive need the password?

No, for the size breakdown — entry names and uncompressed sizes live in the central directory and are readable without decrypting payloads. To confirm an archive is encrypted, use /archive-tools/encrypted-archive-detector.

How does this fit a HIPAA / PCI / FedRAMP environment?

Because nothing transits a network, the regulated boundary does not move when you use the tool — it behaves like a local CLI. Confirm with your compliance team that local browser processing matches your data-handling policy; most accept it on that basis.

Can it open nested archives for a deep audit?

It counts an inner archive as one entry without recursing. For layered audits, extract the inner archive with /archive-tools/nested-archive-extractor and run the analyser on each layer — bundled secrets often hide one level down.

What about decompression bombs?

For 7z/RAR/bz2/xz the tool fully decompresses to measure, so a malicious archive could exhaust memory. Analyse untrusted archives in a disposable browser profile and watch the per-group totals for implausible expansion versus the on-disk size.

Can several analysts use it at once?

Yes. Each browser tab is an independent instance with no shared state. Free-tier limits apply per session; Pro removes them. There is no contention because nothing is centralised.

What is the largest artifact I can audit?

50 MB on Free, 500 MB on Pro, 2 GB on Pro-Media / Developer, with matching entry caps (500 / 50,000 / 500,000). Above 2 GB, use a CLI listing as described in the CLI-comparison guide.

Does it modify timestamps or repackage the artifact?

No. It is strictly read-only. If you need to normalise timestamps for reproducible-build verification, that is a separate tool, /archive-tools/timestamp-normalizer — and that one DOES rewrite the archive.

What output do I attach to the audit ticket?

Download <artifact>-size-analysis.json (byExtension, byTopFolder, total) and attach it with the SHA-256 from the checksum tool. The JSON is plain and tool-agnostic, so your GRC platform or SIEM can ingest it directly.

Privacy first

Every JAD Archive tool runs entirely in your browser using fflate, @zip.js/zip.js, and the libarchive WASM bridge. Your archives never leave your device — verified by zero outbound network requests during processing.

How to size analyser for security & compliance

Step 1
Stage the artifact locally — Pull the archive from your evidence store, artifact registry, or build pipeline onto the analyst machine. The tool reads from disk via the File API — nothing transits a network at any point, so the artifact never leaves your custody.
Step 2
Hash it first for chain-of-custody — Before inspecting, run /archive-tools/checksum-generator to record the artifact's SHA-256. This pins the exact bytes you analysed so a later 'is this the file we reviewed?' question has a definitive answer.
Step 3
Run the Size Analyser — Open /archive-tools/archive-size-analyzer and drop the artifact. For artifacts over 50 MB use Pro (500 MB) so you can analyse the whole build without splitting. The format is detected by magic bytes, so a deliberately mis-named file is still read correctly.
Step 4
Scan byExtension for risk signals — Read the byExtension array for extensions that should never ship in a release: pem, key, pfx, env, bak, map (source maps), or a surprise exe. The count tells you how many; the totalSize tells you how much.
Step 5
Scan byTopFolder for scope creep — Read byTopFolder for directories that do not belong in a clean release — a vendored node_modules, a .git checkout, a test/fixtures tree, or a large (root) of loose files. Each is a finding to raise with the build owner.
Step 6
Attach the report to the ticket — Download <artifact>-size-analysis.json and attach it to the audit ticket alongside the SHA-256 you recorded in step 2. The JSON is plain and tool-agnostic — your SIEM, GRC tool, or ticket can ingest it as-is.

Risk signals visible in the breakdown

What an auditor reads out of the byExtension / byTopFolder report. The analyser does not classify risk for you — it surfaces the groups; the judgement is yours.

Signal in the report	What it may indicate	Where it shows
`pem` / `key` / `pfx` extension group	Private keys or certificates bundled into a shipped artifact	byExtension
`env` / `bak` / `config` group	Environment files or backups with credentials shipped by accident	byExtension
`map` group with high totalSize	Source maps that expose original code in production	byExtension
`node_modules` / `vendor` folder dominating	Dependency tree shipped where it should be excluded — larger attack surface	byTopFolder
`.git` or `test`/`fixtures` folder present	Repo metadata or test data leaking into a release	byTopFolder
Large `(root)` group	Loose files dumped at the archive root, often unreviewed	byTopFolder

Privacy and tier facts for auditors

The facts a compliance reviewer needs before approving the tool for handling sensitive artifacts.

Concern	Behaviour
Where does processing happen?	Entirely in the browser tab via fflate + libarchive WASM — no server path
Is the artifact uploaded?	No. Verify in DevTools, Network: zero outbound requests for the file data
Is the artifact modified?	No. This is a read-only analysis tool
Max artifact (Free / Pro / Pro-Media)	50 MB / 500 MB / 2 GB
Max entries (Free / Pro / Pro-Media)	500 / 50,000 / 500,000
Chain-of-custody pairing	SHA-256 via /archive-tools/checksum-generator

Cookbook

Audit-shaped runs on real-world artifacts. Filenames anonymised; sizes are uncompressed bytes.

Catch a private key shipped in a release

A release ZIP should contain only built assets. A pem group in byExtension is an immediate finding.

Artifact: release-1.8.0.zip

byExtension (filtered to risk signals):
  { "ext": "js",  "count": 410, "totalSize": 31457280 }
  { "ext": "pem", "count": 1,   "totalSize": 3243 }

Finding: one .pem (private key) bundled into the release.
Raise with build owner; rotate the key; rebuild excluding it.

Source maps leaking into production

A .map group with real bytes means original source is shippable to anyone who downloads the artifact.

Artifact: web-dist.zip

byExtension:
  { "ext": "js",  "count": 88, "totalSize": 12582912 }
  { "ext": "map", "count": 88, "totalSize": 41943040 }

Finding: 88 source maps, 40 MB, exposing original code.
Production builds should strip .map — flag the pipeline.

Repo metadata leaking via .git

A .git top-level folder in a release means full history (and possibly secrets in old commits) shipped.

Artifact: deploy.tar.gz

byTopFolder:
  { "folder": "app",  "count": 120,  "totalSize": 8388608 }
  { "folder": ".git", "count": 2014, "totalSize": 67108864 }

Finding: full .git directory (64 MB, 2014 objects) in the
artifact. Exclude .git from the build.

Tamper triage with a before/after manifest

Pair the analyser with checksums to make the size breakdown part of a chain-of-custody record.

Step 1  checksum-generator -> artifact SHA-256:
        9f2c...e41  release-1.8.0.zip

Step 2  archive-size-analyzer -> release-1.8.0-size-analysis.json
        byTopFolder, byExtension recorded

Step 3  attach both to the audit ticket.
        A later rebuild with a different SHA-256 but identical
        breakdown = repack; different breakdown = content change.

Mis-named 7z masquerading as a ZIP

An artifact named .zip that is actually 7z is read correctly by magic-byte detection — the disguise itself can be a finding.

Artifact: bundle.zip  (magic bytes: 37 7A BC AF 27 1C)

Detected format: 7z (NOT zip, despite the name)
Engine: libarchive WASM

byExtension still produced normally.
Finding to note: extension does not match content —
confirm whether the rename was intentional.

Edge cases and what actually happens

Sensitive artifact never uploaded

By design

Artifact is read-only — nothing modified

Preserved

Encrypted ZIP

Supported (sizes only)

Sizes are uncompressed, not on-disk

By design

Nested archive hides its contents

Counted as one entry

Artifact exceeds the Free tier

Tier limit (rejected)

Decompression-bomb 7z

RAM risk

Extension does not match content

Finding to note

Loose files at the root go unreviewed

Watch the (root) group

Files with no folder prefix roll into (root). A large (root) group often means files were dumped in without review. Treat a heavy (root) as a prompt to enumerate those specific files.

No CSV or chart for the report

JSON only

Frequently asked questions

Is this suitable for supply-chain audits?

Does the artifact ever leave my machine?

Can it detect tampering directly?

What risk signals should I look for?

Does analysing an encrypted archive need the password?

How does this fit a HIPAA / PCI / FedRAMP environment?

Can it open nested archives for a deep audit?

What about decompression bombs?

Can several analysts use it at once?

Yes. Each browser tab is an independent instance with no shared state. Free-tier limits apply per session; Pro removes them. There is no contention because nothing is centralised.

What is the largest artifact I can audit?

50 MB on Free, 500 MB on Pro, 2 GB on Pro-Media / Developer, with matching entry caps (500 / 50,000 / 500,000). Above 2 GB, use a CLI listing as described in the CLI-comparison guide.

Size Analyser for Security & Compliance

How to size analyser for security & compliance

Risk signals visible in the breakdown

Privacy and tier facts for auditors

Cookbook

Catch a private key shipped in a release

Source maps leaking into production

Repo metadata leaking via .git

Tamper triage with a before/after manifest

Mis-named 7z masquerading as a ZIP

Edge cases and what actually happens

Sensitive artifact never uploaded

Artifact is read-only — nothing modified

Encrypted ZIP

Sizes are uncompressed, not on-disk

Nested archive hides its contents

Artifact exceeds the Free tier

Decompression-bomb 7z

Extension does not match content

Loose files at the root go unreviewed

No CSV or chart for the report

Frequently asked questions

Is this suitable for supply-chain audits?

Does the artifact ever leave my machine?

Can it detect tampering directly?

What risk signals should I look for?

Does analysing an encrypted archive need the password?

How does this fit a HIPAA / PCI / FedRAMP environment?

Can it open nested archives for a deep audit?

What about decompression bombs?

Can several analysts use it at once?

What is the largest artifact I can audit?

Does it modify timestamps or repackage the artifact?

What output do I attach to the audit ticket?

Privacy first

Related guides

Size Analyser for Security & Compliance

How to size analyser for security & compliance

Risk signals visible in the breakdown

Privacy and tier facts for auditors

Cookbook

Catch a private key shipped in a release

Source maps leaking into production

Repo metadata leaking via .git

Tamper triage with a before/after manifest

Mis-named 7z masquerading as a ZIP

Edge cases and what actually happens

Sensitive artifact never uploaded

Artifact is read-only — nothing modified

Encrypted ZIP

Sizes are uncompressed, not on-disk

Nested archive hides its contents

Artifact exceeds the Free tier

Decompression-bomb 7z

Extension does not match content

Loose files at the root go unreviewed

No CSV or chart for the report

Frequently asked questions

Is this suitable for supply-chain audits?

Does the artifact ever leave my machine?

Can it detect tampering directly?

What risk signals should I look for?

Does analysing an encrypted archive need the password?

How does this fit a HIPAA / PCI / FedRAMP environment?

Can it open nested archives for a deep audit?

What about decompression bombs?

Can several analysts use it at once?

What is the largest artifact I can audit?

Does it modify timestamps or repackage the artifact?

What output do I attach to the audit ticket?

Privacy first

Related guides