ZIP Metadata Extractor for Security & Compliance

How to zip metadata extractor for security & compliance

Step 1
Stage the evidence locally — Pull the archive from your evidence store, artifact registry, or quarantine into a local folder. Nothing leaves the machine; the tool reads from disk via the File API.
Step 2
Confirm it is actually a ZIP — Run /archive-tools/auto-format-detector first. A renamed .7z/.rar will throw 'Not a valid ZIP archive…' here, and the format mismatch itself can be a finding worth recording.
Step 3
Extract the metadata — Open /archive-tools/archive-metadata-extractor, drop the ZIP, and click Process. For evidence above 50 MB use a Pro tier (up to 500 MB / 50,000 entries) so you do not have to split the artifact.
Step 4
Scan for the red-flag fields — In the JSON, search for "encrypted": true, "utf8": false, unexpected hostOS values, and crc32 of 00000000 on non-encrypted entries. These are the fields that most often distinguish a benign archive from a crafted one.
Step 5
Anchor chain-of-custody with a hash — Run /archive-tools/checksum-generator to capture a SHA-256 of the whole archive before and after analysis, proving the bytes were not altered during inspection. Store the hash alongside the metadata JSON.
Step 6
Hand off to your tooling — Download the <name>-metadata.json and attach it to the case, or pipe it into your SIEM. The schema is stable, so successive reports across an investigation diff cleanly.

Header fields and what they reveal in an investigation

How each reported field maps to a security/compliance question. All read from plaintext central-directory metadata — no decryption required.

Field	Security signal	Red flag to watch for
`flags.encrypted`	Entry payload is encrypted (bit 0)	Encrypted entry in an artifact that should be plaintext — scanners skip it
`flags.utf8`	Filename encoding (bit 11)	`false` on a name with non-ASCII or path characters — possible code-page spoofing
`hostOS`	Creating operating system	Unix `hostOS: 3` on an artifact claimed to be Windows-built (or vice versa)
`compressionMethod`	Per-entry method	`AES` (method 99) confirms encryption; an exotic method may evade naive parsers
`crc32`	Stored checksum	`00000000` on a non-encrypted entry, or identical CRCs across distinct files
`lastModified`	DOS timestamp	Future dates, epoch (1980-01-01), or timestamps that contradict the build log
`hasExtraField`	Extra-field present	Unexpected extra fields can carry Unix mode bits, signatures, or padding used to hide data
`versionNeeded`	Min reader version	Surprisingly high values hint at ZIP64 or methods a downstream scanner cannot read

Tier limits for evidence sizing

Per-job limits from the archive family in tier-limits.ts. Choose the tier that holds the artifact without splitting.

Tier	Max archive size	Max entries
Free	50 MB	500
Pro	500 MB	50,000
Pro-media	2 GB	500,000
Developer	2 GB	500,000

Cookbook

Forensic recipes built from the report fields. Each shows what to grep for in the JSON and which sibling tool to chain next.

Find every encrypted entry without a password

General-purpose bit 0 lives in the plaintext central directory, so encrypted entries are visible even though their payloads are not. This is how you confirm that an upload contains an encrypted blob a content scanner would have skipped.

$ jq '.entries[] | select(.flags.encrypted) | {name, compressionMethod}' \
     suspicious-metadata.json
{
  "name": "payload.bin",
  "compressionMethod": "AES"
}

→ One encrypted entry. To test a suspected password:
  /archive-tools/archive-password-tester
→ To classify ZipCrypto vs AES across the archive:
  /archive-tools/encrypted-archive-detector

Catch non-UTF-8 filenames hiding traversal characters

flags.utf8: false means the name was stored in a legacy code page. The tool decodes leniently, so suspicious bytes may surface as U+FFFD. Combined with path characters, this is a classic obfuscation for Zip Slip-style entry names.

$ jq '.entries[] | select(.flags.utf8 == false) | .name' meta.json
"..\\..\\windows\\system32\\evil.dll"
"\uFFFD\uFFFD config.json"

→ The first is a path-traversal attempt; the second has
  non-decodable bytes. Sanitise names on extraction with
  /archive-tools/filename-sanitizer .

Cross-check provenance via hostOS

An artifact a vendor claims was built on Windows but whose entries all report hostOS 3 (Unix) is worth a question. The host-OS byte is written by the creating tool and is a cheap provenance signal.

$ jq '[.entries[].hostOS] | unique' build-from-vendor-meta.json
[ 3 ]

→ Every entry hostOS 3 = Unix-built. If the SBOM claims a
  Windows build farm, that contradiction is a finding.
  (0 = DOS/FAT, 3 = Unix, 11 = NTFS/Windows.)

Tamper smell test on CRC-32 values

A zero CRC on a non-encrypted entry, or the same CRC across files that should differ, suggests the directory was edited without recomputing checksums. The report gives you the stored CRCs to compare; verification of the payload is a separate step.

$ jq '.entries[] | {name, crc32, encrypted: .flags.encrypted}' meta.json
{ "name": "a.txt", "crc32": "00000000", "encrypted": false }
{ "name": "b.txt", "crc32": "00000000", "encrypted": false }

→ Two plaintext entries with zero CRC is suspicious.
  Recompute and verify with
  /archive-tools/archive-integrity-tester .

Lock chain-of-custody with before/after hashes

Because everything is read-only and in-tab, the archive bytes are never modified. Prove it by hashing before and after analysis and storing both hashes with the metadata JSON.

Workflow:
  1. /archive-tools/checksum-generator → SHA-256 of evidence.zip
     e3b0c44298fc1c149afbf4c8996fb924...
  2. /archive-tools/archive-metadata-extractor → metadata.json
  3. /archive-tools/checksum-generator again → same SHA-256

Identical hashes prove the inspection did not alter the file.
Archive the two hashes + metadata.json in the case record.

Edge cases and what actually happens

Encrypted entries in the archive

Supported

Encryption never blocks this tool — the central directory is plaintext, so names, sizes, methods, timestamps and flags.encrypted all read normally. Only the payloads are encrypted, and the tool never reads payloads. This is precisely why it is useful for spotting encrypted blobs a scanner would skip.

Artifact is actually a renamed 7z/RAR

Unsupported format

The tool reads ZIP only and throws 'Not a valid ZIP archive (or unsupported format for metadata extraction)' on non-ZIP input. In a forensic context the mismatch (a .zip extension on a 7z payload) is itself a finding. Confirm with /archive-tools/auto-format-detector and record it.

Filename carries invalid UTF-8 bytes

Decoded leniently

Names are decoded with non-fatal UTF-8, so undecodable bytes become U+FFFD rather than throwing. With flags.utf8: false, a U+FFFD name signals a legacy code page — a common obfuscation vector. The original bytes are not preserved in the report; sanitise on extraction with /archive-tools/filename-sanitizer.

Zero CRC on an AES entry

Expected

AES-encrypted entries (compressionMethod: 'AES', flags.encrypted: true) commonly store crc32: '00000000' because the real CRC is protected with the encrypted data. A zero CRC here is normal — only a zero CRC on a plaintext entry is suspicious.

Comment or extra field used to smuggle data

Presence only

The report flags hasComment/hasExtraField: true but does not dump their bytes — so it tells you data is present without showing what. To inspect comment contents use /archive-tools/comment-extractor; for signing-related extra fields use /archive-tools/archive-signing-info.

Truncated or tampered central directory

Partial / rejected

Parsing stops at the first record missing the 0x02014b50 signature, yielding a partial report; damage before the first entry yields zero entries and the standard error. A directory that parses fewer entries than totalEntries (the EOCD count) is itself a tamper indicator. For recovery, use /archive-tools/corrupted-zip-repair.

Multiple analysts inspecting concurrently

Supported

Each browser tab is an independent, stateless instance — there is no shared server session to contend over. Free-tier limits apply per session; a Pro seat removes them. Nothing about one analyst's run affects another's.

Regulated environment (HIPAA / PCI / FedRAMP)

Boundary preserved

Because the file is read in-tab via the File API and nothing transits the network for archive tools (browserOnly: true), the regulated boundary does not move — the same posture as a local CLI. Confirm with your compliance team, but most treat in-browser local processing as equivalent to on-machine tooling.

Over 65,535 entries (ZIP64)

Limitation

The entry count is read from the 16-bit EOCD field and wraps past 65,535; the loop may stop early on a true ZIP64 directory. For very large evidence archives, corroborate the entry count with a ZIP64-aware tool. Tier entry caps keep typical artifacts well under this.

Frequently asked questions

Is evidence uploaded anywhere?

No. Archive tools are browser-only (browserOnly: true) — the ZIP is read in-tab with the File API and never sent to a server. For chain-of-custody this matters: using the tool does not move the file outside your machine, equivalent to running a local CLI.

Can I detect encrypted entries without the password?

Yes. flags.encrypted comes from general-purpose bit 0, which is plaintext in the central directory. You see exactly which entries are encrypted (and AES entries report compressionMethod: 'AES') without decrypting anything. To test a candidate password, chain /archive-tools/archive-password-tester.

How do I spot a spoofed or path-traversal filename?

Filter for flags.utf8: false and inspect those names — legacy code-page storage is where traversal payloads (..\..\) and look-alike names hide. The tool decodes names leniently, so undecodable bytes appear as U+FFFD. Sanitise on extraction with /archive-tools/filename-sanitizer.

Does it verify CRCs or just read them?

It reports the stored crc32 from the directory; it does not recompute against the payload. A zero CRC on a plaintext entry, or duplicated CRCs across distinct files, is a tamper smell test — confirm by recomputing with /archive-tools/archive-integrity-tester.

What does hostOS tell me about provenance?

It is the high byte of versionMadeBy: 0 = DOS/FAT, 3 = Unix, 11 = NTFS/Windows. If the host OS contradicts the artifact's claimed build environment, that is a provenance discrepancy worth recording. It reflects what the creating tool wrote.

How large an evidence archive can I process?

Free: 50 MB / 500 entries. Pro: 500 MB / 50,000 entries. Pro-media and Developer: 2 GB / 500,000 entries. The size cap is checked before processing; the entry cap is enforced by the archive tier schema.

Can I attach the output to a SIEM or ticket?

Yes — the report is plain JSON (<name>-metadata.json) with no JAD wrapper, ready for SIEM ingestion, ticket attachment, or jq filtering. The stable schema means reports across an investigation timeline diff cleanly.

Does it read the comment or extra-field contents?

No — only hasComment/hasExtraField booleans (presence). To read comment text use /archive-tools/comment-extractor; for signature extra fields use /archive-tools/archive-signing-info. Reporting presence is itself useful: it flags entries carrying hidden metadata for deeper inspection.

Is this suitable for supply-chain artifact audits?

Yes for the ZIP case — it surfaces method, encryption, host OS, timestamps and CRCs per entry, which are exactly the fields a supply-chain review checks against an SBOM or build log. Pair it with /archive-tools/checksum-generator to bind the report to a whole-archive hash.

What if the artifact is a .7z or .tar.gz?

This tool is ZIP-only and will throw 'Not a valid ZIP archive…'. Identify the true format with /archive-tools/auto-format-detector; for listing non-ZIP formats use /archive-tools/archive-previewer. Record any extension/format mismatch as a finding.

Can it prove the archive wasn't altered during analysis?

Indirectly. The tool is read-only and never writes the input, so hashing the file with /archive-tools/checksum-generator before and after analysis yields identical SHA-256 values — store both with the metadata JSON as chain-of-custody evidence.

Is there an API for automated scanning?

No public REST API — archive tools are browser-only (apiAvailable: false). On Pro+ tiers the @jadapps/runner can drive the tool via a headless browser. For unattended scanning at scale, a Node ZIP-directory parser mirroring this tool's JSON schema is the practical path.

Privacy first

Every JAD Archive tool runs entirely in your browser using fflate, @zip.js/zip.js, and the libarchive WASM bridge. Your archives never leave your device — verified by zero outbound network requests during processing.

How to zip metadata extractor for security & compliance

Step 1
Stage the evidence locally — Pull the archive from your evidence store, artifact registry, or quarantine into a local folder. Nothing leaves the machine; the tool reads from disk via the File API.
Step 2
Confirm it is actually a ZIP — Run /archive-tools/auto-format-detector first. A renamed .7z/.rar will throw 'Not a valid ZIP archive…' here, and the format mismatch itself can be a finding worth recording.
Step 3
Extract the metadata — Open /archive-tools/archive-metadata-extractor, drop the ZIP, and click Process. For evidence above 50 MB use a Pro tier (up to 500 MB / 50,000 entries) so you do not have to split the artifact.
Step 4
Scan for the red-flag fields — In the JSON, search for "encrypted": true, "utf8": false, unexpected hostOS values, and crc32 of 00000000 on non-encrypted entries. These are the fields that most often distinguish a benign archive from a crafted one.
Step 5
Anchor chain-of-custody with a hash — Run /archive-tools/checksum-generator to capture a SHA-256 of the whole archive before and after analysis, proving the bytes were not altered during inspection. Store the hash alongside the metadata JSON.
Step 6
Hand off to your tooling — Download the <name>-metadata.json and attach it to the case, or pipe it into your SIEM. The schema is stable, so successive reports across an investigation diff cleanly.

Header fields and what they reveal in an investigation

How each reported field maps to a security/compliance question. All read from plaintext central-directory metadata — no decryption required.

Field	Security signal	Red flag to watch for
`flags.encrypted`	Entry payload is encrypted (bit 0)	Encrypted entry in an artifact that should be plaintext — scanners skip it
`flags.utf8`	Filename encoding (bit 11)	`false` on a name with non-ASCII or path characters — possible code-page spoofing
`hostOS`	Creating operating system	Unix `hostOS: 3` on an artifact claimed to be Windows-built (or vice versa)
`compressionMethod`	Per-entry method	`AES` (method 99) confirms encryption; an exotic method may evade naive parsers
`crc32`	Stored checksum	`00000000` on a non-encrypted entry, or identical CRCs across distinct files
`lastModified`	DOS timestamp	Future dates, epoch (1980-01-01), or timestamps that contradict the build log
`hasExtraField`	Extra-field present	Unexpected extra fields can carry Unix mode bits, signatures, or padding used to hide data
`versionNeeded`	Min reader version	Surprisingly high values hint at ZIP64 or methods a downstream scanner cannot read

Tier limits for evidence sizing

Per-job limits from the archive family in tier-limits.ts. Choose the tier that holds the artifact without splitting.

Tier	Max archive size	Max entries
Free	50 MB	500
Pro	500 MB	50,000
Pro-media	2 GB	500,000
Developer	2 GB	500,000

Cookbook

Forensic recipes built from the report fields. Each shows what to grep for in the JSON and which sibling tool to chain next.

Find every encrypted entry without a password

$ jq '.entries[] | select(.flags.encrypted) | {name, compressionMethod}' \
     suspicious-metadata.json
{
  "name": "payload.bin",
  "compressionMethod": "AES"
}

→ One encrypted entry. To test a suspected password:
  /archive-tools/archive-password-tester
→ To classify ZipCrypto vs AES across the archive:
  /archive-tools/encrypted-archive-detector

Catch non-UTF-8 filenames hiding traversal characters

$ jq '.entries[] | select(.flags.utf8 == false) | .name' meta.json
"..\\..\\windows\\system32\\evil.dll"
"\uFFFD\uFFFD config.json"

→ The first is a path-traversal attempt; the second has
  non-decodable bytes. Sanitise names on extraction with
  /archive-tools/filename-sanitizer .

Cross-check provenance via hostOS

An artifact a vendor claims was built on Windows but whose entries all report hostOS 3 (Unix) is worth a question. The host-OS byte is written by the creating tool and is a cheap provenance signal.

$ jq '[.entries[].hostOS] | unique' build-from-vendor-meta.json
[ 3 ]

→ Every entry hostOS 3 = Unix-built. If the SBOM claims a
  Windows build farm, that contradiction is a finding.
  (0 = DOS/FAT, 3 = Unix, 11 = NTFS/Windows.)

Tamper smell test on CRC-32 values

$ jq '.entries[] | {name, crc32, encrypted: .flags.encrypted}' meta.json
{ "name": "a.txt", "crc32": "00000000", "encrypted": false }
{ "name": "b.txt", "crc32": "00000000", "encrypted": false }

→ Two plaintext entries with zero CRC is suspicious.
  Recompute and verify with
  /archive-tools/archive-integrity-tester .

Lock chain-of-custody with before/after hashes

Because everything is read-only and in-tab, the archive bytes are never modified. Prove it by hashing before and after analysis and storing both hashes with the metadata JSON.

Workflow:
  1. /archive-tools/checksum-generator → SHA-256 of evidence.zip
     e3b0c44298fc1c149afbf4c8996fb924...
  2. /archive-tools/archive-metadata-extractor → metadata.json
  3. /archive-tools/checksum-generator again → same SHA-256

Identical hashes prove the inspection did not alter the file.
Archive the two hashes + metadata.json in the case record.

How to zip metadata extractor for security & compliance

Header fields and what they reveal in an investigation

Tier limits for evidence sizing

Cookbook

Find every encrypted entry without a password

Catch non-UTF-8 filenames hiding traversal characters

Cross-check provenance via hostOS

Tamper smell test on CRC-32 values

Lock chain-of-custody with before/after hashes

Edge cases and what actually happens

Encrypted entries in the archive

Artifact is actually a renamed 7z/RAR

Filename carries invalid UTF-8 bytes

Zero CRC on an AES entry

Comment or extra field used to smuggle data

Truncated or tampered central directory

Multiple analysts inspecting concurrently

Regulated environment (HIPAA / PCI / FedRAMP)

Over 65,535 entries (ZIP64)

Frequently asked questions

Is evidence uploaded anywhere?

Can I detect encrypted entries without the password?

How do I spot a spoofed or path-traversal filename?

Does it verify CRCs or just read them?

What does hostOS tell me about provenance?

How large an evidence archive can I process?

Can I attach the output to a SIEM or ticket?

Does it read the comment or extra-field contents?

Is this suitable for supply-chain artifact audits?

What if the artifact is a .7z or .tar.gz?

Can it prove the archive wasn't altered during analysis?

Is there an API for automated scanning?

Privacy first

Related guides

ZIP Metadata Extractor for Security & Compliance

How to zip metadata extractor for security & compliance

Header fields and what they reveal in an investigation

Tier limits for evidence sizing

Cookbook

Find every encrypted entry without a password

Catch non-UTF-8 filenames hiding traversal characters

Cross-check provenance via hostOS

Tamper smell test on CRC-32 values

Lock chain-of-custody with before/after hashes

Edge cases and what actually happens

Encrypted entries in the archive

Artifact is actually a renamed 7z/RAR

Filename carries invalid UTF-8 bytes

Zero CRC on an AES entry

Comment or extra field used to smuggle data

Truncated or tampered central directory

Multiple analysts inspecting concurrently

Regulated environment (HIPAA / PCI / FedRAMP)

Over 65,535 entries (ZIP64)

Frequently asked questions

Is evidence uploaded anywhere?

Can I detect encrypted entries without the password?

How do I spot a spoofed or path-traversal filename?

Does it verify CRCs or just read them?

What does hostOS tell me about provenance?

How large an evidence archive can I process?

Can I attach the output to a SIEM or ticket?

Does it read the comment or extra-field contents?

Is this suitable for supply-chain artifact audits?

What if the artifact is a .7z or .tar.gz?

Can it prove the archive wasn't altered during analysis?

Is there an API for automated scanning?

Privacy first

Related guides