Archive Format Detector for Security & Compliance

How to format detector for security & compliance

Step 1
Quarantine the suspicious file first — Keep the file in a controlled location and do not open it with its claimed application. Identifying the true container is a read-only step that does not execute or extract anything.
Step 2
Open the detector and drop the file — Use the auto-format-detector. The file is read locally via the browser File API; nothing is uploaded, which is essential for evidence and confidential artefacts.
Step 3
Compare the true format against the claimed extension — Note the Format and Magic bytes in the result. If a file claiming to be a PDF or document reports zip, 7z, or rar, you have an extension mismatch worth escalating.
Step 4
Record the evidence — Download the JSON report (<filename>-format.json) — it contains the filename, detected format, and the magic-byte hex. Attach it to the incident or audit record as the documented basis for your finding.
Step 5
Check for encryption before extraction — If the file is a real archive, run the encrypted-archive-detector to see whether it's password-protected before any extraction is attempted in a sandbox.
Step 6
Route the file to the right handler — Use the detector's recommended tools — a ZIP result points to the multi-format extractor or previewer — and perform any extraction in an isolated environment, never on a production workstation.

Extension-mismatch triage matrix

What a true-format result tells you when it disagrees with the claimed extension. The detector reports the container only; it does not scan the payload.

Claimed extension	Detector result	Interpretation
.pdf / .docx	`zip`	Office files ARE ZIPs — expected; bare `.pdf` reporting ZIP is suspicious
.zip	`7z` or `rar`	Mismatch — built by 7-Zip/WinRAR; route to multi-format extractor, not a ZIP tool
.jpg / .png	`zip` / `rar`	Possible polyglot / appended-archive — escalate
.txt / .log	`gz` or `bz2`	Likely a compressed log mislabelled; decompress in a sandbox
any	`unknown`	Not a recognised archive — could be benign, encrypted, an EXE, or a format the detector doesn't know

Privacy properties for compliance review

Why this tool fits regulated and sensitive-data workflows.

Property	Behaviour
File transmission	None — read locally via browser File API
Server-side processing	None — archive tools have no server path
Data residency impact	File stays on the analyst's device
Bytes inspected	First 8 bytes (plus offset 257 for TAR) — header only
Output	JSON with filename, format, magic hex — downloadable for records
Determinism	Same file → same result, reproducible by a reviewer

What detection does and does NOT tell you

Scope is identification, not malware analysis. Pair it with the right follow-up tool.

Question	Answered here?	Use instead
What is the true container format?	Yes	—
Is the extension spoofed?	Yes (by comparison)	—
Is the archive password-protected?	No	encrypted-archive-detector
What files are inside?	No	archive-previewer
Is the content malware?	No	A dedicated AV / sandbox
Has the archive been tampered with?	No (use a checksum)	checksum-generator

Cookbook

Triage scenarios a security analyst or compliance reviewer meets, and what the detector reveals about each.

A 'PDF' attachment that is really a ZIP

A phishing report includes invoice_2026.pdf. A real PDF starts with %PDF. The detector reports zip — the file is a ZIP wearing a PDF name, a classic delivery trick. Escalate; do not open in a PDF reader.

Input file: invoice_2026.pdf

Output:
{
  "filename": "invoice_2026.pdf",
  "format": "zip",
  "magicBytes": "50 4B 03 04 14 00 00 00",
  "description": "ZIP archive — DEFLATE-compressed entries with central directory at end.",
  "recommendedTools": ["multi-format-extractor", "archive-previewer", "archive-integrity-tester"]
}
→ Finding: extension says PDF, bytes say ZIP. Escalate.

A .zip from a vendor that is actually 7z

A supply-chain artefact dependency-bundle.zip won't open in the standard ZIP tooling. The detector shows 7z — built by 7-Zip. Route it to the libarchive-backed multi-format extractor in a sandbox, not a ZIP-only tool.

Input file: dependency-bundle.zip

Output:
{
  "filename": "dependency-bundle.zip",
  "format": "7z",
  "magicBytes": "37 7A BC AF 27 1C 00 04",
  "description": "7-Zip archive — LZMA/LZMA2 compression with optional encryption.",
  "recommendedTools": ["multi-format-extractor", "archive-previewer"]
}

Documenting evidence for an audit

During a compliance review you must record the true type of a flagged file without uploading it anywhere. Download the detector's JSON and attach it to the case — it is the deterministic, reproducible basis for the finding.

Saved as: flagged-artifact-format.json
{
  "filename": "q2-export.bin",
  "format": "rar",
  "magicBytes": "52 61 72 21 1A 07 01 00",
  "description": "RAR archive — proprietary WinRAR format. JAD Apps reads via libarchive but cannot write.",
  "recommendedTools": ["multi-format-extractor"]
}
→ Attached to case #4412 as the documented true-format evidence.

An unknown result that warrants caution

A file update.zip reports unknown. It is not a recognised archive — it could be encrypted, an EXE with an appended payload, or a format outside the supported set. Treat unknown as 'do not assume safe' and analyse further in a sandbox.

Input file: update.zip

Output:
{
  "filename": "update.zip",
  "format": "unknown",
  "magicBytes": "4D 5A 90 00 03 00 00 00",
  "description": "Unrecognised — magic bytes do not match any supported archive format.",
  "recommendedTools": []
}
→ 4D 5A is 'MZ' — a Windows executable masquerading as a .zip. High priority.

Confirming an Office file is a legitimate ZIP

Not every mismatch is malicious. A .docx reporting zip is normal — Office Open XML files are ZIP containers. The detector helps you distinguish expected ZIP-based formats from genuinely spoofed ones.

Input file: contract.docx

Output:
{
  "filename": "contract.docx",
  "format": "zip",
  "magicBytes": "50 4B 03 04 14 00 06 00"
}
→ Expected: .docx/.xlsx/.pptx are ZIP containers. Not a spoof on its own.

Edge cases and what actually happens

Office document reports zip

Expected

DOCX, XLSX, and PPTX are ZIP containers, so reporting zip for them is correct, not a spoof. Treat a bare .pdf or .txt reporting ZIP as suspicious, but expect Office formats to be ZIPs.

File reports unknown but is clearly hostile

Caution

unknown means the bytes match no supported archive signature — it does not mean safe. An EXE (MZ), an encrypted blob, or an unsupported format all report unknown. Never treat unknown as a clean result.

Polyglot file with appended archive

Limited

Polyglots place valid data of one type at offset 0 and append an archive later. The detector reads the leading bytes, so it reports the front format and misses the appended archive. Deep payload analysis needs a dedicated tool.

Encrypted archive looks like a normal ZIP

Expected

Many encrypted archives keep a normal ZIP/7z header, so the detector still reports the container format. Encryption isn't visible from the magic bytes — run the encrypted-archive-detector to confirm protection before extraction.

Tampered content, intact header

Not detected here

If only the payload was altered but the signature is intact, the detector still reports the same format — it doesn't verify integrity. Use the checksum-generator and compare against a known-good hash for tamper detection.

File over the tier size cap

Rejected (tier limit)

Free caps files at 50 MB, Pro at 500 MB, Developer at 2 GB. A large evidence file beyond your tier is rejected before detection. Use a higher tier or identify it with a local CLI.

Self-extracting archive (SFX)

unknown

An SFX begins with an executable stub, so the detector reports unknown (the leading bytes are MZ, not an archive signature). The real archive is appended after the stub and is invisible to header-only detection.

ISO image submitted as evidence

unknown

ISO 9660's identifying descriptor sits far inside the file, beyond the bytes the detector inspects, so ISOs report unknown. This is a known scope limit, not a failure — use a CLI identifier for disc images.

Frequently asked questions

Does the detector upload the suspicious file anywhere?

No. The file is read locally in your browser and never transmitted. This is the core reason it suits security triage — potentially malicious or confidential evidence stays on the analyst's device.

Can it catch a file with a spoofed extension?

Yes — it reads the magic bytes, so a file claiming one type while its signature says another is exposed by comparing the detected format to the claimed extension.

Is this a malware scanner?

No. It identifies the true container format only. It does not scan, sandbox, or judge whether content is malicious — pair it with a dedicated AV or sandbox for that.

What does an `unknown` result mean for triage?

That the bytes match no supported archive signature. It is not a clean bill of health — it can indicate an EXE, an encrypted blob, an ISO, a polyglot, or an unsupported format. Treat it with caution.

Can I use the output as audit evidence?

Yes. The downloadable JSON records the filename, true format, and magic-byte hex deterministically, so a reviewer can reproduce the finding from the same file.

Will it detect whether an archive is encrypted?

No — encryption isn't visible in the magic bytes, so an encrypted archive still reports its container format. Run the encrypted-archive-detector to check for password protection.

Why does a .docx report as zip?

Office Open XML files (DOCX, XLSX, PPTX) are genuinely ZIP containers, so this is expected and not a spoof. A non-Office file claiming a document extension but reporting ZIP is the suspicious case.

Can it detect tampering?

Not directly — if the header is intact it reports the same format regardless of payload changes. For tamper detection, hash the file with the checksum-generator and compare against a trusted value.

Does it execute or extract the file?

No. It only reads the file header to identify the format. Nothing is executed or extracted, which keeps the triage step safe.

What about a self-extracting EXE archive?

It reports unknown because the file starts with an executable stub (MZ), not an archive signature. The archive payload is appended after the stub and isn't seen by header-only detection.

Is there a file-size limit for evidence files?

Yes — 50 MB on Free, 500 MB on Pro, and 2 GB on Developer. Larger files are rejected before detection; identify those with a local CLI.

What should I do after identifying a suspicious archive?

Check for encryption with the encrypted-archive-detector, then extract only in an isolated sandbox using the multi-format-extractor — never on a production workstation.

Privacy first

Every JAD Archive tool runs entirely in your browser using fflate, @zip.js/zip.js, and the libarchive WASM bridge. Your archives never leave your device — verified by zero outbound network requests during processing.

How to format detector for security & compliance

Step 1
Quarantine the suspicious file first — Keep the file in a controlled location and do not open it with its claimed application. Identifying the true container is a read-only step that does not execute or extract anything.
Step 2
Open the detector and drop the file — Use the auto-format-detector. The file is read locally via the browser File API; nothing is uploaded, which is essential for evidence and confidential artefacts.
Step 3
Compare the true format against the claimed extension — Note the Format and Magic bytes in the result. If a file claiming to be a PDF or document reports zip, 7z, or rar, you have an extension mismatch worth escalating.
Step 4
Record the evidence — Download the JSON report (<filename>-format.json) — it contains the filename, detected format, and the magic-byte hex. Attach it to the incident or audit record as the documented basis for your finding.
Step 5
Check for encryption before extraction — If the file is a real archive, run the encrypted-archive-detector to see whether it's password-protected before any extraction is attempted in a sandbox.
Step 6
Route the file to the right handler — Use the detector's recommended tools — a ZIP result points to the multi-format extractor or previewer — and perform any extraction in an isolated environment, never on a production workstation.

Extension-mismatch triage matrix

What a true-format result tells you when it disagrees with the claimed extension. The detector reports the container only; it does not scan the payload.

Claimed extension	Detector result	Interpretation
.pdf / .docx	`zip`	Office files ARE ZIPs — expected; bare `.pdf` reporting ZIP is suspicious
.zip	`7z` or `rar`	Mismatch — built by 7-Zip/WinRAR; route to multi-format extractor, not a ZIP tool
.jpg / .png	`zip` / `rar`	Possible polyglot / appended-archive — escalate
.txt / .log	`gz` or `bz2`	Likely a compressed log mislabelled; decompress in a sandbox
any	`unknown`	Not a recognised archive — could be benign, encrypted, an EXE, or a format the detector doesn't know

Privacy properties for compliance review

Why this tool fits regulated and sensitive-data workflows.

Property	Behaviour
File transmission	None — read locally via browser File API
Server-side processing	None — archive tools have no server path
Data residency impact	File stays on the analyst's device
Bytes inspected	First 8 bytes (plus offset 257 for TAR) — header only
Output	JSON with filename, format, magic hex — downloadable for records
Determinism	Same file → same result, reproducible by a reviewer

What detection does and does NOT tell you

Scope is identification, not malware analysis. Pair it with the right follow-up tool.

Question	Answered here?	Use instead
What is the true container format?	Yes	—
Is the extension spoofed?	Yes (by comparison)	—
Is the archive password-protected?	No	encrypted-archive-detector
What files are inside?	No	archive-previewer
Is the content malware?	No	A dedicated AV / sandbox
Has the archive been tampered with?	No (use a checksum)	checksum-generator

Cookbook

Triage scenarios a security analyst or compliance reviewer meets, and what the detector reveals about each.

A 'PDF' attachment that is really a ZIP

Input file: invoice_2026.pdf

Output:
{
  "filename": "invoice_2026.pdf",
  "format": "zip",
  "magicBytes": "50 4B 03 04 14 00 00 00",
  "description": "ZIP archive — DEFLATE-compressed entries with central directory at end.",
  "recommendedTools": ["multi-format-extractor", "archive-previewer", "archive-integrity-tester"]
}
→ Finding: extension says PDF, bytes say ZIP. Escalate.

A .zip from a vendor that is actually 7z

Input file: dependency-bundle.zip

Output:
{
  "filename": "dependency-bundle.zip",
  "format": "7z",
  "magicBytes": "37 7A BC AF 27 1C 00 04",
  "description": "7-Zip archive — LZMA/LZMA2 compression with optional encryption.",
  "recommendedTools": ["multi-format-extractor", "archive-previewer"]
}

Documenting evidence for an audit

Saved as: flagged-artifact-format.json
{
  "filename": "q2-export.bin",
  "format": "rar",
  "magicBytes": "52 61 72 21 1A 07 01 00",
  "description": "RAR archive — proprietary WinRAR format. JAD Apps reads via libarchive but cannot write.",
  "recommendedTools": ["multi-format-extractor"]
}
→ Attached to case #4412 as the documented true-format evidence.

An unknown result that warrants caution

Input file: update.zip

Output:
{
  "filename": "update.zip",
  "format": "unknown",
  "magicBytes": "4D 5A 90 00 03 00 00 00",
  "description": "Unrecognised — magic bytes do not match any supported archive format.",
  "recommendedTools": []
}
→ 4D 5A is 'MZ' — a Windows executable masquerading as a .zip. High priority.

Confirming an Office file is a legitimate ZIP

Input file: contract.docx

Output:
{
  "filename": "contract.docx",
  "format": "zip",
  "magicBytes": "50 4B 03 04 14 00 06 00"
}
→ Expected: .docx/.xlsx/.pptx are ZIP containers. Not a spoof on its own.

Edge cases and what actually happens

Office document reports zip

Expected

DOCX, XLSX, and PPTX are ZIP containers, so reporting zip for them is correct, not a spoof. Treat a bare .pdf or .txt reporting ZIP as suspicious, but expect Office formats to be ZIPs.

File reports unknown but is clearly hostile

Caution

Polyglot file with appended archive

Limited

Encrypted archive looks like a normal ZIP

Expected

Tampered content, intact header

Not detected here

File over the tier size cap

Rejected (tier limit)

Free caps files at 50 MB, Pro at 500 MB, Developer at 2 GB. A large evidence file beyond your tier is rejected before detection. Use a higher tier or identify it with a local CLI.

Self-extracting archive (SFX)

unknown

ISO image submitted as evidence

unknown

Frequently asked questions

Does the detector upload the suspicious file anywhere?

No. The file is read locally in your browser and never transmitted. This is the core reason it suits security triage — potentially malicious or confidential evidence stays on the analyst's device.

Can it catch a file with a spoofed extension?

Yes — it reads the magic bytes, so a file claiming one type while its signature says another is exposed by comparing the detected format to the claimed extension.

Is this a malware scanner?

No. It identifies the true container format only. It does not scan, sandbox, or judge whether content is malicious — pair it with a dedicated AV or sandbox for that.

What does an `unknown` result mean for triage?

Can I use the output as audit evidence?

Yes. The downloadable JSON records the filename, true format, and magic-byte hex deterministically, so a reviewer can reproduce the finding from the same file.

Will it detect whether an archive is encrypted?

No — encryption isn't visible in the magic bytes, so an encrypted archive still reports its container format. Run the encrypted-archive-detector to check for password protection.

Why does a .docx report as zip?

Office Open XML files (DOCX, XLSX, PPTX) are genuinely ZIP containers, so this is expected and not a spoof. A non-Office file claiming a document extension but reporting ZIP is the suspicious case.

Can it detect tampering?

Not directly — if the header is intact it reports the same format regardless of payload changes. For tamper detection, hash the file with the checksum-generator and compare against a trusted value.

Does it execute or extract the file?

No. It only reads the file header to identify the format. Nothing is executed or extracted, which keeps the triage step safe.

What about a self-extracting EXE archive?

It reports unknown because the file starts with an executable stub (MZ), not an archive signature. The archive payload is appended after the stub and isn't seen by header-only detection.

Is there a file-size limit for evidence files?

Yes — 50 MB on Free, 500 MB on Pro, and 2 GB on Developer. Larger files are rejected before detection; identify those with a local CLI.

What should I do after identifying a suspicious archive?

Check for encryption with the encrypted-archive-detector, then extract only in an isolated sandbox using the multi-format-extractor — never on a production workstation.

Format Detector for Security & Compliance

How to format detector for security & compliance

Extension-mismatch triage matrix

Privacy properties for compliance review

What detection does and does NOT tell you

Cookbook

A 'PDF' attachment that is really a ZIP

A .zip from a vendor that is actually 7z

Documenting evidence for an audit

An unknown result that warrants caution

Confirming an Office file is a legitimate ZIP

Edge cases and what actually happens

Office document reports zip

File reports unknown but is clearly hostile

Polyglot file with appended archive

Encrypted archive looks like a normal ZIP

Tampered content, intact header

File over the tier size cap

Self-extracting archive (SFX)

ISO image submitted as evidence

Frequently asked questions

Does the detector upload the suspicious file anywhere?

Can it catch a file with a spoofed extension?

Is this a malware scanner?

What does an `unknown` result mean for triage?

Can I use the output as audit evidence?

Will it detect whether an archive is encrypted?

Why does a .docx report as zip?

Can it detect tampering?

Does it execute or extract the file?

What about a self-extracting EXE archive?

Is there a file-size limit for evidence files?

What should I do after identifying a suspicious archive?

Privacy first

Related guides

Format Detector for Security & Compliance

How to format detector for security & compliance

Extension-mismatch triage matrix

Privacy properties for compliance review

What detection does and does NOT tell you

Cookbook

A 'PDF' attachment that is really a ZIP

A .zip from a vendor that is actually 7z

Documenting evidence for an audit

An unknown result that warrants caution

Confirming an Office file is a legitimate ZIP

Edge cases and what actually happens

Office document reports zip

File reports unknown but is clearly hostile

Polyglot file with appended archive

Encrypted archive looks like a normal ZIP

Tampered content, intact header

File over the tier size cap

Self-extracting archive (SFX)

ISO image submitted as evidence

Frequently asked questions

Does the detector upload the suspicious file anywhere?

Can it catch a file with a spoofed extension?

Is this a malware scanner?

What does an `unknown` result mean for triage?

Can I use the output as audit evidence?

Will it detect whether an archive is encrypted?

Why does a .docx report as zip?

Can it detect tampering?

Does it execute or extract the file?

What about a self-extracting EXE archive?

Is there a file-size limit for evidence files?

What should I do after identifying a suspicious archive?

Privacy first

Related guides