How to format detector for security & compliance
- Step 1Quarantine the suspicious file first — Keep the file in a controlled location and do not open it with its claimed application. Identifying the true container is a read-only step that does not execute or extract anything.
- Step 2Open the detector and drop the file — Use the auto-format-detector. The file is read locally via the browser File API; nothing is uploaded, which is essential for evidence and confidential artefacts.
- Step 3Compare the true format against the claimed extension — Note the Format and Magic bytes in the result. If a file claiming to be a PDF or document reports
zip,7z, orrar, you have an extension mismatch worth escalating. - Step 4Record the evidence — Download the JSON report (
<filename>-format.json) — it contains the filename, detected format, and the magic-byte hex. Attach it to the incident or audit record as the documented basis for your finding. - Step 5Check for encryption before extraction — If the file is a real archive, run the encrypted-archive-detector to see whether it's password-protected before any extraction is attempted in a sandbox.
- Step 6Route the file to the right handler — Use the detector's recommended tools — a ZIP result points to the multi-format extractor or previewer — and perform any extraction in an isolated environment, never on a production workstation.
Extension-mismatch triage matrix
What a true-format result tells you when it disagrees with the claimed extension. The detector reports the container only; it does not scan the payload.
| Claimed extension | Detector result | Interpretation |
|---|---|---|
| .pdf / .docx | zip | Office files ARE ZIPs — expected; bare .pdf reporting ZIP is suspicious |
| .zip | 7z or rar | Mismatch — built by 7-Zip/WinRAR; route to multi-format extractor, not a ZIP tool |
| .jpg / .png | zip / rar | Possible polyglot / appended-archive — escalate |
| .txt / .log | gz or bz2 | Likely a compressed log mislabelled; decompress in a sandbox |
| any | unknown | Not a recognised archive — could be benign, encrypted, an EXE, or a format the detector doesn't know |
Privacy properties for compliance review
Why this tool fits regulated and sensitive-data workflows.
| Property | Behaviour |
|---|---|
| File transmission | None — read locally via browser File API |
| Server-side processing | None — archive tools have no server path |
| Data residency impact | File stays on the analyst's device |
| Bytes inspected | First 8 bytes (plus offset 257 for TAR) — header only |
| Output | JSON with filename, format, magic hex — downloadable for records |
| Determinism | Same file → same result, reproducible by a reviewer |
What detection does and does NOT tell you
Scope is identification, not malware analysis. Pair it with the right follow-up tool.
| Question | Answered here? | Use instead |
|---|---|---|
| What is the true container format? | Yes | — |
| Is the extension spoofed? | Yes (by comparison) | — |
| Is the archive password-protected? | No | encrypted-archive-detector |
| What files are inside? | No | archive-previewer |
| Is the content malware? | No | A dedicated AV / sandbox |
| Has the archive been tampered with? | No (use a checksum) | checksum-generator |
Cookbook
Triage scenarios a security analyst or compliance reviewer meets, and what the detector reveals about each.
A 'PDF' attachment that is really a ZIP
A phishing report includes invoice_2026.pdf. A real PDF starts with %PDF. The detector reports zip — the file is a ZIP wearing a PDF name, a classic delivery trick. Escalate; do not open in a PDF reader.
Input file: invoice_2026.pdf
Output:
{
"filename": "invoice_2026.pdf",
"format": "zip",
"magicBytes": "50 4B 03 04 14 00 00 00",
"description": "ZIP archive — DEFLATE-compressed entries with central directory at end.",
"recommendedTools": ["multi-format-extractor", "archive-previewer", "archive-integrity-tester"]
}
→ Finding: extension says PDF, bytes say ZIP. Escalate.A .zip from a vendor that is actually 7z
A supply-chain artefact dependency-bundle.zip won't open in the standard ZIP tooling. The detector shows 7z — built by 7-Zip. Route it to the libarchive-backed multi-format extractor in a sandbox, not a ZIP-only tool.
Input file: dependency-bundle.zip
Output:
{
"filename": "dependency-bundle.zip",
"format": "7z",
"magicBytes": "37 7A BC AF 27 1C 00 04",
"description": "7-Zip archive — LZMA/LZMA2 compression with optional encryption.",
"recommendedTools": ["multi-format-extractor", "archive-previewer"]
}Documenting evidence for an audit
During a compliance review you must record the true type of a flagged file without uploading it anywhere. Download the detector's JSON and attach it to the case — it is the deterministic, reproducible basis for the finding.
Saved as: flagged-artifact-format.json
{
"filename": "q2-export.bin",
"format": "rar",
"magicBytes": "52 61 72 21 1A 07 01 00",
"description": "RAR archive — proprietary WinRAR format. JAD Apps reads via libarchive but cannot write.",
"recommendedTools": ["multi-format-extractor"]
}
→ Attached to case #4412 as the documented true-format evidence.An unknown result that warrants caution
A file update.zip reports unknown. It is not a recognised archive — it could be encrypted, an EXE with an appended payload, or a format outside the supported set. Treat unknown as 'do not assume safe' and analyse further in a sandbox.
Input file: update.zip
Output:
{
"filename": "update.zip",
"format": "unknown",
"magicBytes": "4D 5A 90 00 03 00 00 00",
"description": "Unrecognised — magic bytes do not match any supported archive format.",
"recommendedTools": []
}
→ 4D 5A is 'MZ' — a Windows executable masquerading as a .zip. High priority.Confirming an Office file is a legitimate ZIP
Not every mismatch is malicious. A .docx reporting zip is normal — Office Open XML files are ZIP containers. The detector helps you distinguish expected ZIP-based formats from genuinely spoofed ones.
Input file: contract.docx
Output:
{
"filename": "contract.docx",
"format": "zip",
"magicBytes": "50 4B 03 04 14 00 06 00"
}
→ Expected: .docx/.xlsx/.pptx are ZIP containers. Not a spoof on its own.Edge cases and what actually happens
Office document reports zip
ExpectedDOCX, XLSX, and PPTX are ZIP containers, so reporting zip for them is correct, not a spoof. Treat a bare .pdf or .txt reporting ZIP as suspicious, but expect Office formats to be ZIPs.
File reports unknown but is clearly hostile
Cautionunknown means the bytes match no supported archive signature — it does not mean safe. An EXE (MZ), an encrypted blob, or an unsupported format all report unknown. Never treat unknown as a clean result.
Polyglot file with appended archive
LimitedPolyglots place valid data of one type at offset 0 and append an archive later. The detector reads the leading bytes, so it reports the front format and misses the appended archive. Deep payload analysis needs a dedicated tool.
Encrypted archive looks like a normal ZIP
ExpectedMany encrypted archives keep a normal ZIP/7z header, so the detector still reports the container format. Encryption isn't visible from the magic bytes — run the encrypted-archive-detector to confirm protection before extraction.
Tampered content, intact header
Not detected hereIf only the payload was altered but the signature is intact, the detector still reports the same format — it doesn't verify integrity. Use the checksum-generator and compare against a known-good hash for tamper detection.
File over the tier size cap
Rejected (tier limit)Free caps files at 50 MB, Pro at 500 MB, Developer at 2 GB. A large evidence file beyond your tier is rejected before detection. Use a higher tier or identify it with a local CLI.
Self-extracting archive (SFX)
unknownAn SFX begins with an executable stub, so the detector reports unknown (the leading bytes are MZ, not an archive signature). The real archive is appended after the stub and is invisible to header-only detection.
ISO image submitted as evidence
unknownISO 9660's identifying descriptor sits far inside the file, beyond the bytes the detector inspects, so ISOs report unknown. This is a known scope limit, not a failure — use a CLI identifier for disc images.
Frequently asked questions
Does the detector upload the suspicious file anywhere?
No. The file is read locally in your browser and never transmitted. This is the core reason it suits security triage — potentially malicious or confidential evidence stays on the analyst's device.
Can it catch a file with a spoofed extension?
Yes — it reads the magic bytes, so a file claiming one type while its signature says another is exposed by comparing the detected format to the claimed extension.
Is this a malware scanner?
No. It identifies the true container format only. It does not scan, sandbox, or judge whether content is malicious — pair it with a dedicated AV or sandbox for that.
What does an `unknown` result mean for triage?
That the bytes match no supported archive signature. It is not a clean bill of health — it can indicate an EXE, an encrypted blob, an ISO, a polyglot, or an unsupported format. Treat it with caution.
Can I use the output as audit evidence?
Yes. The downloadable JSON records the filename, true format, and magic-byte hex deterministically, so a reviewer can reproduce the finding from the same file.
Will it detect whether an archive is encrypted?
No — encryption isn't visible in the magic bytes, so an encrypted archive still reports its container format. Run the encrypted-archive-detector to check for password protection.
Why does a .docx report as zip?
Office Open XML files (DOCX, XLSX, PPTX) are genuinely ZIP containers, so this is expected and not a spoof. A non-Office file claiming a document extension but reporting ZIP is the suspicious case.
Can it detect tampering?
Not directly — if the header is intact it reports the same format regardless of payload changes. For tamper detection, hash the file with the checksum-generator and compare against a trusted value.
Does it execute or extract the file?
No. It only reads the file header to identify the format. Nothing is executed or extracted, which keeps the triage step safe.
What about a self-extracting EXE archive?
It reports unknown because the file starts with an executable stub (MZ), not an archive signature. The archive payload is appended after the stub and isn't seen by header-only detection.
Is there a file-size limit for evidence files?
Yes — 50 MB on Free, 500 MB on Pro, and 2 GB on Developer. Larger files are rejected before detection; identify those with a local CLI.
What should I do after identifying a suspicious archive?
Check for encryption with the encrypted-archive-detector, then extract only in an isolated sandbox using the multi-format-extractor — never on a production workstation.
Privacy first
Every JAD Archive tool runs entirely in your browser using fflate, @zip.js/zip.js, and the libarchive WASM bridge. Your archives never leave your device — verified by zero outbound network requests during processing.