How to format detector in developer workflows
- Step 1Grab the confusing artefact — Download the build output, registry tarball, or proxied file that your extractor rejected. The name doesn't matter — the detector reads bytes, so an extensionless file works fine.
- Step 2Drop it on the detector — Open the auto-format-detector and drop the single file. It reads locally in the browser; nothing is uploaded, so internal artefacts stay private. There are no options to set.
- Step 3Read the true format and hex — Check the Format metric against what your build was supposed to emit. Use the Magic bytes hex to verify the exact signature — for example confirm a packer wrote real GZIP (
1F 8B) and not something else. - Step 4Confirm wrapping layers — Remember a
.tar.gzreports asgzbecause the outer bytes are GZIP. If you expected a plain TAR but getgz, your pipeline is double-wrapping; if you expected tar.gz but gettar, it isn't compressing. - Step 5Jump to the recommended converter — Follow the
recommendedToolsin the output — a GZIP result points at gzip-to-zip and tar-gz-to-zip, a ZIP result at the multi-format extractor — to actually open or normalise the artefact. - Step 6Attach the JSON to the issue — Copy the JSON or download
<filename>-format.jsonand paste it into the build ticket so reviewers see the true format, hex, and filename without re-running anything.
Common artefact mislabels developers hit
What the detector reveals for typical CI/registry/download confusion. Detection is on the outermost bytes, so wrapped formats report their outer layer.
| What you got | Detector result | What it really means |
|---|---|---|
dist.zip that won't unzip | gz | CI emitted a tar.gz but named it .zip |
pkg.tgz | tar | Registry served an uncompressed TAR (not gzipped) |
extensionless download | zip | Proxy stripped the suffix; it's a plain ZIP |
bundle.zip | 7z | Built with 7-Zip; needs the libarchive extractor |
backup.gz | unknown | Renamed text/binary that isn't really GZIP |
release.tar.gz | gz | Correct — outer layer is GZIP as expected |
Detector result to next tool
The recommendedTools the output suggests per format (ZIP, GZIP, 7z, RAR are populated; other formats return an empty list).
| Detected format | Recommended next tool(s) | Why |
|---|---|---|
zip | multi-format-extractor, archive-previewer, archive-integrity-tester | Open, inspect, or verify the ZIP |
gz | gzip-to-zip, tar-gz-to-zip | Decompress or unwrap to a usable form |
7z | multi-format-extractor, archive-previewer | libarchive WASM can read 7z |
rar | multi-format-extractor | RAR is read-only; extract via libarchive |
tar / bz2 / xz | (none returned) | Open with the multi-format-extractor directly |
unknown | (none returned) | Not a recognised archive — investigate the source |
What this tool is not
Read-only identifier. For the actual operation, use the matching sibling.
| You want to... | This tool? | Use instead |
|---|---|---|
| Know the true format | Yes | — |
| Convert tar.gz to zip | No | tar-gz-to-zip |
| List entries / preview | No | archive-previewer |
| Verify the archive opens | No | archive-integrity-tester |
| Identify many files in CI | No (one per run) | file in a shell loop |
| Read archive metadata | No | archive-metadata-extractor |
Cookbook
Real developer triage moments and the JSON the detector returns. Filenames are deliberately wrong to mirror what pipelines actually produce.
CI named it .zip but it's a tar.gz
Your deploy step unzips dist.zip and fails. The detector shows gz — the build's tar step ran but the zip step didn't, so you got a tar.gz with a .zip name. Fix the pipeline or convert with tar-gz-to-zip.
Input: dist.zip
Output:
{
"filename": "dist.zip",
"format": "gz",
"magicBytes": "1F 8B 08 00 64 7A 0A 00",
"description": "GZIP single-stream — typically used for one-file compression (.log.gz, .sql.gz).",
"recommendedTools": ["gzip-to-zip", "tar-gz-to-zip"]
}A .tgz that isn't actually gzipped
A registry artefact pkg.tgz won't decompress. The detector reports tar, not gz — the file is a bare TAR with a .tgz name, so any gunzip step will fail. Open it as a plain TAR with the multi-format extractor.
Input: pkg.tgz
Output:
{
"filename": "pkg.tgz",
"format": "tar",
"magicBytes": "2E 2F 73 72 63 2F 00 00",
"description": "TAR archive — uncompressed concatenation of files with metadata headers.",
"recommendedTools": []
}
→ It's an uncompressed TAR; don't gunzip it.Extensionless proxy download
A download proxy handed you a file simply named download. Drop it on the detector — the PK signature shows it's a ZIP, so rename to .zip (or feed straight to the extractor, which ignores the name).
Input: download
Output:
{
"filename": "download",
"format": "zip",
"magicBytes": "50 4B 03 04 14 00 08 00",
"description": "ZIP archive — DEFLATE-compressed entries with central directory at end.",
"recommendedTools": ["multi-format-extractor", "archive-previewer", "archive-integrity-tester"]
}Verifying a packer wrote real XZ
You configured a build to emit XZ and want to confirm it before publishing. The detector's xz result and the FD 37 7A 58 5A 00 hex prove the packer did the right thing.
Input: kernel-modules.xz
Output:
{
"filename": "kernel-modules.xz",
"format": "xz",
"magicBytes": "FD 37 7A 58 5A 00 00 04",
"description": "XZ stream — LZMA2-compressed, common for kernel and Linux package distribution.",
"recommendedTools": []
}
→ Confirmed: real XZ stream, safe to publish.An artefact that turns out not to be an archive
A cache.gz from a flaky CI step reports unknown. The hex shows ASCII text — the compression step silently no-op'd and wrote a plain file with a .gz name. The detector pinpoints the broken step.
Input: cache.gz
Output:
{
"filename": "cache.gz",
"format": "unknown",
"magicBytes": "7B 22 76 65 72 73 69 6F",
"description": "Unrecognised — magic bytes do not match any supported archive format.",
"recommendedTools": []
}
→ 7B is '{' — it's raw JSON, never compressed. Fix the CI gzip step.Edge cases and what actually happens
No CLI / programmatic API
Browser-onlyThis is an interactive browser tool with no server endpoint or HTTP API — you can't curl it in a pipeline. For automated CI format checks, call the file command in your shell step instead; use the browser detector for interactive triage.
tar.gz reports as gz
By designThe detector reads the outermost bytes, which for a gzipped TAR are GZIP. It reports gz, never tar.gz. If you expected a plain TAR and see gz, your pipeline is compressing; convert with tar-gz-to-zip if you need a ZIP.
.tgz reports as tar
ExpectedA .tgz that wasn't actually gzipped is a bare TAR and reports tar. Any gunzip step will fail on it. Open it as an uncompressed TAR with the multi-format-extractor.
Empty or truncated build artefact
unknownA zero-byte or sub-4-byte artefact (a failed build step) returns unknown immediately — there are no signature bytes to read. This itself is a useful signal that the producing step wrote nothing.
TAR artefact under 262 bytes
unknownTAR is matched by ustar at offset 257, so a TAR smaller than 262 bytes can't be confirmed and reports unknown. Tiny test fixtures may hit this; real build TARs are well above the threshold.
Format outside the supported set
unknownZstandard, LZ4, and similar modern artefact formats have no signature here and report unknown even though they're valid. The detector covers ZIP, GZIP, TAR, BZIP2, XZ, 7z, RAR; use file for the rest.
Recommended-tools list is empty
ExpectedFor tar, bz2, xz, and unknown the output's recommendedTools is empty by design — only ZIP, GZIP, 7z, and RAR carry suggestions. Open TAR/BZ2/XZ directly with the multi-format-extractor.
Artefact over the tier size cap
Rejected (tier limit)A multi-gigabyte artefact past your tier cap (Free 50 MB, Pro 500 MB, Developer 2 GB) is rejected before detection. For very large CI artefacts, identify them with a local file call instead.
Frequently asked questions
Can I call this from CI as an API?
No. It's an interactive browser tool with no server endpoint, so it can't be scripted into a pipeline. For automated format checks in CI, use the file command in a shell step; use this detector for hands-on triage.
Why does my CI's dist.zip detect as gz?
Your build produced a tar.gz (outer bytes 1F 8B) but named it .zip. The detector reports the true outer format. Fix the packaging step or convert with tar-gz-to-zip.
Why does a .tgz detect as tar, not gz?
Because it isn't actually gzipped — it's a bare TAR with a .tgz name. Any gunzip step will fail; open it as an uncompressed TAR with the multi-format-extractor.
Does it convert the file too?
No — it only identifies the format. It then recommends the right converter (for example gzip-to-zip), which performs the actual conversion.
Is detection deterministic?
Yes. The same file always yields the same format and the same magic-byte hex, because detection is a pure signature lookup with no heuristics — so a teammate can reproduce your finding.
Can I check the raw signature bytes?
Yes — the output includes the first 8 bytes as uppercase hex, so you can confirm a packer wrote exactly the signature you expected.
Does it upload my internal artefact?
No. The file is read locally in the browser and never transmitted, so internal or proprietary build outputs stay on your machine.
Can it identify several artefacts at once?
No — one file per run. For bulk identification across a directory, loop file in your shell. The detector targets the single-file interactive case.
What if the result is unknown?
The artefact matches no supported archive signature — it may be a non-archive (raw JSON, text), a truncated output, an EXE, or a format outside the supported set. Check the hex to see what it really is.
Why is recommendedTools empty for my tar/xz file?
Suggestions are only attached for ZIP, GZIP, 7z, and RAR. For TAR, BZIP2, and XZ the list is empty by design — open them directly with the multi-format-extractor.
What's the file-size limit?
50 MB on Free, 500 MB on Pro, and 2 GB on Developer. Files larger than your tier are rejected before detection.
After detecting, how do I read what's inside?
Use the archive-previewer to list entries or the archive-metadata-extractor for metadata — the detector only names the container.
Privacy first
Every JAD Archive tool runs entirely in your browser using fflate, @zip.js/zip.js, and the libarchive WASM bridge. Your archives never leave your device — verified by zero outbound network requests during processing.