How to file magic bytes: a technical guide for developers
- Step 1Read enough header bytes — Read the file's header into a buffer and hand it to
fileTypeFromBuffer. Most signatures resolve within the first 12 bytes; reading a few hundred bytes covers virtually every format file-type knows. This validator reads the buffer (subject to the tier limit) and inspects the header from it. - Step 2Detect by signature, not name — Let file-type return
{ ext, mime }from the byte signature. If no signature matches it returnsundefined— the 'unknown' case. Never derive the type from the extension; that's the value the byte check exists to override. - Step 3Parse the claimed extension — Split the filename on
.and lowercase the last segment — that'sclaimedExt.report.final.pdf→pdf; a name with no dot yields the whole string (treated as an empty/odd extension). This mirrorsname.split('.').pop().toLowerCase(). - Step 4Normalise through the alias table — Before comparing, map both sides through the equivalence table (
jpeg→jpg,htm→html,tif→tiff,yml→yaml,mid→midi,mpg/mpe→mpeg,m4v→mp4,qt→mov). This prevents legitimate naming variants from being reported as conflicts. - Step 5Compute matches / threat —
matchesis true when the normalised detected and claimed extensions are equal.threatDetectedis true only when a type was detected AND it doesn't match. A null detection yieldsmatches:falsebutthreatDetected:false— unknown is not a threat. - Step 6Decide your upload policy — Reject on a real mismatch (e.g. claimed image, detected
exe) with HTTP 415; allow the known benign mismatch (OOXML aszip); log the detected/claimed/size/uploader on every mismatch for audit. For your own pipeline, mirror this logic or call the server-safe API.
Common signatures and detection offsets
Representative magic numbers from the formats file-type detects. Offset is where the signature begins. ASCII shown where printable.
| Format | Hex signature | ASCII | Offset | Detected ext |
|---|---|---|---|---|
| PNG | 89 50 4E 47 0D 0A 1A 0A | .PNG.... | 0 | png |
| JPEG | FF D8 FF | — | 0 | jpg |
| GIF | 47 49 46 38 (37|39) 61 | GIF8?a | 0 | gif |
25 50 44 46 | %PDF | 0 | pdf | |
| ZIP / OOXML / JAR | 50 4B 03 04 | PK.. | 0 | zip |
| Windows PE | 4D 5A | MZ | 0 | exe |
| ELF | 7F 45 4C 46 | .ELF | 0 | elf |
| gzip | 1F 8B | — | 0 | gz |
| 7-Zip | 37 7A BC AF 27 1C | 7z.... | 0 | 7z |
| TIFF (LE/BE) | 49 49 2A 00 / 4D 4D 00 2A | II*. / MM.* | 0 | tiff |
The validator's option contract
The complete option surface for magic-byte-validator (lib/security/security-tool-schemas.ts). There is exactly one option, and it only matters on the server/API path.
| Option | Type | Default | Purpose |
|---|---|---|---|
filename | string | (none) | Original filename used to derive the claimed extension. In the browser the dropped file's name is used automatically; the server/API has no File object, so pass it here |
Output / findings fields
The JSON shape returned by both the browser processor and the server engine for this tool.
| Field | Type | Meaning |
|---|---|---|
detectedExt | string | null | Extension from the byte signature; null = 'unknown' |
detectedMime | string | null | MIME type from the byte signature; null = 'unknown' |
claimedExt | string | Last dot-segment of the filename, lowercased |
matches | boolean | True if normalised detected == claimed |
threatDetected | boolean | True if a type was detected AND it doesn't match |
Cookbook
Implementation-shaped examples: signature reads, the alias logic, and how the validator's JSON maps to an upload decision.
Reject a renamed executable on upload
A client POSTs avatar.png with Content-Type: image/png, but the bytes are MZ. Mirror the validator: detect by bytes, ignore the declared type, reject with 415.
// pseudo-code mirroring the validator
const detected = await fileTypeFromBuffer(headerBuf); // {ext:'exe',...}
const claimedExt = name.split('.').pop().toLowerCase(); // 'png'
const matches = normalize(detected.ext) === normalize(claimedExt);
if (detected && !matches && isExecutable(detected.ext)) {
return res.status(415).send('Unsupported Media Type');
}
// validator JSON: { detectedExt:'exe', claimedExt:'png',
// matches:false, threatDetected:true }Don't reject OOXML for detecting as zip
A real .docx detects as zip. A naive reject-on-mismatch rule would block every Word doc. Treat zip-under-Office-extension as benign and verify the inner archive instead.
detected.ext === 'zip' && ['docx','xlsx','pptx'].includes(claimedExt)
-> benign container mismatch, ACCEPT
-> optionally open the zip and assert it contains
'[Content_Types].xml' to confirm genuine OOXML
// validator JSON for budget.xlsx:
// { detectedExt:'zip', claimedExt:'xlsx',
// matches:false, threatDetected:true } <-- expectedAlias table prevents a false rejection
An upload named scan.tif whose bytes are TIFF detects as tiff. Without normalisation, tif != tiff would reject a valid file. The alias map fixes it.
const EXT_ALIASES = { jpeg:'jpg', htm:'html', tif:'tiff',
yml:'yaml', mid:'midi', mpg:'mpeg', mpe:'mpeg',
m4v:'mp4', mov:'mov', qt:'mov' };
const normalize = e => EXT_ALIASES[e] ?? e;
normalize('tiff') === normalize('tif') // 'tiff' === 'tiff' -> true
// validator JSON: { detectedExt:'tiff', claimedExt:'tif',
// matches:true, threatDetected:false }Handle the 'unknown' result for text
A .csv upload returns undefined from file-type. Don't treat unknown as a rejection by default — text formats legitimately have no signature.
const detected = await fileTypeFromBuffer(buf); // undefined
if (!detected) {
// 'unknown' — NOT a threat. Fall back to content/MIME
// sniffing or an allowlist for text-like uploads.
}
// validator JSON: { detectedExt:null, detectedMime:null,
// claimedExt:'csv', matches:false, threatDetected:false }Call the server-safe API path
magic-byte-validator is server-safe, so you can run it via the public API / @jadapps/runner. Because there's no File object server-side, pass the filename in the filename option so it can compute claimedExt.
POST /v1/tools/magic-byte-validator/run
{
"input": "<base64 of file header/bytes>",
"options": { "filename": "invoice.pdf" }
}
Response (application/json):
{ "detected": { "ext":"exe", "mime":"application/x-msdownload" },
"claimedExt":"pdf", "matches":false, "threatDetected":true }Edge cases and what actually happens
ZIP-based formats all detect as zip
By designOOXML (.docx/.xlsx/.pptx), JAR, EPUB, and APK are ZIP containers, so file-type reports zip for all of them. Don't reject on the zip mismatch alone — open the archive and inspect the manifest ([Content_Types].xml for OOXML, META-INF/MANIFEST.MF for JAR) to disambiguate.
Signature lives at a non-zero offset
SupportedSome formats key on bytes past offset 0 (e.g. certain container atoms). file-type handles these internally, but a hand-rolled validator that only reads byte 0 will miss them — read a few hundred header bytes and let the library do the offset logic.
No signature matched
UnknownfileTypeFromBuffer returns undefined for text (CSV/TXT/JSON/HTML), corrupted headers, encrypted blobs, and formats outside the 155-set. The validator reports null/'unknown' and does not flag it. Don't equate 'unknown' with 'malicious' or 'safe'.
Polyglot / appended data
PartialDetection is offset-0 based, so a polyglot reports its first valid format and data appended after a valid header is invisible to type detection. For untrusted inputs, also check size/structure and decompress-then-revalidate archives.
Forged magic bytes
LimitationPrepending a valid signature to a malicious payload makes it detect as that type. Magic-byte validation is a strong filter against renaming, not a guarantee of integrity. Combine with AV, sandboxing, and size/structure checks for untrusted uploads.
Filename without an extension (server path)
ExpectedOn the server path, if you pass a filename with no dot, claimedExt becomes the whole string and extensionsMatch returns false against any detected type. The detected type is still correct — make decisions off detectedExt, not matches, for extension-less inputs.
Detected vs claimed both empty
ExpectedextensionsMatch returns false when either side is empty. So 'unknown' detection plus an empty claimed extension is a (non-threat) matches:false. Treat as inconclusive, not as a disguise.
Very large file on a low tier
413 over limitThe buffer read respects the tier file limit (Free 10 MB). For large uploads where you only need the header, read and validate a header slice yourself, or use a tier with a higher cap. The signature is in the first few hundred bytes regardless of total size.
Frequently asked questions
Which library does this validator use, and how many formats does it cover?
The file-type npm package, version 19.6.0. It covers 155 file extensions and 150 MIME types — comprehensive across images, archives, executables, audio, video, fonts, and document containers. That's the real number; '300+' is a common but inaccurate figure for this library at this version.
Which library should I use server-side to replicate this?
The same file-type package works in Node. For Python, python-magic wraps libmagic; for Go, h2non/filetype covers common formats. To match this validator exactly, use file-type and apply the same alias table when comparing detected vs claimed extension.
How many header bytes do I need to read?
Most signatures resolve in the first 12 bytes; reading a few hundred header bytes covers virtually every format file-type detects (including ZIP/OOXML keyed on the offset-0 PK header). You don't need the whole file — let the library inspect a header buffer.
Why do .docx and .jar both detect as zip?
Because they ARE ZIP archives — they begin with PK\x03\x04. file-type reports the outer container. To disambiguate, open the archive and check its manifest: [Content_Types].xml for OOXML, META-INF/MANIFEST.MF for JAR.
What exactly is the 'claimed extension'?
The lowercased last segment after the final . in the filename — name.split('.').pop().toLowerCase(). So archive.tar.gz claims gz; report.pdf claims pdf. On the server/API path you supply the name via the filename option.
How does the alias table work?
Before comparing, both detected and claimed extensions pass through a map: jpeg→jpg, htm→html, tif→tiff, yml→yaml, mid→midi, mpg/mpe→mpeg, m4v→mp4, qt→mov. This stops legitimate naming variants from being reported as mismatches.
What's the difference between 'unknown' and a threat?
'Unknown' (detected = null) means no signature matched — common for text and encrypted blobs — and is NOT flagged (threatDetected:false). A threat (threatDetected:true) means a type WAS detected but it disagrees with the claimed extension. Don't conflate the two.
How do I handle compressed/nested formats?
ZIP, gzip, bzip2, 7z, and others each have distinct magic bytes and detect correctly. For double-extension attacks nested in an archive, decompress to a temp buffer and re-run detection on the inner file — the validator inspects only the outer container's header.
Can I call this as an API instead of the UI?
Yes — magic-byte-validator is on the server-safe path, so the public API and the local @jadapps/runner can run it. Pass the file bytes (base64 accepted) plus the filename option; the JSON response gives detected, claimedExt, matches, and threatDetected.
What HTTP status should I return on a mismatch?
415 Unsupported Media Type is the conventional choice for a content/extension or content/declared-type mismatch. Log the detected type, claimed type, file size, and uploader on every mismatch for audit — a mismatch is at least an integrity error and at worst an attack attempt.
Does reading the header risk executing the file?
No. Reading bytes into a buffer and matching signatures never executes code — that's true in the browser and on the server. Execution only happens if YOUR pipeline hands the file to an interpreter or the OS launches it. Validate before any such hand-off.
What's the maximum file size I can validate?
Free 10 MB, Pro 100 MB, Pro-media 500 MB, Developer 2 GB. Detection only needs the header, but the buffer read respects your tier cap. In your own server pipeline you can read just a header slice to validate arbitrarily large uploads cheaply.
Privacy first
Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.