How to magic bytes vs file extension: which detection method wins?
- Step 1Pick a file to compare both methods on — Choose something whose extension you trust and, optionally, something you've deliberately renamed. The validator will report what the extension claims and what the bytes actually say, so you can watch the two methods agree or diverge.
- Step 2Drop it into the validator — Single file at a time. The bytes are read into browser memory and passed to file-type's
fileTypeFromBuffer; nothing leaves your device on the Free tier. - Step 3Read the extension method's answer —
claimedExtis the last dot-segment of the filename — that's all an extension-only check ever sees. It's instant and requires no read of the file body, which is exactly why it's spoofable. - Step 4Read the magic-byte method's answer —
detectedExt/detectedMimecome from the header signature. This is content-derived: renaming the file changesclaimedExtbut neverdetectedExt. If detection returns null, the format has no magic bytes (text-like) — a case where byte detection legitimately can't help. - Step 5Compare the two —
matchesreflects whether the normalised extensions agree. A divergence is where extension-only checks would have been fooled. The alias table preventsjpegvsjpg-style false conflicts so a divergence is meaningful. - Step 6Apply the right method to the right decision — Use the extension for cosmetic routing (icon, viewer). Use magic bytes — never the extension — for any trust or execution decision. For deeper confirmation chain to hex-header-inspector or entropy-analyzer.
Extension matching vs magic-byte detection
Head-to-head on the properties that matter for security. Magic-byte detection here means this validator's file-type-based check.
| Property | File-extension matching | Magic-byte detection |
|---|---|---|
| What it inspects | The text after the last . in the name | The binary signature in the file's header |
| Defeated by renaming? | Yes — instantly | No — bytes are unchanged by a rename |
| Cost | Zero (no body read) | Read header into memory, match signature |
| Defeated by forged header? | N/A | Yes — a prepended valid signature fools it |
| Handles signature-less text? | Yes (it's all it has) | No — returns 'unknown' for CSV/TXT/JSON |
| False conflicts on variants? | n/a | No — aliases (jpeg/jpg, tif/tiff) normalised |
| Right job | Display routing, UX hints | Trust / execution / upload-accept decisions |
Formats by signature reliability
Which formats this validator can detect by magic bytes, and which carry none. Detection is offset-0 unless noted.
| Format | Magic bytes (hex) | Detected as | Notes |
|---|---|---|---|
| PNG | 89 50 4E 47 | png | Mandatory, stable signature |
| JPEG | FF D8 FF | jpg | .jpeg accepted via alias |
25 50 44 46 (%PDF) | pdf | Stable header | |
| ZIP / OOXML | 50 4B 03 04 (PK) | zip | .docx/.xlsx/.pptx all report zip |
| Windows PE | 4D 5A (MZ) | exe | Renamed executables exposed here |
| ELF | 7F 45 4C 46 | elf | Linux/Unix binaries |
| GIF | 47 49 46 38 | gif | GIF8 header |
| CSV / TXT / JSON | none | 'unknown' | No magic bytes — byte detection can't help |
Cookbook
Side-by-side outcomes showing exactly where the two methods agree and where extension-only checks would have failed.
Renamed executable: extension says safe, bytes say no
An attacker renames tool.exe to image.jpg. An extension filter sees jpg and accepts it. The validator reads MZ and reports exe — the two methods disagree, and the byte method is the correct one.
Filename: image.jpg
Header: 4D 5A 90 00 ...
Extension method -> jpg (ACCEPTS — wrong)
Magic-byte method -> exe (FLAGS — correct)
Validator: { detectedExt:"exe", claimedExt:"jpg",
matches:false, threatDetected:true }Genuine JPEG: both methods agree
A real photo named sunset.jpg. Extension says jpg; bytes (FF D8 FF) say jpg. Agreement — the easy case where extension matching happens to be right.
Filename: sunset.jpg
Header: FF D8 FF E0 ...
Extension method -> jpg
Magic-byte method -> jpg
Validator: { detectedExt:"jpg", claimedExt:"jpg",
matches:true, threatDetected:false }Variant extension: byte method avoids a false conflict
logo.tiff saved by a camera as .tif. A naive byte check that compared raw strings would flag tif != tiff. The validator's alias table maps both to tiff — no false conflict.
Filename: logo.tif
Header: 49 49 2A 00 ... (little-endian TIFF)
Naive string compare -> tif != tiff (FALSE conflict)
Validator (aliased) -> tiff == tiff (matches:true)
Validator: { detectedExt:"tiff", claimedExt:"tif",
matches:true, threatDetected:false }CSV: the case extension matching wins
A data.csv. Byte detection returns 'unknown' because CSV has no signature. Here the extension is the only useful signal — and the validator does not flag the unknown as a threat.
Filename: data.csv
Header: 6E 61 6D 65 2C ... ("name,")
Extension method -> csv (useful)
Magic-byte method -> unknown (no signature)
Validator: { detectedExt:null, claimedExt:"csv",
matches:false, threatDetected:false }Office doc: a mismatch that is not a threat
budget.xlsx. Bytes are PK (ZIP), so the byte method reports zip != xlsx. Extension matching would have 'agreed' with itself. This is the case where a raw byte-vs-extension mismatch is benign by design.
Filename: budget.xlsx
Header: 50 4B 03 04 ... (PK)
Extension method -> xlsx
Magic-byte method -> zip (OOXML container)
Validator: { detectedExt:"zip", claimedExt:"xlsx",
matches:false, threatDetected:true }
# Expected for Office files — verify inner [Content_Types].xmlEdge cases and what actually happens
Extension renamed, header untouched
DetectedThe core win of byte detection: renaming payload.exe to anything changes only claimedExt. detectedExt still reads exe from MZ, matches is false, threatDetected is true. Extension-only checks miss this entirely.
Both header and extension forged
LimitationIf an attacker prepends a real %PDF header to malware AND names it .pdf, both methods agree and the validator reports a clean match. Magic-byte detection beats renaming, not deliberate header forgery — layer it with AV and sandboxing.
Signature-less format
UnknownCSV, TXT, JSON, HTML have no magic bytes, so byte detection returns null and only the extension carries information. The validator reports 'unknown' and does NOT flag it — the one category where extension matching is the more useful signal.
OOXML reports as zip
By design.docx/.xlsx/.pptx are ZIP containers, so the byte method reports zip and produces a mismatch with the Office extension. This is expected, not an attack. Confirm by inspecting the inner archive.
Variant extensions
By designjpeg/jpg, htm/html, tif/tiff, yml/yaml, mpg/mpeg, m4v/mp4, qt/mov are normalised before comparison, so legitimate naming variants never raise a false conflict — a problem naive string-compare implementations get wrong.
Polyglot file
PartialA file valid as two formats is detected by its offset-0 signature only; the byte method reports the first format. Neither method reveals the second payload — note the anomaly and sandbox it.
Double extension
Detectedinvoice.pdf.exe claims exe (last segment) and detects exe — the validator agrees on type, but the double-extension pattern plus the executable header is the red flag. Extension-only display routing would have shown a PDF icon.
Empty extension
ExpectedA file with no . in its name yields an empty claimed extension; extensionsMatch returns false against any detected type. The byte method still reports the real type accurately — judge by detectedExt.
Frequently asked questions
Why are file extensions unreliable for security?
An extension is just text on the filename. Anyone can rename malware.exe to report.pdf in a second, and a check that only reads the extension accepts it. The file's actual content — its magic bytes — is unchanged by renaming, which is why security decisions must read the bytes, not the name.
Can magic bytes also be faked?
Yes, but it's much harder than renaming. Forging a header means prepending a valid signature to a still-functional payload — deliberate binary editing and format knowledge. Renaming takes a click. Byte detection raises the bar; combine it with antivirus and sandboxing to also defeat forgery.
Which formats have the most reliable magic bytes?
PNG (89 50 4E 47), JPEG (FF D8 FF), GIF (GIF8), PDF (%PDF), ZIP/OOXML (PK\x03\x04), ELF (7F ELF), and Windows PE (MZ) all carry mandatory, stable offset-0 signatures. These are exactly the ones the validator detects with high confidence.
Why does an Office file report as a ZIP, not as docx?
Because OOXML files (.docx, .xlsx, .pptx) ARE ZIP archives — they start with PK\x03\x04. file-type reports the outer container as zip. This is a benign mismatch, expected for every Office document; verify by checking the archive holds [Content_Types].xml.
When is extension matching actually good enough?
For cosmetic decisions where being wrong is harmless: which icon to show, which viewer to suggest, sorting a file list. It's also the only signal for signature-less text formats (CSV/TXT/JSON). Never use it alone to decide whether to trust, store, or execute a file.
Why does this validator trust bytes over the filename?
Because the filename is attacker-controlled and the bytes are not (short of forgery). It reads the header via file-type, computes detectedExt, then compares to the claimed extension. The detected type is the security-relevant answer; the claimed extension is only there to spot the disagreement.
What does 'unknown' tell me in this comparison?
That the byte method couldn't find a signature — the format is text-like (CSV/TXT/JSON) or outside the 155-format set. In that case the extension is the only signal you have. 'Unknown' is not a threat flag and not proof the extension is wrong.
Do equivalent extensions cause false conflicts?
No. The validator normalises real equivalents before comparing: jpeg/jpg, htm/html, tif/tiff, yml/yaml, mid/midi, mpg/mpe/mpeg, m4v/mp4, qt/mov. So photo.jpeg with JPEG bytes is a clean match, not a conflict.
Why do CDNs and servers still trust extensions?
Performance and convenience — reading the body costs more than reading the name, and many systems validate once on upload then trust stored metadata at serve time. The fix is to validate by magic bytes at the trust boundary (upload), not to re-check on every request.
Should I run both checks in my upload pipeline?
Yes. Use the extension for UX hints and the magic-byte result for the accept/reject decision. Reject when the bytes describe an executable regardless of extension; allow the known benign mismatches (OOXML as zip). Dual-layer validation is the practical standard.
Does the comparison handle polyglots?
Only to the offset-0 signature. A polyglot or appended-payload file reports its first format; neither extension nor byte detection reveals the second. Treat polyglot anomalies as sandbox candidates rather than relying on either method alone.
How big a file can I run the comparison on?
Free 10 MB, Pro 100 MB, Pro-media 500 MB, Developer 2 GB. Only the header is needed for detection, but the buffer is read subject to the tier cap. Most files you'd compare in a triage context are far under the Free limit.
Privacy first
Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.