Magic Bytes vs File Extension Detection — Which Is Reliable?

How to magic bytes vs file extension: which detection method wins?

Step 1
Pick a file to compare both methods on — Choose something whose extension you trust and, optionally, something you've deliberately renamed. The validator will report what the extension claims and what the bytes actually say, so you can watch the two methods agree or diverge.
Step 2
Drop it into the validator — Single file at a time. The bytes are read into browser memory and passed to file-type's fileTypeFromBuffer; nothing leaves your device on the Free tier.
Step 3
Read the extension method's answer — claimedExt is the last dot-segment of the filename — that's all an extension-only check ever sees. It's instant and requires no read of the file body, which is exactly why it's spoofable.
Step 4
Read the magic-byte method's answer — detectedExt / detectedMime come from the header signature. This is content-derived: renaming the file changes claimedExt but never detectedExt. If detection returns null, the format has no magic bytes (text-like) — a case where byte detection legitimately can't help.
Step 5
Compare the two — matches reflects whether the normalised extensions agree. A divergence is where extension-only checks would have been fooled. The alias table prevents jpeg vs jpg-style false conflicts so a divergence is meaningful.
Step 6
Apply the right method to the right decision — Use the extension for cosmetic routing (icon, viewer). Use magic bytes — never the extension — for any trust or execution decision. For deeper confirmation chain to hex-header-inspector or entropy-analyzer.

Extension matching vs magic-byte detection

Head-to-head on the properties that matter for security. Magic-byte detection here means this validator's file-type-based check.

Property	File-extension matching	Magic-byte detection
What it inspects	The text after the last `.` in the name	The binary signature in the file's header
Defeated by renaming?	Yes — instantly	No — bytes are unchanged by a rename
Cost	Zero (no body read)	Read header into memory, match signature
Defeated by forged header?	N/A	Yes — a prepended valid signature fools it
Handles signature-less text?	Yes (it's all it has)	No — returns 'unknown' for CSV/TXT/JSON
False conflicts on variants?	n/a	No — aliases (jpeg/jpg, tif/tiff) normalised
Right job	Display routing, UX hints	Trust / execution / upload-accept decisions

Formats by signature reliability

Which formats this validator can detect by magic bytes, and which carry none. Detection is offset-0 unless noted.

Format	Magic bytes (hex)	Detected as	Notes
PNG	`89 50 4E 47`	`png`	Mandatory, stable signature
JPEG	`FF D8 FF`	`jpg`	`.jpeg` accepted via alias
PDF	`25 50 44 46` (`%PDF`)	`pdf`	Stable header
ZIP / OOXML	`50 4B 03 04` (`PK`)	`zip`	.docx/.xlsx/.pptx all report `zip`
Windows PE	`4D 5A` (`MZ`)	`exe`	Renamed executables exposed here
ELF	`7F 45 4C 46`	`elf`	Linux/Unix binaries
GIF	`47 49 46 38`	`gif`	`GIF8` header
CSV / TXT / JSON	none	'unknown'	No magic bytes — byte detection can't help

Cookbook

Side-by-side outcomes showing exactly where the two methods agree and where extension-only checks would have failed.

Renamed executable: extension says safe, bytes say no

An attacker renames tool.exe to image.jpg. An extension filter sees jpg and accepts it. The validator reads MZ and reports exe — the two methods disagree, and the byte method is the correct one.

Filename: image.jpg
Header: 4D 5A 90 00 ...

Extension method  -> jpg   (ACCEPTS — wrong)
Magic-byte method -> exe   (FLAGS  — correct)

Validator: { detectedExt:"exe", claimedExt:"jpg",
             matches:false, threatDetected:true }

Genuine JPEG: both methods agree

A real photo named sunset.jpg. Extension says jpg; bytes (FF D8 FF) say jpg. Agreement — the easy case where extension matching happens to be right.

Filename: sunset.jpg
Header: FF D8 FF E0 ...

Extension method  -> jpg
Magic-byte method -> jpg

Validator: { detectedExt:"jpg", claimedExt:"jpg",
             matches:true, threatDetected:false }

Variant extension: byte method avoids a false conflict

logo.tiff saved by a camera as .tif. A naive byte check that compared raw strings would flag tif != tiff. The validator's alias table maps both to tiff — no false conflict.

Filename: logo.tif
Header: 49 49 2A 00 ...   (little-endian TIFF)

Naive string compare -> tif != tiff  (FALSE conflict)
Validator (aliased)  -> tiff == tiff (matches:true)

Validator: { detectedExt:"tiff", claimedExt:"tif",
             matches:true, threatDetected:false }

CSV: the case extension matching wins

A data.csv. Byte detection returns 'unknown' because CSV has no signature. Here the extension is the only useful signal — and the validator does not flag the unknown as a threat.

Filename: data.csv
Header: 6E 61 6D 65 2C ...   ("name,")

Extension method  -> csv   (useful)
Magic-byte method -> unknown (no signature)

Validator: { detectedExt:null, claimedExt:"csv",
             matches:false, threatDetected:false }

Office doc: a mismatch that is not a threat

budget.xlsx. Bytes are PK (ZIP), so the byte method reports zip != xlsx. Extension matching would have 'agreed' with itself. This is the case where a raw byte-vs-extension mismatch is benign by design.

Filename: budget.xlsx
Header: 50 4B 03 04 ...   (PK)

Extension method  -> xlsx
Magic-byte method -> zip   (OOXML container)

Validator: { detectedExt:"zip", claimedExt:"xlsx",
             matches:false, threatDetected:true }
# Expected for Office files — verify inner [Content_Types].xml

Edge cases and what actually happens

Extension renamed, header untouched

Detected

The core win of byte detection: renaming payload.exe to anything changes only claimedExt. detectedExt still reads exe from MZ, matches is false, threatDetected is true. Extension-only checks miss this entirely.

Both header and extension forged

Limitation

If an attacker prepends a real %PDF header to malware AND names it .pdf, both methods agree and the validator reports a clean match. Magic-byte detection beats renaming, not deliberate header forgery — layer it with AV and sandboxing.

Signature-less format

Unknown

CSV, TXT, JSON, HTML have no magic bytes, so byte detection returns null and only the extension carries information. The validator reports 'unknown' and does NOT flag it — the one category where extension matching is the more useful signal.

OOXML reports as zip

By design

.docx/.xlsx/.pptx are ZIP containers, so the byte method reports zip and produces a mismatch with the Office extension. This is expected, not an attack. Confirm by inspecting the inner archive.

Variant extensions

By design

jpeg/jpg, htm/html, tif/tiff, yml/yaml, mpg/mpeg, m4v/mp4, qt/mov are normalised before comparison, so legitimate naming variants never raise a false conflict — a problem naive string-compare implementations get wrong.

Polyglot file

Partial

A file valid as two formats is detected by its offset-0 signature only; the byte method reports the first format. Neither method reveals the second payload — note the anomaly and sandbox it.

Double extension

Detected

invoice.pdf.exe claims exe (last segment) and detects exe — the validator agrees on type, but the double-extension pattern plus the executable header is the red flag. Extension-only display routing would have shown a PDF icon.

Empty extension

Expected

A file with no . in its name yields an empty claimed extension; extensionsMatch returns false against any detected type. The byte method still reports the real type accurately — judge by detectedExt.

Frequently asked questions

Why are file extensions unreliable for security?

An extension is just text on the filename. Anyone can rename malware.exe to report.pdf in a second, and a check that only reads the extension accepts it. The file's actual content — its magic bytes — is unchanged by renaming, which is why security decisions must read the bytes, not the name.

Can magic bytes also be faked?

Yes, but it's much harder than renaming. Forging a header means prepending a valid signature to a still-functional payload — deliberate binary editing and format knowledge. Renaming takes a click. Byte detection raises the bar; combine it with antivirus and sandboxing to also defeat forgery.

Which formats have the most reliable magic bytes?

PNG (89 50 4E 47), JPEG (FF D8 FF), GIF (GIF8), PDF (%PDF), ZIP/OOXML (PK\x03\x04), ELF (7F ELF), and Windows PE (MZ) all carry mandatory, stable offset-0 signatures. These are exactly the ones the validator detects with high confidence.

Why does an Office file report as a ZIP, not as docx?

Because OOXML files (.docx, .xlsx, .pptx) ARE ZIP archives — they start with PK\x03\x04. file-type reports the outer container as zip. This is a benign mismatch, expected for every Office document; verify by checking the archive holds [Content_Types].xml.

When is extension matching actually good enough?

For cosmetic decisions where being wrong is harmless: which icon to show, which viewer to suggest, sorting a file list. It's also the only signal for signature-less text formats (CSV/TXT/JSON). Never use it alone to decide whether to trust, store, or execute a file.

Why does this validator trust bytes over the filename?

Because the filename is attacker-controlled and the bytes are not (short of forgery). It reads the header via file-type, computes detectedExt, then compares to the claimed extension. The detected type is the security-relevant answer; the claimed extension is only there to spot the disagreement.

What does 'unknown' tell me in this comparison?

That the byte method couldn't find a signature — the format is text-like (CSV/TXT/JSON) or outside the 155-format set. In that case the extension is the only signal you have. 'Unknown' is not a threat flag and not proof the extension is wrong.

Do equivalent extensions cause false conflicts?

No. The validator normalises real equivalents before comparing: jpeg/jpg, htm/html, tif/tiff, yml/yaml, mid/midi, mpg/mpe/mpeg, m4v/mp4, qt/mov. So photo.jpeg with JPEG bytes is a clean match, not a conflict.

Why do CDNs and servers still trust extensions?

Performance and convenience — reading the body costs more than reading the name, and many systems validate once on upload then trust stored metadata at serve time. The fix is to validate by magic bytes at the trust boundary (upload), not to re-check on every request.

Should I run both checks in my upload pipeline?

Yes. Use the extension for UX hints and the magic-byte result for the accept/reject decision. Reject when the bytes describe an executable regardless of extension; allow the known benign mismatches (OOXML as zip). Dual-layer validation is the practical standard.

Does the comparison handle polyglots?

Only to the offset-0 signature. A polyglot or appended-payload file reports its first format; neither extension nor byte detection reveals the second. Treat polyglot anomalies as sandbox candidates rather than relying on either method alone.

How big a file can I run the comparison on?

Free 10 MB, Pro 100 MB, Pro-media 500 MB, Developer 2 GB. Only the header is needed for detection, but the buffer is read subject to the tier cap. Most files you'd compare in a triage context are far under the Free limit.

Privacy first

Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.

How to magic bytes vs file extension: which detection method wins?

Step 1
Pick a file to compare both methods on — Choose something whose extension you trust and, optionally, something you've deliberately renamed. The validator will report what the extension claims and what the bytes actually say, so you can watch the two methods agree or diverge.
Step 2
Drop it into the validator — Single file at a time. The bytes are read into browser memory and passed to file-type's fileTypeFromBuffer; nothing leaves your device on the Free tier.
Step 3
Read the extension method's answer — claimedExt is the last dot-segment of the filename — that's all an extension-only check ever sees. It's instant and requires no read of the file body, which is exactly why it's spoofable.
Step 4
Read the magic-byte method's answer — detectedExt / detectedMime come from the header signature. This is content-derived: renaming the file changes claimedExt but never detectedExt. If detection returns null, the format has no magic bytes (text-like) — a case where byte detection legitimately can't help.
Step 5
Compare the two — matches reflects whether the normalised extensions agree. A divergence is where extension-only checks would have been fooled. The alias table prevents jpeg vs jpg-style false conflicts so a divergence is meaningful.
Step 6
Apply the right method to the right decision — Use the extension for cosmetic routing (icon, viewer). Use magic bytes — never the extension — for any trust or execution decision. For deeper confirmation chain to hex-header-inspector or entropy-analyzer.

Extension matching vs magic-byte detection

Head-to-head on the properties that matter for security. Magic-byte detection here means this validator's file-type-based check.

Property	File-extension matching	Magic-byte detection
What it inspects	The text after the last `.` in the name	The binary signature in the file's header
Defeated by renaming?	Yes — instantly	No — bytes are unchanged by a rename
Cost	Zero (no body read)	Read header into memory, match signature
Defeated by forged header?	N/A	Yes — a prepended valid signature fools it
Handles signature-less text?	Yes (it's all it has)	No — returns 'unknown' for CSV/TXT/JSON
False conflicts on variants?	n/a	No — aliases (jpeg/jpg, tif/tiff) normalised
Right job	Display routing, UX hints	Trust / execution / upload-accept decisions

Formats by signature reliability

Which formats this validator can detect by magic bytes, and which carry none. Detection is offset-0 unless noted.

Format	Magic bytes (hex)	Detected as	Notes
PNG	`89 50 4E 47`	`png`	Mandatory, stable signature
JPEG	`FF D8 FF`	`jpg`	`.jpeg` accepted via alias
PDF	`25 50 44 46` (`%PDF`)	`pdf`	Stable header
ZIP / OOXML	`50 4B 03 04` (`PK`)	`zip`	.docx/.xlsx/.pptx all report `zip`
Windows PE	`4D 5A` (`MZ`)	`exe`	Renamed executables exposed here
ELF	`7F 45 4C 46`	`elf`	Linux/Unix binaries
GIF	`47 49 46 38`	`gif`	`GIF8` header
CSV / TXT / JSON	none	'unknown'	No magic bytes — byte detection can't help

Cookbook

Side-by-side outcomes showing exactly where the two methods agree and where extension-only checks would have failed.

Renamed executable: extension says safe, bytes say no

Filename: image.jpg
Header: 4D 5A 90 00 ...

Extension method  -> jpg   (ACCEPTS — wrong)
Magic-byte method -> exe   (FLAGS  — correct)

Validator: { detectedExt:"exe", claimedExt:"jpg",
             matches:false, threatDetected:true }

Genuine JPEG: both methods agree

A real photo named sunset.jpg. Extension says jpg; bytes (FF D8 FF) say jpg. Agreement — the easy case where extension matching happens to be right.

Filename: sunset.jpg
Header: FF D8 FF E0 ...

Extension method  -> jpg
Magic-byte method -> jpg

Validator: { detectedExt:"jpg", claimedExt:"jpg",
             matches:true, threatDetected:false }

Variant extension: byte method avoids a false conflict

logo.tiff saved by a camera as .tif. A naive byte check that compared raw strings would flag tif != tiff. The validator's alias table maps both to tiff — no false conflict.

Filename: logo.tif
Header: 49 49 2A 00 ...   (little-endian TIFF)

Naive string compare -> tif != tiff  (FALSE conflict)
Validator (aliased)  -> tiff == tiff (matches:true)

Validator: { detectedExt:"tiff", claimedExt:"tif",
             matches:true, threatDetected:false }

CSV: the case extension matching wins

A data.csv. Byte detection returns 'unknown' because CSV has no signature. Here the extension is the only useful signal — and the validator does not flag the unknown as a threat.

Filename: data.csv
Header: 6E 61 6D 65 2C ...   ("name,")

Extension method  -> csv   (useful)
Magic-byte method -> unknown (no signature)

Validator: { detectedExt:null, claimedExt:"csv",
             matches:false, threatDetected:false }

Office doc: a mismatch that is not a threat

Filename: budget.xlsx
Header: 50 4B 03 04 ...   (PK)

Extension method  -> xlsx
Magic-byte method -> zip   (OOXML container)

Validator: { detectedExt:"zip", claimedExt:"xlsx",
             matches:false, threatDetected:true }
# Expected for Office files — verify inner [Content_Types].xml

Edge cases and what actually happens

Extension renamed, header untouched

Detected

Both header and extension forged

Limitation

Signature-less format

Unknown

OOXML reports as zip

By design

Variant extensions

By design

Polyglot file

Partial

A file valid as two formats is detected by its offset-0 signature only; the byte method reports the first format. Neither method reveals the second payload — note the anomaly and sandbox it.

Double extension

Detected

Empty extension

Expected

Frequently asked questions

Why are file extensions unreliable for security?

Can magic bytes also be faked?

Which formats have the most reliable magic bytes?

Why does an Office file report as a ZIP, not as docx?

When is extension matching actually good enough?

Why does this validator trust bytes over the filename?

What does 'unknown' tell me in this comparison?

Do equivalent extensions cause false conflicts?

Why do CDNs and servers still trust extensions?

Should I run both checks in my upload pipeline?

Does the comparison handle polyglots?

How big a file can I run the comparison on?

Privacy first

Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.

Magic Bytes vs File Extension: Which Detection Method Wins?

How to magic bytes vs file extension: which detection method wins?

Extension matching vs magic-byte detection

Formats by signature reliability

Cookbook

Renamed executable: extension says safe, bytes say no

Genuine JPEG: both methods agree

Variant extension: byte method avoids a false conflict

CSV: the case extension matching wins

Office doc: a mismatch that is not a threat

Edge cases and what actually happens

Extension renamed, header untouched

Both header and extension forged

Signature-less format

OOXML reports as zip

Variant extensions

Polyglot file

Double extension

Empty extension

Frequently asked questions

Why are file extensions unreliable for security?

Can magic bytes also be faked?

Which formats have the most reliable magic bytes?

Why does an Office file report as a ZIP, not as docx?

When is extension matching actually good enough?

Why does this validator trust bytes over the filename?

What does 'unknown' tell me in this comparison?

Do equivalent extensions cause false conflicts?

Why do CDNs and servers still trust extensions?

Should I run both checks in my upload pipeline?

Does the comparison handle polyglots?

How big a file can I run the comparison on?

Privacy first

Related guides

Magic Bytes vs File Extension: Which Detection Method Wins?

How to magic bytes vs file extension: which detection method wins?

Extension matching vs magic-byte detection

Formats by signature reliability

Cookbook

Renamed executable: extension says safe, bytes say no

Genuine JPEG: both methods agree

Variant extension: byte method avoids a false conflict

CSV: the case extension matching wins

Office doc: a mismatch that is not a threat

Edge cases and what actually happens

Extension renamed, header untouched

Both header and extension forged

Signature-less format

OOXML reports as zip

Variant extensions

Polyglot file

Double extension

Empty extension

Frequently asked questions

Why are file extensions unreliable for security?

Can magic bytes also be faked?

Which formats have the most reliable magic bytes?

Why does an Office file report as a ZIP, not as docx?

When is extension matching actually good enough?

Why does this validator trust bytes over the filename?

What does 'unknown' tell me in this comparison?

Do equivalent extensions cause false conflicts?

Why do CDNs and servers still trust extensions?

Should I run both checks in my upload pipeline?

Does the comparison handle polyglots?

How big a file can I run the comparison on?

Privacy first

Related guides