How to detect files that lie about their extension
- Step 1Stage the suspicious file without opening it — Save the email attachment or quarantined object to a local folder. Do not double-click it — the validator inspects bytes, but your OS would still execute a disguised
.exeif you launched it. Keep the original filename intact; the extension on the name is half of what the tool compares. - Step 2Drop the file onto the validator — Drag the single file into the drop zone above. The validator is single-file (
acceptsMultiple: false) — drop one at a time. The file's bytes are read into browser memory viafileTypeFromBuffer; nothing is sent to a server on the Free tier. - Step 3Let it read the magic bytes — The file-type library scans the header for a matching signature. It checks fixed offsets — most signatures live in the first 12 bytes, ZIP/OOXML at offset 0 (
PK\x03\x04), some formats deeper — and returns the detectedextandmime, or nothing if no signature matches. - Step 4Compare detected against claimed — The tool splits your filename on
., takes the last segment asclaimedExt, normalises both sides through the alias table, and setsmatches. Ifmatchesis false and a type was detected,threatDetectedbecomes true. If no type was detected, the result is 'unknown' — explicitly not flagged as a threat. - Step 5Read the JSON report — The result card shows
detectedExt,detectedMime,claimedExt, and the booleanmatches/threatDetected. A green match means the bytes agree with the label; a mismatch tells you the real type the bytes describe (e.g.claimedExt: pdf,detectedExt: exe). - Step 6Escalate, don't conclude — A mismatch is a strong triage signal, not malware confirmation. Copy the report into your ticket, then send genuine mismatches (especially
exe/elf/ziphiding behind a document extension) to a sandbox. For deeper byte inspection, follow with hex-header-inspector or entropy-analyzer.
What each result state means for triage
The validator returns five fields. This maps every combination to the triage action. threatDetected is true only when a type was detected AND it disagrees with the claimed extension.
| detected | matches | threatDetected | What it means | Triage action |
|---|---|---|---|---|
a known type (e.g. jpg) | true | false | Bytes agree with the extension. The file is the type it claims (no behavioural guarantee) | Proceed; type is consistent. Not a malware verdict |
a known type (e.g. exe) | false | true | The bytes describe a different format than the label — e.g. claimed .pdf, detected exe | Treat as untrusted. Sandbox before any further handling |
null (no signature matched) | false | false | 'Unknown' — plain text, CSV, an encrypted/packed blob, or a format outside the 155-signature set | Not auto-flagged. Inspect bytes manually if the source is suspect |
zip, file is .docx/.xlsx/.pptx | false | true | OOXML documents ARE ZIP archives (PK\x03\x04) — file-type reports zip for the outer container | Expected for Office files; verify the inner [Content_Types].xml, not a real threat |
jpg, file is .jpeg | true | false | Alias-normalised: jpeg→jpg, so these never false-flag | No action; equivalent extensions |
Extension aliases that are NOT treated as a mismatch
Real-world equivalent extensions the validator normalises before comparing, so legitimate naming variants do not raise a false threat flag. Source: EXT_ALIASES in the processor.
| Detected ext | Accepted claimed ext | Why it's equivalent |
|---|---|---|
jpg | jpeg | Two spellings of the same JFIF/JPEG format |
html | htm | 8.3-era short form of the same HTML |
tiff | tif | Long/short spelling of the same TIFF |
yaml | yml | Two spellings of the same YAML |
midi | mid | Long/short spelling of Standard MIDI |
mpeg | mpg, mpe | Variant extensions for MPEG video |
mp4 | m4v | m4v is an MP4 container variant |
mov | qt | QuickTime container, two extensions |
Free-tier limits for file triage
Security-family file size and batch limits per plan (lib/tier-limits.ts). The validator is single-file regardless of plan batch caps.
| Plan | Max file size | Files per run | Notes |
|---|---|---|---|
| Free | 10 MB | 1 | Enough for nearly every attachment-triage case |
| Pro | 100 MB | 5 (batch) | Validator itself stays single-file; larger samples |
| Pro-media | 500 MB | 50 (batch) | For very large containers / disk images |
| Developer | 2 GB | Unlimited | Plus public-API access on the server-safe path |
Cookbook
Real disguise patterns and exactly what the validator reports for each. The JSON shown is the shape returned in findings / the result card.
The classic photo.jpg.exe lure
An attachment named vacation_photo.jpg.exe. With Windows hiding known extensions, the victim sees vacation_photo.jpg. The validator splits on ., takes exe as the claimed extension, and reads MZ from the header — they agree, so matches is true, but the real story is the executable header itself.
Filename: vacation_photo.jpg.exe
Header bytes: 4D 5A 90 00 03 00 ... (MZ — Windows PE)
Validator output:
{
"detectedExt": "exe",
"detectedMime": "application/x-msdownload",
"claimedExt": "exe",
"matches": true,
"threatDetected": false
}
Note: claimedExt is the LAST segment (exe). The 'jpg' is
decoration. The executable header is the red flag — and the
double-extension pattern itself is the giveaway.A renamed executable pretending to be a PDF
An attacker renames setup.exe to Q3_invoice.pdf so it slips past an extension-only filter. The bytes still start with MZ. The validator detects exe, the label says pdf — mismatch, and threatDetected fires.
Filename: Q3_invoice.pdf
Header bytes: 4D 5A 90 00 ... (MZ)
Validator output:
{
"detectedExt": "exe",
"detectedMime": "application/x-msdownload",
"claimedExt": "pdf",
"matches": false,
"threatDetected": true
}
Action: do not open. Send to sandbox.A .docx that correctly reports as zip
A legitimate Word document. OOXML files are ZIP archives, so file-type sees PK\x03\x04 and reports zip. This is a benign mismatch — expected for every .docx, .xlsx, .pptx.
Filename: contract.docx
Header bytes: 50 4B 03 04 ... (PK — ZIP/OOXML)
Validator output:
{
"detectedExt": "zip",
"detectedMime": "application/zip",
"claimedExt": "docx",
"matches": false,
"threatDetected": true
}
This is NORMAL for Office files. The outer container is a ZIP.
Verify the inner [Content_Types].xml to confirm it's real OOXML
— do not panic on the zip/docx mismatch alone.A text file the tool can't fingerprint
A plain .csv export. CSV and most text formats have no magic bytes, so file-type returns nothing. The result is 'unknown' — crucially NOT flagged as a threat, because absence of a signature is not evidence of disguise.
Filename: subscribers.csv
Header bytes: 65 6D 61 69 6C 2C ... ("email," — plain text)
Validator output:
{
"detectedExt": null,
"detectedMime": null,
"claimedExt": "csv",
"matches": false,
"threatDetected": false
}
'Unknown' is expected for text. To inspect the bytes anyway,
use hex-header-inspector.Equivalent extension — no false alarm
A photo saved as holiday.jpeg. The bytes are JPEG (FF D8 FF), detected as jpg. The alias table maps jpeg→jpg, so the comparison passes and nothing is flagged.
Filename: holiday.jpeg
Header bytes: FF D8 FF E0 ... (JFIF/JPEG)
Validator output:
{
"detectedExt": "jpg",
"detectedMime": "image/jpeg",
"claimedExt": "jpeg",
"matches": true,
"threatDetected": false
}
Alias-normalised: jpeg == jpg. No false threat.Edge cases and what actually happens
No file dropped
ErrorThe processor throws No file provided. when run with no input. The validator needs exactly one file's bytes — there is no demo or paste-text mode for this tool.
File exceeds your plan's size limit
413 over limitfileToBuffer checks size before reading: a file larger than your cap throws File "x" is N MB — exceeds the 10 MB limit for your plan. (Free). Upgrade or trim the sample. The validator only needs the header, but the buffer is read subject to the tier limit.
Extension genuinely matches a disguised binary
By designphoto.jpg.exe has claimedExt exe (the last segment) and detects as exe — so matches is true and threatDetected is false. The tool is honest: the bytes and the final extension agree. The threat is the double-extension pattern itself plus the executable header, not a content/label disagreement.
OOXML document reports as zip
By designEvery .docx/.xlsx/.pptx starts with PK\x03\x04, so file-type reports zip and threatDetected is true. This is expected, not malicious. Confirm by checking the inner archive for [Content_Types].xml. Treat zip-vs-Office mismatches as benign unless other signals stack up.
Text / CSV / JSON returns unknown
ExpectedPlain-text formats carry no magic bytes, so detection returns null. The result is 'unknown' and is NOT flagged as a threat — absence of a signature is not disguise. For these, switch to hex-header-inspector to eyeball the bytes.
Polyglot / appended payload
PartialA polyglot valid as two formats (e.g. a real GIF with a ZIP appended) is detected by its OFFSET-0 signature only — file-type reports the first format. A trailing payload after a valid header will not change the detected type. Note the anomaly and run entropy-analyzer / a full sandbox.
Forged magic bytes on a malicious file
LimitationAn attacker can prepend a valid header (e.g. %PDF) to a malicious blob so it detects as pdf and matches passes. Magic-byte validation defeats lazy renaming, not deliberate signature forgery. It is one layer — pair it with antivirus and sandbox detonation.
Format outside the 155-signature set
Unknownfile-type 19.6.0 covers 155 extensions / 150 MIME types — broad, but not every proprietary or niche format. An unrecognised-but-legitimate format also returns 'unknown'. Don't read 'unknown' as 'safe' or 'fake'; it just means no signature matched.
Filename with no extension
ExpectedIf the dropped file has no dot in its name, claimedExt becomes the whole name (or empty). extensionsMatch returns false when either side is empty, so a detected type will read as a mismatch. Detected type is still reported accurately — judge by detectedExt.
Frequently asked questions
How many file formats can the validator actually detect?
It uses the file-type library version 19.6.0, which ships 155 file extensions and 150 MIME types — images (jpg, png, gif, webp, tiff, plus many camera RAW formats), archives (zip, rar, 7z, gz, tar), executables (exe/PE, elf), audio/video (mp3, mp4, mkv, webm, mov), fonts, and document containers (PDF, OOXML). Plain text, CSV, JSON, HTML and similar have no magic bytes and return 'unknown'.
Is dropping a suspected-malware file into this tool safe?
Yes. On the Free tier it runs 100% in your browser: the bytes are read into memory and inspected by file-type. The file is never executed and never uploaded. The risk in malware triage is launching the file with your OS — the validator never does that. Save it without opening, then drop it in.
Does a mismatch mean the file is malware?
No. A mismatch (threatDetected: true) means the bytes describe a different format than the extension claims — a strong triage signal, not a verdict. Some mismatches are benign (every .docx reports as zip). Confirm genuine document-extension-hiding-an-executable cases in a sandbox before acting.
Why does my .docx show as a ZIP?
Because it IS one. Modern Office files (.docx, .xlsx, .pptx) are ZIP archives that begin with PK\x03\x04. file-type reports the outer container as zip. This is normal — verify it's genuine OOXML by checking the archive contains [Content_Types].xml.
What does an 'unknown' result mean?
The header matched no signature in the 155-format set. Causes: plain text/CSV/JSON (no magic bytes), an encrypted or packed blob (random-looking bytes), corruption, or a format outside the library. 'Unknown' is explicitly NOT flagged as a threat — it just means no signature matched.
Can the validator be fooled by faked magic bytes?
Yes. Prepending a valid header (e.g. %PDF or PK\x03\x04) makes a malicious file detect as that type. Magic-byte validation defeats lazy renaming, not deliberate forgery. Use it as one layer alongside antivirus, entropy analysis, and sandbox detonation.
Are jpg/jpeg and similar variants flagged as mismatches?
No. The validator normalises real equivalents before comparing: jpeg↔jpg, htm↔html, tif↔tiff, yml↔yaml, mid→midi, mpg/mpe→mpeg, m4v→mp4, qt→mov. A holiday.jpeg with JPEG bytes passes cleanly.
Can I check several files at once?
The validator is single-file (acceptsMultiple: false) — drop one file per run and read its report. For a deeper byte-level look at the same file, follow up with hex-header-inspector or entropy-analyzer.
What's the largest file I can validate?
Free tier: 10 MB. Pro: 100 MB. Pro-media: 500 MB. Developer: 2 GB. The validator only needs the header, but the buffer is read subject to your plan's limit, so very large files are capped. Almost all attachment triage fits well under 10 MB.
Does it detect polyglot files?
Only partially. It detects by the offset-0 signature, so a polyglot or a file with a payload appended after a valid header reports the FIRST format. The disguise won't change the detected type. Note the anomaly and send it to a full sandbox.
How does it decide which extension I 'claimed'?
It splits your filename on . and takes the last segment, lowercased, as the claimed extension. So report.final.pdf claims pdf; setup.exe claims exe. Both the claimed and detected extensions are normalised through the alias table before comparison.
Is there an API or automation path?
Yes — magic-byte-validator is on the server-safe path, so the public API and the local @jadapps/runner can run it (Developer tier for API). The server build needs the filename passed explicitly (it has no File object), via the filename option, to compute the claimed extension; the JSON result returns detected, claimedExt, matches, and threatDetected.
Privacy first
Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.