How to shannon entropy analyzer for files
- Step 1Drop the suspicious file — The analyzer accepts any file type — EXE, DLL, document, archive, firmware blob. The drop area allows multiple files, but the entropy case analyzes the first file only per run, so drop one sample at a time for a clean chart. The file is read into a single in-browser buffer; nothing is uploaded.
- Step 2Let the 256-byte chunker run — The file is sliced into consecutive 256-byte windows (the last window may be shorter). For each window the tool builds a 256-bucket byte-frequency table and applies
H = -Sum p(b) log2 p(b). Results are rounded to 3 decimal places and pushed into thechunksarray. - Step 3Read the cyan entropy curve — The X-axis is the chunk index (hidden for density); the Y-axis is fixed
[0, 8]bits/byte. The amber dashed line marks 7.5 — the high-entropy boundary. Hover any point for its exact entropy value via the tooltip. - Step 4Check the high-entropy readout — Above the chart,
{N} high-entropy chunks ({P}%)counts windows at or above 7.5. The percentage text turns amber once it passes 50% — that is the file-levelthreatDetectedsignal. The footer shows the single highest chunk value asMax: X.XX bits/byte. - Step 5Locate sustained plateaus, not single spikes — A lone chunk at 7.9 inside text is usually a base64 blob or an embedded thumbnail. A flat plateau sitting on or above the amber line across a whole region is the packing/encryption signature. Note where the plateau starts and ends along the curve.
- Step 6Confirm with header and hash tools — Entropy is a triage signal, not a verdict. Cross-check the header with magic-byte-validator and hex-header-inspector, and fingerprint the sample with multi-hash-fingerprinter before escalating to a sandbox.
Entropy bands by content type
Approximate Shannon entropy ranges (bits/byte) for common content. The analyzer's amber line sits at 7.5; the engine counts any chunk >=7.5 as high-entropy. Compression and encryption are NOT distinguishable by entropy alone.
| Content | Typical entropy | On the chart | Triage read |
|---|---|---|---|
| Plain ASCII / source code | 4.0 - 5.5 | Well below the amber line | Benign-looking; a high spike here is the interesting part |
| Unpacked x86/x64 code section | 5.5 - 6.5 | Below the line, gently varying | Normal executable code — readable in a disassembler |
| Structured data / string tables | 2.0 - 4.5 | Low valleys | Headers, padding, resource tables — expected low regions |
| DEFLATE / zlib / gzip | 7.5 - 7.9 | On or just above the amber line | High but legit if the file is a known archive/asset |
| AES / strong encryption | 7.95 - 8.0 | Flat plateau hugging the top | Indistinguishable from compression by entropy; check context |
| UPX / runtime-packed binary | 7.7 - 8.0 across most of the file | Uniform plateau over >50% of chunks | Triggers the amber threatDetected flag — escalate |
What the analyzer returns
The exact fields the entropy case produces. There are no options to configure — the tool exposes an empty option schema and a fixed 256-byte chunk size.
| Field / control | Value | Notes |
|---|---|---|
| Chunk size | 256 bytes (fixed) | Not configurable; the last chunk may be shorter than 256 |
chunks | Array of per-chunk entropy floats | Each value rounded to 3 decimals, range 0.000 - 8.000 |
highEntropyChunks | Count of chunks with entropy >=7.5 | Drives the readout count and percentage |
total | Number of chunks | Equals ceil(fileBytes / 256) |
threatDetected | highEntropyChunks > total * 0.5 | File-level boolean; turns the readout percentage amber |
| Options | None | Empty option schema — no presets, no threshold slider, no window-size control |
Tier and size limits (security family)
File-size and batch ceilings for the Security family. Entropy-analyzer is a free tool; you only need a higher tier for larger samples.
| Tier | Max file size | Files per run | Practical use |
|---|---|---|---|
| Free | 10 MB | 1 | Droppers, loaders, document payloads |
| Pro | 100 MB | 5 | Full installers, mid-size firmware |
| Pro-media | 500 MB | 50 | Large disk images, media containers |
| Developer | 2 GB | Unlimited | Memory dumps, full firmware images |
Cookbook
Real triage patterns. Entropy values shown are representative; the analyzer rounds to 3 decimals and counts chunks at or above 7.5.
UPX-packed dropper trips the amber flag
A 480 KB sample that AV rated clean. The entropy curve is a flat plateau above 7.5 across almost the whole file — the classic runtime-packer signature. Because more than 50% of chunks are high-entropy, the readout percentage turns amber and threatDetected is true.
Chart readout:
Shannon Entropy · 1875 chunks · 256 bytes each
1796 high-entropy chunks (96%) <- amber
Max: 7.99 bits/byte
JSON findings:
{ "total": 1875, "highEntropyChunks": 1796, "threatDetected": true }
Read: 96% > 50% -> packed/encrypted body. Confirm UPX with
magic-byte-validator (check for the UPX! marker in the header).Encrypted blob hidden inside a plaintext config
A 40 KB .conf file that should be all text. One region of the curve jumps onto the amber line while the rest stays near 4.5 — an embedded encrypted payload appended to a legitimate config. The overall percentage stays low, so threatDetected is false, but the local plateau is the tell.
Baseline chunks: ~4.3 - 4.8 bits/byte (text) Suspicious region: ~7.92 bits/byte for ~120 chunks (~30 KB) Readout: 122 high-entropy chunks (76%) ... wait, recheck: if the file is mostly the blob, percentage is high; if the blob is a minority, threatDetected stays false but the plateau is still visible on the curve. Trust the SHAPE, not just the flag, for embedded payloads.
Legitimate compressed installer — high but benign
A 9 MB NSIS installer. The whole curve sits on the amber line because the payload is DEFLATE-compressed. threatDetected fires, but context (signed, known vendor, expected size) says benign. This is why entropy is a triage signal, not a verdict.
Readout: 33000 high-entropy chunks (94%) <- amber threatDetected: true Context check: - magic-byte-validator: detects NSIS / PE installer header - signed by a known vendor cert -> benign compression - same hash as the official download (multi-hash-fingerprinter) => High entropy explained by compression, not malware.
Comparing a clean EXE to its packed twin
Run the same binary before and after packing. The unpacked version shows distinct entropy zones (low data, mid code, high resources); the packed version is a uniform plateau. Drop them one at a time (the tool charts only the first file per run).
Unpacked notepad-like EXE: code .text ~6.1 data .rdata ~4.0 .rsrc ~7.3 highEntropyChunks: 14% threatDetected: false Same EXE after UPX: uniform ~7.9 across the body highEntropyChunks: 91% threatDetected: true The loss of distinct zones is the packing fingerprint.
Document with an OLE-embedded encrypted object
A .docx (ZIP container) where most chunks are ~7.6 because DOCX is itself a compressed ZIP. Entropy alone cannot separate the benign ZIP compression from a maliciously embedded encrypted object — so pair the curve with a header inspection rather than trusting the flag.
DOCX is a ZIP -> baseline already ~7.5-7.8 everywhere. The amber flag will fire on almost any modern Office file. Do NOT treat 'DOCX shows high entropy' as a finding. Instead: unzip and run entropy on the embedded parts, or use hex-header-inspector on the embedded OLE stream headers.
Edge cases and what actually happens
More than one file dropped — only the first is charted
By designThe drop area accepts multiple files (the registry marks the tool acceptsMultiple), but the entropy case reads files[0] and analyzes only that buffer. There is no per-file batch chart. Drop samples one at a time; to compare two files, run them in separate passes.
Compression and encryption look identical
ExpectedShannon entropy measures byte randomness, not meaning. AES ciphertext (~7.99) and DEFLATE output (~7.6-7.9) both land in the high-entropy band. The analyzer cannot tell them apart — threatDetected only counts how many chunks cross 7.5, it does not classify them. Use header context to disambiguate.
High entropy on a known-good media or archive file
ExpectedMP3/AAC/FLAC audio, JPEG, MP4, ZIP, and DOCX/XLSX (ZIP containers) all sit near or above 7.5 by design because they are compressed. The amber flag will fire on benign media. High entropy is normal for compressed content — it is only a finding when it is unexpected for the file's claimed type.
Tiny file — fewer chunks, noisier numbers
Low confidenceA 256-byte file is a single chunk; a 200-byte file is one short chunk. With few samples the entropy of any one chunk can swing widely, and 'percentage of high-entropy chunks' becomes coarse (one chunk = 50% or 100%). Treat sub-kilobyte results as indicative only.
File exceeds your tier's size limit
RejectedThe buffer reader enforces the per-tier ceiling and throws File "name" is N MB — exceeds the M MB limit for your plan. before any entropy is computed. Free is 10 MB, Pro 100 MB, Pro-media 500 MB, Developer 2 GB. Split a large dump or upgrade.
Single high spike inside low-entropy text
InvestigateOne or two chunks near 8.0 surrounded by ~4.5 text usually means an embedded base64 blob, a thumbnail, or a key — not whole-file packing. threatDetected stays false (well under 50%), so trust the curve shape, not the flag, for localized payloads.
Padding or null runs drag entropy to zero
PreservedLong runs of 0x00 (PE section alignment, sparse regions) produce chunks at ~0.0 bits/byte — deep valleys on the curve. These are real and useful: a valley between two high plateaus often marks a section boundary. The tool reports them faithfully; it does not smooth them out.
Steganography in a lossless image
Hint onlyLSB steganography slightly elevates entropy in the carrier but rarely crosses 7.5 cleanly, so the flag usually will not fire. Entropy is a weak hint here; for actual extraction use steganography-decoder on the PNG/BMP carrier.
No file dropped
ErrorRunning with an empty drop throws No file provided. The tool needs a binary buffer; there is no text-paste mode for entropy analysis (unlike the password auditor).
Entropy near 8.0 but file is not malicious
ExpectedA truly random key file, a one-time pad, or a high-quality random seed legitimately reaches ~8.0. Maximum entropy is not a malware indicator on its own — it only means the bytes are uniformly distributed. Always anchor the verdict to file type, source, and signature.
Frequently asked questions
What counts as 'high entropy' in this tool?
The engine counts any 256-byte chunk with entropy at or above 7.5 bits/byte as high-entropy, and the chart draws the amber reference line at exactly 7.5. Random data scores near 8.0, plain text 4-5. The file-level threatDetected boolean is true when high-entropy chunks make up more than 50% of the file.
Can this confirm a file is malware?
No — it is a triage signal, not a verdict. Packed and encrypted malware shows sustained high entropy, but so do legitimate compressed installers, media files, and ZIP-based documents. Use entropy to decide what to look at next, then confirm with a header check, a hash lookup, and a sandbox.
Why does my MP3 / JPEG / DOCX show high entropy?
Because they are compressed. MP3/AAC/FLAC, JPEG, and ZIP-container formats (DOCX, XLSX, PPTX) all use entropy coding that produces near-maximal byte randomness — typically 7.5-7.9. The amber flag firing on these is expected and benign; high entropy is only interesting when it is unexpected for the file's claimed type.
What chunk size does the analyzer use? Can I change it?
A fixed 256 bytes per chunk. There is no window-size control — the option schema for entropy-analyzer is empty. 256 bytes balances locating multi-kilobyte encrypted regions against keeping the chart readable on large files.
Does the file get uploaded anywhere?
No. The browser tool reads the file into local memory and computes entropy in your tab — bytes never leave your machine. This is exactly what you want for first-look triage on a live sample. A separate server-safe engine exists only for the paired runner/API path, and even that runs locally on your machine.
Can it tell encryption from compression?
No tool can, from entropy alone. AES ciphertext (~7.99) and DEFLATE output (~7.6-7.9) both sit in the high band. Disambiguate with context: check the header with magic-byte-validator, and look at the first bytes with hex-header-inspector for format markers like PK, MZ, or UPX!.
How big a file can I analyze?
Free tier caps at 10 MB and 1 file. Pro raises it to 100 MB / 5 files, Pro-media to 500 MB / 50 files, and Developer to 2 GB / unlimited. If you exceed your cap, the reader throws a clear exceeds the limit for your plan error before computing anything.
I dropped several files — why did only one chart appear?
The entropy case processes the first file only (files[0]), even though the drop area accepts multiple. There is no batch entropy view. Drop one sample per run; to compare files, run them separately and read the two charts side by side.
What does the 'Max' number under the chart mean?
It is the single highest chunk entropy in the file, shown as Max: X.XX bits/byte. It is not an average — there is no average displayed. The other readout, {count} high-entropy chunks ({percent}%), tells you how widespread the high entropy is.
Will entropy reveal hidden steganographic data?
Only weakly. LSB steganography nudges carrier entropy up but usually stays under 7.5, so the flag often will not fire. Entropy is a hint, not an extractor — to actually pull hidden data from a PNG or BMP, use steganography-decoder.
Why is part of my binary at entropy 0?
Long runs of identical bytes — typically 0x00 padding for PE section alignment or sparse/zeroed regions — have zero entropy because there is no uncertainty. These deep valleys are real and often mark section boundaries, which helps when you are carving.
Can I run entropy analysis in an automated pipeline?
Yes. GET /api/v1/tools/entropy-analyzer returns the schema (no options), and the paired @jadapps/runner executes the same byte math locally, returning identical chunks, highEntropyChunks, total, and threatDetected fields. The sample never reaches JAD's servers — the runner processes it on your machine.
Privacy first
Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.