How to using entropy analysis in ctf and reverse engineering
- Step 1Drop the challenge file — Drop the binary, firmware image, or memory dump (one at a time — the entropy case reads only the first file). It is read into a local buffer and chunked into 256-byte windows; nothing is uploaded, which matters for challenge files you must not leak.
- Step 2Read the baseline, then hunt anomalies — Establish the file's baseline entropy (text ~4.5, code ~6, padding ~0). Then look for departures: a spike or plateau above the amber 7.5 line signals compressed/encrypted data; a valley inside a high-entropy archive can mark a cleartext header or a section boundary.
- Step 3Find the anomaly's starting chunk — The X-axis is the chunk index (hidden for density, but ordered left to right). Hover the curve with the tooltip to read entropy at each point and identify the chunk where the plateau begins and ends.
- Step 4Convert chunk index to byte offset — Multiply the starting chunk index by 256 to get the raw file offset:
offset = index * 256. A plateau from chunk 1500 to 1700 corresponds to bytes 384000 - 435200 (~50 KB). - Step 5Carve and verify the header — Carve that byte range with
dd if=file of=carved.bin bs=1 skip=OFFSET count=LENGTHor a hex editor. Then drop the carved chunk into hex-header-inspector or magic-byte-validator — a clean magic number (PK,PNG,7z) confirms you hit the embedded file. - Step 6Decrypt, decompress, or extract — Once carved, push the region through the right tool: a ZIP/PNG you decode directly; an AES blob goes to aes-256-encryptor in decrypt mode if you have the passphrase; a suspected LSB-stego carrier goes to steganography-decoder.
Reading the entropy curve for carving
Curve shapes and what they mean during a CTF or RE session. Offset of any feature = its chunk index x 256.
| Curve feature | Likely content | Next move |
|---|---|---|
| Spike above amber line in low-entropy file | Embedded archive / encrypted payload / flag | Carve at index x 256; check header |
| Long flat plateau ~8.0 | Encryption or strong compression | Need a key or a decompressor; identify format first |
| Plateau ~7.6-7.9 | DEFLATE/zlib (PNG, ZIP, gzip) | Carve and decompress with the matching tool |
| Deep valley to ~0 | Null padding / section boundary | Marks where one region ends and the next begins |
| Gentle rise ~5.5-6.5 | Unpacked machine code | Open in a disassembler — not hidden data |
| Sawtooth between high and low | Interleaved structures (TLV records, sprite tables) | Map the period to the record size |
Chunk-index to byte-offset reference
Fixed 256-byte window means the conversion is exact. Use it to jump straight to the carve point.
| Chunk index | Byte offset (index x 256) | Hex offset |
|---|---|---|
| 0 | 0 | 0x00000000 |
| 1 | 256 | 0x00000100 |
| 16 | 4096 | 0x00001000 |
| 256 | 65536 | 0x00010000 |
| 1024 | 262144 | 0x00040000 |
| 4096 | 1048576 (1 MB) | 0x00100000 |
Entropy of common embedded artifacts
What a hidden artifact looks like against a binary baseline. These bands help you guess the format before carving.
| Embedded artifact | Entropy | Stands out against |
|---|---|---|
| Raw machine code | 5.5 - 6.5 (x86); 6.5 - 7.0 (ARM thumb) | Lower-entropy text/data |
| PNG IDAT (DEFLATE) | 7.5 - 7.9 | JPEG or text baseline |
| ZIP / 7z body | 7.6 - 8.0 | Any non-compressed host |
| AES / RC4 ciphertext | 7.95 - 8.0 | Everything except other crypto/compression |
| Base64-encoded blob | ~6.0 | Plain text (~4.5) baseline |
| Null / 0xFF padding | 0.0 | Everything — a clear valley |
Cookbook
CTF and RE workflows. Offsets use the index x 256 rule; entropy values are representative (the tool rounds to 3 decimals).
Flag hidden in a ZIP appended to a JPEG
Classic 'extra data after IEND/EOI' challenge. The JPEG baseline sits ~7.0; a sharp jump to ~7.9 near the end of the curve is the appended ZIP. Read the start chunk, multiply by 256, carve from there to EOF.
Baseline (JPEG): ~6.9 - 7.1 Jump at chunk 980: ~7.94 to end of file Offset = 980 * 256 = 250880 (0x3D400) dd if=chal.jpg of=hidden.zip bs=1 skip=250880 # hex-header-inspector on hidden.zip -> 'PK\x03\x04' confirms ZIP unzip hidden.zip # -> flag.txt
Encrypted second stage inside firmware
A router firmware image: low-entropy bootloader and config, then a long ~8.0 plateau — the encrypted application partition. The plateau's start chunk gives the partition offset for extraction.
Chunks 0-300: ~3-5 (bootloader, env, tables) Chunks 300-9000: ~7.99 (encrypted app partition) Partition offset = 300 * 256 = 76800 (0x12C00) Length = (9000-300) * 256 = 2227200 bytes Carve, then attempt the vendor key / known XOR. The flat ~8.0 says encrypted, not just compressed.
Memory dump: locate the injected shellcode
A process dump that is mostly low-entropy heap and strings, with one ~7.8 region — an RC4-decrypted-then-recrypted payload or packed shellcode. Entropy narrows a 200 MB dump (Developer tier) to one carve target.
Most of dump: 2.5 - 5.5 (strings, heap, stack) Anomaly at chunk 410000: ~7.8 for ~80 chunks Offset = 410000 * 256 = 104,960,000 (~100 MB in) Length ~ 80 * 256 = 20480 bytes Carve those 20 KB; disassemble just that, not the whole dump.
Spot base64 in a 'plain text' file
A text challenge where the flag is base64-buried. Plain English sits ~4.3-4.8; base64 raises a region to ~6.0 — not high enough to trip the amber flag, but clearly above the text baseline on the curve.
Prose baseline: ~4.5 Suspicious block: ~6.0 for ~12 chunks base64 entropy (~6.0) < 7.5, so threatDetected stays false. But the BUMP is visible. Carve those chunks, base64 -d them. Lesson: not every clue crosses the amber line.
Find the section boundary by the entropy valley
Reversing a custom container: two high-entropy blobs separated by a deep valley of null padding. The valley is the boundary — its chunk index x 256 tells you exactly where blob A ends and blob B begins.
Blob A: chunks 0-500 (~7.8) Valley: chunks 500-505 (~0.1, null pad) Blob B: chunks 505-1100 (~7.9) Blob B offset = 505 * 256 = 129280 (0x1F900) The valley gave you the split point for free.
Edge cases and what actually happens
Chunk size is fixed at 256 — no finer granularity
By designYou cannot shrink the window to pinpoint a sub-256-byte payload. A 64-byte XOR key embedded mid-chunk averages into its neighbours and may not spike sharply. Use entropy to find the right 256-byte neighbourhood, then switch to a hex editor for byte-precise work.
Small carved payload averaged away
Low confidenceA high-entropy region smaller than one chunk shares its window with low-entropy host bytes, so the chunk's entropy lands somewhere in the middle and the spike is muted. If you suspect a tiny payload, carve generously around the bump and inspect bytes directly.
Offset math off-by-one if you misread the start chunk
InvestigateThe conversion index * 256 is exact, but reading the wrong starting chunk from the curve shifts your carve. Verify by checking that the byte at your computed offset is a plausible magic number; if it is mid-stream, step the chunk index by +/-1 and re-carve.
PNG inside JPEG appears as a high plateau
SupportedA PNG embedded in a JPEG shows its IDAT (DEFLATE, ~7.5-7.9) as a plateau above the JPEG baseline. The entropy locates it; correlate the carve offset with the PNG magic via hex-header-inspector to confirm the embedded signature before extracting.
Whole-file high entropy hides the spike
InvestigateIf the host is itself compressed (a ZIP-based document, a packed binary), the baseline is already ~7.6, so an embedded encrypted payload barely stands out. Decompress/unpack the outer layer first, then re-run entropy on the inner data to regain contrast.
Memory dump exceeds the free tier
RejectedA full dump is often hundreds of MB or GB. Free caps at 10 MB, Pro 100 MB, Pro-media 500 MB, Developer 2 GB. The reader throws exceeds the limit for your plan before charting. Use Developer tier or split the dump into tier-sized slices.
Only the first dropped file is analyzed
By designDrop a folder of challenge files and only files[0] gets a curve. Process each artifact in its own pass — there is no multi-file overlay, so per-file reading is the intended workflow.
Base64 / hex-encoded data sits below the amber line
ExpectedEncoded (not compressed/encrypted) data raises entropy only to ~6.0 (base64) or ~4.0 (hex). It will not trip threatDetected, but it is clearly above a prose baseline on the curve. Do not rely on the amber flag for encoded clues — read the relative bump.
No file dropped
ErrorAn empty drop throws No file provided. The analyzer needs a binary buffer; there is no inline-paste mode for challenge text. Save the artifact to a file and drop it.
Tooltip values rounded to 3 decimals
PreservedEach chunk's entropy is rounded to 3 decimal places (e.g. 7.994). For carving this is plenty; for exact statistical work, recompute from the raw bytes. The rounding never affects the offset math.
Frequently asked questions
What chunk size does the analyzer use?
A fixed 256 bytes per chunk. It is not configurable. 256 bytes gives enough granularity to locate multi-kilobyte payloads while keeping the chart readable on large files — and it makes the offset conversion clean: byte offset = chunk index x 256.
How do I convert a chunk index on the chart into a file offset?
Multiply by 256. A plateau starting at chunk 1500 begins at byte 384000 (0x5DC00). Read the start and end chunks of your region of interest, multiply each by 256, and you have the exact carve range for dd or a hex editor.
Can it find a PNG embedded inside a JPEG?
Yes — the embedded PNG's IDAT (DEFLATE-compressed, ~7.5-7.9) appears as a high plateau above the JPEG baseline. The curve locates it; carve at start_chunk x 256 and confirm the PNG magic with hex-header-inspector before extracting.
What entropy value indicates uncompressed code?
x86/x64 machine code typically sits at 5.5-6.5 bits/byte; ARM thumb code is denser at ~6.5-7.0. Pure assembly text or bytecode is lower, ~4.5-5.5. None of these trip the 7.5 amber line — a code region reading above 7.5 usually means it is packed or you have carved into data.
Why doesn't my base64 flag trip the high-entropy flag?
Base64 encoding raises entropy to only ~6.0, below the 7.5 threshold, so threatDetected stays false. It still shows as a visible bump above a ~4.5 prose baseline on the curve. For CTF, read the relative rise — not every clue crosses the amber line.
Can I analyze a multi-hundred-MB memory dump?
Only on a high enough tier. Free caps at 10 MB, Pro 100 MB, Pro-media 500 MB, Developer 2 GB. Above your cap the reader rejects the file before charting. For big dumps use Developer tier, or carve the dump into tier-sized slices first and analyze each.
The whole binary is high entropy — how do I find the inner payload?
Peel the outer layer first. If the host is packed or a compressed container, its baseline is already ~7.6 and any embedded payload barely stands out. Unpack/decompress the outer file, then re-run entropy on the inner bytes to regain contrast on the curve.
Does the challenge file leave my machine?
No. The analyzer runs in your browser tab — the file is read into local memory and never uploaded. That is important for CTF artifacts and client RE engagements where leaking the sample would be a problem.
Can I pinpoint a payload smaller than 256 bytes?
Not from entropy alone — a sub-chunk payload averages into its 256-byte window and the spike flattens. Use entropy to find the right neighbourhood, then drop to byte-level work in a hex editor or hex-header-inspector for precision.
How do I decrypt or extract once I've carved a region?
Match the tool to the format: a ZIP/PNG you decode directly; an AES blob (if you have the passphrase) goes to aes-256-encryptor in decrypt mode; a suspected LSB-stego carrier goes to steganography-decoder. The header you confirmed tells you which path to take.
What does a deep valley in the curve tell me?
A drop toward 0.0 is a run of identical bytes — usually null or 0xFF padding. In a custom container, a valley between two high plateaus is often the boundary between two payloads, which gives you a free split point at valley_chunk x 256.
Can I script entropy analysis across many challenge files?
Yes. GET /api/v1/tools/entropy-analyzer returns the schema and the paired @jadapps/runner runs the same math locally, returning the chunks array, highEntropyChunks, and total. Loop over your files in a script and post-process the curves — all locally, nothing uploaded.
Privacy first
Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.