Find Duplicate Files by Content Hash (SHA-256) — Free, In-Browser

How to detect duplicate files by sha-256 content fingerprint

Step 1
Decide which hash you'll dedup on — Use SHA-256 unless you're matching an existing index. It's the sha256 field in the report — a 64-hex-char string. SHA-256 is the right default: long enough that accidental collisions don't happen, fast enough for everyday files.
Step 2
Fingerprint the first candidate — Drop a file onto the dropzone and run. The bytes are read into memory and digested locally. Copy the sha256 value, or download the <filename>.hashes.json report, and record it next to the filename in a spreadsheet or note.
Step 3
Fingerprint each remaining candidate — Each run hashes the first dropped file only, so repeat per file — drop, run, record. Build up a list of filename -> sha256 pairs as you go. (For a guided two-file comparison instead of a list, see the cookbook.)
Step 4
Sort by hash to surface duplicate groups — Sort your filename -> sha256 list by the hash column. Any two rows with the same SHA-256 are byte-for-byte identical files — a duplicate group. Rows with a unique hash are one-of-a-kind.
Step 5
Keep one, delete the rest of each group — Within a duplicate group, keep whichever copy has the name/location you want and delete the others — they are exact byte copies, so nothing is lost. Files with a unique hash are not duplicates; leave them alone.
Step 6
Re-fingerprint after cleanup to confirm — After deleting, fingerprint the survivor again and confirm its SHA-256 is unchanged. A matching digest proves you kept an intact copy and didn't accidentally truncate or alter it during the cleanup.

Why filename, size, and date can't prove a duplicate — but the hash can

Common signals people rely on for dedup, and why only a content hash is decisive.

Signal	Same value means…	Reliable for dedup?
Filename	Nothing — copies get renamed; different files share names	No — `copy (1).jpg` could be identical or unrelated
File size	Possibly the same bytes — but two different files can match by size	No — a same-size pair is a candidate, not a confirmed duplicate
Modified date	Nothing — copying or syncing rewrites timestamps	No — a true copy often has a brand-new date
SHA-256 digest	Byte-for-byte identical contents	Yes — a match is a true duplicate; a mismatch is genuinely different

Reading the report for dedup

Every run returns all four digests. For deduplication, the SHA-256 line is the one to key on; the others are there for matching legacy indexes.

Report field	Length	Use for dedup
`sha256`	64 hex chars	Primary key — sort/group on this to find duplicates
`md5`	32 hex chars	Only if reconciling against an older asset manager that indexed on MD5
`sha1`	40 hex chars	Only if matching a system that stored SHA-1 content keys
`sha512`	128 hex chars	Higher-assurance dedup of critical archives; same conclusion as SHA-256

Limits and scope

Per-file, in-memory hashing. The tool fingerprints one file per run; the dedup logic is your comparison of the digests.

Property	Value	Notes
File-size limit (Free / Pro / Pro-media / Developer)	10 MB / 100 MB / 500 MB / 2 GB	Whole file is read into memory; a file over your cap is rejected before hashing
Files per run	1 (the first dropped file)	Fingerprint one at a time and collect the digests yourself
Output	JSON `{ sha1, sha256, sha512, md5 }`, lowercase hex	Copy, or download as `<filename>.hashes.json`
Options	None	No normalization — the raw bytes are hashed exactly as they are

Cookbook

Practical deduplication workflows. The tool produces one SHA-256 per file; deciding what's a duplicate is comparing those digests. CLI equivalents are shown so you can spot-check from a terminal.

Confirm two same-size photos are actually the same shot

Your camera roll has two 4.2 MB JPEGs that look identical. Same size isn't proof. Fingerprint each: matching SHA-256 means delete one with confidence; different SHA-256 means they're distinct files (maybe a burst or a re-edit) — keep both.

IMG_0421.jpg      -> sha256: 7d865e959b2466918c9863afca942d0f...
IMG_0421 (1).jpg  -> sha256: 7d865e959b2466918c9863afca942d0f...

Identical -> true duplicate, safe to delete one.

Terminal cross-check:
  sha256sum IMG_0421.jpg "IMG_0421 (1).jpg"

Build a hash index of a document folder in a spreadsheet

To dedup a dump of client documents, fingerprint each file and record filename + SHA-256 in two columns. Sorting on the hash column groups every duplicate together. One-of-a-kind files have a hash that appears exactly once.

filename             sha256
-------------------- ------------------------------------------
contract.pdf         9b74c9897bac770ffc029102a200c5de...
contract-copy.pdf    9b74c9897bac770ffc029102a200c5de...  <- dup
nda.pdf              0a0a9f2a6772942557ab5355d76af442...

Sort by sha256 -> the two matching rows are the duplicate.
Keep contract.pdf, delete contract-copy.pdf.

Catch a near-duplicate that is NOT a byte duplicate

A re-exported or re-compressed copy of an image looks the same to your eye but has different bytes, so its SHA-256 differs. The tool correctly reports them as distinct — content hashing finds exact duplicates, not visually-similar ones.

original.png   -> sha256: 2c26b46b68ffc68ff99b453c1d304134...
resized.png    -> sha256: fcde2b2edba56bf408601fb721fe9b5c...

Different -> NOT a byte duplicate (re-encoded/edited).
Keep both; a hash only matches exact copies.

Reconcile against a legacy asset manager keyed on MD5

An older DAM exported a manifest of MD5 content keys. Your job is to find which local files are already in it. Fingerprint each local file and match its md5 line against the manifest — the matching digest is already in the report.

Local file -> md5: e2fc714c4727ee9395f324cd2e7f331f

DAM manifest contains:
  e2fc714c4727ee9395f324cd2e7f331f  asset_88213

Match -> this file is already catalogued (a duplicate of asset_88213).
(Use SHA-256 for new dedup work; MD5 only to honor the old index.)

Verify the survivor is intact after deleting duplicates

After purging duplicate copies, re-fingerprint the file you kept and confirm its SHA-256 still equals what you recorded before cleanup. A match proves the survivor is whole; a mismatch means it was altered or truncated during the operation.

Before cleanup -> sha256: 7d865e959b2466918c9863afca942d0f...
After cleanup  -> sha256: 7d865e959b2466918c9863afca942d0f...

Unchanged -> survivor is intact.
Differ -> the kept copy was modified; restore from backup.

Edge cases and what actually happens

Two photos look identical but hash differently

By design

The tool hashes raw bytes, not pixels. A re-saved, re-compressed, resized, or re-encoded image has different bytes — and therefore a different SHA-256 — even if it looks the same on screen. Content hashing finds exact duplicates only. It will not group a JPEG and a PNG of the same scene, or two different JPEG quality settings.

Same content, but one file has extra metadata

Not a duplicate

If one copy carries EXIF, an ID3 tag, or an XMP block the other lacks, the bytes differ and so does the hash — they are not byte-identical. To compare the media content while ignoring metadata you'd first strip it (for images via gps-geotag-remover, for MP3 tags via audio-id3-ghoster) and then re-hash both.

Text files that look the same won't match

Expected

A CRLF (Windows) vs LF (Unix) line ending, a UTF-8 BOM, or a trailing newline added by an editor changes the bytes and flips the digest. "Looks the same" is not "is the same." If duplicate text files refuse to match, suspect line endings or encoding before assuming the hash is wrong.

Multiple files dropped at once

First file only

The dropzone is multi-select, but each run fingerprints files[0] only and returns one report. Drop and run per file to build your index, or use the server-safe runner path to script batch fingerprinting of a whole folder.

A file is larger than your tier's limit

Rejected: too large

The whole file is read into memory before hashing, so the cap is enforced up front: Free 10 MB, Pro 100 MB, Pro-media 500 MB, Developer 2 GB. A file over your cap is rejected with an "exceeds the … limit for your plan" error before any digest is computed.

You only need to compare two specific files

Use the paired tool

Manually collecting two hashes works, but file-integrity-monitor takes both files at once, tells you directly whether they're byte-identical, and reports the first differing byte offset when they aren't — faster than eyeballing two SHA-256 strings.

Empty files all share one hash

Expected

Every zero-byte file produces the same well-known digest (e3b0c4… for SHA-256, d41d8c… for MD5). That's correct — they really are byte-identical (both empty). Don't treat a folder of empty placeholders as a meaningful duplicate group.

No file dropped before running

Error: no file

There is no text-paste mode — the tool needs a file. Running with an empty dropzone throws "No file provided." Drop a file first, then run.

You want to detect tampering over time, not duplicates

Different goal

Dedup answers "are these two files the same now?" To answer "did this one file change since last week?" you need a saved baseline hash to compare against later — capture the SHA-256 now and re-fingerprint later, or use file-integrity-monitor to diff two copies.

Frequently asked questions

Why hash files instead of comparing names or sizes to find duplicates?

Names and dates lie — a true copy gets renamed and re-dated when you move or sync it, and an unrelated file can happen to share a name. File size is a slightly better hint but still not proof: two different files can be exactly the same size. A SHA-256 digest is derived from every byte, so identical digests guarantee identical contents and different digests guarantee different contents. It's the only signal that actually proves duplication.

Which hash should I use for deduplication?

SHA-256. It's long enough (64 hex chars) that two different real-world files matching by accident is effectively impossible, and it's the modern default. Use SHA-512 if you want extra assurance on critical archives — it reaches the same conclusion. Only key on MD5 or SHA-1 if you're reconciling against an existing index that already used those; all four digests are in every report.

Will it find photos that look the same but were edited or resized?

No — and that's correct behavior. Content hashing finds files that are byte-for-byte identical. A resized, re-compressed, cropped, or re-exported image has different bytes and therefore a different hash, even though it looks the same. This tool catches exact duplicates (the same file copied twice), not visually similar or perceptually-near images. For that you'd need a perceptual-hash tool, which this is not.

Two copies of my document won't match — what happened?

Some byte differs. Common culprits: one copy was opened and re-saved (rewriting metadata or compression), one has a different line ending or a BOM if it's text, or one carries embedded metadata the other doesn't. The contents may look identical to you while the bytes aren't. If you want to compare only the visible content of, say, two images, strip metadata first and re-hash.

Can I dedup a whole folder in one go?

Not in a single browser run — each run fingerprints the first dropped file only. Fingerprint each file and collect the digests into a spreadsheet, then sort on the hash column to surface duplicate groups. For true batch fingerprinting of many files at once, use the server-safe runner path, which scripts the same hashing without files leaving your machine.

Are my files uploaded when I hash them?

No. Each file is read into memory in your browser and digested locally with the Web Crypto API — nothing is uploaded. That's why you can safely dedup a private photo library, a client's confidential documents, or an unpublished dataset. The only server-side record for signed-in users is a usage counter, never file content.

Is a SHA-256 match ever a false positive?

For practical purposes, no. SHA-256 has 2^256 possible outputs; the chance of two different files you actually own colliding by accident is astronomically smaller than a hardware failure silently corrupting your comparison. A SHA-256 match means the files are the same bytes. (Deliberate adversarial collisions are a separate concern and aren't relevant to deduplicating your own files.)

Do empty files all count as duplicates of each other?

Yes, because they genuinely are byte-identical — every zero-byte file produces the same digest (e3b0c4… for SHA-256). That's not a bug. Just be aware that a pile of empty placeholder files will all collapse into one "duplicate group," which usually isn't what you care about.

Can I compare files in different folders or on different drives?

Yes — the hash doesn't depend on where a file lives. Fingerprint a file from one folder and a file from another (or from a USB drive, a download, a backup), and if the SHA-256 matches, they're identical regardless of path. Location, drive, and filesystem are irrelevant; only the bytes matter.

What's the difference between this and the file integrity monitor?

This tool gives you one file's fingerprint, which you then compare yourself against other fingerprints — ideal for building an index across many files. file-integrity-monitor is purpose-built for exactly two files: drop both and it tells you immediately whether they're byte-identical, plus the first byte offset where they differ. Use this for many-file dedup, the integrity monitor for a focused two-file comparison.

How big a file can I fingerprint for dedup?

Up to your tier's security file-size limit: Free 10 MB, Pro 100 MB, Pro-media 500 MB, Developer 2 GB. The whole file is read into memory before hashing — there's no streaming — so very large media files are bounded by both the tier cap and your browser's available memory. A file over the cap is rejected before any digest runs.

Can I automate dedup hashing in a script?

Yes. The fingerprinter is server-safe, so it runs through a paired @jadapps/runner without files leaving your machine. GET /api/v1/tools/multi-hash-fingerprinter returns the schema; POST each file to the local runner at http://127.0.0.1:9789/v1/tools/multi-hash-fingerprinter/run and collect the sha256 fields. The server-safe response also includes sizeBytes. Install the runner from /docs/runner.

Privacy first

Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.

How to detect duplicate files by sha-256 content fingerprint

Step 1
Decide which hash you'll dedup on — Use SHA-256 unless you're matching an existing index. It's the sha256 field in the report — a 64-hex-char string. SHA-256 is the right default: long enough that accidental collisions don't happen, fast enough for everyday files.
Step 2
Fingerprint the first candidate — Drop a file onto the dropzone and run. The bytes are read into memory and digested locally. Copy the sha256 value, or download the <filename>.hashes.json report, and record it next to the filename in a spreadsheet or note.
Step 3
Fingerprint each remaining candidate — Each run hashes the first dropped file only, so repeat per file — drop, run, record. Build up a list of filename -> sha256 pairs as you go. (For a guided two-file comparison instead of a list, see the cookbook.)
Step 4
Sort by hash to surface duplicate groups — Sort your filename -> sha256 list by the hash column. Any two rows with the same SHA-256 are byte-for-byte identical files — a duplicate group. Rows with a unique hash are one-of-a-kind.
Step 5
Keep one, delete the rest of each group — Within a duplicate group, keep whichever copy has the name/location you want and delete the others — they are exact byte copies, so nothing is lost. Files with a unique hash are not duplicates; leave them alone.
Step 6
Re-fingerprint after cleanup to confirm — After deleting, fingerprint the survivor again and confirm its SHA-256 is unchanged. A matching digest proves you kept an intact copy and didn't accidentally truncate or alter it during the cleanup.

Why filename, size, and date can't prove a duplicate — but the hash can

Common signals people rely on for dedup, and why only a content hash is decisive.

Signal	Same value means…	Reliable for dedup?
Filename	Nothing — copies get renamed; different files share names	No — `copy (1).jpg` could be identical or unrelated
File size	Possibly the same bytes — but two different files can match by size	No — a same-size pair is a candidate, not a confirmed duplicate
Modified date	Nothing — copying or syncing rewrites timestamps	No — a true copy often has a brand-new date
SHA-256 digest	Byte-for-byte identical contents	Yes — a match is a true duplicate; a mismatch is genuinely different

Reading the report for dedup

Every run returns all four digests. For deduplication, the SHA-256 line is the one to key on; the others are there for matching legacy indexes.

Report field	Length	Use for dedup
`sha256`	64 hex chars	Primary key — sort/group on this to find duplicates
`md5`	32 hex chars	Only if reconciling against an older asset manager that indexed on MD5
`sha1`	40 hex chars	Only if matching a system that stored SHA-1 content keys
`sha512`	128 hex chars	Higher-assurance dedup of critical archives; same conclusion as SHA-256

Limits and scope

Per-file, in-memory hashing. The tool fingerprints one file per run; the dedup logic is your comparison of the digests.

Property	Value	Notes
File-size limit (Free / Pro / Pro-media / Developer)	10 MB / 100 MB / 500 MB / 2 GB	Whole file is read into memory; a file over your cap is rejected before hashing
Files per run	1 (the first dropped file)	Fingerprint one at a time and collect the digests yourself
Output	JSON `{ sha1, sha256, sha512, md5 }`, lowercase hex	Copy, or download as `<filename>.hashes.json`
Options	None	No normalization — the raw bytes are hashed exactly as they are

Cookbook

Practical deduplication workflows. The tool produces one SHA-256 per file; deciding what's a duplicate is comparing those digests. CLI equivalents are shown so you can spot-check from a terminal.

Confirm two same-size photos are actually the same shot

IMG_0421.jpg      -> sha256: 7d865e959b2466918c9863afca942d0f...
IMG_0421 (1).jpg  -> sha256: 7d865e959b2466918c9863afca942d0f...

Identical -> true duplicate, safe to delete one.

Terminal cross-check:
  sha256sum IMG_0421.jpg "IMG_0421 (1).jpg"

Build a hash index of a document folder in a spreadsheet

filename             sha256
-------------------- ------------------------------------------
contract.pdf         9b74c9897bac770ffc029102a200c5de...
contract-copy.pdf    9b74c9897bac770ffc029102a200c5de...  <- dup
nda.pdf              0a0a9f2a6772942557ab5355d76af442...

Sort by sha256 -> the two matching rows are the duplicate.
Keep contract.pdf, delete contract-copy.pdf.

Catch a near-duplicate that is NOT a byte duplicate

original.png   -> sha256: 2c26b46b68ffc68ff99b453c1d304134...
resized.png    -> sha256: fcde2b2edba56bf408601fb721fe9b5c...

Different -> NOT a byte duplicate (re-encoded/edited).
Keep both; a hash only matches exact copies.

Reconcile against a legacy asset manager keyed on MD5

Local file -> md5: e2fc714c4727ee9395f324cd2e7f331f

DAM manifest contains:
  e2fc714c4727ee9395f324cd2e7f331f  asset_88213

Match -> this file is already catalogued (a duplicate of asset_88213).
(Use SHA-256 for new dedup work; MD5 only to honor the old index.)

Verify the survivor is intact after deleting duplicates

Before cleanup -> sha256: 7d865e959b2466918c9863afca942d0f...
After cleanup  -> sha256: 7d865e959b2466918c9863afca942d0f...

Unchanged -> survivor is intact.
Differ -> the kept copy was modified; restore from backup.

Edge cases and what actually happens

Two photos look identical but hash differently

By design

Same content, but one file has extra metadata

Not a duplicate

Text files that look the same won't match

Expected

Multiple files dropped at once

First file only

A file is larger than your tier's limit

Rejected: too large

You only need to compare two specific files

Use the paired tool

Empty files all share one hash

Expected

No file dropped before running

Error: no file

There is no text-paste mode — the tool needs a file. Running with an empty dropzone throws "No file provided." Drop a file first, then run.

You want to detect tampering over time, not duplicates

Different goal

Frequently asked questions

Why hash files instead of comparing names or sizes to find duplicates?

Which hash should I use for deduplication?

Will it find photos that look the same but were edited or resized?

Two copies of my document won't match — what happened?

Can I dedup a whole folder in one go?

Are my files uploaded when I hash them?

Is a SHA-256 match ever a false positive?

Do empty files all count as duplicates of each other?

Can I compare files in different folders or on different drives?

What's the difference between this and the file integrity monitor?

How big a file can I fingerprint for dedup?

Can I automate dedup hashing in a script?

Privacy first

Every JAD Security operation runs entirely in your browser. Files, passwords, and PGP private keys never leave your device — verified by zero outbound network requests during processing.

Detect Duplicate Files by SHA-256 Content Fingerprint

How to detect duplicate files by sha-256 content fingerprint

Why filename, size, and date can't prove a duplicate — but the hash can

Reading the report for dedup

Limits and scope

Cookbook

Confirm two same-size photos are actually the same shot

Build a hash index of a document folder in a spreadsheet

Catch a near-duplicate that is NOT a byte duplicate

Reconcile against a legacy asset manager keyed on MD5

Verify the survivor is intact after deleting duplicates

Edge cases and what actually happens

Two photos look identical but hash differently

Same content, but one file has extra metadata

Text files that look the same won't match

Multiple files dropped at once

A file is larger than your tier's limit

You only need to compare two specific files

Empty files all share one hash

No file dropped before running

You want to detect tampering over time, not duplicates

Frequently asked questions

Why hash files instead of comparing names or sizes to find duplicates?

Which hash should I use for deduplication?

Will it find photos that look the same but were edited or resized?

Two copies of my document won't match — what happened?

Can I dedup a whole folder in one go?

Are my files uploaded when I hash them?

Is a SHA-256 match ever a false positive?

Do empty files all count as duplicates of each other?

Can I compare files in different folders or on different drives?

What's the difference between this and the file integrity monitor?

How big a file can I fingerprint for dedup?

Can I automate dedup hashing in a script?

Privacy first

Related guides

Detect Duplicate Files by SHA-256 Content Fingerprint

How to detect duplicate files by sha-256 content fingerprint

Why filename, size, and date can't prove a duplicate — but the hash can

Reading the report for dedup

Limits and scope

Cookbook

Confirm two same-size photos are actually the same shot

Build a hash index of a document folder in a spreadsheet

Catch a near-duplicate that is NOT a byte duplicate

Reconcile against a legacy asset manager keyed on MD5

Verify the survivor is intact after deleting duplicates

Edge cases and what actually happens

Two photos look identical but hash differently

Same content, but one file has extra metadata

Text files that look the same won't match

Multiple files dropped at once

A file is larger than your tier's limit

You only need to compare two specific files

Empty files all share one hash

No file dropped before running

You want to detect tampering over time, not duplicates

Frequently asked questions

Why hash files instead of comparing names or sizes to find duplicates?

Which hash should I use for deduplication?

Will it find photos that look the same but were edited or resized?

Two copies of my document won't match — what happened?

Can I dedup a whole folder in one go?

Are my files uploaded when I hash them?

Is a SHA-256 match ever a false positive?

Do empty files all count as duplicates of each other?

Can I compare files in different folders or on different drives?

What's the difference between this and the file integrity monitor?

How big a file can I fingerprint for dedup?

Can I automate dedup hashing in a script?

Privacy first

Related guides