Duplicate File Detector Online Free — SHA-256 Inside a ZIP

How to find duplicate files inside an archive for free

Step 1
Open the Duplicate File Detector — Go to redundancy-analyzer. It is a Pro-tier tool, so sign in on a Pro plan or higher — Free accounts cannot run it (the archive family Free cap is 50 MB / 500 entries / 1 file, and this tool's minimum tier is Pro).
Step 2
Drop in a single archive — Drag one archive onto the drop zone. This tool reads ONE archive at a time — it is not a batch tool and does not accept folders. Supported inputs are detected by magic bytes: ZIP, GZIP, TAR, tar.gz, 7z, RAR, bz2, xz, tar.bz2, tar.xz and ISO.
Step 3
Set the Top-N groups slider — The only control is a range slider labelled Top-N groups (pairLimit), from 10 to 500 in steps of 10, default 100. It caps how many duplicate groups the report returns — the groups with the most wasted bytes are kept. Leave it at 100 for most archives; raise it if you suspect many small duplicate sets.
Step 4
Run the analysis — The tool extracts every entry, computes a SHA-256 digest of each, and builds a map from digest to file list. Groups with two or more members are duplicates. Directory entries (paths ending in /) are skipped and never hashed.
Step 5
Read the JSON report — Output is JSON with totalEntries, duplicateGroups, totalWastedBytes, totalWastedHuman, and a groups array. Each group has its hash, count, perFileSize, wastedBytes, and a files list of {name, size}. The summary metrics panel shows Duplicate groups and Wasted.
Step 6
Act on the findings with a sibling tool — This tool only reports — it never edits your archive. To actually drop the redundant copies, use selective-extractor to pull only the files you want and re-zip with folder-to-zip, or compare two builds with archive-diff.

What you can drop in

Formats the analyzer can read, the engine that handles each, and how it is detected. All reading is browser-side; libarchive formats are read-only.

Input format	Engine	Detected by	Notes
ZIP (`.zip`)	fflate (or zip.js if encrypted)	Magic `50 4B` (PK)	Directory entries (trailing `/`) are skipped and never hashed
GZIP (`.gz`)	fflate	Magic `1F 8B`	Single-member stream — yields exactly one inner file, so duplicates need a multi-file container
TAR (`.tar`)	fflate (tar parser)	Header offset check	Plain uncompressed tar; many entries, ideal for dedup analysis
tar.gz / tar.bz2 / tar.xz	fflate (gz) / libarchive (bz2, xz)	Outer compression magic	Decompresses then walks the inner tar's entries
7z (`.7z`)	libarchive WASM	Magic `37 7A` etc.	Read-only — the analyzer never writes 7z, it only inspects
RAR (`.rar`)	libarchive WASM	RAR signature	Read-only inspection; encrypted RAR cannot be analyzed (no password input)
bz2 / xz / ISO	libarchive WASM	Magic bytes	ISO walks the disc image's files; bz2/xz are single-stream like gz

The Top-N groups slider (the only option)

The analyzer exposes exactly one control. Values shown are the real min/max/step/default from the option schema.

Property	Value	Effect
UI control	Range slider, label "Top-N groups"	Drag to set how many groups are returned; a number badge shows the current value
Schema name	`pairLimit`	Maps to `opts.pairLimit` in the processor
Minimum	10	Smallest report — only the 10 biggest-waste groups
Maximum	500	Largest report this tool will return in one pass
Step	10	Slider snaps in increments of 10
Default	100	Used when you do not touch the slider

Tier limits for the archive family

Per-archive caps from lib/tier-limits.ts. Note the entry-count cap, not just file size. This tool's minimum tier is Pro, so Free cannot run it.

Tier	Max archive size	Max entries	Files per run	Can run this tool?
Free	50 MB	500	1	No — tool requires Pro
Pro	500 MB	50,000	20	Yes (1 archive at a time here)
Pro-media	2 GB	500,000	100	Yes
Developer	2 GB	500,000	Unlimited	Yes

Cookbook

Real-world reports from typical archives. Output is trimmed JSON; sizes are illustrative but the shape and fields are exactly what the tool returns.

Vendored logo duplicated across theme folders

A site export ZIP shipped the same 240 KB logo into three theme directories. Names differ, bytes are identical, so all three share one SHA-256 and form a single group with two wasted copies.

Input: site-export.zip (3,140 entries)

Report (excerpt):
{
  "totalEntries": 3140,
  "duplicateGroups": 1,
  "totalWastedBytes": 491520,
  "totalWastedHuman": "480.0 KB",
  "groups": [
    {
      "hash": "9f2c...",
      "count": 3,
      "perFileSize": 245760,
      "wastedBytes": 491520,
      "files": [
        {"name":"themes/aurora/logo.png","size":245760},
        {"name":"themes/dusk/assets/logo.png","size":245760},
        {"name":"themes/noir/img/logo.png","size":245760}
      ]
    }
  ]
}

.DS_Store noise across a Mac-zipped project

macOS Finder scatters a .DS_Store into many folders. They are not all identical (each can differ), but the empty ones collapse into one group with zero wasted bytes — useful to confirm before pruning.

Input: project-mac.zip

Report (excerpt):
{
  "duplicateGroups": 2,
  "totalWastedBytes": 0,
  "totalWastedHuman": "0 B",
  "groups": [
    {
      "hash": "e3b0c442...",  // SHA-256 of zero bytes
      "count": 12,
      "perFileSize": 0,
      "wastedBytes": 0,
      "files": [ /* 12 empty placeholder files */ ]
    }
  ]
}

Note: empty files all share the SHA-256 of the empty string,
so they group together but cost no space (wastedBytes 0).

Tightening a noisy report with the slider

An archive with thousands of tiny duplicate sets returns a huge report at the default 100. Drop the slider to 10 to focus only on the biggest-waste groups for a quick cleanup decision.

Same archive, two runs:

pairLimit = 100 (default):
  duplicateGroups: 100  (capped)  totalWastedHuman: "31.4 MB"

pairLimit = 10:
  duplicateGroups: 10   (capped)  totalWastedHuman: "27.9 MB"

The top 10 groups already account for ~89% of the waste —
slider lets you ignore the long tail of trivial duplicates.

Backup snapshot overlap

A tar.gz holding two daily snapshots of the same tree is mostly redundant. The analyzer surfaces every unchanged file as a duplicate pair, showing how little actually changed between snapshots.

Input: backups-2026-06.tar.gz (two snapshot trees)

Report summary:
{
  "totalEntries": 18402,
  "duplicateGroups": 100,        // capped at pairLimit
  "totalWastedBytes": 612843776,
  "totalWastedHuman": "584.4 MB"
}

Most entries are byte-identical across the two days —
strong case for incremental backups instead of full snapshots.

Confirming a clean release archive

Before publishing, run the analyzer on the release ZIP. A clean build should report zero duplicate groups — if it does not, something got vendored twice.

Input: release-v2.4.0.zip

Report:
{
  "totalEntries": 842,
  "duplicateGroups": 0,
  "totalWastedBytes": 0,
  "totalWastedHuman": "0 B",
  "groups": []
}

Empty groups array = no byte-identical files. Ship it.

Edge cases and what actually happens

Encrypted ZIP entries

Rejected

The analyzer extracts with no password (it calls the extractor without one), so an archive with any encrypted entry throws "Archive contains encrypted entries... Provide a password to extract." There is no password input on this tool. Decrypt first with multi-format-extractor (which accepts a password) and analyze the plain output.

Two files, same name, different bytes

By design

These are NOT grouped. Grouping is purely by SHA-256 of content, so same-name-different-content files have different digests and stay separate. Only byte-identical files ever share a group.

Empty files all match

Expected

Every zero-byte file produces the same SHA-256 (the digest of the empty string), so all empty files in the archive collapse into one group with perFileSize: 0 and wastedBytes: 0. They are duplicates by definition but cost no space.

Single-stream archive (gz/bz2/xz)

By design

A bare .gz, .bz2 or .xz decompresses to exactly one inner file, so there is nothing to compare and the report shows zero duplicate groups. Use a multi-file container (zip, tar, 7z) to find duplicates.

Directory entries in the archive

Skipped

Entries whose paths end in / (folder markers) are not hashed and never appear in any group. Only real files are compared.

Archive over the tier cap

Rejected

Pro allows 500 MB and 50,000 entries per archive; Pro-media and Developer allow 2 GB and 500,000 entries. An archive past your tier's size or entry cap is rejected before analysis. Split it with archive-splitter or upgrade.

Corrupt or unrecognized archive

Failed

If the bytes do not match any known signature and a last-resort ZIP read fails, you get "Could not detect or extract archive format." Verify the file with archive-integrity-tester first; repair a damaged ZIP with corrupted-zip-repair.

Free-tier account

Blocked

This tool's minimum tier is Pro. Free accounts cannot run it at all, regardless of archive size. Upgrade to Pro to use the Duplicate File Detector.

Very large entry counts

Supported

Hashing is done one entry at a time in the browser; large archives are CPU-bound, so a 50,000-entry archive takes noticeably longer than a small one. It still completes — there is no per-entry timeout — just give the tab time and avoid backgrounding it.

Frequently asked questions

Does the tool match by filename or by content?

By content only. It computes a SHA-256 digest of each entry's exact bytes and groups files whose digests are equal. Filenames, paths, and timestamps are ignored when deciding duplicates.

What hash does it use?

SHA-256, via the browser's built-in crypto.subtle.digest. SHA-256 collisions are computationally infeasible, so identical digests mean identical bytes.

Is anything uploaded?

No. Extraction and hashing run entirely in your browser using fflate, zip.js, and libarchive WASM. Your archive never leaves your machine.

What formats can I analyze?

ZIP, GZIP, TAR, tar.gz, tar.bz2, tar.xz, 7z, RAR, bz2, xz, and ISO. ZIP/GZIP/TAR use fflate; 7z/RAR/bz2/xz/ISO use libarchive WASM (read-only).

Can it remove the duplicates for me?

No — it only reports. To delete redundant copies, extract just the files you want with selective-extractor and re-zip, or compare builds with archive-diff.

What does wastedBytes mean?

For a group, wastedBytes = perFileSize x (count - 1) — the space you would recover by keeping one copy and removing the rest. The report sums these into totalWastedBytes / totalWastedHuman.

Why are my empty files all in one group?

Every zero-byte file has the same SHA-256, so they all match. The group shows perFileSize: 0 and wastedBytes: 0 — they are duplicates but free.

What is the Top-N groups slider?

It caps how many duplicate groups the report returns (10 to 500, default 100, in steps of 10). The highest-waste groups are kept; the rest are dropped.

Can I analyze an encrypted ZIP?

No. This tool runs the extractor without a password, so encrypted entries cause an error. Decrypt first with multi-format-extractor (which has a password field), then analyze the result.

How big an archive can I use?

Pro: up to 500 MB and 50,000 entries. Pro-media and Developer: up to 2 GB and 500,000 entries. The tool requires at least Pro.

Does it process folders or multiple archives at once?

No. It reads one archive per run and does not accept folders or batches. For batch extraction see batch-extraction-manager.

Where do I act on the results?

Pair it with selective-extractor (to keep only wanted files), folder-to-zip (to re-pack), archive-diff (to compare two archives), or archive-size-analyzer (to see size by path).

Privacy first

Every JAD Archive tool runs entirely in your browser using fflate, @zip.js/zip.js, and the libarchive WASM bridge. Your archives never leave your device — verified by zero outbound network requests during processing.

How to find duplicate files inside an archive for free

Step 1
Open the Duplicate File Detector — Go to redundancy-analyzer. It is a Pro-tier tool, so sign in on a Pro plan or higher — Free accounts cannot run it (the archive family Free cap is 50 MB / 500 entries / 1 file, and this tool's minimum tier is Pro).
Step 2
Drop in a single archive — Drag one archive onto the drop zone. This tool reads ONE archive at a time — it is not a batch tool and does not accept folders. Supported inputs are detected by magic bytes: ZIP, GZIP, TAR, tar.gz, 7z, RAR, bz2, xz, tar.bz2, tar.xz and ISO.
Step 3
Set the Top-N groups slider — The only control is a range slider labelled Top-N groups (pairLimit), from 10 to 500 in steps of 10, default 100. It caps how many duplicate groups the report returns — the groups with the most wasted bytes are kept. Leave it at 100 for most archives; raise it if you suspect many small duplicate sets.
Step 4
Run the analysis — The tool extracts every entry, computes a SHA-256 digest of each, and builds a map from digest to file list. Groups with two or more members are duplicates. Directory entries (paths ending in /) are skipped and never hashed.
Step 5
Read the JSON report — Output is JSON with totalEntries, duplicateGroups, totalWastedBytes, totalWastedHuman, and a groups array. Each group has its hash, count, perFileSize, wastedBytes, and a files list of {name, size}. The summary metrics panel shows Duplicate groups and Wasted.
Step 6
Act on the findings with a sibling tool — This tool only reports — it never edits your archive. To actually drop the redundant copies, use selective-extractor to pull only the files you want and re-zip with folder-to-zip, or compare two builds with archive-diff.

What you can drop in

Formats the analyzer can read, the engine that handles each, and how it is detected. All reading is browser-side; libarchive formats are read-only.

Input format	Engine	Detected by	Notes
ZIP (`.zip`)	fflate (or zip.js if encrypted)	Magic `50 4B` (PK)	Directory entries (trailing `/`) are skipped and never hashed
GZIP (`.gz`)	fflate	Magic `1F 8B`	Single-member stream — yields exactly one inner file, so duplicates need a multi-file container
TAR (`.tar`)	fflate (tar parser)	Header offset check	Plain uncompressed tar; many entries, ideal for dedup analysis
tar.gz / tar.bz2 / tar.xz	fflate (gz) / libarchive (bz2, xz)	Outer compression magic	Decompresses then walks the inner tar's entries
7z (`.7z`)	libarchive WASM	Magic `37 7A` etc.	Read-only — the analyzer never writes 7z, it only inspects
RAR (`.rar`)	libarchive WASM	RAR signature	Read-only inspection; encrypted RAR cannot be analyzed (no password input)
bz2 / xz / ISO	libarchive WASM	Magic bytes	ISO walks the disc image's files; bz2/xz are single-stream like gz

The Top-N groups slider (the only option)

The analyzer exposes exactly one control. Values shown are the real min/max/step/default from the option schema.

Property	Value	Effect
UI control	Range slider, label "Top-N groups"	Drag to set how many groups are returned; a number badge shows the current value
Schema name	`pairLimit`	Maps to `opts.pairLimit` in the processor
Minimum	10	Smallest report — only the 10 biggest-waste groups
Maximum	500	Largest report this tool will return in one pass
Step	10	Slider snaps in increments of 10
Default	100	Used when you do not touch the slider

Tier limits for the archive family

Per-archive caps from lib/tier-limits.ts. Note the entry-count cap, not just file size. This tool's minimum tier is Pro, so Free cannot run it.

Tier	Max archive size	Max entries	Files per run	Can run this tool?
Free	50 MB	500	1	No — tool requires Pro
Pro	500 MB	50,000	20	Yes (1 archive at a time here)
Pro-media	2 GB	500,000	100	Yes
Developer	2 GB	500,000	Unlimited	Yes

Cookbook

Real-world reports from typical archives. Output is trimmed JSON; sizes are illustrative but the shape and fields are exactly what the tool returns.

Vendored logo duplicated across theme folders

A site export ZIP shipped the same 240 KB logo into three theme directories. Names differ, bytes are identical, so all three share one SHA-256 and form a single group with two wasted copies.

Input: site-export.zip (3,140 entries)

Report (excerpt):
{
  "totalEntries": 3140,
  "duplicateGroups": 1,
  "totalWastedBytes": 491520,
  "totalWastedHuman": "480.0 KB",
  "groups": [
    {
      "hash": "9f2c...",
      "count": 3,
      "perFileSize": 245760,
      "wastedBytes": 491520,
      "files": [
        {"name":"themes/aurora/logo.png","size":245760},
        {"name":"themes/dusk/assets/logo.png","size":245760},
        {"name":"themes/noir/img/logo.png","size":245760}
      ]
    }
  ]
}

.DS_Store noise across a Mac-zipped project

Input: project-mac.zip

Report (excerpt):
{
  "duplicateGroups": 2,
  "totalWastedBytes": 0,
  "totalWastedHuman": "0 B",
  "groups": [
    {
      "hash": "e3b0c442...",  // SHA-256 of zero bytes
      "count": 12,
      "perFileSize": 0,
      "wastedBytes": 0,
      "files": [ /* 12 empty placeholder files */ ]
    }
  ]
}

Note: empty files all share the SHA-256 of the empty string,
so they group together but cost no space (wastedBytes 0).

Tightening a noisy report with the slider

An archive with thousands of tiny duplicate sets returns a huge report at the default 100. Drop the slider to 10 to focus only on the biggest-waste groups for a quick cleanup decision.

Same archive, two runs:

pairLimit = 100 (default):
  duplicateGroups: 100  (capped)  totalWastedHuman: "31.4 MB"

pairLimit = 10:
  duplicateGroups: 10   (capped)  totalWastedHuman: "27.9 MB"

The top 10 groups already account for ~89% of the waste —
slider lets you ignore the long tail of trivial duplicates.

Backup snapshot overlap

A tar.gz holding two daily snapshots of the same tree is mostly redundant. The analyzer surfaces every unchanged file as a duplicate pair, showing how little actually changed between snapshots.

Input: backups-2026-06.tar.gz (two snapshot trees)

Report summary:
{
  "totalEntries": 18402,
  "duplicateGroups": 100,        // capped at pairLimit
  "totalWastedBytes": 612843776,
  "totalWastedHuman": "584.4 MB"
}

Most entries are byte-identical across the two days —
strong case for incremental backups instead of full snapshots.

Confirming a clean release archive

Before publishing, run the analyzer on the release ZIP. A clean build should report zero duplicate groups — if it does not, something got vendored twice.

Input: release-v2.4.0.zip

Report:
{
  "totalEntries": 842,
  "duplicateGroups": 0,
  "totalWastedBytes": 0,
  "totalWastedHuman": "0 B",
  "groups": []
}

Empty groups array = no byte-identical files. Ship it.

Edge cases and what actually happens

Encrypted ZIP entries

Rejected

Two files, same name, different bytes

By design

These are NOT grouped. Grouping is purely by SHA-256 of content, so same-name-different-content files have different digests and stay separate. Only byte-identical files ever share a group.

Empty files all match

Expected

Single-stream archive (gz/bz2/xz)

By design

Directory entries in the archive

Skipped

Entries whose paths end in / (folder markers) are not hashed and never appear in any group. Only real files are compared.

Archive over the tier cap

Rejected

Corrupt or unrecognized archive

Failed

Free-tier account

Blocked

This tool's minimum tier is Pro. Free accounts cannot run it at all, regardless of archive size. Upgrade to Pro to use the Duplicate File Detector.

Very large entry counts

Supported

Frequently asked questions

Does the tool match by filename or by content?

By content only. It computes a SHA-256 digest of each entry's exact bytes and groups files whose digests are equal. Filenames, paths, and timestamps are ignored when deciding duplicates.

What hash does it use?

SHA-256, via the browser's built-in crypto.subtle.digest. SHA-256 collisions are computationally infeasible, so identical digests mean identical bytes.

Is anything uploaded?

No. Extraction and hashing run entirely in your browser using fflate, zip.js, and libarchive WASM. Your archive never leaves your machine.

What formats can I analyze?

ZIP, GZIP, TAR, tar.gz, tar.bz2, tar.xz, 7z, RAR, bz2, xz, and ISO. ZIP/GZIP/TAR use fflate; 7z/RAR/bz2/xz/ISO use libarchive WASM (read-only).

Can it remove the duplicates for me?

No — it only reports. To delete redundant copies, extract just the files you want with selective-extractor and re-zip, or compare builds with archive-diff.

What does wastedBytes mean?

For a group, wastedBytes = perFileSize x (count - 1) — the space you would recover by keeping one copy and removing the rest. The report sums these into totalWastedBytes / totalWastedHuman.

Why are my empty files all in one group?

Every zero-byte file has the same SHA-256, so they all match. The group shows perFileSize: 0 and wastedBytes: 0 — they are duplicates but free.

What is the Top-N groups slider?

It caps how many duplicate groups the report returns (10 to 500, default 100, in steps of 10). The highest-waste groups are kept; the rest are dropped.

Can I analyze an encrypted ZIP?

No. This tool runs the extractor without a password, so encrypted entries cause an error. Decrypt first with multi-format-extractor (which has a password field), then analyze the result.

How big an archive can I use?

Pro: up to 500 MB and 50,000 entries. Pro-media and Developer: up to 2 GB and 500,000 entries. The tool requires at least Pro.

Does it process folders or multiple archives at once?

No. It reads one archive per run and does not accept folders or batches. For batch extraction see batch-extraction-manager.

Where do I act on the results?

Pair it with selective-extractor (to keep only wanted files), folder-to-zip (to re-pack), archive-diff (to compare two archives), or archive-size-analyzer (to see size by path).

Find Duplicate Files Inside an Archive for Free

How to find duplicate files inside an archive for free

What you can drop in

The Top-N groups slider (the only option)

Tier limits for the archive family

Cookbook

Vendored logo duplicated across theme folders

.DS_Store noise across a Mac-zipped project

Tightening a noisy report with the slider

Backup snapshot overlap

Confirming a clean release archive

Edge cases and what actually happens

Encrypted ZIP entries

Two files, same name, different bytes

Empty files all match

Single-stream archive (gz/bz2/xz)

More groups than the slider allows

Directory entries in the archive

Archive over the tier cap

Corrupt or unrecognized archive

Free-tier account

Very large entry counts

Frequently asked questions

Does the tool match by filename or by content?

What hash does it use?

Is anything uploaded?

What formats can I analyze?

Can it remove the duplicates for me?

What does wastedBytes mean?

Why are my empty files all in one group?

What is the Top-N groups slider?

Can I analyze an encrypted ZIP?

How big an archive can I use?

Does it process folders or multiple archives at once?

Where do I act on the results?

Privacy first

Related guides

Find Duplicate Files Inside an Archive for Free

How to find duplicate files inside an archive for free

What you can drop in

The Top-N groups slider (the only option)

Tier limits for the archive family

Cookbook

Vendored logo duplicated across theme folders

.DS_Store noise across a Mac-zipped project

Tightening a noisy report with the slider

Backup snapshot overlap

Confirming a clean release archive

Edge cases and what actually happens

Encrypted ZIP entries

Two files, same name, different bytes

Empty files all match

Single-stream archive (gz/bz2/xz)

More groups than the slider allows

Directory entries in the archive

Archive over the tier cap

Corrupt or unrecognized archive

Free-tier account

Very large entry counts

Frequently asked questions

Does the tool match by filename or by content?

What hash does it use?

Is anything uploaded?

What formats can I analyze?

Can it remove the duplicates for me?

What does wastedBytes mean?

Why are my empty files all in one group?

What is the Top-N groups slider?

Can I analyze an encrypted ZIP?

How big an archive can I use?

Does it process folders or multiple archives at once?

Where do I act on the results?

Privacy first

Related guides