How to duplicate finder vs cli tools (fdupes, rdfind, unzip -l)
- Step 1Decide: one-off audit or repeatable automation? — For a one-off look at a single archive, use the browser tool. For nightly dedup across a whole filesystem with automatic deletion or hardlinking, use rdfind/fdupes in a script — the analyzer is report-only and runs interactively.
- Step 2For the browser path, open the analyzer — Go to redundancy-analyzer (Pro tier) and drop one archive. No extraction, no temp files.
- Step 3Pick the result size — Set the Top-N groups slider (10–500, default 100). CLI dedupers report everything; the analyzer caps to the highest-waste groups, which is usually what you want for a decision.
- Step 4Read the hash-grouped JSON — Each group lists the SHA-256, the count, per-file size, wasted bytes, and the file paths — the same content-identity fdupes computes, but already extracted from the archive.
- Step 5Cross-check sizes if you were using unzip -l / 7z l — If you previously eyeballed
7z loutput, remember equal sizes do not mean equal content. The analyzer's hash grouping is the authoritative answer the listing never gave you. - Step 6Act on the result — The analyzer reports only. To delete duplicates in the browser flow, extract the keepers with selective-extractor and re-pack; the CLI tools can delete or hardlink in place instead.
Duplicate Finder vs the common CLI options
What each tool compares and where it runs. fdupes/rdfind require extracting the archive first; listing tools do not compare content at all.
| Tool | Compares by | Reads archive directly? | Removes dupes? | Runs where |
|---|---|---|---|---|
| JAD Duplicate Finder | SHA-256 of entry bytes | Yes — no extract step | No (report only) | Your browser, no upload |
| fdupes -r | Size then byte/MD5 compare | No — extract first | Yes (delete/prompt) | Local shell |
| rdfind | Size then SHA-1/SHA-256 | No — extract first | Yes (delete/hardlink) | Local shell |
| jdupes | Size then byte compare | No — extract first | Yes (delete/link) | Local shell |
| unzip -l / 7z l / tar -tvf | Nothing (lists names + sizes) | Yes (lists only) | No | Local shell |
When to use which
Pick by privacy need, automation need, and whether you want content-true grouping.
| Need | Best choice | Why |
|---|---|---|
| Quick one-off audit of a single archive | JAD Duplicate Finder | No extract, no install, ranked by savings |
| Sensitive archive, must not touch disk | JAD Duplicate Finder | Browser-only, no temp files, no upload |
| Nightly dedup across a filesystem | rdfind / fdupes in cron | Scriptable, in-place delete/hardlink |
| Just want the entry list and sizes | unzip -l / 7z l | Fast listing, but does NOT compare content |
| Archive over 2 GB or 500k entries | CLI dedupers | Browser tool is capped by tier limits |
Browser-tool limits vs CLI
Real tier caps for the analyzer versus the effectively-unbounded CLI tools.
| Dimension | JAD Duplicate Finder | fdupes / rdfind |
|---|---|---|
| Max archive size | 500 MB (Pro) / 2 GB (Pro-media, Developer) | Disk-bound, effectively unlimited |
| Max entries | 50,000 (Pro) / 500,000 (higher tiers) | Unlimited |
| Groups returned | 10–500 (slider, default 100) | All |
| Hash algorithm | SHA-256 (fixed) | Configurable (MD5/SHA-1/SHA-256/byte) |
| Encrypted archives | Not supported (no password input) | Decrypt then run |
Cookbook
Side-by-side of what each approach actually tells you about the same archive.
unzip -l shows two same-size files; are they identical?
Listing tools cannot answer this. Two 12,288-byte entries might be identical or completely different. The analyzer settles it with SHA-256.
$ unzip -l bundle.zip | grep icon 12288 2026-06-01 09:14 a/icon.png 12288 2026-06-01 09:14 b/icon.png Same size != same content. Drop bundle.zip into the analyzer: duplicateGroups: 1 groups[0].hash: "7d1e..." groups[0].count: 2 -> they ARE byte-identical.
The classic extract-then-fdupes pipeline
The traditional CLI flow writes every file to disk first, which is exactly what the browser tool avoids.
# CLI: leaves a decompressed copy on disk $ mkdir tmp && unzip -q bundle.zip -d tmp $ fdupes -r tmp tmp/a/icon.png tmp/b/icon.png # Browser: same answer, no files written to disk drop bundle.zip -> analyzer -> 1 group, count 2
rdfind ranks nothing by savings; the analyzer does
rdfind output lists duplicates but you still have to total the savings yourself. The analyzer sorts groups by wasted bytes and gives a grand total.
# rdfind: you compute savings manually from results.txt $ rdfind -dryrun true tmp ... (flat list of duplicate sets) # Analyzer: ranked + totalled groups sorted by wastedBytes desc totalWastedHuman: "480.0 KB"
Multi-format archive without chaining binaries
A .7z would need 7-Zip; a .rar would need unrar; tar.xz needs xz + tar. The analyzer reads them all via libarchive WASM in one drop.
# CLI: different binary per format $ 7z l data.7z # needs p7zip $ unrar l data.rar # needs unrar $ tar -tJf data.tar.xz # Analyzer: one drop, any of these formats, hashes inside data.7z / data.rar / data.tar.xz -> duplicate report
When the CLI is the right answer
For a 4 GB archive or a whole-disk dedup with automatic hardlinking on a cron schedule, the browser tool's tier caps and report-only nature make the CLI the better fit.
# Beyond browser tier caps + needs in-place action: $ rdfind -makehardlinks true /data Use the analyzer for interactive, sensitive, single-archive audits; use rdfind/fdupes for big, automated, in-place jobs.
Edge cases and what actually happens
unzip -l / 7z l used as a dedup check
MisleadingListing tools show names and sizes only. Two entries with the same size are not necessarily identical, and they will not group them. Only a content hash (what the analyzer does) is authoritative.
fdupes on an unextracted archive
Failedfdupes/rdfind/jdupes compare files on disk; they cannot read inside an archive. You must extract first, which writes a decompressed copy. The analyzer reads the archive directly.
Encrypted archive
RejectedThe analyzer has no password input and errors on encrypted entries. A CLI flow can decrypt during extraction (with the password) and then run fdupes. For the browser path, decrypt first with multi-format-extractor.
Archive larger than the tier cap
RejectedPro tops out at 500 MB / 50,000 entries; higher tiers at 2 GB / 500,000. A multi-gigabyte archive belongs in a CLI pipeline, or split it first with archive-splitter.
You need automatic deletion
By designThe analyzer is report-only. fdupes/rdfind can delete or hardlink in place; the browser flow instead keeps wanted files via selective-extractor and re-packs.
Different hash algorithm required
FixedThe analyzer always uses SHA-256. CLI tools let you pick MD5/SHA-1/byte compare. For content identity SHA-256 is the strongest of these, so this is rarely a real limitation.
More duplicate groups than the slider cap
TruncatedThe analyzer returns at most the slider value (max 500) of the highest-waste groups; a CLI tool returns every set. For a complete enumeration on a huge archive, the CLI is better.
Same-size, different-content files
Handled correctlyBecause grouping is by SHA-256, the analyzer never falsely groups two same-size files that differ in content — the same correctness fdupes/rdfind give, and the trap unzip -l falls into.
Frequently asked questions
Is the analyzer as accurate as fdupes?
For content identity, yes — both compare actual bytes. The analyzer uses SHA-256; fdupes does size-then-byte/MD5. Both will only group truly identical files.
Why not just use unzip -l and compare sizes?
Equal size does not mean equal content. Two different 12 KB files have the same size but different bytes. Only a hash, like the analyzer's SHA-256, tells you they match.
Does the analyzer write files to disk like extract-then-fdupes?
No. It hashes entries in browser memory. Nothing is decompressed to disk and nothing is uploaded.
Can the analyzer delete duplicates like rdfind?
No, it only reports. Use selective-extractor to keep just the files you want, or use rdfind/fdupes for in-place deletion/hardlinking.
Which reads more formats?
The analyzer reads ZIP, 7z, RAR, tar.*, bz2, xz, and ISO in one tool via libarchive WASM. The CLI usually needs a separate binary per format before you can run fdupes.
What about very large archives?
The analyzer caps at 500 MB / 50,000 entries on Pro (2 GB / 500,000 on higher tiers). For larger jobs, a CLI deduper running on disk is the better fit.
Can I run it in CI?
The analyzer is an interactive browser tool, not a CLI binary, so for CI you would script fdupes/rdfind on extracted files instead.
Is it slower than fdupes?
It depends. The analyzer hashes every entry (no size pre-filter), while fdupes skips unique sizes early. For small/medium archives the difference is negligible; for huge ones the CLI's size pre-pass is faster.
Does it support encrypted archives like CLI tools?
No — there is no password input. Decrypt first (multi-format-extractor), then analyze.
Can I get every duplicate group, not just the top N?
Raise the Top-N slider to 500. If you genuinely have more than 500 groups, a CLI deduper that returns all sets is the better tool.
Which is more private?
The browser analyzer — it never uploads and never writes to disk. A CLI on a shared build server leaves extracted files behind unless you clean up.
Can it compare two separate archives?
Not this tool — it analyzes one archive. To diff two archives, use archive-diff.
Privacy first
Every JAD Archive tool runs entirely in your browser using fflate, @zip.js/zip.js, and the libarchive WASM bridge. Your archives never leave your device — verified by zero outbound network requests during processing.