How to batch compression report for automation & sre teams
- Step 1Stage the archives locally — Pull the backup or artefact ZIPs to your machine (from the backup vault, the artefact store, or a host you have access to). The tool reads from disk via the File API — nothing transits the network during analysis.
- Step 2Confirm you're on a Pro plan — Batch analysis is a Pro-minimum tool. SRE workstations on Free will see an upgrade prompt; Pro raises the per-file cap to 500 MB and allows 20 files per batch, Pro-media 2 GB and 100 files, Developer unlimited count.
- Step 3Drop the whole stack — Open /archive-tools/batch-compression-report and drop every archive at once. There are no options to tune — the report runs with fixed behaviour, which keeps audits reproducible.
- Step 4Read the bloat signals — Sort the CSV by
sizeBytesto find the largest backups, byratioPercentto find the ones that barely compressed (often already-compressed payloads), and checklargestEntryto see which single file dominates each archive. - Step 5Flag the exceptions — Filter for
encrypted=trueto verify encryption policy, and forformat=errorto find archives that failed to read — those need a closer look (corruption, truncation, or a wrong format). - Step 6Attach the CSV to the audit record — Download
batch-compression-report.csvand attach it to the ticket, runbook, or evidence store. It's plain CSV with a header — no JAD wrapper — so it ingests into Splunk, a sheet, or a pipeline as-is.
What the report tells an SRE per archive
Mapping each CSV column to the operational question it answers during a backup or artefact audit.
| Column | Operational question it answers | Note for non-ZIP backups |
|---|---|---|
format | Is this really a ZIP, or a renamed 7z/tar.gz? | Detected by magic bytes for all types |
sizeBytes | How much storage does this backup consume? | Always populated |
compressedSize | How much of the file is compressed payload? | 0 for non-ZIP containers |
entryCount | How many files are inside? | 0 for non-ZIP containers |
ratioPercent | Did this actually compress, or is it dead weight? | — for non-ZIP containers |
largestEntry / largestEntrySize | Which file dominates the backup? | Empty for non-ZIP containers |
encrypted | Is this backup password-protected? | false for non-ZIP containers |
error | Did this archive fail to read? | Holds the failure reason |
Plan limits for a fleet-scale audit
Per-file caps and batch counts. For backups beyond 2 GB, use a host-side CLI loop; the browser report handles everything up to the cap.
| Plan | Per-file cap | Entries per archive | Files per batch |
|---|---|---|---|
| Free | Tool not available (Pro required) | — | — |
| Pro | 500 MB | 50,000 | 20 |
| Pro-media | 2 GB | 500,000 | 100 |
| Developer | 2 GB | 500,000 | Unlimited |
| Enterprise | Unlimited | Unlimited | Unlimited |
SRE workflow pairings
Tools to chain before or after the batch report for a complete audit. All run in-browser, no upload.
| Need | Tool | When in the workflow |
|---|---|---|
| Tamper-evidence manifest | /archive-tools/checksum-generator | After — SHA over the audited files |
| Per-entry ratios in one archive | /archive-tools/compression-ratio-calculator | Drill into a flagged backup |
| By-extension bloat breakdown | /archive-tools/file-type-breakdown | Explain a large archive |
| Cipher type of an encrypted backup | /archive-tools/encrypted-archive-detector | When encrypted=true |
| Confirm a misnamed file's type | /archive-tools/auto-format-detector | When format surprises you |
Cookbook
Audit scenarios from real SRE and automation work, with the CSV rows they produce.
A week of nightly backup ZIPs
The bread-and-butter audit: pull seven nightly ZIPs, drop them, and read the trend. Stable sizes and ratios mean the backup job is healthy.
Dropped: backup-mon.zip … backup-sun.zip (7 files) filename,format,sizeBytes,compressedSize,entryCount,ratioPercent,largestEntry,largestEntrySize,encrypted,error backup-mon.zip,zip,512000000,489000000,1240,4.5%,db.dump,498073600,false, backup-tue.zip,zip,513200000,490100000,1241,4.5%,db.dump,499122176,false, ... Trend: ~512 MB nightly, ~4.5% ratio (db.dump is already compressed). A sudden jump in sizeBytes or drop in entryCount is your signal to investigate.
Build-artefact bloat hunt
CI keeps producing fat artefacts. The report shows which ones barely compress and what's inside them, so you know where to trim.
Dropped: artifact-1234.zip … artifact-1240.zip artifact-1238.zip,zip,318767104,261000000,920,18.1%,node_modules.tar,201326592,false, Reading it: largestEntry is a bundled node_modules tar eating 192 MB. That's the bloat. Follow up with /archive-tools/file-type-breakdown to confirm the extension mix before changing the build.
Encryption-policy verification across a fleet
Policy says all off-host backups must be encrypted. The encrypted column makes the audit a one-glance filter.
Filter batch-compression-report.csv for encrypted: host-a-backup.zip,zip,...,true, ← compliant host-b-backup.zip,zip,...,false, ← VIOLATION: unencrypted backup host-c-backup.zip,zip,...,true, host-b is the exception. Confirm the cipher on the compliant ones with /archive-tools/encrypted-archive-detector (ZipCrypto is not acceptable).
Mixed backup formats from different hosts
Some hosts emit ZIP, some emit tar.gz. The report identifies both; only the ZIPs get entry-level numbers.
host-a.zip,zip,209715200,180000000,640,14.2%,app.log,52428800,false, host-b.tar.gz,gz,157286400,0,0,—,,0,false, host-c.tar.gz,gz,167772160,0,0,—,,0,false, Reading it: ZIP host gives full stats; tar.gz hosts give format+size only. For tar.gz internals, list them host-side with tar -tzvf in a loop.
Audit with a tamper-evident manifest
Two-part audit artefact: the size/ratio report plus a checksum manifest, so the evidence is both descriptive and verifiable.
Step 1: Batch Compression Report → batch-compression-report.csv
Step 2: /archive-tools/checksum-generator over the same files →
SHA-256 manifest
Attach both to the audit ticket. The CSV says what each archive is;
the manifest proves the bytes didn't change between audit and storage.Edge cases and what actually happens
tar.gz / 7z backups show no entry stats
By designPer-entry numbers are read from the ZIP central directory. A .tar.gz or .7z backup is detected and named in the format column but shows 0 entries and a — ratio. The size and format are still useful for the audit; for internal detail of those formats, list them host-side with the matching CLI.
Corrupt backup pulled from a flaky host
errorA truncated or partially-transferred backup throws while reading and lands as an error row with the message — the rest of the audit still completes. An error row is itself a finding: it usually means the backup transfer failed or the archive is damaged. Try /archive-tools/corrupted-zip-repair to recover what's salvageable.
Backup larger than 2 GB
400 too largePer-file caps are 500 MB (Pro) and 2 GB (Pro-media/Developer). A monolithic multi-GB backup exceeds the cap and can't be processed in-browser. For those, run zipinfo/tar -t on the host that holds the file — the browser tool is for the under-2 GB majority.
Free-tier SRE workstation
Pro requiredBatch analysis requires a Pro plan; Free shows an upgrade prompt. Provision analyst accounts at Pro or above. Pro-media is worth it if your backups routinely exceed 500 MB, since it raises the per-file cap to 2 GB and the batch count to 100.
Encrypted backup, no password to hand
SupportedThe report reads the central directory, not the encrypted payload, so it flags encrypted=true and still reports sizes and counts without any password. It never decrypts. This is exactly what you want for a policy audit — confirm encryption without holding the secret.
Backups spread across many subfolders
ManualThe tool analyses the files you drop, not a directory tree. Gather the archives into one selection first, or use a host-side find to collect them. There is no recursive folder discovery in the browser report.
Thousands of archives in one audit
PartialBatch counts are 20 (Pro) / 100 (Pro-media) / unlimited (Developer). For fleet-wide audits of thousands of archives, a server-side loop is the practical path; use the browser report for per-host spot checks and exception triage.
No programmatic API for scheduling
Not availableThe batch report is an interactive page with no API, so it can't be wired into a nightly cron. For scheduled audits, script it host-side with fflate or zipinfo. The browser tool fits the human-in-the-loop triage moment, not the automated pipeline.
ZIP64 backup (very large or 65,535+ entries)
PartialThe parser reads 32-bit EOCD fields. Backups whose entry count or sizes depend on the ZIP64 extension may report short counts or sizes. For backups in that range, verify host-side.
Misnamed backup (e.g. .bak that's a ZIP)
SupportedFormat detection ignores the extension and reads magic bytes, so a .bak that is really a ZIP is reported with full ZIP stats, and a .zip that's really a 7z is labelled 7z. The audit reflects what the file actually is, not what someone named it.
Frequently asked questions
Is this suitable for auditing backups across a server fleet?
Yes, for the per-host and exception-triage parts. Pull a host's backups locally and drop them — they're never uploaded, so the audited boundary doesn't move. For fleet-wide automation across thousands of archives, pair it with a host-side script; the browser report covers spot checks and one-off audits.
Will the backup contents be uploaded anywhere?
No. Analysis runs entirely in your browser via WebAssembly. Backup bytes never reach a server — verify with DevTools → Network. This keeps the workflow inside the same data-handling boundary as running a local CLI.
How do I find which backups aren't compressing well?
Sort the CSV by ratioPercent ascending. Archives near 0% are usually full of already-compressed data (gzipped dumps, media, nested archives), so re-zipping won't help. Those are candidates for a different storage strategy rather than more compression.
Can I verify encryption policy with this?
Yes — filter the CSV for encrypted=true. Any false among archives that should be protected is a policy violation. To confirm the cipher is acceptable (AES, not legacy ZipCrypto), follow up with /archive-tools/encrypted-archive-detector.
What happens if a backup is corrupt?
It gets a row with format=error and the reason in the error column, then the audit continues. An error row is a finding in itself — often a failed transfer. Send the file to /archive-tools/corrupted-zip-repair to see what can be recovered.
Does it work on tar.gz backups?
It detects and names them (format=gz) and reports sizeBytes, but entry stats come only from ZIP central directories, so tar.gz rows show 0 entries and a — ratio. For internal detail, list them host-side with tar -tzvf.
Is there a size limit per backup?
Yes — 500 MB on Pro, 2 GB on Pro-media and Developer, per file. Beyond 2 GB, analyse on the host. The total across a batch is bounded by browser memory; counts are 20 / 100 / unlimited by plan.
Can multiple analysts run audits at once?
Yes — each browser tab is an independent instance with no shared state. Plan limits apply per account/session. There's nothing to contend over.
Does it meet regulated-environment requirements (HIPAA/PCI/etc.)?
Because nothing transits the network, using the tool doesn't move your regulated boundary — it's equivalent to running a local CLI. Confirm with your compliance team that local browser processing matches your policy; most treat it the same as a local utility.
Can I schedule this as a nightly audit?
Not via the tool itself — there's no API, so it can't be cron-driven. For scheduled audits, script it host-side (fflate in Node, or zipinfo/7z l loops). The browser report is for interactive triage and ad-hoc audits.
What output do I attach to a ticket?
The downloaded batch-compression-report.csv — plain CSV with a header row, no JAD-specific wrapper. It opens in Excel/Sheets and ingests into Splunk or any pipeline. For tamper-evidence, also attach a SHA manifest from /archive-tools/checksum-generator.
How is this different from an enterprise backup tool?
Enterprise platforms add SSO, retention policy, and centralised storage. This is a focused, privacy-first audit utility for the analyst's moment: drop archives, get a CSV, no upload. Many teams use both — the platform for storage, this for quick audits and exception checks.
Privacy first
Every JAD Archive tool runs entirely in your browser using fflate, @zip.js/zip.js, and the libarchive WASM bridge. Your archives never leave your device — verified by zero outbound network requests during processing.