How to identify the biggest contributors to excel file bloat
- Step 1Open the Workbook Weight Analyzer — It accepts
.xlsx(and.xlsm) files only — not.csv, since a CSV has no internal ZIP structure to weigh. Free tier cannot run it; the analyzer is gated to Pro and above. - Step 2Drop the oversized workbook on — JSZip unzips the file in your browser. No data leaves your machine. Larger files take a moment because every internal entry is decompressed to measure its true byte length.
- Step 3Read the header line first — It shows
Total workbook size: X KB (compressed), Y KB (decompressed). If Y is many times X, the file is text-heavy XML that ZIP squeezes well; if Y is close to X, it is full of already-compressed media. - Step 4Scan the By category block — Categories are ranked largest-first with a KB figure and a percent of the decompressed total. The top line is your prime suspect — usually
embedded_media,worksheet_data, orshared_strings. - Step 5Drill into the Top 10 largest files — This names the exact offending entry: a single
xl/media/image1.pngat 12 MB, orxl/worksheets/sheet3.xmlat 30 MB (a sign of whole-column conditional formatting, not data). - Step 6Apply the matching fix, then re-run — Compress or delete images in Excel; strip macros with the VBA Macro Stripper; purge notes with the Comment & Note Purger. Save and re-upload to confirm the drop.
The eleven categories the analyzer buckets your XLSX into
An .xlsx is a ZIP. The analyzer assigns every internal file to one bucket by matching a substring in its path. These are the exact category labels you see in the report, plus the path each matches.
| Category label | Matches path containing | What lives there |
|---|---|---|
embedded_media | media/ (i.e. xl/media/) | Pasted/inserted images: PNG, JPEG, GIF, EMF stored at full resolution |
worksheet_data | worksheets/ (xl/worksheets/sheet*.xml) | Cell values, formulas, AND conditional-formatting rules per sheet |
shared_strings | sharedStrings | xl/sharedStrings.xml — the dedup table of every unique text string |
styles | styles | xl/styles.xml — cell formats, fonts, fills, number formats, cellXfs |
vba_macro | vbaProject | xl/vbaProject.bin — compiled VBA macro project (.xlsm only) |
drawings | drawings | xl/drawings/*.xml — shape/image anchors and drawing geometry |
charts | charts | xl/charts/*.xml — embedded chart definitions and cached series |
external_links | externalLinks | xl/externalLinks/* — links to other workbooks and their cached values |
data_connections | connections | xl/connections.xml — Power Query / external data connection defs |
document_properties | docProps | docProps/core.xml, app.xml, custom.xml — title, author, named-range counts |
other | anything else (fallback) | [Content_Types].xml, workbook.xml (named ranges live here), _rels/, theme |
What the report actually prints (and what it does not)
The output is a plain-text diagnostic report — there is no download, no options panel, and the file is never modified.
| Report section | Shows | Note |
|---|---|---|
| Header line | Total workbook size: A KB (compressed), B KB (decompressed) | A is the on-disk ZIP size; B is the sum of every entry uncompressed |
| By category | Each non-empty category, its decompressed KB, and a % | The percentage is of the decompressed total, not the compressed file |
| Top 10 largest files | Exact internal path + decompressed KB for the 10 biggest entries | By individual file — e.g. xl/media/image1.png or xl/worksheets/sheet1.xml |
| findings (JSON) | Compressed + decompressed totals, per-category map, top 20 files | Powers any signed-in dashboard view; same numbers, machine-readable |
| Options | None — needsOptions: false, the option schema is empty | There is nothing to configure; upload is the only input |
| Output file | None — outputType: report | Diagnostic only; use a cleanup tool for the actual fix |
Tier limits for the Excel family (real numbers)
The analyzer runs entirely in your browser, but the upload size and row caps are enforced by tier. Weight Analyzer itself requires at least Pro.
| Tier | Max file size | Row cap | Files at once |
|---|---|---|---|
| Free | 5 MB | 10,000 rows | 1 (and Weight Analyzer is gated to Pro+) |
| Pro | 50 MB | 100,000 rows | 5 |
| Pro-media | 200 MB | 500,000 rows | 20 |
| Developer | 500 MB | unlimited rows | unlimited |
Cookbook
Real Weight Analyzer reports (numbers rounded). The header line, category block, and top-10 list are exactly what the tool prints; sizes are decompressed KB unless the line says compressed.
The classic: one pasted screenshot is 80% of the file
A status-report workbook ballooned to 47 MB on disk. Three rows of summary numbers, but someone pasted a full-resolution dashboard screenshot into a header cell. The analyzer makes it obvious in two lines.
Total workbook size: 47104.0 KB (compressed), 49210.4 KB (decompressed) By category: embedded_media 39680.2 KB (81%) worksheet_data 5120.6 KB (10%) styles 2210.1 KB (4%) shared_strings 980.4 KB (2%) other 1219.1 KB (3%) Top 10 largest files: xl/media/image1.png 38912.0 KB xl/worksheets/sheet1.xml 4980.2 KB xl/styles.xml 2210.1 KB Fix: open in Excel -> select image -> Picture Format -> Compress Pictures -> 'E-mail (96 ppi)'. Re-run: ~7 MB.
Compressed vs decompressed: a 'small' file that is huge inside
A 6 MB workbook that crawled when opened. On disk it looks fine; decompressed it is 41 MB of worksheet XML — the ZIP just compresses repetitive XML extremely well. The gap between the two header numbers is the tell.
Total workbook size: 6144.0 KB (compressed), 41984.7 KB (decompressed) By category: worksheet_data 38010.3 KB (91%) shared_strings 2560.8 KB (6%) styles 820.4 KB (2%) other 593.2 KB (1%) Top 10 largest files: xl/worksheets/sheet1.xml 37800.1 KB Reading: 6 MB on disk, 42 MB live in memory. The single sheet XML is the bloat -> almost always whole-column conditional formatting or formatting on millions of empty cells, NOT the visible data.
Macro project hiding in an .xlsm
An .xlsm that should be tiny carried an inherited macro library copied from an old template. vba_macro only appears when vbaProject.bin exists, so seeing it at all is the signal.
Total workbook size: 3328.0 KB (compressed), 9216.5 KB (decompressed) By category: vba_macro 6144.0 KB (67%) worksheet_data 2048.2 KB (22%) shared_strings 700.1 KB (8%) styles 324.2 KB (3%) Top 10 largest files: xl/vbaProject.bin 6144.0 KB Fix: if you don't use the macros, run the VBA Macro Stripper. It also lets you save as .xlsx and drop vbaProject.bin entirely.
Death by a thousand cuts — no single dominant category
Sometimes nothing is 70%. A 22 MB merged-region-heavy workbook spread its weight across styles, shared strings, and many sheet XMLs. The breakdown tells you it is structural, not one bad object.
Total workbook size: 22528.0 KB (compressed), 58400.9 KB (decompressed) By category: worksheet_data 24010.4 KB (41%) styles 15120.7 KB (26%) shared_strings 12880.3 KB (22%) other 6389.5 KB (11%) Reading: huge styles.xml = thousands of distinct cell formats (cellXfs explosion). Standardise formatting, clear unused styles, and consolidate sheets. No single image or macro to delete here.
Before / after a cleanup pass
Run the analyzer, fix the top offender, run it again. The header line is your scoreboard. Here a 53 MB file dropped to 4 MB after image compression and a macro strip.
BEFORE Total workbook size: 53248.0 KB (compressed), 55100.2 KB (decompressed) embedded_media 44032.0 KB (80%) vba_macro 6144.0 KB (11%) AFTER (compress images + strip VBA, save as .xlsx) Total workbook size: 4096.0 KB (compressed), 11200.4 KB (decompressed) worksheet_data 7168.3 KB (64%) shared_strings 2560.1 KB (23%) styles 1024.0 KB (9%) Net: 53 MB -> 4 MB on disk.
Edge cases and what actually happens
CSV uploaded instead of XLSX
Rejected (wrong format)The analyzer accepts .xlsx/.xlsm only. A CSV is plain text with no internal ZIP, so there is nothing to weigh — categories like shared_strings and embedded_media simply do not exist in a CSV. The tool's acceptsFormats is ["xlsx"].
Free tier tries to run it
Blocked (Pro required)Weight Analyzer is gated to Pro and above (minTier: pro). On Free you'll be prompted to upgrade. The Pro Excel limits are 50 MB / 100,000 rows / 5 files; the row cap is irrelevant here since the tool reads file structure, not rows.
File over the tier size cap
Rejected (size limit)Upload is capped per tier: 50 MB on Pro, 200 MB on Pro-media, 500 MB on Developer. A 120 MB workbook on Pro is refused before analysis — and ironically those are exactly the files you want analysed. Upgrade or split the workbook first.
Decompressed total far exceeds the on-disk size
ExpectedBy design. The header shows both numbers. Repetitive worksheet XML compresses 5-10x, so a 6 MB file routinely decompresses to 40 MB+. That gap is normal and is itself diagnostic: a large gap means XML/text bloat, a small gap means already-compressed media.
Percentages add to 99% or 101%
Expected (rounding)Each category percent is Math.round(bytes / decompressedTotal * 100). Independent rounding can make the column sum to 99 or 101. The KB figures are the precise values; treat the percentages as a quick relative guide.
Named ranges suspected of bloat
Counted under otherNamed ranges live in xl/workbook.xml, which has no dedicated bucket, so they fall into other. The tool will not break out named ranges individually — workbook.xml is almost always tiny anyway, so this is rarely the real cause despite the myth.
The fix is needed, not just the diagnosis
By design (read-only)The analyzer never modifies or returns a file — it only reports. For the actual cleanup use the matching sibling: VBA Macro Stripper, Comment & Note Purger, or Hidden Sheet Destroyer.
Password-protected / encrypted workbook
Cannot read (encrypted)An encrypted .xlsx is wrapped in a compound (OLE/CFB) container, not a plain ZIP, so JSZip cannot open it and no breakdown is produced. Remove the open-password in Excel (File -> Info -> Protect Workbook -> Encrypt, clear the password), save, then re-upload.
Strict OOXML / .xlsb binary workbook
Limited / unsupportedA .xlsb (binary) workbook stores sheets as .bin, not .xml, so worksheet_data matches nothing and most data lands in other. The category breakdown is far less useful for .xlsb. Re-save as .xlsx for a meaningful report.
Two sheets share one big sharedStrings table
ExpectedsharedStrings is workbook-wide, not per sheet — there is one xl/sharedStrings.xml. So if two sheets both lean on long text, the analyzer attributes the whole table to shared_strings, not to a sheet. To find which sheet drives unique strings, inspect column content per sheet.
Frequently asked questions
What is the single most common cause of Excel bloat?
Embedded images. Excel stores pasted/inserted pictures at full resolution inside xl/media/, so a 12 MP screenshot adds ~8-12 MB. When embedded_media tops the breakdown, compress in Excel (Picture Format -> Compress Pictures) and re-run.
Does the Weight Analyzer change my file?
No. It is purely diagnostic — outputType is report. It unzips the workbook in memory, measures byte lengths, and prints a text report. There is no download and the original is untouched. For changes, use a dedicated cleanup tool.
Why does it show two different total sizes?
Compressed is the on-disk ZIP size (file.size); decompressed is the sum of every internal file's real byte length. XLSX is a ZIP, so the decompressed number is larger — often much larger for XML-heavy files. The gap tells you whether bloat is text or media.
Can it tell me which sheet is biggest?
Indirectly. The Top 10 list names files like xl/worksheets/sheet2.xml, and each sheet has its own XML, so a large sheet XML pinpoints that sheet. But shared strings and styles are workbook-wide, so a sheet's true footprint is split across buckets.
Does it work on CSV files?
No. CSV is flat text with no ZIP container, images, or shared-string table — there is nothing to break down. The analyzer accepts .xlsx/.xlsm only. For CSV size issues you're usually just looking at row count, which any text editor shows.
What tier do I need?
Pro or above. On Pro the Excel upload cap is 50 MB; Pro-media raises it to 200 MB and Developer to 500 MB. Choose the tier that fits the file you need to diagnose — large bloated files need the higher caps.
Is my data safe — does the file get uploaded?
Nothing is uploaded. JSZip runs in your browser and reads the ZIP locally. A confidential financial model or HR workbook never leaves your machine; the only thing recorded server-side (if you're signed in) is a processed-count for dashboard stats.
The breakdown shows everything in 'other' — what does that mean?
other is the fallback for anything that doesn't match a known path ([Content_Types].xml, workbook.xml, theme, _rels/). A large other usually means a .xlsb binary workbook or an unusual structure. Re-save as standard .xlsx for a sharper breakdown.
How accurate are the percentages?
They are rounded to whole numbers and computed against the decompressed total, so they may sum to 99 or 101. The KB figures beside them are exact — use those when you need precision; use the percentages to spot the dominant category at a glance.
It says embedded_media is huge but I deleted my images — why?
Deleting a picture in Excel removes the drawing anchor, but the image bytes in xl/media/ can linger until you fully save and, in stubborn cases, re-save. If embedded_media is still high after deletion, save, close, reopen, save again, then re-run the analyzer.
Can it open password-protected workbooks?
No. Encrypted .xlsx files are CFB/OLE containers, not plain ZIPs, so JSZip can't read them. Remove the encryption password in Excel first (File -> Info -> Protect Workbook -> Encrypt with Password, clear it), save, and re-upload.
After I find the bloat, what tool removes it?
Match the category to the tool: images -> Excel's Compress Pictures; vba_macro -> VBA Macro Stripper; comments/notes -> Comment & Note Purger; hidden sheets -> Hidden Sheet Destroyer; stale links -> External Link Auditor.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.