How to excel sharedstrings.xml too large: how to find and fix it
- Step 1Run the Weight Analyzer on the workbook — Upload the
.xlsx. The analyzer reads every ZIP entry, includingxl/sharedStrings.xml, and reports its decompressed size in both theshared_stringscategory line and the Top 10 file list. - Step 2Confirm shared_strings is the dominant category — If
shared_stringsis at or near the top with a high percent, the string table is your problem. If it's small, your bloat is elsewhere — checkworksheet_data,styles, orembedded_mediainstead. - Step 3Find the high-cardinality text columns — The table grows with UNIQUE strings. Look for columns of long descriptions, URLs, GUIDs/UUIDs, free-text comments, or IDs stored as text — anything where almost every value differs.
- Step 4Convert numeric-looking text IDs to real numbers — If a column holds digit-only IDs stored as text (left-aligned, green-triangle warning), converting them to numbers moves them out of the string table entirely — numbers are stored inline. Use the Format Inspector to find them.
- Step 5Move verbose reference text to a lookup file — Long descriptions or URLs needed only for reference can live in a separate workbook, leaving the working file with short codes. Fewer unique strings = a smaller table.
- Step 6Save and re-run to confirm the shrink — Excel only rewrites
sharedStrings.xmlon save. After converting/trimming, save, then re-upload to the analyzer — theshared_stringsline should drop.
What goes into sharedStrings.xml — and what never does
The shared string table holds only text. Storage type decides whether a value adds to shared_strings. This is why converting text-IDs to numbers shrinks the table.
| Cell value type | Stored where | Adds to shared_strings? |
|---|---|---|
Text ("Approved", descriptions, URLs) | xl/sharedStrings.xml (one entry per UNIQUE string) | Yes |
| Repeated text (same value in many cells) | One table entry; each cell stores an index | Once per unique value (efficient) |
| Unique text (GUIDs, free-text, long notes) | Near one entry per cell | Yes — heavily (worst case) |
Numbers (42, 3.14) | Inline in the sheet XML | No |
| Dates / times (serial numbers) | Inline in the sheet XML (as numbers) | No |
Booleans (TRUE/FALSE) | Inline in the sheet XML | No |
| Formula results | Cached inline in the sheet XML | No (the formula text isn't in sharedStrings) |
Errors (#N/A, #REF!) | Inline in the sheet XML | No |
The eleven categories the analyzer buckets your XLSX into
An .xlsx is a ZIP. The analyzer assigns every internal file to one bucket by matching a substring in its path. These are the exact category labels you see in the report, plus the path each matches.
| Category label | Matches path containing | What lives there |
|---|---|---|
embedded_media | media/ (i.e. xl/media/) | Pasted/inserted images: PNG, JPEG, GIF, EMF stored at full resolution |
worksheet_data | worksheets/ (xl/worksheets/sheet*.xml) | Cell values, formulas, AND conditional-formatting rules per sheet |
shared_strings | sharedStrings | xl/sharedStrings.xml — the dedup table of every unique text string |
styles | styles | xl/styles.xml — cell formats, fonts, fills, number formats, cellXfs |
vba_macro | vbaProject | xl/vbaProject.bin — compiled VBA macro project (.xlsm only) |
drawings | drawings | xl/drawings/*.xml — shape/image anchors and drawing geometry |
charts | charts | xl/charts/*.xml — embedded chart definitions and cached series |
external_links | externalLinks | xl/externalLinks/* — links to other workbooks and their cached values |
data_connections | connections | xl/connections.xml — Power Query / external data connection defs |
document_properties | docProps | docProps/core.xml, app.xml, custom.xml — title, author, named-range counts |
other | anything else (fallback) | [Content_Types].xml, workbook.xml (named ranges live here), _rels/, theme |
Tier limits for the Excel family (real numbers)
The analyzer runs entirely in your browser, but the upload size and row caps are enforced by tier. Weight Analyzer itself requires at least Pro.
| Tier | Max file size | Row cap | Files at once |
|---|---|---|---|
| Free | 5 MB | 10,000 rows | 1 (and Weight Analyzer is gated to Pro+) |
| Pro | 50 MB | 100,000 rows | 5 |
| Pro-media | 200 MB | 500,000 rows | 20 |
| Developer | 500 MB | unlimited rows | unlimited |
Cookbook
Weight Analyzer output focused on the shared_strings line and the xl/sharedStrings.xml entry. Sizes are decompressed KB. The fix notes show the levers that actually move that number.
GUIDs stored as text dominate the table
An export with a UUID per row turned every cell into a unique string. 38,000 rows, 38,000 unique GUIDs — shared_strings ate the file.
Total workbook size: 9216.0 KB (compressed), 31744.6 KB (decompressed) By category: shared_strings 27136.2 KB (85%) worksheet_data 3072.1 KB (10%) styles 980.4 KB (3%) other 555.9 KB (2%) Top 10 largest files: xl/sharedStrings.xml 27136.2 KB Reading: ~38k unique GUID strings. If the GUID is only a join key you don't display, drop it or keep it in a lookup file -> table collapses.
Numeric IDs stored as text — convert them out
A 12-digit order-number column was stored as text (to preserve leading zeros). Stored as text it lives in the string table; converted to numbers it moves inline. The trade-off: you lose leading zeros, so only convert true integers.
BEFORE (IDs as text)
shared_strings 14336.0 KB (61%)
xl/sharedStrings.xml 14336.0 KB
AFTER (column converted to real numbers via paste-special
multiply-by-1, then re-save)
shared_strings 3584.0 KB (28%)
worksheet_data 6144.0 KB (49%) <- numbers now inline
Caveat: '00451' becomes 451. If leading zeros matter,
keep as text and accept the table cost.Free-text notes column is the whole problem
A support log had a multi-sentence Notes column. Every note differs, so each is a unique string. The analyzer pins it on shared_strings, ruling out images and macros.
Total workbook size: 5120.0 KB (compressed), 18432.7 KB (decompressed) By category: shared_strings 15360.3 KB (83%) worksheet_data 2048.1 KB (11%) styles 1024.3 KB (6%) Top 10 largest files: xl/sharedStrings.xml 15360.3 KB Fix: archive resolved tickets to a separate file; the active workbook keeps only open notes.
Repeated text is cheap — proving the table works
A 200,000-row status sheet with only six distinct status words. Despite the row count, shared_strings is tiny — because the table stores each unique word once. This is the analyzer confirming you DON'T have a string problem.
Total workbook size: 7168.0 KB (compressed), 44032.0 KB (decompressed) By category: worksheet_data 42000.5 KB (95%) shared_strings 780.2 KB (2%) <- only 6 unique strings styles 820.1 KB (2%) other 431.2 KB (1%) Reading: bloat is the sheet XML (200k rows of indices), not the string table. Don't touch sharedStrings here.
Orphaned strings linger after deleting rows
Deleting rows doesn't immediately purge their entries from the table. The analyzer can show a sharedStrings.xml larger than the live data implies until a clean re-save compacts it.
After deleting 50k rows but BEFORE a full save+reopen: shared_strings 11264.0 KB (still high) xl/sharedStrings.xml 11264.0 KB After Save -> Close -> Reopen -> Save -> re-run: shared_strings 3072.0 KB Lesson: Excel compacts the table on a clean re-save, not on delete. Always re-save before judging the result.
Edge cases and what actually happens
shared_strings is small despite huge row count
ExpectedIf text repeats (status codes, categories), the table stores each unique value once, so a 200k-row sheet can have a tiny shared_strings. The bloat is then worksheet_data (the indices), not the table. The analyzer correctly attributes it.
Numeric-looking IDs counted as strings
ExpectedDigits stored as text (to keep leading zeros) live in sharedStrings.xml, not inline. That's why they inflate shared_strings. Converting to real numbers removes them — but destroys leading zeros, so only do it for genuine integers.
Deleted rows still in the string table
Lingers until re-saveExcel doesn't purge a string entry the instant you delete its last referencing cell. The analyzer may show a stale, oversized sharedStrings.xml until you Save -> Close -> Reopen -> Save, which compacts the table.
Dates appear to be in the table but aren't
Counted as numbersDates and times are serial numbers stored inline in the sheet XML, never in sharedStrings.xml. If a 'date' column inflates shared_strings, it's stored as text, not as real dates — fix the cell type, and it moves to worksheet_data.
Two sheets, one shared table
Workbook-wideThere is exactly one xl/sharedStrings.xml for the whole workbook. The analyzer reports it under shared_strings as a single number; it cannot split the table by sheet. To find which sheet drives unique text, inspect each sheet's text columns manually.
shared_strings small but file still huge
Look elsewhereThe string table isn't always the villain. If shared_strings is a low percent, the analyzer is telling you the bloat is embedded_media, styles, or vba_macro. Follow the top category, not the assumption.
inlineStr cells (rare exports)
Counted under worksheet_dataSome non-Excel generators write text as inline strings inside the sheet XML instead of using the shared table. Then sharedStrings.xml is small/absent and the text weight shows up in worksheet_data. The analyzer reports where the bytes physically are.
Whitespace/encoding inflates unique-string count
AvoidableTrailing spaces or NBSP differences make otherwise-identical text count as distinct strings, bloating the table. Normalising whitespace first reduces unique entries. (Excel text trimming lives in the CSV Whitespace Trimmer for exported data.)
Encrypted workbook
Cannot read (encrypted)An encrypted .xlsx is an OLE/CFB container, not a plain ZIP, so JSZip can't open it and there's no sharedStrings measurement. Remove the open-password in Excel and re-upload.
.xlsb binary workbook
Limited / unsupportedBinary .xlsb stores strings in .bin parts, not xl/sharedStrings.xml, so the shared_strings line won't reflect them. Re-save as .xlsx for a meaningful shared-string measurement.
Frequently asked questions
What exactly is the shared string table?
It's xl/sharedStrings.xml — a single workbook-wide table holding every unique text string. Each text cell stores a numeric index into it instead of the text itself. It saves space when text repeats and costs space when text is mostly unique.
Why is mine so large?
Because your text is high-cardinality: lots of unique values such as GUIDs, URLs, long descriptions, free-text notes, or numeric IDs stored as text. Each unique string is a separate table entry, so near-unique columns make the table explode.
Do numbers and dates go into the shared string table?
No. Numbers, dates (serial numbers), times, booleans, formula results, and errors are all stored inline in the sheet XML. Only text goes into sharedStrings.xml. That's why converting text-IDs to numbers shrinks the table.
How do I shrink it?
Reduce unique strings: convert numeric text-IDs to real numbers, archive verbose reference text to a lookup file, normalise whitespace so near-duplicates collapse to one entry, and remove unused text columns. Then save and re-run the analyzer.
Does deleting rows reduce sharedStrings.xml?
Not immediately. Excel compacts the table on a clean re-save, not on delete. After deleting, Save -> Close -> Reopen -> Save, then re-check — otherwise you'll see orphaned entries still counted.
Why does converting text IDs to numbers help?
Text lives in the shared string table; numbers live inline in the sheet XML. Converting a text-ID column to real numbers physically removes those values from sharedStrings.xml. The catch: leading zeros are lost, so only convert true integers.
Can the analyzer tell me which sheet's text is biggest?
Not directly — there's one workbook-wide table, reported as a single shared_strings number. To attribute unique text to a sheet, inspect each sheet's text columns yourself; the analyzer reports the table as a whole.
Is shared_strings always the cause of a big file?
No. It's one of eleven categories. If the analyzer shows shared_strings at a low percent, your bloat is elsewhere — typically embedded_media, styles, or vba_macro. Follow the top line of the breakdown.
My dates are inflating shared_strings — why?
Because they're stored as text, not as real dates. Real dates are inline numbers and never touch the table. Convert the column to a proper date type and that weight moves from shared_strings to worksheet_data.
Does the analyzer modify my workbook to fix the table?
No — it only measures and reports (outputType: report). The shrinking is done by you in Excel (convert types, archive text) and confirmed by re-running the analyzer. It's a diagnostic, not an editor.
What tier and formats do I need?
Pro or above, and .xlsx/.xlsm files only (not CSV — CSV has no shared-string table). Pro allows 50 MB uploads; Pro-media 200 MB; Developer 500 MB.
Is my text uploaded when I analyse?
No. JSZip unzips and measures the file in your browser. The contents of sharedStrings.xml — your descriptions, notes, identifiers — never leave your machine.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.