How to remove all cell comments from excel before publishing as open data
- Step 1Finish all data QA and disclosure-control checks — Complete cleaning, suppression, and review. Any comments added during these steps — exactly the sensitive ones — will be removed by the purge.
- Step 2Copy the dataset before purging — Keep your annotated working copy for the audit trail; publish a purged copy. The purge is permanent.
- Step 3Open the Comment & Note Purger (Pro tier) — Sign in to Pro or higher — Free is blocked. There are no options; the purge is deterministic and whole-workbook.
- Step 4Drop the .xlsx / .xlsm dataset onto the tool — Use the OOXML format. The engine deletes
xl/comments*,xl/threadedComments/*, andxl/persons/*and strips their references and content-type entries. - Step 5Record the parts-removed count for your release log — The
change(s)figure documents how many comment-related parts were removed — a useful line in a disclosure-control checklist. - Step 6Verify, then upload to the portal — Unzip
dataset-no-comments.xlsxand confirm there are noxl/comments/xl/personsentries, or check Excel's Review pane. Then publish.
Pre-publication residue check
What a public re-publisher could extract from a workbook, and whether the purger removes it. Comments are necessary but not the whole picture.
| Residue | Where it lives | Public risk | Purger removes? |
|---|---|---|---|
| Analyst / QA notes | xl/comments*.xml | Leaks internal judgements | Yes |
| Threaded discussion of data quality | xl/threadedComments/*.xml | Leaks decisions / disagreements | Yes |
| Commenter name + account email | xl/persons/person.xml | Names staff, exposes emails | Yes |
| Hidden / very-hidden sheets of raw data | extra worksheet parts | May expose unsuppressed microdata | No — use a hidden-sheet remover |
| Document author / agency metadata | docProps/core.xml, app.xml | Names individuals / systems | No — use a metadata wiper |
| Cell values, formulas | worksheet XML | (the published content) | Preserved (untouched) |
Open-data release sequence
Purging comments is one control. Combine with the related tools for a fully sanitised public file.
| Step | Tool | Removes | Tier |
|---|---|---|---|
| 1 | Comment & Note Purger (this tool) | QA notes, comments, author records | Pro |
| 2 | excel-hidden-sheet-destroyer | Hidden/very-hidden raw-data sheets | Developer |
| 3 | excel-app-metadata-wiper | Application + author metadata | Pro |
| 4 | office-doc-property-wiper | Core document properties | Pro |
| 5 | email-phone-scrubber | Emails/phones left in cells | varies |
Cookbook
Open-data and FOI scenarios where comment residue is the disclosure risk, shown at the file-part level.
A disclosure-control note that must not be public
During suppression review an analyst noted which cells were withheld and why. That reasoning would help an attacker reverse the suppression — it cannot ship.
Before: xl/comments1.xml -> 'suppressed B7:B9 (cell count < 5, re-id risk)' After purge: xl/comments1.xml deleted; the suppression logic is gone Published values still show the suppressed cells as blank/'-'.
Threaded discussion names the data provider's contact
A threaded comment recorded a back-and-forth with the data provider, including their staff member's email via the person record.
xl/threadedComments/threadedComment1.xml -> provider Q&A xl/persons/person.xml -> 'M. Okafor', m.okafor@provider.gov After purge: both parts deleted. No provider contact name or email in the public file.
Documenting a clean state for sign-off
Disclosure-control sign-offs benefit from a reproducible check. The ZIP inspection is your evidence the file shipped clean.
Release checklist evidence: $ unzip -l dataset-no-comments.xlsx | grep -Ei 'comments|persons' (no output) Purger result panel: "5 change(s)" -> attach to release log
Comments cleared, but hidden raw-data sheet still present
Purging comments doesn't remove a hidden tab of unsuppressed microdata — a separate, serious open-data risk. Chain the hidden-sheet remover.
After comment purge:
xl/comments* / persons removed (good)
but xl/worksheets/sheet4.xml ('RAW_microdata', state=hidden) remains
Next: /excel-tools/excel-hidden-sheet-destroyer to remove it
before the file goes on the portal.Author metadata still names the agency analyst
A clean comment purge can still leave 'Last Modified By' in the document properties. For public release, wipe that too.
After comment purge, docProps/core.xml still has: <dc:creator>a.analyst</dc:creator> <cp:lastModifiedBy>a.analyst</cp:lastModifiedBy> Next: /security-tools/office-doc-property-wiper to clear it.
Edge cases and what actually happens
QA note explains a suppression an attacker could reverse
Critical to removeDisclosure-control notes that describe why and where cells were suppressed are a re-identification risk if published. The purger removes all comment parts, eliminating this residue — but confirm the actual suppressed values are blanked in the cells, which is a separate data step.
Provider contact's email left in a person record
PII leak closedThreaded comments referencing a data provider store the author's name and email in xl/persons/. Deleting the comment in Excel may leave that record. The purger always removes xl/persons/, so no provider or staff email ships in the public file.
Comments purged but a hidden sheet of raw data remains
Separate riskA common open-data mistake is a hidden or very-hidden tab containing unsuppressed microdata. The comment purger does not remove sheets. Run excel-hidden-sheet-destroyer as well before publishing.
Document metadata still names the agency or analyst
Use a metadata wiperThe purger leaves docProps/core.xml (creator, last-modified-by) and app.xml (application/company) intact. For a public file, add excel-app-metadata-wiper and the office-doc-property-wiper to the release chain.
Emails or phone numbers sit in actual data cells
Out of scopeIf PII is in cell values (not comments), purging comments won't touch it. That's the correct behaviour — published data cells are preserved. Use the email-phone-scrubber for PII inside cells.
Free tier account
Rejected — Pro requiredThe purger is Pro-gated. Free accounts are rejected before processing. Government/research teams should run it under a Pro plan or higher.
Dataset is published as a legacy .xls
Rejected (wrong format)Binary .xls doesn't store comments as the OOXML parts this tool reads. Re-save as .xlsx first, purge, then publish the modern format (which is also better for open-data accessibility).
Very large national dataset over the tier ceiling
Rejected — size limitLarge datasets can exceed Pro's 50 MB limit and are rejected before processing. Use Pro-media (200 MB) or Developer (500 MB / unlimited rows). Splitting the file is unnecessary just for purging.
Frequently asked questions
Are Excel comments part of open-data metadata standards?
No. Cell comments and notes are internal annotations, not dataset metadata (which lives in a separate catalogue record or data dictionary). They should always be removed before publication — they only carry internal commentary, never something a data consumer needs.
What sensitive content do comments typically leak in open data?
Disclosure-control reasoning ('suppressed because cell count < 5'), data-quality caveats, provider correspondence, and — via threaded-comment author records — the names and account emails of the analysts and contacts involved. The purger removes all of it.
Does it remove the names of people who commented?
Yes. Threaded-comment authors are stored in xl/persons/person.xml, which the purger always deletes. So no analyst or provider name or email survives in the published file via comment authorship.
Will purging comments change any published figures?
No. Only comment/note/person parts are removed. Every cell value, formula, and format that you intend to publish is preserved exactly.
How do I prove to a reviewer that the file shipped clean?
Unzip the output and show there are no xl/comments, xl/threadedComments, or xl/persons entries, and record the purger's parts-removed count in your release log. That's a reproducible disclosure-control check.
Does the purger also remove hidden sheets of raw data?
No — it removes comments, not sheets. A hidden tab of unsuppressed microdata is a separate (and serious) open-data risk. Run excel-hidden-sheet-destroyer as well before publishing.
What about author/agency metadata in document properties?
The purger leaves document properties intact. For public release, also run excel-app-metadata-wiper and the office-doc-property-wiper to clear creator, last-modified-by, and company fields.
Is the dataset uploaded to a server when I purge it?
No. The purge runs entirely in your browser via JSZip, so the pre-publication file and its sensitive annotations never leave your machine.
Does it handle both old Notes and new threaded comments?
Yes. It removes legacy Notes, Excel 365 threaded Comments, and the person records for threaded-comment authors in a single pass across all sheets.
What file formats can I publish/purge?
Use .xlsx or .xlsm. These OOXML packages contain the comment parts the tool targets. Convert any legacy .xls to .xlsx first — modern formats are also better for open-data reuse.
Is there an undo if I purge before finishing review?
No — the comment parts are deleted from the output. Keep your annotated working copy (the tool writes a separate -no-comments.xlsx rather than overwriting it) so your QA trail and reasoning survive internally.
What's the largest dataset I can purge?
Up to your tier's limit: Pro 50 MB / 100,000 rows, Pro-media 200 MB / 500,000 rows, Developer 500 MB / unlimited rows. Purging scales with file size, so choose the tier that fits the dataset.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.