How to remove author and organisation metadata from excel before open data release
- Step 1Finalise the dataset — Complete QA, anonymise any personal data in the cells themselves, and save as
.xlsx. Wipe metadata as the final step, since saving re-stampscp:lastModifiedBywith your name. - Step 2Open the wiper — The Excel Core Metadata Wiper opens the Office Doc Property Wiper (Pro tier). If the release includes supporting
.docxdocumentation or a.pptxsummary, drop those in the same batch — all are OOXML. - Step 3Strip author and organisation properties — JSZip deletes
docProps/core.xml,app.xml, andcustom.xml, plus the thumbnail and any comment/person XML, then repacks.removedEntriesconfirms how many property files came out. - Step 4Set a Publisher value if your standard requires it — This tool clears properties; it does not write a Publisher field. If your open-data standard (e.g. DCAT, the data.gov.uk schema) wants a Publisher, set the organisation name in Excel's File → Info after wiping, then re-wipe is not needed — but avoid adding your personal name back.
- Step 5Verify no staff names remain — Open File → Info (or unzip the file) and confirm Author, Last Modified By, and Company are blank. Also skim the cells, headers, and any hidden tabs for staff names — the wiper removes properties, not content.
- Step 6Publish to the portal — Upload the cleaned file. The public sees only the dataset, the portal's descriptive record, and any Publisher value you deliberately set — never an individual officer's name or the internal review history.
What leaks into a public dataset, and where
Each item the public could extract from a published .xlsx, the OOXML file it lives in, and whether the wiper removes it. Field names are the literal OOXML element names.
| What the public can extract | Field / element | OOXML path | Wiped? |
|---|---|---|---|
| Name of the officer who built it | dc:creator | docProps/core.xml | Yes — file deleted |
| Sign-off reviewer | cp:lastModifiedBy | docProps/core.xml | Yes |
| Internal draft count | cp:revision | docProps/core.xml | Yes |
| Department name | Company | docProps/app.xml | Yes |
| Classification label / system ID | custom property nodes | docProps/custom.xml | Yes |
| Snapshot of active sheet at last save | thumbnail image | docProps/thumbnail.(jpeg|png) | Yes |
| Reviewer comment threads | comment / person refs | xl/comments*.xml, xl/persons/ | Yes |
| Staff names typed into cells / headers | cell text | xl/worksheets/ | No — content, not property |
| Data in hidden rows/columns/sheets | worksheet data | xl/worksheets/, xl/workbook.xml | No — see Hidden Sheet Destroyer |
Do portals strip embedded metadata for you?
General behaviour of major open-data portals. They index the descriptive metadata you enter; they do not parse and scrub the bytes inside the uploaded file. Pre-wipe before upload.
| Portal | Strips embedded file metadata on upload? | Implication |
|---|---|---|
| data.gov.uk | No | Embedded Author/Company ship as-is — pre-wipe |
| data.gov (US) | No | File is served verbatim — pre-wipe |
| CKAN-based regional portals | No | CKAN stores the file as an unmodified resource |
| Socrata / Tyler portals | Varies for converted views | Original download usually retains metadata — pre-wipe |
Cookbook
Real open-data cases with the raw OOXML before and after wiping. Officer names anonymised.
Civil servant's name in a published dataset
A statistics release went live on a portal; a journalist unzipped the .xlsx and read the building officer's name. Pre-wiping core.xml prevents this.
docProps/core.xml (before publication): <dc:creator>a.officer@dept.gov.uk</dc:creator> <cp:lastModifiedBy>Head of Statistics</cp:lastModifiedBy> <cp:revision>21</cp:revision> After wiping: docProps/core.xml → removed removedEntries: 3
Department name in app.xml
The dataset should be attributed to the publishing body via the portal record, not via an embedded Company string from a sub-team. The wiper deletes app.xml.
docProps/app.xml (before): <Company>Regional Analytics Unit, Dept of Transport</Company> After wiping: docProps/app.xml → removed Attribution now comes only from the portal's Publisher field.
Internal classification label in custom.xml
An internal handling label that was never meant for the public can hide in custom.xml. It is invisible in the standard Properties panel but readable by anyone who unzips the file.
docProps/custom.xml (before): <property name="HandlingLabel">OFFICIAL — internal use</property> After wiping: docProps/custom.xml → removed
Set Publisher after wiping, per DCAT
Some standards want a Publisher value in the file. The wiper clears properties; set the organisation name in Excel afterward — and do not re-add your personal name.
1. wipe → core.xml, app.xml removed (Author blank, Company blank)
2. Excel File → Info → Properties:
Company / Publisher → "Department of Transport"
leave Author blank
3. publish → public sees org, never an individualProperties clean, but a name sits in a cell
The wiper removed all properties, yet an analyst's initials were typed into a 'prepared by' cell. Always scan content too.
Properties: clean ✓ Cell A1 of 'Notes' sheet: "Prepared by A. Officer, 2026-05" ← still public! Fix: clear that cell before publishing. The wiper does not touch worksheet text.
Edge cases and what actually happens
Portal does not strip embedded metadata
By designdata.gov, data.gov.uk, and most CKAN-based portals store the uploaded file as an unmodified resource and serve it verbatim — they index the descriptive metadata you type, not the bytes inside the spreadsheet. So any embedded Author/Company ships to the public unless you wipe it first. Pre-wiping before upload is the only reliable control.
Staff name typed into a cell or header
Out of scopeIf an officer's name appears in a 'prepared by' cell, a column header, or a footnote, it is worksheet content (xl/worksheets/) and survives the wipe. Review and clear such cells before publishing — the wiper removes document properties only, not data.
Sensitive data hidden in rows, columns, or sheets
Not removedThe wiper does not remove hidden worksheet content — hidden rows, columns, and sheets still contain their data after a wipe, and a determined member of the public can unhide them. Use the Hidden Sheet Destroyer for hidden sheets and manually clear hidden rows/columns before release.
Need a Publisher value, not just blank
Clear onlyThis tool clears properties and cannot write a Publisher field. If your open-data standard requires one, wipe first, then set the organisation name in Excel's File → Info. Be careful not to re-introduce a personal name in the process; set the org, leave Author blank.
Reviewer comment threads
RemovedInternal review comments and their author names live in xl/comments*.xml, xl/threadedComments/, and xl/persons/, all of which the wiper removes — so a public reader cannot reconstruct the internal review discussion. To remove only comments while keeping properties, use the Comment & Note Purger.
Legacy .xls release file
Unsupported formatOld .xls files use binary BIFF8 with no docProps/ folder, so the ZIP-based wiper cannot act on them. Save As .xlsx (or convert with LibreOffice on a non-Excel machine) first, then wipe — publishing .xlsx is also more accessible for open-data reuse.
Dataset file over the Pro size cap
413-style rejectPro tier caps each file at 50 MB. A very large statistical workbook past that is rejected before processing; Pro-media raises it to 200 MB and Developer to 500 MB. For genuinely huge open datasets, CSV is usually the preferred publication format anyway.
Re-save after wiping re-adds your name
ExpectedOpening the wiped file in Excel and saving re-creates docProps/core.xml with your account name in cp:lastModifiedBy. Make the wipe the final action before upload, or — if you must set a Publisher in Excel afterward — verify Author is still blank before publishing.
Frequently asked questions
Will the open-data portal strip the metadata for me?
Generally no. data.gov.uk, data.gov, and CKAN-based portals store and serve the uploaded file verbatim — they index the descriptive metadata you type in, not the bytes inside the spreadsheet. So any embedded Author, Company, or custom property ships to the public unless you remove it before upload. Pre-wiping is the only reliable control.
What gets removed before I publish?
The wiper deletes docProps/core.xml (Author, Last Modified By, dates, revision count), docProps/app.xml (Company/department, Manager, Excel build), and docProps/custom.xml (classification labels, system IDs), plus the thumbnail preview and any reviewer comment/person XML. Cell data, formulas, and formatting are untouched.
Should I set the author to the organisation name instead of leaving it blank?
Some standards (DCAT, the data.gov.uk schema) recommend a Publisher value. Blank is acceptable for most portals. This tool clears fields — to set a Publisher, wipe first, then set the organisation name in Excel's File → Info, taking care not to re-add a personal name in the process.
Does it remove a civil servant's name that's typed into a cell?
No. The wiper removes document properties only. A name in a 'prepared by' cell, a header, or a footnote is worksheet content and survives — review and clear those cells before publishing. The metadata wipe and a content review are two separate, both-necessary steps.
What about data hidden in rows, columns, or sheets?
The wiper does not touch worksheet content, so hidden rows, columns, and sheets keep their data and a member of the public can unhide them. Use the Hidden Sheet Destroyer for hidden sheets, and unhide and clear hidden rows/columns manually before release.
Is the pre-release dataset uploaded anywhere to be cleaned?
No. JSZip unpacks, deletes the property files, and repacks the file in your browser. Pre-publication datasets — which may still contain sensitive working data you haven't finished anonymising — never leave the publishing officer's machine. A local audit-log entry (no content) is recorded for governance.
Does data.gov.uk automatically strip embedded Excel metadata?
No. data.gov.uk does not parse and scrub embedded metadata from uploaded spreadsheets — it serves the file as you uploaded it. Wipe the file before uploading; do not assume the portal does it for you.
Does it also remove macros from the file?
No. VBA macros live in xl/vbaProject.bin, which the wiper leaves intact. Open data is rarely published as a macro workbook, but if yours is, remove the macros with the VBA Macro Stripper and consider saving as plain .xlsx.
Can a member of the public recover the officer's name after the wipe?
Not from the published file. The property XML is deleted from the archive, not blanked, so there is no residual creator data. The name only resurfaces if an older, un-wiped copy of the file was also published — so wipe the exact version you upload.
What tier do I need?
Pro. The Core Metadata Wiper runs through the Office Doc Property Wiper, which requires Pro tier (files up to 50 MB, 5 per batch). Pro-media (200 MB / 20) and Developer (500 MB / unlimited) cover larger or higher-volume publication workflows.
Can I clean the whole release package at once?
Yes. The wiper batches multiple files and handles .xlsx, .docx, and .pptx together, so the dataset, its .docx documentation, and any .pptx summary can be cleaned in a single drop — Pro allows 5 files, Pro-media 20, Developer unlimited.
Can I automate metadata stripping in our publication pipeline?
Yes. Pair the @jadapps local runner and dispatch the office-doc-property-wiper job from your publishing script — the file is processed locally, so sensitive pre-release data stays on your network. A common pattern is a release-gate step that wipes every file before it reaches the portal's upload API.
Privacy first
Every JAD Excel tool runs entirely in your browser using SheetJS and ExcelJS. Your spreadsheets, formulas, and data never leave your device — verified by zero outbound network requests during processing.