How to filename sanitiser in developer workflows
- Step 1Grab the archive into your workspace — Pull the ZIP/TAR/7z/etc. from wherever it lands — webhook drop, customer upload, scraped bundle. The Sanitiser takes one file per run.
- Step 2Drop it on the tool — Open Filename Sanitiser and drop the file. No options to configure; the rule set is fixed, so the dev experience is identical every time.
- Step 3Eyeball the rename count — Check
Renames. For a trusted internal source this is usually 0; for an external feed a non-zero count tells you the upstream is emitting unsafe names worth handling at the source. - Step 4Download and feed downstream — Take
<stem>-sanitized.zipand hand it to whatever consumes it — extraction, indexing, CI. Because it's a plain ZIP, every tool understands it. - Step 5Automate with the runner if it's recurring — For a nightly or per-event job, pair the JAD runner once; archive tools run in a short-lived headless-browser session (there is no REST endpoint). Treat it as a browser-automation step, not an API call.
- Step 6Or reimplement the fixed rule set locally — Because the transform is small and deterministic, a pure-Node/Python port is viable when you want zero-dependency local automation without the runner.
The rule set as a developer reference
Every transform the tool applies, expressed as the regex it uses, in execution order. Match this in any local reimplementation.
| Step | Regex / op | Action |
|---|---|---|
| 1. Separators | /\\/g | Replace \ with / |
| 2. Traversal | /\.\.+/g | Replace any 2+ dot run with _ |
| 3. NUL | /\0/g | Delete NUL bytes |
| 4. Forbidden + control | /[<>:"|?*\x00-\x1f]/g (per segment) | Replace each match with _ |
| 5. Reserved names | stem ∈ {CON,PRN,AUX,NUL,COM1–4,LPT1–2} | Prefix _ |
| 6. Slash cleanup | /^\/+/ then /\/+/g | Strip leading, collapse repeated |
| 7. Empty guard | name === '' | Replace with _ |
Automation surface
What is and isn't available for scripting archive tools.
| Capability | Available? | Notes |
|---|---|---|
| REST API endpoint | No | apiAvailable: false for all archive tools |
| JAD runner (headless browser) | Yes | Short-lived Chromium session executes the same browser code |
| Per-run options | No | Fixed rule set, nothing to parameterise |
| Batch / multiple files | No | Single file per run (acceptsMultiple: false) |
| Local reimplementation | Yes | Rule set is a small deterministic regex pipeline |
Cookbook
Developer-flavoured before/after names from the kinds of archives that land in a backlog: webhook payloads, scraped bundles, and customer uploads.
Webhook payload with a traversal entry
An external system POSTed a ZIP whose entry tries to climb out. Sanitise before your worker extracts it.
incoming.zip: data/../../var/www/shell.php data/ok.json Sanitised: data/_/_/var/www/shell.php data/ok.json Renames: 1 (flag the source — it's emitting traversal)
Scraped dataset with URL-derived names
Files named after URLs carry :, ?, and *. The fixed rule set makes them filesystem-safe in one pass.
scrape.zip: https:__site.com_page?id=1.html img/*thumb.png Sanitised: https___site.com_page_id=1.html img/_thumb.png
Customer upload with a reserved name
A user zipped a folder containing a file called nul. On Windows that's unwritable; the prefix fixes it.
upload.zip: exports/nul exports/report.csv Sanitised: exports/_nul exports/report.csv
7z artefact normalised for a Node consumer
Your indexer only knows ZIP, but a partner ships 7z. Read the 7z and get a safe ZIP without shelling out to 7-Zip.
partner-feed.7z (entries with | chars)
-> partner-feed-sanitized.zip
every | replaced with _, single ZIP outputLocal reimplementation sketch
The deterministic rule set is small enough to port when you want runner-free automation. This mirrors the tool's pipeline.
function safe(name){
let s = name.replace(/\\/g,'/')
.replace(/\.\.+/g,'_')
.replace(/\0/g,'');
s = s.split('/').map(seg=>{
let x = seg.replace(/[<>:"|?*\x00-\x1f]/g,'_');
const stem = x.toUpperCase().split('.')[0];
if(RESERVED.has(stem)) x = '_'+x;
return x;
}).join('/').replace(/^\/+/,'').replace(/\/+/g,'/');
return s || '_';
}Edge cases and what actually happens
Expecting a REST API to POST to
Not availableArchive tools set apiAvailable: false. There is no HTTP endpoint to call. Automate via the JAD runner (headless browser) or reimplement the rule set locally. Do not build a pipeline assuming a curl-able API.
Need to process many archives at once
Single file onlyThe tool accepts one file per run (acceptsMultiple: false). For batches, loop in your automation, one file at a time. There is no multi-file or folder input on this tool.
Input is password-protected
Read errorNo password is passed to the reader, so encrypted ZIPs fail. Decrypt in your pipeline first. To create encrypted output later, use encrypted-zip-creator.
Collision after sanitising
Last write winsDistinct names mapping to the same safe form overwrite each other. In a data pipeline this can silently drop a file — compare input vs output entry counts and alert if they differ.
Free-tier 50 MB / 500-entry cap in automation
413 rejectedAutomated feeds often exceed the free limits. Use a Pro+ tier (500 MB / 50,000 entries and up) for unattended jobs, and check size before submitting.
Single dot vs dot-run
PreservedOnly 2+ dot runs collapse. module.test.ts is untouched; module..ts becomes module_ts. Account for this if your local port must match exactly.
Output format is fixed to ZIP
By designDownstream code that expects the original tar.gz must convert: sanitise here, then archive-format-converter. The writer only emits ZIP.
WASM disabled in headless browser
WASM error7z/RAR/bz2/xz need libarchive WASM. If your automation browser disables WebAssembly, restrict inputs to ZIP/GZIP/TAR (pure-JS fflate) or pre-extract the exotic formats.
Reserved-name coverage gap
Known gapOnly COM1–4 and LPT1–2 (plus CON/PRN/AUX/NUL) are prefixed. A local port aiming for full Windows safety should extend the set to COM5–9 and LPT3–9.
Reference inside a file points at the old name
Manual fixupRenaming an entry never edits file contents. A manifest, lockfile, or config that names the renamed entry will be stale after sanitising and needs a manual or scripted update.
Frequently asked questions
Is there a REST API for the Filename Sanitiser?
No. All archive tools have apiAvailable: false. Programmatic execution goes through the JAD runner, which spins up a short-lived headless Chromium and runs the same browser code. There is no HTTP endpoint to POST an archive to.
How do I automate it then?
Pair the JAD runner and trigger the tool as a headless-browser job, or — because the rule set is a small deterministic regex pipeline — reimplement it locally in Node/Python for a dependency-free pipeline step.
Can I sanitise a whole batch at once?
Not in one run — the tool takes a single file (acceptsMultiple: false). In automation, iterate over your archives one at a time. There is no folder or multi-file drop for this tool.
Will it touch my file contents?
No. Only entry names are rewritten. Decompressed bytes are identical, so you can run it before checksum or signature steps without invalidating them — unless a file references another entry by its old name.
What's the exact rename logic so I can match it locally?
Normalise \→/; collapse \.\.+→_; delete NUL; per segment replace [<>:"|?*\x00-\x1f]→_; if the upper-cased stem is a reserved device name prefix _; strip leading slashes; collapse repeated slashes; empty → _. The cookbook shows a faithful JS port.
Does it read 7z and RAR?
Yes, via a libarchive WASM module (it also reads bzip2 and xz). ZIP, GZIP and TAR use pure-JS fflate. Output is always a plain ZIP regardless of input.
What happens to two names that collapse to one?
The later entry overwrites the earlier — they share the same output key and the tool doesn't auto-suffix. In a pipeline, compare entry counts in vs out to detect silent drops.
Can it accept a password for encrypted inputs?
No. The reader is called without a password, so encrypted ZIPs error. Decrypt upstream. For producing encrypted archives, see encrypted-zip-creator.
Why output ZIP instead of the original format?
The writer only produces ZIP (fflate), the most portable container. If your consumer needs the source format, chain archive-format-converter after sanitising.
What are the size limits for unattended runs?
Free 50 MB / 500 entries; Pro 500 MB / 50,000; Pro-media and Developer 2 GB / 500,000. Use a higher tier for automated feeds and validate size before submitting to avoid a rejection.
Is it safe to run on untrusted customer uploads?
Yes — that's a core use case. It runs locally in the browser, so the upload never reaches a third-party server, and it specifically defangs traversal and unsafe characters before any extraction step touches the file.
Which sibling tools fit a dev pipeline?
archive-integrity-tester to validate, file-listing-generator to inventory entries, and path-prefix-remover to normalise structure — sanitise as the safety step in the chain.
Privacy first
Every JAD Archive tool runs entirely in your browser using fflate, @zip.js/zip.js, and the libarchive WASM bridge. Your archives never leave your device — verified by zero outbound network requests during processing.