How to clean emoji out of corporate & legal markdown
- Step 1Load the draft — Paste the memo or upload the
.md/.docx-exported-to-.mdsource. One document per run. - Step 2Start with Replace for review — Set Mode to Replace with [name] so every emoji becomes a visible
[...]token. Send that to your reviewer to confirm none of the glyphs were load-bearing. - Step 3Keep code/config fenced content safe — Leave Process code blocks too unchecked so any fenced config samples or command lines in the appendix are untouched.
- Step 4Run De-Emoji and review — Click Run De-Emoji. Scan for double spaces where glyphs were removed and for any
:clause-label:style tokens the shortcode matcher may have caught. - Step 5Switch to Strip for the final — Once review confirms nothing meaningful is lost, set Mode to Strip (remove) and run again for the clean, emoji-free final.
- Step 6Export to the publishing format — Copy or download the cleaned Markdown, then convert with md-to-docx for Word review or md-to-pdf-modern for a polished PDF.
Every option this tool exposes (and nothing it does not)
The De-Emoji panel has exactly two controls. There is no allow-list, no per-emoji picker, no 'keep meaningful emoji' toggle, and no name dictionary — these are the real defaults from lib/markdown/markdown-tool-schemas.ts.
| Control (UI label) | Field | Values | Default | Effect |
|---|---|---|---|---|
| Mode → Strip (remove) | mode | strip | Yes (default) | Deletes each matched emoji and each :shortcode: outright. Surrounding spaces are left in place — they are not collapsed. |
| Mode → Replace with [name] | mode | replace | — | Wraps each matched emoji in square brackets (the literal glyph stays inside, e.g. the rocket becomes [ rocket-glyph ]) and turns :rocket: into [rocket]. |
| Process code blocks too | includeCodeBlocks | true / false | false (off) | Off: fenced `` `` blocks are skipped so command-line emoji or test fixtures stay intact. On: emoji inside fences are processed identically to prose. |
Before / after by mode (verified output)
Exact transformations produced by the strip and replace passes. Note the leftover space after removal and the bracket-the-glyph behaviour in Replace mode.
| Input fragment | Strip output | Replace output |
|---|---|---|
Ship it (rocket-emoji) today | Ship it today (double space remains) | Ship it [rocket-glyph] today |
Done :white_check_mark: | Done (shortcode removed) | Done [white_check_mark] |
Status: :) | Status: :) (ASCII emoticon untouched) | Status: :) (ASCII emoticon untouched) |
(c) 2026 ACME (tm) | (c) 2026 ACME (tm) (marks survive) | (c) 2026 ACME (tm) (marks survive) |
thumb-emoji + skin-tone | `` (both code points removed) | [thumb-glyph][skin-glyph] (two tokens) |
What counts as an emoji (matched Unicode ranges)
The de-emoji pass matches these Unicode blocks, each optionally followed by the U+FE0F variation selector. Anything outside these blocks is left untouched — verified directly against the matcher in lib/markdown/markdown-processor.ts.
| Block / range | Examples it matches | What it does NOT catch | Why it matters |
|---|---|---|---|
| U+1F600–1F64F (Emoticons) | grinning, wink, crying, neutral faces | ASCII emoticons like :) :-( ^_^ (plain text, never matched) | Face emoji are the most common decorative noise in AI-drafted prose |
| U+1F300–1FAFF (Misc Symbols & Pictographs, Supplemental, Extended-A) | weather, food, animals, hands, objects, newer 2020+ emoji | Letterlike symbols (tm) U+2122, (c) U+00A9, (R) U+00AE | Trademark and copyright marks survive — they are not emoji |
| U+2600–27BF (Misc Symbols + Dingbats) | check mark, star, heart, warning, arrows-as-dingbats | Box-drawing, geometric shapes below U+2600, math operators | A literal check (check) in a feature list is treated as decorative and removed |
| U+1F680–1F6FF (Transport & Map) | rocket, car, plane, warning-triangle transport icons | Map pin variants outside the block | Common in README hero lines and roadmap bullets |
| U+1F900–1F9FF (Supplemental Symbols) | additional hands, gestures, body parts, fantasy | Skin-tone modifier in isolation is matched as its own token | Skin-tone + base render as two bracket tokens in Replace mode |
| U+1F1E6–1F1FF (Regional Indicators) | the two-letter pieces that combine into flags | The flag as a single grapheme is matched as two code points | A flag becomes two bracket tokens in Replace mode, not one |
Cookbook
Formal-document fragments before and after. (rocket), (check), (warning) stand in for literal emoji glyphs.
De-emoji a policy heading
An internal handbook heading carries a decorative emoji that has no place in a compliance-reviewed document. Strip removes it; mind the leftover space.
Mode: Strip Input: ## (warning) Acceptable Use Policy Output: ## Acceptable Use Policy (Heading marker now has a double space before the text.)
Preserve trademark and copyright in a footer
Legal footers must keep their marks. The matcher leaves them alone — this is the expected no-op.
Mode: Strip Input: (c) 2026 ACME Corp. ACME(tm) and Widget(R) are trademarks. Output: (c) 2026 ACME Corp. ACME(tm) and Widget(R) are trademarks. (All marks preserved; nothing matched.)
Review pass with bracketed placeholders
Before deleting anything, show counsel where emoji were so they can confirm none signalled approval status or severity.
Mode: Replace with [name] Input: Q3 results :chart_with_upwards_trend: exceeded plan (check) Output: Q3 results [chart_with_upwards_trend] exceeded plan [(check)] (Shortcode -> word; Unicode glyph kept inside brackets.)
Watch for clause labels caught as shortcodes
A contract that uses :term: style labels will be affected by the shortcode matcher. This shows the false positive to watch for.
Mode: Strip Input: See clause :indemnity: and exhibit (check) Output: See clause and exhibit (The :indemnity: label was removed as if a shortcode — review before trusting.)
Hand off to Word with marks intact
Clean, then convert. The de-emoji output is plain Markdown ready for the docx converter.
Step 1 - Mode: Strip, run De-Emoji on the memo. Step 2 - Copy the cleaned Markdown. Step 3 - Paste into md-to-docx (/markdown-tools/md-to-docx) for tracked-change review in Word. Result: an emoji-free .docx with (c)/(tm)/(R) preserved.
Edge cases and what actually happens
Contract `:clause:` labels removed as shortcodes
ExpectedThe shortcode matcher cannot distinguish an emoji shortcode from a colon-wrapped legal label like :indemnity:. It will strip or bracket those too. For contracts that use colon-delimited term labels, do a Replace pass first and verify, or rename labels before cleaning.
Trademark / copyright marks preserved
Preserved(tm), (c), and (R) are not in the matched emoji ranges, so trademark and copyright notices in legal footers survive unchanged. This is correct and intentional.
Double space left where an emoji was
By designRemoval does not collapse surrounding whitespace, so a heading like ## (warning) Policy becomes ## Policy. Tidy with md-prettifier before final publication if your renderer shows the gap.
Accented executive names kept
PreservedNames like José, Renée, or Müller use accented Latin letters, which are outside the emoji ranges and are never modified. Only true emoji code points are targeted.
Fenced config blocks untouched by default
By designWith 'Process code blocks too' off, an appendix of YAML or shell config keeps any emoji it contains. Enable the option only if policy requires code samples to be emoji-free too.
ASCII smiley in a quoted email survives
PreservedIf the memo quotes an email containing :), that ASCII emoticon is not matched and stays. Remove it manually if the formal version must drop it.
Document over the character limit
LimitFree caps pasted text at 500,000 characters; Pro 5,000,000; Pro-media 20,000,000; Developer unlimited. A large compiled handbook may exceed Free — upload the file or split sections with md-splitter.
Replace output still contains the literal glyph
ExpectedFor Unicode emoji, Replace keeps the original glyph inside the brackets rather than naming it. If a reviewer needs English names, that is not what this mode produces — use the bracketed token only as a positional marker.
Keycap-style numbered badges not removed
PreservedNumbered keycap emoji (a digit plus a combining keycap) are built on an ASCII digit and the U+20E3 mark, which is outside the matched ranges. The digit stays; a stray combining mark may remain. Remove manually if it appears.
Frequently asked questions
Will this delete our trademark and copyright symbols?
No. (tm) (U+2122), (c) (U+00A9), and (R) (U+00AE) are outside the matched emoji ranges and are preserved exactly. Only emoji code points are targeted, so legal footers are safe.
We use `:term:` labels in contracts — will those be affected?
Yes, that is the one thing to watch. The shortcode matcher catches any lowercase :word:, so a clause label like :indemnity: is treated like an emoji shortcode. Run a Replace pass first to see exactly what would change, then decide.
Is the document uploaded anywhere?
No. Processing is entirely in-browser, which matters for pre-announcement, M&A, and HR-sensitive drafts. Nothing about the content leaves your machine.
How do I keep an audit trail for legal review?
Use Replace mode for the review copy. Every former emoji shows as a bracketed token so counsel can confirm none carried meaning (approval, severity, status). Then switch to Strip for the final, emoji-free document.
Why is there a double space in my heading after stripping?
Removal leaves the surrounding spaces in place. ## (warning) Policy becomes ## Policy. Clean the whitespace with md-prettifier before publishing.
Does it handle accented names like José or Müller?
Yes — by leaving them alone. Accented Latin letters are not emoji, so they are untouched. The tool never alters ordinary text.
Can I produce a Word document from the result?
Clean the Markdown here, then run it through md-to-docx for a .docx, or md-to-pdf-modern for a PDF. This tool itself outputs Markdown text only.
What about emoji inside config or code samples in an appendix?
They are preserved by default because 'Process code blocks too' is off. Enable that checkbox only if your standard requires code samples to be emoji-free as well.
Does Replace translate a Unicode rocket to the word [rocket]?
No. Only :rocket: shortcodes become [rocket]. A literal Unicode rocket is wrapped as [ glyph ] with the original character kept inside. Treat the bracket as a position marker, not a name.
Can our comms team standardise this step?
Yes. The pass is deterministic and idempotent — running Strip twice yields the same result — so it is safe as a fixed pre-publish step in a documented workflow.
Will it remove ASCII emoticons from a quoted email?
No. :) and similar are plain ASCII, not Unicode emoji, so they are kept. Remove them by hand if the formal version must not contain them.
How large a document can it handle?
Pasted text caps at 500,000 characters (Free), 5,000,000 (Pro), 20,000,000 (Pro-media), or unlimited (Developer). File uploads follow the 1/10/50/500 MB tiers. The character cap is independent of file size.
Privacy first
All Markdown processing runs locally in your browser using JavaScript. No file is ever uploaded to JAD Apps servers — only metadata counters are saved for signed-in dashboard stats.