How to remove emojis from academic markdown before submission
- Step 1Load the manuscript — Paste the Markdown or upload the
.mdfile. The tool processes one manuscript at a time; merge chapters first with md-merger if needed. - Step 2Audit with Replace first — Set Mode to Replace with [name] so every emoji is bracketed. Skim for any glyph that carried meaning (a reviewer's marginal note that became prose, a status flag) before you remove anything.
- Step 3Protect listings — Keep Process code blocks too unchecked so fenced algorithm or code listings keep any characters they need.
- Step 4Run De-Emoji — Click Run De-Emoji. Check the output for double spaces left where an emoji was removed and for any
:eq:/:fig:cross-reference keys the shortcode matcher may have caught. - Step 5Switch to Strip for the clean copy — Once the audit is clear, set Mode to Strip (remove) and run again to delete the emoji and shortcodes outright.
- Step 6Convert to the submission format — Copy or download the cleaned Markdown, then render with md-to-pdf-academic for a paper-style PDF or md-to-docx for a Word template.
Every option this tool exposes (and nothing it does not)
The De-Emoji panel has exactly two controls. There is no allow-list, no per-emoji picker, no 'keep meaningful emoji' toggle, and no name dictionary — these are the real defaults from lib/markdown/markdown-tool-schemas.ts.
| Control (UI label) | Field | Values | Default | Effect |
|---|---|---|---|---|
| Mode → Strip (remove) | mode | strip | Yes (default) | Deletes each matched emoji and each :shortcode: outright. Surrounding spaces are left in place — they are not collapsed. |
| Mode → Replace with [name] | mode | replace | — | Wraps each matched emoji in square brackets (the literal glyph stays inside, e.g. the rocket becomes [ rocket-glyph ]) and turns :rocket: into [rocket]. |
| Process code blocks too | includeCodeBlocks | true / false | false (off) | Off: fenced `` `` blocks are skipped so command-line emoji or test fixtures stay intact. On: emoji inside fences are processed identically to prose. |
What counts as an emoji (matched Unicode ranges)
The de-emoji pass matches these Unicode blocks, each optionally followed by the U+FE0F variation selector. Anything outside these blocks is left untouched — verified directly against the matcher in lib/markdown/markdown-processor.ts.
| Block / range | Examples it matches | What it does NOT catch | Why it matters |
|---|---|---|---|
| U+1F600–1F64F (Emoticons) | grinning, wink, crying, neutral faces | ASCII emoticons like :) :-( ^_^ (plain text, never matched) | Face emoji are the most common decorative noise in AI-drafted prose |
| U+1F300–1FAFF (Misc Symbols & Pictographs, Supplemental, Extended-A) | weather, food, animals, hands, objects, newer 2020+ emoji | Letterlike symbols (tm) U+2122, (c) U+00A9, (R) U+00AE | Trademark and copyright marks survive — they are not emoji |
| U+2600–27BF (Misc Symbols + Dingbats) | check mark, star, heart, warning, arrows-as-dingbats | Box-drawing, geometric shapes below U+2600, math operators | A literal check (check) in a feature list is treated as decorative and removed |
| U+1F680–1F6FF (Transport & Map) | rocket, car, plane, warning-triangle transport icons | Map pin variants outside the block | Common in README hero lines and roadmap bullets |
| U+1F900–1F9FF (Supplemental Symbols) | additional hands, gestures, body parts, fantasy | Skin-tone modifier in isolation is matched as its own token | Skin-tone + base render as two bracket tokens in Replace mode |
| U+1F1E6–1F1FF (Regional Indicators) | the two-letter pieces that combine into flags | The flag as a single grapheme is matched as two code points | A flag becomes two bracket tokens in Replace mode, not one |
Before / after by mode (verified output)
Exact transformations produced by the strip and replace passes. Note the leftover space after removal and the bracket-the-glyph behaviour in Replace mode.
| Input fragment | Strip output | Replace output |
|---|---|---|
Ship it (rocket-emoji) today | Ship it today (double space remains) | Ship it [rocket-glyph] today |
Done :white_check_mark: | Done (shortcode removed) | Done [white_check_mark] |
Status: :) | Status: :) (ASCII emoticon untouched) | Status: :) (ASCII emoticon untouched) |
(c) 2026 ACME (tm) | (c) 2026 ACME (tm) (marks survive) | (c) 2026 ACME (tm) (marks survive) |
thumb-emoji + skin-tone | `` (both code points removed) | [thumb-glyph][skin-glyph] (two tokens) |
Cookbook
Manuscript fragments before and after. (rocket), (check), (warning) stand in for literal emoji glyphs; math is shown literally.
Strip an emoji from a section flag
A co-author flagged a section heading with an emoji. Strip removes it before the manuscript hits the template.
Mode: Strip Input: ## Results (check) (finalised) Output: ## Results (finalised) (Double space remains where the emoji was.)
Math and Greek survive untouched
Inline math, operators, and Greek letters are not emoji. This is the expected no-op that proves equations are safe.
Mode: Strip Input: We set alpha = 0.05 and report effect size d >= 0.8 (degree sign 25C). Output: We set alpha = 0.05 and report effect size d >= 0.8 (degree sign 25C). (Nothing matched; statistics text preserved.)
Accented author names kept
Author lists and citations full of diacritics must not change. Only emoji are targeted.
Mode: Strip Input: After Erdos, Poincare, and Schrodinger (rocket new section) Output: After Erdos, Poincare, and Schrodinger (Names preserved; trailing emoji removed.)
Cross-reference key caught as a shortcode
Some Markdown preprocessors use :fig: / :eq: cross-references. The shortcode matcher will catch them — verify before trusting.
Mode: Strip Input: As shown in :fig: and proven in :eq: the bound holds (check) Output: As shown in and proven in the bound holds (:fig: and :eq: removed as shortcodes — review and restore if needed.)
Audit a long manuscript before deleting
Replace mode turns a daunting search into a skim: every emoji is bracketed in place.
Mode: Replace with [name] Input: Discussion (warning) needs a citation here :memo: Output: Discussion [(warning)] needs a citation here [memo] (Now grep for [ to find every former emoji site.)
Edge cases and what actually happens
Cross-reference keys like :fig: / :eq: stripped
ExpectedThe shortcode matcher catches any lowercase :word:, so cross-reference or equation keys written in that form are removed or bracketed. If your manuscript uses colon-delimited reference keys, run Replace first and verify, or convert them before cleaning.
Math operators and Greek letters preserved
PreservedMathematical operators, Greek letters, the degree sign, and per-mille marks are outside the matched emoji ranges, so equations and statistics text are never altered. The pass is safe for STEM manuscripts.
Accented author names preserved
PreservedDiacritics in names and citations (Erdős, Poincaré, Gödel) are not emoji and pass through unchanged.
Double space after a removed emoji
By designStrip leaves the spaces that surrounded a glyph. Results (check) here becomes Results here. Tidy with md-prettifier before rendering the submission PDF.
Algorithm listings in fences untouched
By designWith 'Process code blocks too' off, fenced code or pseudocode keeps any characters it contains. Enable the option only if a listing genuinely contains decorative emoji you want gone.
Emoji-only reviewer comment vanishes
ExpectedIf a tracked comment was reduced to just an emoji and then folded into prose, Strip removes it entirely. Use Replace first to catch any such comment that should have been a real note.
Manuscript over the character limit
LimitFree caps pasted text at 500,000 characters; Pro 5,000,000; Pro-media 20,000,000; Developer unlimited. A book-length manuscript may exceed Free — upload the file or split with md-splitter.
Replace keeps the literal glyph, not a LaTeX name
ExpectedReplace wraps a Unicode emoji as [ glyph ]; it does not emit a LaTeX command or an English name. Use the bracket only as a positional marker during audit, then Strip for the clean copy.
Newest Unicode emoji above U+1FAFF
Mostly coveredThe matcher reaches the U+1FAFF Extended-A block. A glyph from a Unicode release that assigns code points above that range may not be caught. If one survives, it is outside the implemented ranges.
Frequently asked questions
Will this break my equations or Greek letters?
No. Math operators, Greek letters, the degree sign, and similar symbols are outside the matched emoji ranges and are never touched. The pass is safe for STEM manuscripts.
Are accented author names preserved?
Yes. Diacritics like those in Erdős or Poincaré are ordinary letters, not emoji, so they pass through unchanged.
I use `:fig:` and `:eq:` cross-references — are they safe?
No, watch out. The shortcode matcher catches any lowercase :word:, including cross-reference keys in that form. Run a Replace pass first to see what would change, then convert those keys or restore them after cleaning.
Does Strip remove `:shortcode:` emoji too?
Yes. Strip removes both Unicode emoji and :name: shortcodes such as :memo: or :white_check_mark:. This is intentional in the implementation.
Is my unpublished manuscript uploaded anywhere?
No. The transform runs entirely in your browser, so an embargoed or under-review manuscript never leaves your machine.
How do I audit a long paper before deleting anything?
Use Replace mode. Every emoji becomes a bracketed token in place, so you can search for [ to find each former emoji site and confirm none was meaningful before running Strip.
Why do I see a double space after stripping?
Removal leaves surrounding whitespace untouched, so Results (check) here becomes Results here. Clean it with md-prettifier before rendering.
Can I produce a submission-ready PDF or Word file?
Yes — clean here, then render with md-to-pdf-academic for a paper-style PDF or md-to-docx for a journal Word template. This tool outputs Markdown text only.
Does it process code listings?
Not by default. Fenced listings are skipped unless you check 'Process code blocks too', protecting algorithm pseudocode and reproducibility snippets.
Does Replace give a readable name for a Unicode emoji?
Only for shortcodes. :memo: becomes [memo], but a literal Unicode emoji is bracketed with the original glyph inside. The token is a position marker, not a translation.
How large a manuscript can it handle?
Pasted text caps at 500,000 characters (Free), 5,000,000 (Pro), 20,000,000 (Pro-media), or unlimited (Developer). File uploads follow the 1/10/50/500 MB size tiers. The character cap is separate from file size.
Can I merge chapters, clean, then split for submission?
Yes. Combine chapter files with md-merger, run De-Emoji on the whole manuscript, then split back if needed with md-splitter.
Privacy first
All Markdown processing runs locally in your browser using JavaScript. No file is ever uploaded to JAD Apps servers — only metadata counters are saved for signed-in dashboard stats.