How to whitelist vs charset: choosing a font subset strategy
- Step 1Decide whether your text is fixed or variable — Fixed = you know every character at build time and it never changes per-visitor (logo, hero headline, app name, ticker symbols). Variable = the text depends on data or user input (names, search results, comments, any CMS field). This single question decides the strategy.
- Step 2For fixed text, whitelist the exact characters — Use the [Character Whitelist Builder](/font-tools/character-whitelist-builder). Paste the literal string — including spaces and punctuation. It deduplicates by codepoint and keeps only those glyphs. Smallest output, but anything you forgot renders `.notdef`.
- Step 3For variable text, pick a Unicode range — Use the [Font Subsetter](/font-tools/font-subsetter) and choose a named subset: Latin, Latin-Ext, Cyrillic, Greek, Vietnamese, or symbols. The [Latin Filter](/font-tools/latin-filter) is a one-click Latin shortcut. The range covers any character in that block, so unexpected accents and names still render.
- Step 4Confirm coverage before either approach — Run the [Character Coverage Map](/font-tools/character-coverage-map) to see which Unicode blocks the source font actually fills (it scores against 346 real Unicode blocks). A 'Latin' charset only helps if the font has Latin glyphs; a whitelist only helps if the font has your specific characters.
- Step 5Run the subset, download the TTF — Both tools output `<stem>.<label>.ttf` (uncompressed). The result panel shows glyphs-in-source vs glyphs-kept so you can compare the two strategies' footprints on the same font.
- Step 6Compress and embed identically — Regardless of strategy, the last steps are the same: [TTF→WOFF2](/font-tools/ttf-to-woff2) then an `@font-face` from the [Font-Face Generator](/font-tools/font-face-generator). For charset subsets, set the CSS `unicode-range` to match so the browser only downloads the subset when it's needed.
Whitelist vs charset: the decision matrix
Both strategies use JAD's in-browser opentype.js engine, output uncompressed TTF, and drop kerning + OT features. The difference is what you specify and how forgiving it is.
| Dimension | Whitelist (exact list) | Charset (Unicode range) |
|---|---|---|
| JAD tool | Character Whitelist Builder | Font Subsetter / Latin Filter |
| You specify | The literal characters (JAD APPS) | A named block (latin, cyrillic, …) |
| Glyphs kept | Exactly your unique codepoints + .notdef | Every glyph in the range the font has + .notdef |
| Typical output | Smallest (logo: ~6–20 glyphs) | Larger (Latin: ~95–220 glyphs) |
| Breaks when… | Text contains a character you didn't paste | Text uses a script outside the chosen range |
| Best for | Logos, headlines, app names, fixed labels | Body copy, user names, CMS content, i18n |
Named charsets available in the Font Subsetter
The Font Subsetter's range dropdown maps to these UNICODE_SUBSETS definitions. The Whitelist Builder has no dropdown — it only takes a literal string. Ranges are inclusive.
| Charset key | Covers | Codepoint ranges |
|---|---|---|
latin | Basic Latin + Latin-1 Supplement | U+0020–007E, U+00A0–00FF |
latin-ext | Latin Extended-A + Extended-B + Additional | U+0100–024F, U+1E00–1EFF |
cyrillic | Cyrillic + Cyrillic Supplement | U+0400–04FF, U+0500–052F |
greek | Greek + Greek Extended | U+0370–03FF, U+1F00–1FFF |
vietnamese | Latin Ext ranges + đồng sign | U+0100–024F, U+1E00–1EFF, U+20AB |
symbols | General punctuation + super/subscripts + currency | U+2000–206F, U+2070–209F, U+20A0–20CF |
Cookbook
Concrete scenarios mapped to the right strategy. The pattern: fixed → whitelist, variable → charset, and confirm coverage either way.
Logo wordmark — whitelist wins
ExampleSix glyphs, never changes per visitor. A Latin charset would keep ~200 glyphs you'll never render. Whitelist the exact string.
Text: "JAD APPS" (fixed) Strategy: WHITELIST Tool: /font-tools/character-whitelist-builder Paste: JAD APPS -> 7 glyphs, ~5 KB TTF, ~2 KB WOFF2 (A 'latin' charset here would keep ~200 glyphs, ~30 KB.)
English blog body — charset wins
ExampleBody copy varies per post and may include any Latin-1 character (curly quotes, accented author names). Whitelisting build-time content would tofu on the next article. Use a Latin range.
Text: article body (varies per post) Strategy: CHARSET Tool: /font-tools/font-subsetter (charset: latin) -> keeps U+0020-007E + U+00A0-00FF -> survives 'naïve', 'café', curly quotes, em dashes (A whitelist of today's article tofus on tomorrow's.)
Multilingual site — charset, possibly two
ExampleAn English + Russian site needs Latin and Cyrillic. Subset twice and ship two @font-face blocks with matching unicode-range so the browser fetches only what a page uses.
Strategy: CHARSET x2
Font Subsetter (charset: latin) -> Brand.latin.ttf
Font Subsetter (charset: cyrillic) -> Brand.cyrillic.ttf
@font-face { src: url(brand.latin.woff2); unicode-range: U+0020-00FF; }
@font-face { src: url(brand.cyrillic.woff2); unicode-range: U+0400-04FF; }Pricing-page numerals + currency — whitelist
ExampleA tabular figure set for a pricing widget is a fixed, known glyph set. Whitelist the digits, decimal, comma, and currency symbols you display.
Text: "0123456789.,$€£ /mo" (fixed glyph set) Strategy: WHITELIST Paste: 0123456789.,$€£ /mo -> ~18 glyphs, tiny TTF Guard: include EVERY currency you show; a missing € = box.
When you need kerning — neither in-browser strategy works
ExampleBoth JAD in-browser strategies drop kerning. If letter-pair spacing matters (fine typographic headlines), move to a layout-preserving pipeline engine.
Need kerning/ligatures preserved? -> NOT the in-browser whitelist or charset tools (both drop GPOS/GSUB) -> Use pyftsubset --layout-features='*' OR hb-subset in a build pipeline (Python or WASM, layout-safe) -> See /font-tools/guides/automate-font-subsetting-build-pipeline
Edge cases and what actually happens
Every row below was probed against the live API. Some documented requirements (alphabetical axis order, numerical tuple order) are not actually enforced in practice — useful to know if you've been blaming the wrong thing for a 400.
Whitelist tofus when the text changes
Strategy mismatchThe classic whitelist failure: you whitelist today's hero copy, marketing edits it next week to add an em dash or an accented name, and the new character renders .notdef. Whitelisting is only safe for text that is genuinely fixed and under your control. If an editor can change the string, use a charset range or re-run the whitelist on every content change (a build-pipeline job).
Charset keeps glyphs you'll never use
By designA 'latin' charset keeps every Latin-1 glyph the font has — including currency symbols, fractions, and accented letters your English site never renders. That's the safety/size trade: you carry ~150–200 glyphs to guarantee you never tofu. For a fixed string this is pure waste; for variable text it's insurance. Pick based on whether the text varies, not on which file is smaller.
Both strategies drop kerning — the choice doesn't fix it
Kerning droppedNeither the Whitelist Builder nor the Font Subsetter preserves kern/GPOS, because both call the same subsetByCodepoints core that rebuilds the font from new Font({ glyphs }). Switching strategies does not recover kerning. If kerning is non-negotiable, the answer is a different engine (hb-subset / pyftsubset in a pipeline), not a different in-browser strategy.
Charset range exists but the font doesn't fill it
Coverage gapChoosing 'cyrillic' only helps if the source font actually contains Cyrillic glyphs. A Latin-only font subset to 'cyrillic' would keep nothing and error 'Subset would be empty'. Run the Character Coverage Map (it scores 346 real Unicode blocks) to confirm the font fills the range before you subset to it.
Whitelist with a base + combining-mark cluster
Split into codepointsIf your text uses a decomposed character (e.g. e + U+0301 combining acute) rather than precomposed é, the whitelist must include both codepoints, and the rendered result depends on the font's mark-positioning — which, with GPOS dropped, may sit incorrectly. Prefer precomposed characters in fixed marketing strings, or use a charset range that includes the combining-marks block.
Mixing strategies on one site
SupportedIt's common and correct to use both: whitelist the logo and hero headline to tiny files, charset-subset the body font to a Latin range. They're independent @font-face declarations. Just give them distinct font-family names so the logo's kerning-less micro-font isn't accidentally applied to body copy.
Vietnamese needs more than 'latin'
Use the right rangeVietnamese uses Latin Extended ranges plus the đồng sign (U+20AB) — a plain 'latin' charset will tofu on ệ, ữ, ọ. Use the dedicated 'vietnamese' charset, which the Font Subsetter defines as U+0100–024F + U+1E00–1EFF + U+20AB. A whitelist of the exact Vietnamese characters in fixed copy also works and is smaller.
Symbols/punctuation fall outside 'latin'
Add the symbols charsetEm dashes (U+2014), curly quotes (U+2018–201D), and the euro sign (U+20AC) live in General Punctuation / Currency blocks, not Latin-1. An English body subset to 'latin' will tofu on typographic punctuation. Either add the 'symbols' charset (U+2000–20CF) via a second subset, or whitelist the specific punctuation you use.
Frequently asked questions
What's the actual difference between whitelisting and charset subsetting on JAD Apps?
Whitelisting (the Character Whitelist Builder) keeps an exact list of characters you paste. Charset subsetting (the Font Subsetter / Latin Filter) keeps every glyph in a chosen Unicode range. Both run in-browser on the same opentype.js engine, both output uncompressed TTF, and both drop kerning + OpenType features. The only difference is granularity: whitelist is smaller but brittle; charset is larger but safe for varying text.
Which one should I use for a logo?
Whitelist. A logo is a fixed string you know completely at build time, so keeping only those 6–20 glyphs gives the smallest possible file — often an order of magnitude smaller than even a Latin subset. Just be sure to include spaces, punctuation, and any symbol (®, ™, &) that appears in the mark. See the tiny-logo-font guide.
Which one should I use for body text?
Charset. Body copy varies — different articles, user-entered names, CMS edits — so whitelisting build-time content will tofu on the next character that wasn't on the page. A Latin (or Latin-Ext, or Cyrillic) range covers any glyph in that block, so unexpected accents and names still render. Pair it with a CSS unicode-range so the browser only downloads the subset when needed.
Does choosing one strategy over the other affect kerning?
No. Both JAD in-browser strategies drop kerning and all OpenType layout features, because they share the same subset core that rebuilds the font from scratch. If you need kerning preserved, the fix is a different engine — hb-subset or pyftsubset --layout-features='*' in a build pipeline — not a different in-browser strategy. See the build-pipeline guide.
What Unicode ranges can the Font Subsetter target?
Six named charsets: latin (U+0020–007E, U+00A0–00FF), latin-ext (U+0100–024F, U+1E00–1EFF), cyrillic (U+0400–052F), greek (U+0370–03FF, U+1F00–1FFF), vietnamese (Latin-Ext ranges + U+20AB), and symbols (U+2000–20CF). The Whitelist Builder has no range option at all — it only takes a literal character string.
Can I get a custom range that isn't in the dropdown?
Not directly in the charset tool — it only offers the six named subsets. But the whitelist approach effectively gives you arbitrary granularity: paste exactly the characters you want and you get exactly those glyphs, which is a custom 'range' of one or many characters. For programmatic arbitrary ranges (e.g. U+2190–21FF arrows), use pyftsubset's --unicodes flag in a pipeline.
Is the output size really that different?
Yes, when text is fixed. A 6-glyph logo whitelist might be ~5 KB TTF (~2 KB WOFF2), while the same font subset to 'latin' keeps ~200 glyphs at ~30 KB TTF (~12 KB WOFF2). But for body text you can't whitelist safely, so the comparison is moot — you pay for the Latin range because you need the coverage.
Can I combine both on the same page?
Yes, and it's a good pattern: whitelist the logo and hero headline to micro-fonts, charset-subset the body font to a Latin range. Each is an independent @font-face with its own font-family name. Keep the names distinct so the kerning-less logo font isn't applied to running text where its missing glyphs and spacing would show.
How do I avoid tofu boxes with either strategy?
Run the Character Coverage Map first — it scores the font against 346 real Unicode blocks so you can confirm the glyphs you need exist. For whitelist, include every literal character your text uses (especially punctuation and symbols). For charset, pick a range wide enough for your content; add the 'symbols' charset for typographic punctuation, which lives outside Latin-1.
Why do both tools output TTF instead of WOFF2?
Because they both use opentype.js's Font.toArrayBuffer(), which writes sfnt/TTF. There's no WOFF2 writer in either tool. Whichever strategy you choose, the next step is the same: run the TTF through TTF→WOFF2. The strategy choice doesn't change the output format — only the glyph set.
Is whitelisting ever wrong even for fixed-looking text?
Yes — when 'fixed' isn't truly fixed. Navigation labels translated by a CMS, A/B-tested headlines, or any string an editor can change are deceptively variable. If a non-developer can edit the text, treat it as variable and use a charset, or automate the whitelist regeneration on every content change so it never falls out of sync.
Which strategy is better for performance?
Smaller files load faster, so whitelist wins on raw bytes when it's safe. But correctness beats size: a tofu box on a real user's name is worse than a few extra KB. Use whitelist where text is provably fixed (logos, hero copy under your control) and charset everywhere text can vary. Both compress to WOFF2 the same way, and both benefit equally from font-display: swap.
Privacy first
Every JAD Font tool runs entirely in your browser using opentype.js and the wawoff2 WASM Brotli encoder. Your fonts never leave your device — verified by zero outbound network requests during processing.