How to subset fonts for a multilingual design system
- Step 1Inventory locales and group by Unicode subset — List every locale the product targets and bucket them: Latin Basic, Latin Extended, Cyrillic, Greek (JAD presets), then CJK Simplified/Traditional, Korean, Arabic, Devanagari (handled elsewhere). Most products land on 5–8 distinct buckets.
- Step 2Freeze the brand weights you ship — Because the Subsetter outputs static styles, run [Variable Font Freezer](/font-tools/variable-font-freezer) first to pin each weight you use (e.g. 400/600/700). You'll subset each frozen weight separately — plan the matrix as weight × language.
- Step 3Subset each Latin-family/Cyrillic/Greek bucket in JAD — Run the [Font Subsetter](/font-tools/font-subsetter) once per applicable preset against each frozen weight: `brand-400-latin.ttf`, `brand-400-cyrillic.ttf`, and so on. Confirm coverage first with the [Character Coverage Map](/font-tools/character-coverage-map) so a preset you pick actually exists in the font.
- Step 4Handle CJK / Arabic / Indic with a CI engine — The six JAD presets don't include these scripts, and complex scripts need the layout tables JAD drops. Use `pyftsubset`/`hb-subset` for them — see the [build-pipeline guide](/font-tools/guides/automate-font-subsetting-build-pipeline). Subset CJK to a common-character set (JIS Level 1, GB2312, KS X 1001) rather than the full font.
- Step 5Compress and declare unicode-range per file — Push every subset TTF through [TTF→WOFF2](/font-tools/ttf-to-woff2), then emit one `@font-face` per (weight × language) with a disjoint `unicode-range`. Mirror Google Fonts' subset boundaries so you can swap self-hosted files in without CSS churn.
- Step 6Define fallback chains per script — Each `font-family` stack should fall back to a sensible system font for the script (`system-ui` for Latin, `'Noto Sans CJK'`/`sans-serif` for CJK). Visitors who type a codepoint outside your subsets then get a graceful system glyph instead of a tofu box.
Per-locale subset matrix — who handles what
The JAD Subsetter covers the Latin/Cyrillic/Greek scripts via its six presets. Scripts marked CI engine need pyftsubset/hb-subset because they fall outside the presets and rely on layout tables JAD drops.
| Script bucket | Example locales | Subset with | Notes |
|---|---|---|---|
| Latin Basic | en, fr, de, es, it, pt | JAD: Latin preset | U+0020–00FF, ~190 codepoints |
| Latin Extended | pl, cs, tr, vi, cy | JAD: Latin-Ext / Vietnamese | Adds U+0100–024F, U+1E00–1EFF |
| Cyrillic | ru, uk, bg, sr | JAD: Cyrillic preset | U+0400–052F |
| Greek | el | JAD: Greek preset | U+0370–03FF, U+1F00–1FFF |
| CJK (Simplified / Traditional / Korean) | zh-CN, zh-TW, ja, ko | CI engine (hb-subset) | Subset to common set; full font is multi-MB |
| Arabic / Devanagari / Thai | ar, hi, th | CI engine (pyftsubset) | Need GSUB/GPOS shaping JAD drops |
CJK common-character subsets — the big wins
CJK fonts dominate payload. Subsetting to a standard common-character set is the highest-leverage change. These need a layout-preserving CI engine, not the JAD in-browser tool.
| Language | Common-character set | Approx. glyphs | Typical size drop |
|---|---|---|---|
| Japanese | JIS Level 1 (kanji) + kana | ~3,000 | 5+ MB → ~1 MB |
| Simplified Chinese | GB2312 | ~6,700 | 8+ MB → ~1.5 MB |
| Traditional Chinese | Big5 common | ~5,400 | 8+ MB → ~1.5 MB |
| Korean | KS X 1001 (modern hangul) | ~2,350 | 4+ MB → ~900 KB |
Cookbook
The patterns top global products use. The Latin/Cyrillic/Greek halves are the JAD tool; CJK/Arabic are flagged where a CI engine is required.
The weight × language matrix
ExampleThree weights and four scripts is twelve files. Plan it as a grid so you don't forget that subsetting multiplies by weight — and freeze each weight first because the Subsetter is single-style.
Weights frozen (Variable Font Freezer): 400, 600, 700 Scripts: latin, latin-ext, cyrillic, greek Font Subsetter runs = 3 × 4 = 12 files: brand-400-latin.ttf brand-600-latin.ttf brand-700-latin.ttf brand-400-latin-ext.ttf brand-600-latin-ext.ttf brand-700-latin-ext.ttf brand-400-cyrillic.ttf ... ... brand-400-greek.ttf ... ... then TTF→WOFF2 on each.
Matching Google Fonts' subset boundaries
ExampleUse Google's well-tested unicode-range boundaries so a self-hosted brand font drops into the same CSS shape — no rewrite when you migrate away from the Google CDN.
@font-face {
font-family: 'Brand'; font-weight: 400;
src: url('/f/brand-400-latin.woff2') format('woff2');
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+2000-206F;
}
@font-face {
font-family: 'Brand'; font-weight: 400;
src: url('/f/brand-400-cyrillic.woff2') format('woff2');
unicode-range: U+0400-045F, U+0490-0491, U+04B0-04B1;
}Per-script fallback chain
ExampleWhen a visitor renders a codepoint outside your subsets, the system font for that script keeps text readable instead of showing tofu. Define the chain per script.
/* Latin UI */
--font-latin: 'Brand', system-ui, -apple-system, sans-serif;
/* CJK fallback while/if subset is unavailable */
--font-cjk: 'Brand', 'Noto Sans CJK SC', 'PingFang SC',
'Hiragino Sans', sans-serif;
body { font-family: var(--font-latin); }
:lang(zh), :lang(ja), :lang(ko) { font-family: var(--font-cjk); }CJK common-subset (CI engine, not JAD)
ExampleThe JAD presets stop at Greek, and CJK needs layout tables JAD drops — so this step belongs in the build pipeline. Subset to a common set, keep layout features.
# JAD Subsetter has no CJK preset → use hb-subset / pyftsubset. pyftsubset BrandCJK.ttf \ --unicodes-file=jis-level1.txt \ --layout-features='*' \ --flavor=woff2 \ --output-file=brand-ja.woff2 # 5.4 MB → ~1.0 MB, kerning + ligatures preserved.
Granular cache invalidation in practice
ExampleWhen the design team tweaks the Cyrillic glyphs, only that file's hash changes — Latin and Greek visitors keep their cached copies. This is why per-language files beat one combined file.
Before edit: brand-400-cyrillic.a1b2c3.woff2
After edit: brand-400-cyrillic.d4e5f6.woff2 ← new hash
brand-400-latin.9f8e7d.woff2 ← UNCHANGED, still cached
brand-400-greek.5c4b3a.woff2 ← UNCHANGED, still cached
One combined file would have invalidated for every visitor instead.Edge cases and what actually happens
Every row below was probed against the live API. Some documented requirements (alphabetical axis order, numerical tuple order) are not actually enforced in practice — useful to know if you've been blaming the wrong thing for a 400.
CJK / Arabic / Devanagari aren't in the JAD presets
Out of scopeThe Subsetter's dropdown is Latin, Latin-Ext, Cyrillic, Greek, Vietnamese, Symbols — no CJK or complex-script options, and the empty-subset guard will fire if you try Cyrillic on a CJK-only font. Handle these scripts with pyftsubset/hb-subset in CI (see the build-pipeline guide).
Arabic/Indic subset rendered but shaping broke
errorComplex scripts depend on GSUB/GPOS for joining and reordering — exactly the tables the in-browser Subsetter drops. A JAD-subset Arabic font would render disconnected, mis-ordered letters. Always subset complex scripts with a layout-preserving engine and verify shaping before shipping.
Brand font lost its weight axis across the matrix
ExpectedThe Subsetter outputs a single static style, so a variable brand font collapses to one weight per file. Freeze each weight you ship with the Variable Font Freezer before subsetting, and treat weight as a dimension of the matrix.
A localized name renders tofu for a real user
Coverage gapGlyph-by-language subsets only cover the scripts you planned for. A user named José on a Cyrillic-only page, or a Vietnamese name on a Latin-Basic page, hits a missing glyph. For user-generated text, subset to the wider script range (Latin-Ext, not Latin) or lean on the system fallback chain.
Kerning/ligatures dropped on display headings
ExpectedFor body and UI text the dropped kern/GSUB is invisible, but a hero headline in a script font can look loose or lose its fi/fl joins. Subset display faces with a layout-preserving CI engine, or keep the full font for the few headline glyphs and JAD-subset only body weights.
Vietnamese locale needs more than the Latin preset
Coverage gapVietnamese uses Latin plus stacked diacritics in U+1E00–1EFF. The JAD Vietnamese preset covers this (Latin-Ext ranges + đồng sign); the plain Latin preset does not. Pick Vietnamese (or Latin-Ext) for vi locales or tone marks show as tofu.
Free-tier file limit blocks the source font
413Multi-script brand fonts often exceed the 5 MB free limit before you even subset. Upgrade to Pro (50 MB), or pre-strip the source with Hinting Stripper / Colour Table Remover to get under the cap, then subset per language.
Overlapping ranges download two language files for one page
Wasted bytesIf your Latin and Latin-Ext @font-face blocks both claim basic Latin, an English page fetches both. Keep each block's unicode-range disjoint and aligned to the subset it contains — the matrix only saves bytes when the ranges don't overlap.
Combined file chosen over per-language for simplicity
By design tradeoffFor a product with few locales and modest traffic, one Latin+Cyrillic+Greek combined subset is simpler to ship and still far smaller than the full font — at the cost of every visitor downloading all three scripts and whole-file cache invalidation. Pick per-language only when traffic justifies the maintenance.
Frequently asked questions
Can I subset CJK fonts with the JAD tool?
No — the Subsetter's presets stop at Greek and it has no CJK option, and CJK needs the GSUB/GPOS layout tables the in-browser tool drops. Subset CJK in CI with pyftsubset/hb-subset to a common set (JIS Level 1 for Japanese, GB2312 for Simplified Chinese, KS X 1001 for Korean) — that's where multi-MB fonts drop to ~1 MB. See the build-pipeline guide.
Which scripts can I subset directly in JAD?
The six presets cover Latin Basic + Latin-1, Latin Extended-A/B, Cyrillic, Greek, Vietnamese (Latin-Ext + đồng sign), and a Punctuation/Symbols set. For everything else — CJK, Arabic, Hebrew, Devanagari, Thai — use a CI engine. Confirm what the source font contains with the Character Coverage Map first.
Should I match Google Fonts' subset boundaries?
Yes. Google's unicode-range definitions are battle-tested across real-world traffic, and mirroring them means your self-hosted JAD subsets drop into the same CSS shape — you can swap away from the Google CDN with no CSS change. Generate the matching blocks with the Font-Face Generator.
How much smaller does the brand font get per visitor?
Typically 70–95% per request, because a visitor only downloads their script's subset rather than the full multi-script file. The exact figure depends on the source font's glyph count and which unicode-range files the page touches. CJK sees the largest absolute savings (megabytes), Latin the largest relative ones.
Does subsetting keep my variable weight axis?
No — the in-browser Subsetter outputs a single static style. Freeze each weight you ship with the Variable Font Freezer first, then subset each frozen weight. Plan the file matrix as weight × language.
What about emoji in a multilingual UI?
Most products rely on the OS emoji font rather than shipping their own, so emoji codepoints (U+1F300–1FAFF) don't need a brand subset. If you do ship branded emoji, treat it as one more unicode-range file — but note JAD's subsetter drops colour tables, so it can't carry colour emoji; that needs a colour-preserving pipeline.
How do I avoid tofu for user-generated names?
Subset to the wider script range (Latin-Ext rather than Latin) so accented names survive, and define a per-script system fallback chain so any codepoint outside your subsets renders in the OS font instead of a box. Plan coverage for the names your users actually have, not just your UI strings.
Why per-language files instead of one combined subset?
Granular cache invalidation and per-visitor savings. Editing the Cyrillic file only busts that file's hash — Latin and Greek visitors keep their cached copies — and each visitor downloads only their script. A combined file is simpler but ships every script to everyone and invalidates wholesale on any edit.
Do complex scripts like Arabic work after JAD subsetting?
No. Arabic and Indic scripts need GSUB/GPOS for joining and reordering, and the in-browser Subsetter drops those tables — the result would render disconnected, mis-shaped letters. Subset complex scripts only with a layout-preserving engine and verify shaping before deploy.
What fallback font should each script use?
Latin: system-ui, -apple-system, sans-serif. CJK: 'Noto Sans CJK', 'PingFang SC', 'Hiragino Sans', sans-serif. Scope them with :lang() selectors so the right system font catches any glyph outside your subsets. Build the stack with the System Font Stack Generator if you want OS-tuned defaults.
Can I automate the whole matrix in CI?
Yes — the per-language subset + compress + @font-face emission is a natural build step. The Latin/Cyrillic/Greek parts can use a layout-preserving engine alongside CJK so the whole matrix is one pipeline. The build-pipeline guide has working CI snippets and a size-budget gate.
How do I verify each page fetches only its language?
DevTools → Network → filter Font. Load an English page and confirm only the Latin WOFF2 loads; switch locale and confirm the right file loads on demand. If multiple language files load for one page, your unicode-range declarations overlap — make them disjoint.
Privacy first
Every JAD Font tool runs entirely in your browser using opentype.js and the wawoff2 WASM Brotli encoder. Your fonts never leave your device — verified by zero outbound network requests during processing.