Font Subsetting for Multilingual Design Systems

How to subset fonts for a multilingual design system

Step 1
Inventory locales and group by Unicode subset — List every locale the product targets and bucket them: Latin Basic, Latin Extended, Cyrillic, Greek (JAD presets), then CJK Simplified/Traditional, Korean, Arabic, Devanagari (handled elsewhere). Most products land on 5–8 distinct buckets.
Step 2
Freeze the brand weights you ship — Because the Subsetter outputs static styles, run [Variable Font Freezer](/font-tools/variable-font-freezer) first to pin each weight you use (e.g. 400/600/700). You'll subset each frozen weight separately — plan the matrix as weight × language.
Step 3
Subset each Latin-family/Cyrillic/Greek bucket in JAD — Run the [Font Subsetter](/font-tools/font-subsetter) once per applicable preset against each frozen weight: `brand-400-latin.ttf`, `brand-400-cyrillic.ttf`, and so on. Confirm coverage first with the [Character Coverage Map](/font-tools/character-coverage-map) so a preset you pick actually exists in the font.
Step 4
Handle CJK / Arabic / Indic with a CI engine — The six JAD presets don't include these scripts, and complex scripts need the layout tables JAD drops. Use `pyftsubset`/`hb-subset` for them — see the [build-pipeline guide](/font-tools/guides/automate-font-subsetting-build-pipeline). Subset CJK to a common-character set (JIS Level 1, GB2312, KS X 1001) rather than the full font.
Step 5
Compress and declare unicode-range per file — Push every subset TTF through [TTF→WOFF2](/font-tools/ttf-to-woff2), then emit one `@font-face` per (weight × language) with a disjoint `unicode-range`. Mirror Google Fonts' subset boundaries so you can swap self-hosted files in without CSS churn.
Step 6
Define fallback chains per script — Each `font-family` stack should fall back to a sensible system font for the script (`system-ui` for Latin, `'Noto Sans CJK'`/`sans-serif` for CJK). Visitors who type a codepoint outside your subsets then get a graceful system glyph instead of a tofu box.

Per-locale subset matrix — who handles what

The JAD Subsetter covers the Latin/Cyrillic/Greek scripts via its six presets. Scripts marked CI engine need pyftsubset/hb-subset because they fall outside the presets and rely on layout tables JAD drops.

Script bucket	Example locales	Subset with	Notes
Latin Basic	en, fr, de, es, it, pt	JAD: Latin preset	`U+0020–00FF`, ~190 codepoints
Latin Extended	pl, cs, tr, vi, cy	JAD: Latin-Ext / Vietnamese	Adds `U+0100–024F`, `U+1E00–1EFF`
Cyrillic	ru, uk, bg, sr	JAD: Cyrillic preset	`U+0400–052F`
Greek	el	JAD: Greek preset	`U+0370–03FF`, `U+1F00–1FFF`
CJK (Simplified / Traditional / Korean)	zh-CN, zh-TW, ja, ko	CI engine (`hb-subset`)	Subset to common set; full font is multi-MB
Arabic / Devanagari / Thai	ar, hi, th	CI engine (`pyftsubset`)	Need `GSUB`/`GPOS` shaping JAD drops

CJK common-character subsets — the big wins

CJK fonts dominate payload. Subsetting to a standard common-character set is the highest-leverage change. These need a layout-preserving CI engine, not the JAD in-browser tool.

Language	Common-character set	Approx. glyphs	Typical size drop
Japanese	JIS Level 1 (kanji) + kana	~3,000	5+ MB → ~1 MB
Simplified Chinese	GB2312	~6,700	8+ MB → ~1.5 MB
Traditional Chinese	Big5 common	~5,400	8+ MB → ~1.5 MB
Korean	KS X 1001 (modern hangul)	~2,350	4+ MB → ~900 KB

Cookbook

The patterns top global products use. The Latin/Cyrillic/Greek halves are the JAD tool; CJK/Arabic are flagged where a CI engine is required.

The weight × language matrix

Example

Three weights and four scripts is twelve files. Plan it as a grid so you don't forget that subsetting multiplies by weight — and freeze each weight first because the Subsetter is single-style.

Weights frozen (Variable Font Freezer): 400, 600, 700
Scripts: latin, latin-ext, cyrillic, greek

Font Subsetter runs = 3 × 4 = 12 files:
  brand-400-latin.ttf      brand-600-latin.ttf      brand-700-latin.ttf
  brand-400-latin-ext.ttf  brand-600-latin-ext.ttf  brand-700-latin-ext.ttf
  brand-400-cyrillic.ttf   ...                       ...
  brand-400-greek.ttf      ...                       ...
then TTF→WOFF2 on each.

Matching Google Fonts' subset boundaries

Example

Use Google's well-tested unicode-range boundaries so a self-hosted brand font drops into the same CSS shape — no rewrite when you migrate away from the Google CDN.

@font-face {
  font-family: 'Brand'; font-weight: 400;
  src: url('/f/brand-400-latin.woff2') format('woff2');
  unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+2000-206F;
}
@font-face {
  font-family: 'Brand'; font-weight: 400;
  src: url('/f/brand-400-cyrillic.woff2') format('woff2');
  unicode-range: U+0400-045F, U+0490-0491, U+04B0-04B1;
}

Per-script fallback chain

Example

When a visitor renders a codepoint outside your subsets, the system font for that script keeps text readable instead of showing tofu. Define the chain per script.

/* Latin UI */
--font-latin: 'Brand', system-ui, -apple-system, sans-serif;
/* CJK fallback while/if subset is unavailable */
--font-cjk: 'Brand', 'Noto Sans CJK SC', 'PingFang SC',
            'Hiragino Sans', sans-serif;

body { font-family: var(--font-latin); }
:lang(zh), :lang(ja), :lang(ko) { font-family: var(--font-cjk); }

CJK common-subset (CI engine, not JAD)

Example

The JAD presets stop at Greek, and CJK needs layout tables JAD drops — so this step belongs in the build pipeline. Subset to a common set, keep layout features.

# JAD Subsetter has no CJK preset → use hb-subset / pyftsubset.
pyftsubset BrandCJK.ttf \
  --unicodes-file=jis-level1.txt \
  --layout-features='*' \
  --flavor=woff2 \
  --output-file=brand-ja.woff2
# 5.4 MB → ~1.0 MB, kerning + ligatures preserved.

Granular cache invalidation in practice

Example

When the design team tweaks the Cyrillic glyphs, only that file's hash changes — Latin and Greek visitors keep their cached copies. This is why per-language files beat one combined file.

Before edit:  brand-400-cyrillic.a1b2c3.woff2
After edit:   brand-400-cyrillic.d4e5f6.woff2   ← new hash
              brand-400-latin.9f8e7d.woff2      ← UNCHANGED, still cached
              brand-400-greek.5c4b3a.woff2       ← UNCHANGED, still cached

One combined file would have invalidated for every visitor instead.

Edge cases and what actually happens

Every row below was probed against the live API. Some documented requirements (alphabetical axis order, numerical tuple order) are not actually enforced in practice — useful to know if you've been blaming the wrong thing for a 400.

CJK / Arabic / Devanagari aren't in the JAD presets

Out of scope

The Subsetter's dropdown is Latin, Latin-Ext, Cyrillic, Greek, Vietnamese, Symbols — no CJK or complex-script options, and the empty-subset guard will fire if you try Cyrillic on a CJK-only font. Handle these scripts with pyftsubset/hb-subset in CI (see the build-pipeline guide).

Arabic/Indic subset rendered but shaping broke

error

Complex scripts depend on GSUB/GPOS for joining and reordering — exactly the tables the in-browser Subsetter drops. A JAD-subset Arabic font would render disconnected, mis-ordered letters. Always subset complex scripts with a layout-preserving engine and verify shaping before shipping.

Brand font lost its weight axis across the matrix

Expected

The Subsetter outputs a single static style, so a variable brand font collapses to one weight per file. Freeze each weight you ship with the Variable Font Freezer before subsetting, and treat weight as a dimension of the matrix.

A localized name renders tofu for a real user

Coverage gap

Glyph-by-language subsets only cover the scripts you planned for. A user named José on a Cyrillic-only page, or a Vietnamese name on a Latin-Basic page, hits a missing glyph. For user-generated text, subset to the wider script range (Latin-Ext, not Latin) or lean on the system fallback chain.

Kerning/ligatures dropped on display headings

Expected

For body and UI text the dropped kern/GSUB is invisible, but a hero headline in a script font can look loose or lose its fi/fl joins. Subset display faces with a layout-preserving CI engine, or keep the full font for the few headline glyphs and JAD-subset only body weights.

Vietnamese locale needs more than the Latin preset

Coverage gap

Vietnamese uses Latin plus stacked diacritics in U+1E00–1EFF. The JAD Vietnamese preset covers this (Latin-Ext ranges + đồng sign); the plain Latin preset does not. Pick Vietnamese (or Latin-Ext) for vi locales or tone marks show as tofu.

Free-tier file limit blocks the source font

413

Multi-script brand fonts often exceed the 5 MB free limit before you even subset. Upgrade to Pro (50 MB), or pre-strip the source with Hinting Stripper / Colour Table Remover to get under the cap, then subset per language.

Overlapping ranges download two language files for one page

Wasted bytes

If your Latin and Latin-Ext @font-face blocks both claim basic Latin, an English page fetches both. Keep each block's unicode-range disjoint and aligned to the subset it contains — the matrix only saves bytes when the ranges don't overlap.

Combined file chosen over per-language for simplicity

By design tradeoff

For a product with few locales and modest traffic, one Latin+Cyrillic+Greek combined subset is simpler to ship and still far smaller than the full font — at the cost of every visitor downloading all three scripts and whole-file cache invalidation. Pick per-language only when traffic justifies the maintenance.

Frequently asked questions

Can I subset CJK fonts with the JAD tool?

No — the Subsetter's presets stop at Greek and it has no CJK option, and CJK needs the GSUB/GPOS layout tables the in-browser tool drops. Subset CJK in CI with pyftsubset/hb-subset to a common set (JIS Level 1 for Japanese, GB2312 for Simplified Chinese, KS X 1001 for Korean) — that's where multi-MB fonts drop to ~1 MB. See the build-pipeline guide.

Which scripts can I subset directly in JAD?

The six presets cover Latin Basic + Latin-1, Latin Extended-A/B, Cyrillic, Greek, Vietnamese (Latin-Ext + đồng sign), and a Punctuation/Symbols set. For everything else — CJK, Arabic, Hebrew, Devanagari, Thai — use a CI engine. Confirm what the source font contains with the Character Coverage Map first.

Should I match Google Fonts' subset boundaries?

Yes. Google's unicode-range definitions are battle-tested across real-world traffic, and mirroring them means your self-hosted JAD subsets drop into the same CSS shape — you can swap away from the Google CDN with no CSS change. Generate the matching blocks with the Font-Face Generator.

How much smaller does the brand font get per visitor?

Typically 70–95% per request, because a visitor only downloads their script's subset rather than the full multi-script file. The exact figure depends on the source font's glyph count and which unicode-range files the page touches. CJK sees the largest absolute savings (megabytes), Latin the largest relative ones.

Does subsetting keep my variable weight axis?

No — the in-browser Subsetter outputs a single static style. Freeze each weight you ship with the Variable Font Freezer first, then subset each frozen weight. Plan the file matrix as weight × language.

What about emoji in a multilingual UI?

Most products rely on the OS emoji font rather than shipping their own, so emoji codepoints (U+1F300–1FAFF) don't need a brand subset. If you do ship branded emoji, treat it as one more unicode-range file — but note JAD's subsetter drops colour tables, so it can't carry colour emoji; that needs a colour-preserving pipeline.

How do I avoid tofu for user-generated names?

Subset to the wider script range (Latin-Ext rather than Latin) so accented names survive, and define a per-script system fallback chain so any codepoint outside your subsets renders in the OS font instead of a box. Plan coverage for the names your users actually have, not just your UI strings.

Why per-language files instead of one combined subset?

Granular cache invalidation and per-visitor savings. Editing the Cyrillic file only busts that file's hash — Latin and Greek visitors keep their cached copies — and each visitor downloads only their script. A combined file is simpler but ships every script to everyone and invalidates wholesale on any edit.

Do complex scripts like Arabic work after JAD subsetting?

No. Arabic and Indic scripts need GSUB/GPOS for joining and reordering, and the in-browser Subsetter drops those tables — the result would render disconnected, mis-shaped letters. Subset complex scripts only with a layout-preserving engine and verify shaping before deploy.

What fallback font should each script use?

Latin: system-ui, -apple-system, sans-serif. CJK: 'Noto Sans CJK', 'PingFang SC', 'Hiragino Sans', sans-serif. Scope them with :lang() selectors so the right system font catches any glyph outside your subsets. Build the stack with the System Font Stack Generator if you want OS-tuned defaults.

Can I automate the whole matrix in CI?

Yes — the per-language subset + compress + @font-face emission is a natural build step. The Latin/Cyrillic/Greek parts can use a layout-preserving engine alongside CJK so the whole matrix is one pipeline. The build-pipeline guide has working CI snippets and a size-budget gate.

How do I verify each page fetches only its language?

DevTools → Network → filter Font. Load an English page and confirm only the Latin WOFF2 loads; switch locale and confirm the right file loads on demand. If multiple language files load for one page, your unicode-range declarations overlap — make them disjoint.

Privacy first

Every JAD Font tool runs entirely in your browser using opentype.js and the wawoff2 WASM Brotli encoder. Your fonts never leave your device — verified by zero outbound network requests during processing.

How to subset fonts for a multilingual design system

Step 1
Inventory locales and group by Unicode subset — List every locale the product targets and bucket them: Latin Basic, Latin Extended, Cyrillic, Greek (JAD presets), then CJK Simplified/Traditional, Korean, Arabic, Devanagari (handled elsewhere). Most products land on 5–8 distinct buckets.
Step 2
Freeze the brand weights you ship — Because the Subsetter outputs static styles, run [Variable Font Freezer](/font-tools/variable-font-freezer) first to pin each weight you use (e.g. 400/600/700). You'll subset each frozen weight separately — plan the matrix as weight × language.
Step 3
Subset each Latin-family/Cyrillic/Greek bucket in JAD — Run the [Font Subsetter](/font-tools/font-subsetter) once per applicable preset against each frozen weight: `brand-400-latin.ttf`, `brand-400-cyrillic.ttf`, and so on. Confirm coverage first with the [Character Coverage Map](/font-tools/character-coverage-map) so a preset you pick actually exists in the font.
Step 4
Handle CJK / Arabic / Indic with a CI engine — The six JAD presets don't include these scripts, and complex scripts need the layout tables JAD drops. Use `pyftsubset`/`hb-subset` for them — see the [build-pipeline guide](/font-tools/guides/automate-font-subsetting-build-pipeline). Subset CJK to a common-character set (JIS Level 1, GB2312, KS X 1001) rather than the full font.
Step 5
Compress and declare unicode-range per file — Push every subset TTF through [TTF→WOFF2](/font-tools/ttf-to-woff2), then emit one `@font-face` per (weight × language) with a disjoint `unicode-range`. Mirror Google Fonts' subset boundaries so you can swap self-hosted files in without CSS churn.
Step 6
Define fallback chains per script — Each `font-family` stack should fall back to a sensible system font for the script (`system-ui` for Latin, `'Noto Sans CJK'`/`sans-serif` for CJK). Visitors who type a codepoint outside your subsets then get a graceful system glyph instead of a tofu box.

Per-locale subset matrix — who handles what

Script bucket	Example locales	Subset with	Notes
Latin Basic	en, fr, de, es, it, pt	JAD: Latin preset	`U+0020–00FF`, ~190 codepoints
Latin Extended	pl, cs, tr, vi, cy	JAD: Latin-Ext / Vietnamese	Adds `U+0100–024F`, `U+1E00–1EFF`
Cyrillic	ru, uk, bg, sr	JAD: Cyrillic preset	`U+0400–052F`
Greek	el	JAD: Greek preset	`U+0370–03FF`, `U+1F00–1FFF`
CJK (Simplified / Traditional / Korean)	zh-CN, zh-TW, ja, ko	CI engine (`hb-subset`)	Subset to common set; full font is multi-MB
Arabic / Devanagari / Thai	ar, hi, th	CI engine (`pyftsubset`)	Need `GSUB`/`GPOS` shaping JAD drops

CJK common-character subsets — the big wins

CJK fonts dominate payload. Subsetting to a standard common-character set is the highest-leverage change. These need a layout-preserving CI engine, not the JAD in-browser tool.

Language	Common-character set	Approx. glyphs	Typical size drop
Japanese	JIS Level 1 (kanji) + kana	~3,000	5+ MB → ~1 MB
Simplified Chinese	GB2312	~6,700	8+ MB → ~1.5 MB
Traditional Chinese	Big5 common	~5,400	8+ MB → ~1.5 MB
Korean	KS X 1001 (modern hangul)	~2,350	4+ MB → ~900 KB

Cookbook

The patterns top global products use. The Latin/Cyrillic/Greek halves are the JAD tool; CJK/Arabic are flagged where a CI engine is required.

The weight × language matrix

Example

Three weights and four scripts is twelve files. Plan it as a grid so you don't forget that subsetting multiplies by weight — and freeze each weight first because the Subsetter is single-style.

Weights frozen (Variable Font Freezer): 400, 600, 700
Scripts: latin, latin-ext, cyrillic, greek

Font Subsetter runs = 3 × 4 = 12 files:
  brand-400-latin.ttf      brand-600-latin.ttf      brand-700-latin.ttf
  brand-400-latin-ext.ttf  brand-600-latin-ext.ttf  brand-700-latin-ext.ttf
  brand-400-cyrillic.ttf   ...                       ...
  brand-400-greek.ttf      ...                       ...
then TTF→WOFF2 on each.

Matching Google Fonts' subset boundaries

Example

Use Google's well-tested unicode-range boundaries so a self-hosted brand font drops into the same CSS shape — no rewrite when you migrate away from the Google CDN.

@font-face {
  font-family: 'Brand'; font-weight: 400;
  src: url('/f/brand-400-latin.woff2') format('woff2');
  unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+2000-206F;
}
@font-face {
  font-family: 'Brand'; font-weight: 400;
  src: url('/f/brand-400-cyrillic.woff2') format('woff2');
  unicode-range: U+0400-045F, U+0490-0491, U+04B0-04B1;
}

Per-script fallback chain

Example

When a visitor renders a codepoint outside your subsets, the system font for that script keeps text readable instead of showing tofu. Define the chain per script.

/* Latin UI */
--font-latin: 'Brand', system-ui, -apple-system, sans-serif;
/* CJK fallback while/if subset is unavailable */
--font-cjk: 'Brand', 'Noto Sans CJK SC', 'PingFang SC',
            'Hiragino Sans', sans-serif;

body { font-family: var(--font-latin); }
:lang(zh), :lang(ja), :lang(ko) { font-family: var(--font-cjk); }

CJK common-subset (CI engine, not JAD)

Example

The JAD presets stop at Greek, and CJK needs layout tables JAD drops — so this step belongs in the build pipeline. Subset to a common set, keep layout features.

# JAD Subsetter has no CJK preset → use hb-subset / pyftsubset.
pyftsubset BrandCJK.ttf \
  --unicodes-file=jis-level1.txt \
  --layout-features='*' \
  --flavor=woff2 \
  --output-file=brand-ja.woff2
# 5.4 MB → ~1.0 MB, kerning + ligatures preserved.

Granular cache invalidation in practice

Example

When the design team tweaks the Cyrillic glyphs, only that file's hash changes — Latin and Greek visitors keep their cached copies. This is why per-language files beat one combined file.

Before edit:  brand-400-cyrillic.a1b2c3.woff2
After edit:   brand-400-cyrillic.d4e5f6.woff2   ← new hash
              brand-400-latin.9f8e7d.woff2      ← UNCHANGED, still cached
              brand-400-greek.5c4b3a.woff2       ← UNCHANGED, still cached

One combined file would have invalidated for every visitor instead.

Subset Fonts for a Multilingual Design System

How to subset fonts for a multilingual design system

Per-locale subset matrix — who handles what

CJK common-character subsets — the big wins

Cookbook

The weight × language matrix

Matching Google Fonts' subset boundaries

Per-script fallback chain

CJK common-subset (CI engine, not JAD)

Granular cache invalidation in practice

Edge cases and what actually happens

CJK / Arabic / Devanagari aren't in the JAD presets

Arabic/Indic subset rendered but shaping broke

Brand font lost its weight axis across the matrix

A localized name renders tofu for a real user

Kerning/ligatures dropped on display headings

Vietnamese locale needs more than the Latin preset

Free-tier file limit blocks the source font

Overlapping ranges download two language files for one page

Combined file chosen over per-language for simplicity

Frequently asked questions

Can I subset CJK fonts with the JAD tool?

Which scripts can I subset directly in JAD?

Should I match Google Fonts' subset boundaries?

How much smaller does the brand font get per visitor?

Does subsetting keep my variable weight axis?

What about emoji in a multilingual UI?

How do I avoid tofu for user-generated names?

Why per-language files instead of one combined subset?

Do complex scripts like Arabic work after JAD subsetting?

What fallback font should each script use?

Can I automate the whole matrix in CI?

How do I verify each page fetches only its language?

Privacy first

Related guides

Subset Fonts for a Multilingual Design System

How to subset fonts for a multilingual design system

Per-locale subset matrix — who handles what

CJK common-character subsets — the big wins

Cookbook

The weight × language matrix

Matching Google Fonts' subset boundaries

Per-script fallback chain

CJK common-subset (CI engine, not JAD)

Granular cache invalidation in practice

Edge cases and what actually happens

CJK / Arabic / Devanagari aren't in the JAD presets

Arabic/Indic subset rendered but shaping broke

Brand font lost its weight axis across the matrix

A localized name renders tofu for a real user

Kerning/ligatures dropped on display headings

Vietnamese locale needs more than the Latin preset

Free-tier file limit blocks the source font

Overlapping ranges download two language files for one page

Combined file chosen over per-language for simplicity

Frequently asked questions

Can I subset CJK fonts with the JAD tool?

Which scripts can I subset directly in JAD?

Should I match Google Fonts' subset boundaries?

How much smaller does the brand font get per visitor?

Does subsetting keep my variable weight axis?

What about emoji in a multilingual UI?

How do I avoid tofu for user-generated names?

Why per-language files instead of one combined subset?

Do complex scripts like Arabic work after JAD subsetting?

What fallback font should each script use?

Can I automate the whole matrix in CI?

How do I verify each page fetches only its language?

Privacy first

Related guides