Glyph Count & CJK Fonts — Why the Analyser's Six Subsets Don't Cover Chinese/Japanese/Korean

How to cjk font glyph count considerations

Step 1
Know what the analyser can and can't tell you for CJK — It will report `total_glyphs` (real and useful) and `current_woff2_estimate_bytes`. Its six subset projections do NOT include Han, Kana, or Hangul, so ignore the per-subset rows for a CJK font — they only count any Latin/symbol codepoints that happen to be present.
Step 2
Check the file against the tier size limit first — A full Noto Sans CJK weight is ~5 MB WOFF2 and will hit the free tier's 5 MB limit, refused with "exceeds the {tier} tier per-job limit". Use a paid tier (50 MB / 1 GB) or pre-subset the font before analysing the full file.
Step 3
Read total_glyphs to make the subsetting case — A `total_glyphs` in the tens of thousands is the evidence that shipping the full font is a non-starter on the web. That single number is what justifies the common-character subsetting work to your team.
Step 4
Pick a common-character set for your language — Japanese: JIS Level 1 (~3,000 kanji). Simplified Chinese: GB2312 (~6,800) or the larger GB18030. Korean: KS X 1001 (~2,300 hangul). These cover the overwhelming majority of running text. The analyser won't build these — you specify the codepoints elsewhere.
Step 5
Build the real subset and measure it — Because the analyser has no CJK subset projection, build the actual common-character subset with [the font subsetter](/font-tools/font-subsetter) and read its real output size, or re-run the analyser on the subset WOFF2 to read the reduced `total_glyphs` and real baseline.
Step 6
Ship common + rare with unicode-range — Serve the common-character subset as the primary `@font-face` and the rare-character remainder as additional `unicode-range` blocks, so most page loads fetch only the common subset and rare characters load lazily. Wire it with [the @font-face generator](/font-tools/font-face-generator).

What the analyser reports for a CJK font

Which output fields are meaningful for CJK and which to ignore. The six subsets are Latin-script-centric; CJK lives in blocks none of them cover.

Output	Meaningful for CJK?	Why
`total_glyphs`	Yes — the key number	Real count via `font.glyphs.length`; tens of thousands for full CJK, the case for subsetting
`current_woff2_estimate_bytes`	Yes for WOFF2 input	Real `file.size` for WOFF2; a `sfnt × 0.55` estimate for TTF/OTF (less reliable at CJK scale)
`projections[]` (Latin, Cyrillic…)	No	None of the six subsets cover Han/Kana/Hangul, so the rows only count incidental Latin/symbol glyphs
`glyph_count` per subset	No	Counts codepoints inside Latin-script ranges; near-zero relevance for CJK text
`savings_pct` per subset	Misleading	Shows huge savings because the font barely touches the Latin subsets — not a real CJK subset estimate

CJK common-character sets and rough sizes

Common-character standards used to subset CJK for the web, with approximate counts and post-WOFF2 sizes. These are not analyser projections — they come from real subset builds and vary by font. Build and measure your own.

Language	Common set	Approx chars	Rough WOFF2 (one weight)
Japanese	JIS Level 1	~3,000 kanji (+ kana)	~1–1.5 MB
Simplified Chinese	GB2312	~6,800	~1.5–2 MB
Simplified Chinese (full)	GB18030	27,000+	Most of the full font
Traditional Chinese	Big5	13,000+	~3 MB+
Korean	KS X 1001	~2,300 hangul	~0.8–1 MB

Cookbook

How to read the analyser correctly for CJK and where to go for the numbers it can't give you. Sizes below come from real subset builds, not the analyser's Latin-centric projections. For the general projection mechanics, see the subset savings calculator guide.

Reading a full CJK font correctly

Example

Analyse a Noto Sans JP weight and the only line that matters is total_glyphs. The projections all show implausibly high savings because the font barely intersects the Latin subsets — that's not a real estimate, it's the tool's six subsets simply not covering CJK.

Output (abridged) for NotoSansJP-Regular.woff2:
{
  "total_glyphs": 17800,            <- THE useful number
  "current_woff2_estimate_bytes": 4900000,  <- real (WOFF2 input)
  "projections": [
    { "subset": "Latin Basic + Latin-1 Supplement",
      "glyph_count": 95, "savings_pct": 99 }  <- IGNORE for CJK
  ]
}

Read: 17,800 glyphs, ~4.9 MB. Subsetting is mandatory.
Ignore: every projections[] row.

The full font hits the free-tier limit

Example

A 5 MB Noto Sans CJK weight is refused on the free tier before it's even parsed. Either upgrade the tier or pre-subset to a common-character set and analyse that smaller file.

Upload: NotoSansSC-Regular.woff2 (5.0 MB), free tier (5 MB cap)
Result: "exceeds the free tier per-job limit (5 MB). Upgrade..."

Options:
  - Use the Pro tier (50 MB) to analyse the full file, OR
  - Subset to GB2312 first (font-subsetter), then analyse the
    ~1.8 MB result on any tier.

Measuring a real common-character subset

Example

Because the analyser can't project a CJK subset, build the GB2312 subset for real and read its size. Re-running the analyser on the subset gives you the reduced glyph total and a real baseline.

1. font-subsetter: keep GB2312 codepoints (~6,800)
   -> NotoSansSC-GB2312.woff2  (real bytes, e.g. 1.8 MB)
2. glyph-count-analyzer on the subset:
   total_glyphs: ~6900 (down from ~28,000)
   current_woff2_estimate_bytes: 1800000 (real, WOFF2 input)

The analyser confirmed the subset took; the size is the real one,
not a projection.

Common + rare with unicode-range

Example

Ship the common-character subset as the primary face and the rare remainder behind unicode-range so most loads fetch only the common subset. The rare-character WOFF2 downloads only when content needs it.

@font-face { font-family: "JP"; src: url(/f/jp-jis1.woff2) format("woff2");
  unicode-range: U+3000-30FF, U+4E00-9FFF; } /* common */
@font-face { font-family: "JP"; src: url(/f/jp-rare.woff2) format("woff2");
  unicode-range: U+3400-4DBF, U+20000-2A6DF; } /* rare/ext */

Most pages fetch only jp-jis1.woff2 (~1.3 MB);
rare CJK Ext-A/B load lazily as content requires.

Simplified vs Traditional are different fonts

Example

Don't subset a Simplified font and expect Traditional content to render, or vice versa. The analyser's total_glyphs won't warn you about script mismatch — that's a coverage and content question, not a count.

Simplified content  -> Simplified font (GB2312 / GB18030 subset)
Traditional content -> Traditional font (Big5 subset)

Mixing them = tofu boxes where the chosen subset lacks the
characters. Confirm the right script with the coverage map
before sizing with the analyser.

Edge cases and what actually happens

Every row below was probed against the live API. Some documented requirements (alphabetical axis order, numerical tuple order) are not actually enforced in practice — useful to know if you've been blaming the wrong thing for a 400.

The six subset projections are meaningless for CJK

Out of scope (no CJK subset)

The analyser only projects Latin, Latin Extended, Cyrillic, Greek, Vietnamese, and a symbols block. There is no Han, Kana, or Hangul subset. For a CJK font the per-subset rows only count whatever incidental Latin and symbol codepoints it includes, so every projection shows huge "savings" and a tiny size — not because the font is small, but because it barely touches those Latin subsets. Read total_glyphs; ignore the projection rows entirely.

Full CJK font exceeds the free-tier 5 MB limit

413-style reject

A full Noto Sans CJK weight is ~5 MB as WOFF2 and routinely trips the free tier's 5 MB per-job limit, refused before parsing with "exceeds the free tier per-job limit". Use the Pro tier (50 MB) or Developer tier (1 GB) to analyse the full file, or pre-subset to a common-character set first and analyse the smaller result on any tier.

TTF baseline is unreliable at CJK scale

Expected (estimate)

For a TTF/OTF CJK font the baseline is sfnt × 0.55, and at CJK scale that assumption is shakier — CJK glyph data and the large cmap/glyf tables don't always compress at 0.55×. Analyse the actual WOFF2 to anchor current_woff2_estimate_bytes to a real size. Note that an uncompressed CJK TTF can be 15–25 MB, exceeding even the Pro 50 MB limit only rarely but worth checking.

total_glyphs counts more than typable characters

By design

As with any font, CJK total_glyphs includes composites, alternates, and vertical-writing variants (vert/vrt2 glyphs) on top of the base characters. So a font advertising ~7,000 characters can report well over that. The count is still the right signal that the font is too big to ship whole — just don't equate it one-to-one with character coverage.

Simplified vs Traditional mismatch isn't caught by count

Use the coverage map

A Simplified-Chinese subset will render mostly tofu for Traditional content and vice versa — different character sets. The analyser's total_glyphs says nothing about which script the glyphs belong to. Confirm the font matches your content's script with the character coverage map (which scores Han blocks across 346 blocks) before committing to a subset.

Variable CJK font has the same glyph total per weight

Expected

A variable CJK font carries the full glyph set once and varies weight via axes, so total_glyphs matches a single static instance — but the file is far larger than one static. For the web you almost always still subset by common characters; the variable file replaces multiple static weights of that subset. Freeze the axes you need with the variable font freezer, subset, then analyse.

Lazy-loading rare characters depends on unicode-range, not the tool

Out of scope (CSS pattern)

The analyser sizes and counts; it doesn't generate the @font-face/unicode-range blocks that make rare-character lazy loading work. That graceful-fallback pattern — common subset primary, rare subset behind a unicode-range — is implemented in CSS. Build the subsets first, then wire the blocks with the @font-face generator; the browser fetches the rare WOFF2 only when content uses those codepoints.

Pre-2025 expectations about subset size are off

Re-measure

WOFF2 sizes for CJK common-character sets shift as foundries re-cut fonts and as Brotli tuning improves; rough figures (JIS1 ~1–1.5 MB, GB2312 ~1.5–2 MB) are guidance, not guarantees. Always build the real subset for your specific font and weight and measure it — re-running the analyser on the subset WOFF2 gives the real number rather than a stale estimate.

Frequently asked questions

Can the analyser project the size of a Chinese/Japanese/Korean subset?

No. Its six fixed subsets are Latin, Latin Extended, Cyrillic, Greek, Vietnamese, and symbols — none cover Han, Kana, or Hangul. For a CJK font the projection rows only count incidental Latin/symbol codepoints and show meaningless "savings". To size a CJK subset, build the real common-character subset with the font subsetter and measure its output, or re-run the analyser on that subset to read the real baseline.

What does the analyser actually tell me about a CJK font, then?

Two useful things: the real total_glyphs (tens of thousands for a full CJK font — the number that proves subsetting is mandatory) and, for WOFF2 input, the real baseline size. Everything in the per-subset projections[] array is Latin-centric and should be ignored for CJK. Use the headline glyph count to make the case, then size the actual subset elsewhere.

Why does my CJK font get rejected on upload?

Almost certainly the tier file-size limit. A full Noto Sans CJK weight is ~5 MB as WOFF2 and trips the free tier's 5 MB per-job cap, refused before parsing. Use the Pro tier (50 MB) or Developer tier (1 GB), or subset to a common-character set first and analyse the smaller file — a GB2312 or JIS Level 1 subset is well under the free limit.

How many glyphs are in a typical CJK font?

A full pan-CJK font like Noto Sans CJK carries well over 40,000 glyphs across all four scripts; a single-language cut (JP, SC, TC, KR) still runs to roughly 16,000–28,000. The analyser's total_glyphs will confirm the exact number for your file. Common-character subsets bring the typable set down to ~2,300–6,800 depending on language.

Which common-character set should I subset to?

Japanese → JIS Level 1 (~3,000 kanji plus kana); Simplified Chinese → GB2312 (~6,800) or the larger GB18030 for full coverage; Traditional Chinese → Big5 (13,000+); Korean → KS X 1001 (~2,300 hangul). These cover the overwhelming majority of running text. The analyser doesn't build them — specify the codepoints in the font subsetter or the character whitelist builder.

Can I lazy-load the rare CJK characters?

Effectively yes, via unicode-range. Ship the common-character subset as the primary @font-face and the rare-character remainder as additional unicode-range blocks; the browser fetches the rare WOFF2 only when a page actually contains those codepoints. Most page loads then download just the common subset (~1–2 MB). Generate the blocks with the @font-face generator.

What's a realistic CJK font budget for a web page?

Roughly 300 KB–2 MB depending on audience and network. A common-character subset (JIS1 / GB2312 / KS X 1001) plus unicode-range lazy-loading of the long tail keeps the typical load near the lower end. Build the subset and measure it — the analyser's per-subset projections won't size CJK for you, but re-running it on the built subset confirms the real bytes.

Is Simplified the same as Traditional for subsetting?

No — they're different character sets and usually different fonts. A Simplified subset renders Traditional content as tofu and vice versa. The analyser's glyph count won't flag the mismatch because it counts glyphs, not which script they serve. Confirm the font matches your content's script with the character coverage map before subsetting.

Should I analyse the TTF or the WOFF2 for a CJK font?

The WOFF2, so current_woff2_estimate_bytes is a real measured size. For a TTF/OTF the baseline is sfnt × 0.55, and that compression assumption is less reliable at CJK scale where the glyf/cmap tables are huge. Also note an uncompressed CJK TTF can be 15–25 MB, so check it against your tier's size limit before uploading.

Do variable CJK fonts change the glyph count?

No — a variable CJK font stores the glyph set once and varies weight via axes, so total_glyphs equals a single static instance. The file is much larger than one static, though. You still subset by common characters for the web; the variable file just replaces multiple static weights of that subset. Freeze axes with the variable font freezer first if you only need a few weights.

Are the CJK subset sizes in this guide exact?

No — they're rough guidance from real builds (JIS1 ~1–1.5 MB, GB2312 ~1.5–2 MB, KS X 1001 ~0.8–1 MB) and they vary by font, weight, and Brotli settings. Always build the actual subset for your specific font and measure it. Re-running the analyser on the subset WOFF2 gives the real total_glyphs and real baseline rather than a stale estimate.

Is my CJK font uploaded when I analyse it?

No. Parsing runs entirely in your browser via WebAssembly; the font bytes never leave the page, even for large CJK files (subject to the tier size limit). Only an anonymous processed-file counter is recorded for dashboard stats, with no content, and it's opt-out. You can analyse confidential or licensed CJK fonts locally.

Privacy first

Every JAD Font tool runs entirely in your browser using opentype.js and the wawoff2 WASM Brotli encoder. Your fonts never leave your device — verified by zero outbound network requests during processing.

How to cjk font glyph count considerations

Step 1
Know what the analyser can and can't tell you for CJK — It will report `total_glyphs` (real and useful) and `current_woff2_estimate_bytes`. Its six subset projections do NOT include Han, Kana, or Hangul, so ignore the per-subset rows for a CJK font — they only count any Latin/symbol codepoints that happen to be present.
Step 2
Check the file against the tier size limit first — A full Noto Sans CJK weight is ~5 MB WOFF2 and will hit the free tier's 5 MB limit, refused with "exceeds the {tier} tier per-job limit". Use a paid tier (50 MB / 1 GB) or pre-subset the font before analysing the full file.
Step 3
Read total_glyphs to make the subsetting case — A `total_glyphs` in the tens of thousands is the evidence that shipping the full font is a non-starter on the web. That single number is what justifies the common-character subsetting work to your team.
Step 4
Pick a common-character set for your language — Japanese: JIS Level 1 (~3,000 kanji). Simplified Chinese: GB2312 (~6,800) or the larger GB18030. Korean: KS X 1001 (~2,300 hangul). These cover the overwhelming majority of running text. The analyser won't build these — you specify the codepoints elsewhere.
Step 5
Build the real subset and measure it — Because the analyser has no CJK subset projection, build the actual common-character subset with [the font subsetter](/font-tools/font-subsetter) and read its real output size, or re-run the analyser on the subset WOFF2 to read the reduced `total_glyphs` and real baseline.
Step 6
Ship common + rare with unicode-range — Serve the common-character subset as the primary `@font-face` and the rare-character remainder as additional `unicode-range` blocks, so most page loads fetch only the common subset and rare characters load lazily. Wire it with [the @font-face generator](/font-tools/font-face-generator).

What the analyser reports for a CJK font

Which output fields are meaningful for CJK and which to ignore. The six subsets are Latin-script-centric; CJK lives in blocks none of them cover.

Output	Meaningful for CJK?	Why
`total_glyphs`	Yes — the key number	Real count via `font.glyphs.length`; tens of thousands for full CJK, the case for subsetting
`current_woff2_estimate_bytes`	Yes for WOFF2 input	Real `file.size` for WOFF2; a `sfnt × 0.55` estimate for TTF/OTF (less reliable at CJK scale)
`projections[]` (Latin, Cyrillic…)	No	None of the six subsets cover Han/Kana/Hangul, so the rows only count incidental Latin/symbol glyphs
`glyph_count` per subset	No	Counts codepoints inside Latin-script ranges; near-zero relevance for CJK text
`savings_pct` per subset	Misleading	Shows huge savings because the font barely touches the Latin subsets — not a real CJK subset estimate

CJK common-character sets and rough sizes

Language	Common set	Approx chars	Rough WOFF2 (one weight)
Japanese	JIS Level 1	~3,000 kanji (+ kana)	~1–1.5 MB
Simplified Chinese	GB2312	~6,800	~1.5–2 MB
Simplified Chinese (full)	GB18030	27,000+	Most of the full font
Traditional Chinese	Big5	13,000+	~3 MB+
Korean	KS X 1001	~2,300 hangul	~0.8–1 MB

Cookbook

Reading a full CJK font correctly

Example

Output (abridged) for NotoSansJP-Regular.woff2:
{
  "total_glyphs": 17800,            <- THE useful number
  "current_woff2_estimate_bytes": 4900000,  <- real (WOFF2 input)
  "projections": [
    { "subset": "Latin Basic + Latin-1 Supplement",
      "glyph_count": 95, "savings_pct": 99 }  <- IGNORE for CJK
  ]
}

Read: 17,800 glyphs, ~4.9 MB. Subsetting is mandatory.
Ignore: every projections[] row.

The full font hits the free-tier limit

Example

A 5 MB Noto Sans CJK weight is refused on the free tier before it's even parsed. Either upgrade the tier or pre-subset to a common-character set and analyse that smaller file.

Upload: NotoSansSC-Regular.woff2 (5.0 MB), free tier (5 MB cap)
Result: "exceeds the free tier per-job limit (5 MB). Upgrade..."

Options:
  - Use the Pro tier (50 MB) to analyse the full file, OR
  - Subset to GB2312 first (font-subsetter), then analyse the
    ~1.8 MB result on any tier.

Measuring a real common-character subset

Example

Because the analyser can't project a CJK subset, build the GB2312 subset for real and read its size. Re-running the analyser on the subset gives you the reduced glyph total and a real baseline.

1. font-subsetter: keep GB2312 codepoints (~6,800)
   -> NotoSansSC-GB2312.woff2  (real bytes, e.g. 1.8 MB)
2. glyph-count-analyzer on the subset:
   total_glyphs: ~6900 (down from ~28,000)
   current_woff2_estimate_bytes: 1800000 (real, WOFF2 input)

The analyser confirmed the subset took; the size is the real one,
not a projection.

Common + rare with unicode-range

Example

@font-face { font-family: "JP"; src: url(/f/jp-jis1.woff2) format("woff2");
  unicode-range: U+3000-30FF, U+4E00-9FFF; } /* common */
@font-face { font-family: "JP"; src: url(/f/jp-rare.woff2) format("woff2");
  unicode-range: U+3400-4DBF, U+20000-2A6DF; } /* rare/ext */

Most pages fetch only jp-jis1.woff2 (~1.3 MB);
rare CJK Ext-A/B load lazily as content requires.

Simplified vs Traditional are different fonts

Example

Simplified content  -> Simplified font (GB2312 / GB18030 subset)
Traditional content -> Traditional font (Big5 subset)

Mixing them = tofu boxes where the chosen subset lacks the
characters. Confirm the right script with the coverage map
before sizing with the analyser.

CJK Font Glyph Count Considerations

How to cjk font glyph count considerations

What the analyser reports for a CJK font

CJK common-character sets and rough sizes

Cookbook

Reading a full CJK font correctly

The full font hits the free-tier limit

Measuring a real common-character subset

Common + rare with unicode-range

Simplified vs Traditional are different fonts

Edge cases and what actually happens

The six subset projections are meaningless for CJK

Full CJK font exceeds the free-tier 5 MB limit

TTF baseline is unreliable at CJK scale

total_glyphs counts more than typable characters

Simplified vs Traditional mismatch isn't caught by count

Variable CJK font has the same glyph total per weight

Lazy-loading rare characters depends on unicode-range, not the tool

Pre-2025 expectations about subset size are off

Frequently asked questions

Can the analyser project the size of a Chinese/Japanese/Korean subset?

What does the analyser actually tell me about a CJK font, then?

Why does my CJK font get rejected on upload?

How many glyphs are in a typical CJK font?

Which common-character set should I subset to?

Can I lazy-load the rare CJK characters?

What's a realistic CJK font budget for a web page?

Is Simplified the same as Traditional for subsetting?

Should I analyse the TTF or the WOFF2 for a CJK font?

Do variable CJK fonts change the glyph count?

Are the CJK subset sizes in this guide exact?

Is my CJK font uploaded when I analyse it?

Privacy first

Related guides

CJK Font Glyph Count Considerations

How to cjk font glyph count considerations

What the analyser reports for a CJK font

CJK common-character sets and rough sizes

Cookbook

Reading a full CJK font correctly

The full font hits the free-tier limit

Measuring a real common-character subset

Common + rare with unicode-range

Simplified vs Traditional are different fonts

Edge cases and what actually happens

The six subset projections are meaningless for CJK

Full CJK font exceeds the free-tier 5 MB limit

TTF baseline is unreliable at CJK scale

total_glyphs counts more than typable characters

Simplified vs Traditional mismatch isn't caught by count

Variable CJK font has the same glyph total per weight

Lazy-loading rare characters depends on unicode-range, not the tool

Pre-2025 expectations about subset size are off

Frequently asked questions

Can the analyser project the size of a Chinese/Japanese/Korean subset?

What does the analyser actually tell me about a CJK font, then?

Why does my CJK font get rejected on upload?

How many glyphs are in a typical CJK font?

Which common-character set should I subset to?

Can I lazy-load the rare CJK characters?

What's a realistic CJK font budget for a web page?

Is Simplified the same as Traditional for subsetting?

Should I analyse the TTF or the WOFF2 for a CJK font?

Do variable CJK fonts change the glyph count?

Are the CJK subset sizes in this guide exact?

Is my CJK font uploaded when I analyse it?

Privacy first

Related guides