How to cjk font glyph count considerations
- Step 1Know what the analyser can and can't tell you for CJK — It will report `total_glyphs` (real and useful) and `current_woff2_estimate_bytes`. Its six subset projections do NOT include Han, Kana, or Hangul, so ignore the per-subset rows for a CJK font — they only count any Latin/symbol codepoints that happen to be present.
- Step 2Check the file against the tier size limit first — A full Noto Sans CJK weight is ~5 MB WOFF2 and will hit the free tier's 5 MB limit, refused with "exceeds the {tier} tier per-job limit". Use a paid tier (50 MB / 1 GB) or pre-subset the font before analysing the full file.
- Step 3Read total_glyphs to make the subsetting case — A `total_glyphs` in the tens of thousands is the evidence that shipping the full font is a non-starter on the web. That single number is what justifies the common-character subsetting work to your team.
- Step 4Pick a common-character set for your language — Japanese: JIS Level 1 (~3,000 kanji). Simplified Chinese: GB2312 (~6,800) or the larger GB18030. Korean: KS X 1001 (~2,300 hangul). These cover the overwhelming majority of running text. The analyser won't build these — you specify the codepoints elsewhere.
- Step 5Build the real subset and measure it — Because the analyser has no CJK subset projection, build the actual common-character subset with [the font subsetter](/font-tools/font-subsetter) and read its real output size, or re-run the analyser on the subset WOFF2 to read the reduced `total_glyphs` and real baseline.
- Step 6Ship common + rare with unicode-range — Serve the common-character subset as the primary `@font-face` and the rare-character remainder as additional `unicode-range` blocks, so most page loads fetch only the common subset and rare characters load lazily. Wire it with [the @font-face generator](/font-tools/font-face-generator).
What the analyser reports for a CJK font
Which output fields are meaningful for CJK and which to ignore. The six subsets are Latin-script-centric; CJK lives in blocks none of them cover.
| Output | Meaningful for CJK? | Why |
|---|---|---|
total_glyphs | Yes — the key number | Real count via font.glyphs.length; tens of thousands for full CJK, the case for subsetting |
current_woff2_estimate_bytes | Yes for WOFF2 input | Real file.size for WOFF2; a sfnt × 0.55 estimate for TTF/OTF (less reliable at CJK scale) |
projections[] (Latin, Cyrillic…) | No | None of the six subsets cover Han/Kana/Hangul, so the rows only count incidental Latin/symbol glyphs |
glyph_count per subset | No | Counts codepoints inside Latin-script ranges; near-zero relevance for CJK text |
savings_pct per subset | Misleading | Shows huge savings because the font barely touches the Latin subsets — not a real CJK subset estimate |
CJK common-character sets and rough sizes
Common-character standards used to subset CJK for the web, with approximate counts and post-WOFF2 sizes. These are not analyser projections — they come from real subset builds and vary by font. Build and measure your own.
| Language | Common set | Approx chars | Rough WOFF2 (one weight) |
|---|---|---|---|
| Japanese | JIS Level 1 | ~3,000 kanji (+ kana) | ~1–1.5 MB |
| Simplified Chinese | GB2312 | ~6,800 | ~1.5–2 MB |
| Simplified Chinese (full) | GB18030 | 27,000+ | Most of the full font |
| Traditional Chinese | Big5 | 13,000+ | ~3 MB+ |
| Korean | KS X 1001 | ~2,300 hangul | ~0.8–1 MB |
Cookbook
How to read the analyser correctly for CJK and where to go for the numbers it can't give you. Sizes below come from real subset builds, not the analyser's Latin-centric projections. For the general projection mechanics, see the subset savings calculator guide.
Reading a full CJK font correctly
ExampleAnalyse a Noto Sans JP weight and the only line that matters is total_glyphs. The projections all show implausibly high savings because the font barely intersects the Latin subsets — that's not a real estimate, it's the tool's six subsets simply not covering CJK.
Output (abridged) for NotoSansJP-Regular.woff2:
{
"total_glyphs": 17800, <- THE useful number
"current_woff2_estimate_bytes": 4900000, <- real (WOFF2 input)
"projections": [
{ "subset": "Latin Basic + Latin-1 Supplement",
"glyph_count": 95, "savings_pct": 99 } <- IGNORE for CJK
]
}
Read: 17,800 glyphs, ~4.9 MB. Subsetting is mandatory.
Ignore: every projections[] row.The full font hits the free-tier limit
ExampleA 5 MB Noto Sans CJK weight is refused on the free tier before it's even parsed. Either upgrade the tier or pre-subset to a common-character set and analyse that smaller file.
Upload: NotoSansSC-Regular.woff2 (5.0 MB), free tier (5 MB cap)
Result: "exceeds the free tier per-job limit (5 MB). Upgrade..."
Options:
- Use the Pro tier (50 MB) to analyse the full file, OR
- Subset to GB2312 first (font-subsetter), then analyse the
~1.8 MB result on any tier.Measuring a real common-character subset
ExampleBecause the analyser can't project a CJK subset, build the GB2312 subset for real and read its size. Re-running the analyser on the subset gives you the reduced glyph total and a real baseline.
1. font-subsetter: keep GB2312 codepoints (~6,800) -> NotoSansSC-GB2312.woff2 (real bytes, e.g. 1.8 MB) 2. glyph-count-analyzer on the subset: total_glyphs: ~6900 (down from ~28,000) current_woff2_estimate_bytes: 1800000 (real, WOFF2 input) The analyser confirmed the subset took; the size is the real one, not a projection.
Common + rare with unicode-range
ExampleShip the common-character subset as the primary face and the rare remainder behind unicode-range so most loads fetch only the common subset. The rare-character WOFF2 downloads only when content needs it.
@font-face { font-family: "JP"; src: url(/f/jp-jis1.woff2) format("woff2");
unicode-range: U+3000-30FF, U+4E00-9FFF; } /* common */
@font-face { font-family: "JP"; src: url(/f/jp-rare.woff2) format("woff2");
unicode-range: U+3400-4DBF, U+20000-2A6DF; } /* rare/ext */
Most pages fetch only jp-jis1.woff2 (~1.3 MB);
rare CJK Ext-A/B load lazily as content requires.Simplified vs Traditional are different fonts
ExampleDon't subset a Simplified font and expect Traditional content to render, or vice versa. The analyser's total_glyphs won't warn you about script mismatch — that's a coverage and content question, not a count.
Simplified content -> Simplified font (GB2312 / GB18030 subset) Traditional content -> Traditional font (Big5 subset) Mixing them = tofu boxes where the chosen subset lacks the characters. Confirm the right script with the coverage map before sizing with the analyser.
Edge cases and what actually happens
Every row below was probed against the live API. Some documented requirements (alphabetical axis order, numerical tuple order) are not actually enforced in practice — useful to know if you've been blaming the wrong thing for a 400.
The six subset projections are meaningless for CJK
Out of scope (no CJK subset)The analyser only projects Latin, Latin Extended, Cyrillic, Greek, Vietnamese, and a symbols block. There is no Han, Kana, or Hangul subset. For a CJK font the per-subset rows only count whatever incidental Latin and symbol codepoints it includes, so every projection shows huge "savings" and a tiny size — not because the font is small, but because it barely touches those Latin subsets. Read total_glyphs; ignore the projection rows entirely.
Full CJK font exceeds the free-tier 5 MB limit
413-style rejectA full Noto Sans CJK weight is ~5 MB as WOFF2 and routinely trips the free tier's 5 MB per-job limit, refused before parsing with "exceeds the free tier per-job limit". Use the Pro tier (50 MB) or Developer tier (1 GB) to analyse the full file, or pre-subset to a common-character set first and analyse the smaller result on any tier.
TTF baseline is unreliable at CJK scale
Expected (estimate)For a TTF/OTF CJK font the baseline is sfnt × 0.55, and at CJK scale that assumption is shakier — CJK glyph data and the large cmap/glyf tables don't always compress at 0.55×. Analyse the actual WOFF2 to anchor current_woff2_estimate_bytes to a real size. Note that an uncompressed CJK TTF can be 15–25 MB, exceeding even the Pro 50 MB limit only rarely but worth checking.
total_glyphs counts more than typable characters
By designAs with any font, CJK total_glyphs includes composites, alternates, and vertical-writing variants (vert/vrt2 glyphs) on top of the base characters. So a font advertising ~7,000 characters can report well over that. The count is still the right signal that the font is too big to ship whole — just don't equate it one-to-one with character coverage.
Simplified vs Traditional mismatch isn't caught by count
Use the coverage mapA Simplified-Chinese subset will render mostly tofu for Traditional content and vice versa — different character sets. The analyser's total_glyphs says nothing about which script the glyphs belong to. Confirm the font matches your content's script with the character coverage map (which scores Han blocks across 346 blocks) before committing to a subset.
Variable CJK font has the same glyph total per weight
ExpectedA variable CJK font carries the full glyph set once and varies weight via axes, so total_glyphs matches a single static instance — but the file is far larger than one static. For the web you almost always still subset by common characters; the variable file replaces multiple static weights of that subset. Freeze the axes you need with the variable font freezer, subset, then analyse.
Lazy-loading rare characters depends on unicode-range, not the tool
Out of scope (CSS pattern)The analyser sizes and counts; it doesn't generate the @font-face/unicode-range blocks that make rare-character lazy loading work. That graceful-fallback pattern — common subset primary, rare subset behind a unicode-range — is implemented in CSS. Build the subsets first, then wire the blocks with the @font-face generator; the browser fetches the rare WOFF2 only when content uses those codepoints.
Pre-2025 expectations about subset size are off
Re-measureWOFF2 sizes for CJK common-character sets shift as foundries re-cut fonts and as Brotli tuning improves; rough figures (JIS1 ~1–1.5 MB, GB2312 ~1.5–2 MB) are guidance, not guarantees. Always build the real subset for your specific font and weight and measure it — re-running the analyser on the subset WOFF2 gives the real number rather than a stale estimate.
Frequently asked questions
Can the analyser project the size of a Chinese/Japanese/Korean subset?
No. Its six fixed subsets are Latin, Latin Extended, Cyrillic, Greek, Vietnamese, and symbols — none cover Han, Kana, or Hangul. For a CJK font the projection rows only count incidental Latin/symbol codepoints and show meaningless "savings". To size a CJK subset, build the real common-character subset with the font subsetter and measure its output, or re-run the analyser on that subset to read the real baseline.
What does the analyser actually tell me about a CJK font, then?
Two useful things: the real total_glyphs (tens of thousands for a full CJK font — the number that proves subsetting is mandatory) and, for WOFF2 input, the real baseline size. Everything in the per-subset projections[] array is Latin-centric and should be ignored for CJK. Use the headline glyph count to make the case, then size the actual subset elsewhere.
Why does my CJK font get rejected on upload?
Almost certainly the tier file-size limit. A full Noto Sans CJK weight is ~5 MB as WOFF2 and trips the free tier's 5 MB per-job cap, refused before parsing. Use the Pro tier (50 MB) or Developer tier (1 GB), or subset to a common-character set first and analyse the smaller file — a GB2312 or JIS Level 1 subset is well under the free limit.
How many glyphs are in a typical CJK font?
A full pan-CJK font like Noto Sans CJK carries well over 40,000 glyphs across all four scripts; a single-language cut (JP, SC, TC, KR) still runs to roughly 16,000–28,000. The analyser's total_glyphs will confirm the exact number for your file. Common-character subsets bring the typable set down to ~2,300–6,800 depending on language.
Which common-character set should I subset to?
Japanese → JIS Level 1 (~3,000 kanji plus kana); Simplified Chinese → GB2312 (~6,800) or the larger GB18030 for full coverage; Traditional Chinese → Big5 (13,000+); Korean → KS X 1001 (~2,300 hangul). These cover the overwhelming majority of running text. The analyser doesn't build them — specify the codepoints in the font subsetter or the character whitelist builder.
Can I lazy-load the rare CJK characters?
Effectively yes, via unicode-range. Ship the common-character subset as the primary @font-face and the rare-character remainder as additional unicode-range blocks; the browser fetches the rare WOFF2 only when a page actually contains those codepoints. Most page loads then download just the common subset (~1–2 MB). Generate the blocks with the @font-face generator.
What's a realistic CJK font budget for a web page?
Roughly 300 KB–2 MB depending on audience and network. A common-character subset (JIS1 / GB2312 / KS X 1001) plus unicode-range lazy-loading of the long tail keeps the typical load near the lower end. Build the subset and measure it — the analyser's per-subset projections won't size CJK for you, but re-running it on the built subset confirms the real bytes.
Is Simplified the same as Traditional for subsetting?
No — they're different character sets and usually different fonts. A Simplified subset renders Traditional content as tofu and vice versa. The analyser's glyph count won't flag the mismatch because it counts glyphs, not which script they serve. Confirm the font matches your content's script with the character coverage map before subsetting.
Should I analyse the TTF or the WOFF2 for a CJK font?
The WOFF2, so current_woff2_estimate_bytes is a real measured size. For a TTF/OTF the baseline is sfnt × 0.55, and that compression assumption is less reliable at CJK scale where the glyf/cmap tables are huge. Also note an uncompressed CJK TTF can be 15–25 MB, so check it against your tier's size limit before uploading.
Do variable CJK fonts change the glyph count?
No — a variable CJK font stores the glyph set once and varies weight via axes, so total_glyphs equals a single static instance. The file is much larger than one static, though. You still subset by common characters for the web; the variable file just replaces multiple static weights of that subset. Freeze axes with the variable font freezer first if you only need a few weights.
Are the CJK subset sizes in this guide exact?
No — they're rough guidance from real builds (JIS1 ~1–1.5 MB, GB2312 ~1.5–2 MB, KS X 1001 ~0.8–1 MB) and they vary by font, weight, and Brotli settings. Always build the actual subset for your specific font and measure it. Re-running the analyser on the subset WOFF2 gives the real total_glyphs and real baseline rather than a stale estimate.
Is my CJK font uploaded when I analyse it?
No. Parsing runs entirely in your browser via WebAssembly; the font bytes never leave the page, even for large CJK files (subject to the tier size limit). Only an anonymous processed-file counter is recorded for dashboard stats, with no content, and it's opt-out. You can analyse confidential or licensed CJK fonts locally.
Privacy first
Every JAD Font tool runs entirely in your browser using opentype.js and the wawoff2 WASM Brotli encoder. Your fonts never leave your device — verified by zero outbound network requests during processing.