How to unicode blocks reference for font selection
- Step 1Latin family — Basic Latin (U+0000–U+007F, 128 codepoints; printable from U+0020). Latin-1 Supplement (U+0080–U+00FF, 128). Latin Extended-A (U+0100–U+017F, 128 — Central/Eastern European). Latin Extended-B (U+0180–U+024F, 208). Latin Extended Additional (U+1E00–U+1EFF, 256 — where Vietnamese lives).
- Step 2Cyrillic and Greek — Cyrillic (U+0400–U+04FF, 256). Cyrillic Supplement (U+0500–U+052F, 48). Greek and Coptic (U+0370–U+03FF, 144). Greek Extended (U+1F00–U+1FFF, 256 — polytonic Greek).
- Step 3Punctuation and symbols — General Punctuation (U+2000–U+206F, 112 — typographic quotes, dashes, spaces). Currency Symbols (U+20A0–U+20CF, 48 — €, ₫, ₹). Superscripts and Subscripts (U+2070–U+209F, 48). Letterlike Symbols, Arrows, Mathematical Operators follow above U+2100.
- Step 4CJK essentials — CJK Unified Ideographs (U+4E00–U+9FFF, 20,992 — the main Han block). CJK Symbols and Punctuation (U+3000–U+303F, 64). Hiragana (U+3040–U+309F, 96). Katakana (U+30A0–U+30FF, 96). Hangul Syllables (U+AC00–U+D7AF, 11,184).
- Step 5Right-to-left and Indic — Hebrew (U+0590–U+05FF, 112). Arabic (U+0600–U+06FF, 256) plus Arabic Supplement and Presentation Forms. Devanagari (U+0900–U+097F, 128). Bengali, Tamil, Telugu, and other Indic blocks follow contiguously from U+0980.
- Step 6Symbols, emoji, and private use — Emoji span five blocks (see the emoji table). Private Use Area (U+E000–U+F8FF, 6,400) holds custom icon fonts. Two Supplementary PUAs add ~131k more codepoints for niche use. Cross-check any of these against a font with the [Coverage Map](/font-tools/character-coverage-map).
Latin, Cyrillic, Greek — the everyday blocks
The blocks most web fonts must cover. Size is codepoints in the block; printable counts may be lower. Ranges match the Coverage Map's 346-block table.
| Block | Range | Size | Languages / use |
|---|---|---|---|
| Basic Latin | U+0000–U+007F | 128 | ASCII; English, code, digits (printable from U+0020) |
| Latin-1 Supplement | U+0080–U+00FF | 128 | Western European accents: é à ç ñ ü ø, ©, £, €-adjacent |
| Latin Extended-A | U+0100–U+017F | 128 | Central/Eastern European: Polish, Czech, Hungarian, Romanian |
| Latin Extended-B | U+0180–U+024F | 208 | African Latin, phonetics, Pinyin tone marks |
| Latin Extended Additional | U+1E00–U+1EFF | 256 | Vietnamese (stacked diacritics), Welsh, dotted/ring forms |
| Cyrillic | U+0400–U+04FF | 256 | Russian, Ukrainian, Bulgarian, Serbian |
| Cyrillic Supplement | U+0500–U+052F | 48 | Komi, Abkhaz, other minority Cyrillic |
| Greek and Coptic | U+0370–U+03FF | 144 | Modern Greek, math Greek (α β γ) |
| Greek Extended | U+1F00–U+1FFF | 256 | Polytonic (classical) Greek |
Punctuation, symbols, CJK, Indic, RTL
Block sizes drive subset weight. CJK Unified Ideographs and Hangul Syllables dominate any file that includes them.
| Block | Range | Size | Languages / use |
|---|---|---|---|
| General Punctuation | U+2000–U+206F | 112 | Typographic quotes “ ” ‘ ’, dashes – —, spaces, ellipsis |
| Currency Symbols | U+20A0–U+20CF | 48 | € ₫ ₹ ₽ ₩ and other currency signs |
| CJK Symbols and Punctuation | U+3000–U+303F | 64 | CJK quotes, brackets, ideographic space |
| Hiragana | U+3040–U+309F | 96 | Japanese phonetic (cursive) |
| Katakana | U+30A0–U+30FF | 96 | Japanese phonetic (loanwords) |
| CJK Unified Ideographs | U+4E00–U+9FFF | 20,992 | Chinese, Japanese kanji, Korean hanja — huge |
| Hangul Syllables | U+AC00–U+D7AF | 11,184 | Korean — precomposed syllables |
| Hebrew | U+0590–U+05FF | 112 | Hebrew (RTL) |
| Arabic | U+0600–U+06FF | 256 | Arabic, Persian, Urdu (RTL; needs shaping) |
| Devanagari | U+0900–U+097F | 128 | Hindi, Marathi, Sanskrit (needs conjunct shaping) |
Where emoji and icons actually live
Emoji are not one block — they're spread across five. Icon fonts use the Private Use Area. The Coverage Map reports each of these by its real block name.
| Block | Range | Size | Contents |
|---|---|---|---|
| Emoticons | U+1F600–U+1F64F | 80 | Smileys 😀😢😎 |
| Miscellaneous Symbols and Pictographs | U+1F300–U+1F5FF | 768 | Weather, animals, objects 🌧🐶📦 |
| Supplemental Symbols and Pictographs | U+1F900–U+1F9FF | 256 | Newer emoji 🤖🧠🦷 |
| Transport and Map Symbols | U+1F680–U+1F6FF | 128 | Vehicles, signs 🚀🚦 |
| Miscellaneous Symbols | U+2600–U+26FF | 256 | Older symbols/emoji ☀⚡♻ |
| Private Use Area | U+E000–U+F8FF | 6,400 | Custom icon fonts — no standard meaning |
| Supplementary PUA-A / B | U+F0000–U+10FFFD | ~131,068 | Large private-use ranges for niche systems |
Cookbook
Lookup workflows that turn a Coverage Map row into an actionable decision. Use these to interpret block names, sizes, and ranges.
Find which block your missing character lives in
ExampleYou see tofu on a specific character. Get its codepoint, then look up the block to check that block's coverage percentage.
Missing character: ř (Czech r with caron) Codepoint: U+0159 U+0159 falls in Latin Extended-A (U+0100-U+017F). → Check the Latin Extended-A row in the Coverage Map. If it's below 100%, ř may be one of the missing slots. Confirm, then pick a font that covers Latin Extended-A fully.
Estimate subset weight from block size
ExampleBlock size tells you roughly how heavy a subset will be. Tiny blocks are free; CJK and Hangul are not.
latin (Basic Latin + Latin-1) ~191 codepoints → tiny subset + Latin Extended-A +128 → still small + Cyrillic +256 → small + Greek + Greek Extended +400 → small + CJK Unified Ideographs +20,992 → large file! + Hangul Syllables +11,184 → large file! Latin/Cyrillic/Greek together stay lean. Adding CJK or Hangul changes the file from KB to hundreds of KB even when subset.
Map Google Fonts subset names to block ranges
ExampleGoogle Fonts uses subset labels; this is what they actually map to so you can match them to Coverage Map blocks.
Google Fonts subset → underlying ranges latin U+0000-00FF + General Punctuation + a few extras latin-ext Latin Extended-A/B/Additional + Latin Extended-D cyrillic Cyrillic block cyrillic-ext Cyrillic + Supplement + extensions greek Greek and Coptic greek-ext Greek and Coptic + Greek Extended vietnamese Latin Extended Additional + ₫ (U+20AB) The Coverage Map's 6 web-subset rows align with latin, latin-ext, cyrillic, greek, vietnamese, and a punctuation+symbols set.
Locate emoji across five blocks
ExampleEmoji aren't one block. If a font 'has some emoji', check all five emoji blocks to see which it covers.
Emoji blocks to check in a Coverage Map: Emoticons U+1F600-U+1F64F Misc Symbols and Pictographs U+1F300-U+1F5FF Supplemental Symbols and Pictographs U+1F900-U+1F9FF Transport and Map Symbols U+1F680-U+1F6FF Misc Symbols U+2600-U+26FF Most text fonts cover none — the OS supplies colour emoji. To strip stray emoji from a font, see /font-tools/emoji-remover.
Recognise an icon font from its block
ExampleIf the only covered block is the Private Use Area, you're looking at a UI icon font, not a text font.
Coverage Map: Private Use Area U+E000-U+F8FF 480 / 6,400 8% (no Latin, no other block) This is a custom icon font. PUA codepoints have no standard meaning — the glyphs are whatever the foundry assigned. Inspect individual icons with /font-tools/glyph-inspector before use.
Edge cases and what actually happens
Every row below was probed against the live API. Some documented requirements (alphabetical axis order, numerical tuple order) are not actually enforced in practice — useful to know if you've been blaming the wrong thing for a 400.
Vietnamese has no 'Vietnamese' block
Naming trapVietnamese characters live in Latin Extended Additional (U+1E00–U+1EFF) plus the đồng sign U+20AB in Currency Symbols — there is no block literally named 'Vietnamese'. Google Fonts' vietnamese subset is a curated range, not a Unicode block. In the Coverage Map, check Latin Extended Additional for Vietnamese coverage.
Emoji are spread across five blocks, not one
Naming trapThere's no single 'Emoji' block. They span Emoticons, Miscellaneous Symbols and Pictographs, Supplemental Symbols and Pictographs, Transport and Map Symbols, and Miscellaneous Symbols. A font with 'some emoji' may cover one block and not the others — check all five in the Coverage Map.
Private Use Area codepoints have no standard meaning
By designPUA (U+E000–U+F8FF) plus the two Supplementary PUAs are reserved for private agreement. An icon font assigns its own glyphs there. Coverage in PUA tells you the font uses those slots, but not what they render — two icon fonts at 'PUA 50%' are entirely different. Inspect glyphs directly.
Block 'covered' but a specific character missing
VerifyA block at 80% has 20% of its slots empty. The character you need might be in that 20%. Block-level coverage is a summary; for a critical character, confirm its exact codepoint is mapped rather than trusting the block percentage alone.
Two surrogate-range blocks aren't real characters
ExpectedHigh Surrogates, High Private Use Surrogates, and Low Surrogates (U+D800–U+DFFF) exist only to encode astral codepoints in UTF-16. They contain no assignable characters, so a correct font never covers them. They appear in the 346-block table for completeness but you'll never need them in a font.
Block size includes unassigned slots
ExpectedA block's size is end - start + 1, which can include reserved/unassigned codepoints. So 100% coverage of a block isn't always literally possible if some slots are unassigned — but in practice the Coverage Map's percentage reflects covered ÷ block span, and near-100% is the realistic target.
Hangul Syllables is one enormous block
ExpectedKorean precomposed syllables fill Hangul Syllables (U+AC00–U+D7AF, 11,184 codepoints). A Korean font covers nearly all of it; a Latin font covers none. Because it's so large, including Korean dramatically increases file size — plan subsetting accordingly.
CJK spans many extension blocks
ExpectedBeyond the main CJK Unified Ideographs block, there are extension blocks A through J plus compatibility blocks. A font 'with CJK' may cover the main block and none of the extensions (rare characters). For pan-CJK requirements, check the extension blocks too — and expect very large files.
Google Fonts subset names ≠ Unicode block names
Naming traplatin, latin-ext, cyrillic-ext are Google's curated delivery subsets, not Unicode blocks. They mix and trim ranges (e.g. latin includes some General Punctuation). When mapping a Coverage Map block read to a Google subset, use the mapping table — don't assume a 1:1 name match.
Frequently asked questions
How many Unicode blocks are there?
Unicode 17.0.0 defines 346 named blocks — the exact set the Coverage Map scores against. Most fonts touch only 5–20 of them. The rest are historic scripts, specialised symbols, and CJK extensions you'll rarely need unless you're building for a specific niche.
Which blocks does an English-only site need?
Basic Latin (U+0000–U+007F) and Latin-1 Supplement (U+0080–U+00FF) cover English plus common European accents and symbols. Add General Punctuation (U+2000–U+206F) for proper typographic quotes and dashes. That's roughly 300 codepoints — a tiny, fast subset.
Where is Vietnamese in Unicode?
Vietnamese lives in Latin Extended Additional (U+1E00–U+1EFF) — the stacked-diacritic letters — plus the đồng sign U+20AB. There is no block named 'Vietnamese'; Google Fonts' vietnamese subset is a curated range. Check Latin Extended Additional coverage for Vietnamese fitness.
How big is the CJK block?
CJK Unified Ideographs (U+4E00–U+9FFF) is 20,992 codepoints — by far the largest commonly-needed block. Hangul Syllables (U+AC00–U+D7AF) is 11,184. Including either turns a font from kilobytes to hundreds of kilobytes even after subsetting, so size them deliberately.
Where do emoji live?
Across five blocks: Emoticons (U+1F600–U+1F64F), Miscellaneous Symbols and Pictographs (U+1F300–U+1F5FF), Supplemental Symbols and Pictographs (U+1F900–U+1F9FF), Transport and Map Symbols (U+1F680–U+1F6FF), and Miscellaneous Symbols (U+2600–U+26FF). Most text fonts carry none; the OS supplies colour emoji. To remove stray emoji, see emoji-remover.
What's the Private Use Area for?
PUA (U+E000–U+F8FF, 6,400 codepoints) is reserved for private agreement — custom icon fonts map their glyphs there. The two Supplementary PUAs (U+F0000–U+10FFFD) add ~131k more. PUA codepoints have no standard meaning, so coverage there tells you the slots are used, not what they render.
How does block size relate to subset weight?
Roughly proportionally for the characters you keep. A 95-codepoint Basic Latin subset is tiny; a full CJK subset is huge. Block size is your first estimate of how heavy including a script will be — use the Glyph Count Analyzer for precise projected WOFF2 sizes.
Do Google Fonts subset names match Unicode block names?
No. latin, latin-ext, cyrillic, greek-ext, vietnamese are Google's curated delivery subsets that mix and trim ranges (e.g. latin includes some General Punctuation). Use the mapping table above to translate between them and the Coverage Map's block rows.
What's the difference between Latin Extended-A and Additional?
Latin Extended-A (U+0100–U+017F) holds Central/Eastern European letters (Polish, Czech, Hungarian). Latin Extended Additional (U+1E00–U+1EFF) holds Vietnamese and other heavily-diacritic forms. They're separate blocks — a font can cover one and not the other, which is why CE-Europe and Vietnamese are independent coverage checks.
Why does Greek have two blocks?
Greek and Coptic (U+0370–U+03FF) covers modern Greek; Greek Extended (U+1F00–U+1FFF) covers polytonic (classical) Greek with breathing marks and accents. Modern Greek text needs only the first; classical/academic Greek needs both. Google's greek-ext subset combines them.
How do I use this with the Coverage Map?
When the Coverage Map shows a block name and percentage, look it up here to learn its range, size, and languages. Then decide: is the percentage high enough for the characters I ship, and is the block small enough to subset affordably? Run the Coverage Map on each candidate.
Which blocks need shaping beyond coverage?
Arabic, Devanagari (and other Indic scripts), Thai, Khmer, and similar complex scripts need GSUB/GPOS shaping rules on top of codepoint coverage. 100% block coverage with broken shaping still renders wrong. Confirm coverage here, then proof rendered text and check features with the OpenType Features Inspector.
Privacy first
Every JAD Font tool runs entirely in your browser using opentype.js and the wawoff2 WASM Brotli encoder. Your fonts never leave your device — verified by zero outbound network requests during processing.