How to latin-1 vs latin extended: which charset do you need?
- Step 1List the languages your content ships in — Not the languages you might support someday — the ones in your actual content and translations today. English-only sites need just `latin`. A site with French, German, and Spanish still fits `latin` except for French `œ`/`Œ`. Add Polish, Czech, or Turkish and you've crossed into `latin-ext`.
- Step 2Check coverage against the font, not just the spec — A range only helps if the font has those glyphs. Run [character-coverage-map](/font-tools/character-coverage-map) — it now scores against 346 real Unicode blocks — to see whether your font even contains Latin Extended-A before you bother subsetting to it.
- Step 3Pick the smallest preset that covers your languages — If every language fits inside `U+0020–U+00FF`, use the one-click [Latin Filter](/font-tools/latin-filter). If you need Extended-A letters, use [font-subsetter](/font-tools/font-subsetter) and select `latin-ext`. For Vietnamese specifically, use the dedicated `vietnamese` preset — it includes the precomposed block Extended-A alone misses.
- Step 4Subset, then compress — Both presets output an uncompressed TTF and drop kerning/layout tables (opentype.js rebuild). Always finish with [ttf-to-woff2](/font-tools/ttf-to-woff2). The size difference between `latin` and `latin-ext` is real but modest after compression — typically a few KB.
- Step 5Serve with the matching unicode-range — Declare `unicode-range` in `@font-face` so the browser only downloads the subset for pages that use those codepoints. `latin` ≈ `U+0000-00FF`; `latin-ext` ≈ `U+0100-024F, U+1E00-1EFF`. This lets you layer a small Latin file plus an on-demand extended file.
- Step 6Test with a real string per language — Paste a sentence containing the trickiest letters of each target language and confirm none render as `□`. Polish `źćłą`, Turkish `ığşĞİŞ`, Czech ` řčšž`, Vietnamese `ếợữ` — if any box out, you under-subsetted.
The four Latin blocks and what they unlock
Codepoint ranges and the languages each block adds. The latin preset = Basic + Latin-1 Supplement. The latin-ext preset = Extended-A + the precomposed U+1E00–U+1EFF block (it covers Extended-B partially via that range).
| Block | Range | In which JAD preset | Languages it unlocks |
|---|---|---|---|
| Basic Latin | U+0020–U+007E | latin, latin-ext-adjacent | ASCII — English, all unaccented text |
| Latin-1 Supplement | U+00A0–U+00FF | latin (this is the line) | German, Spanish, Italian, Portuguese, Nordic, most French |
| Latin Extended-A | U+0100–U+017F | latin-ext | Polish, Czech, Slovak, Hungarian, Croatian, Romanian, Turkish, French œ |
| Latin Extended-B | U+0180–U+024F | latin-ext (covers U+0100–U+024F) | Pinyin tones, African orthographies, phonetic letters |
| Latin Extended Additional | U+1E00–U+1EFF | latin-ext, vietnamese | Vietnamese precomposed, Welsh ŵŷ, medievalist text |
Per-language: smallest preset that works
What to choose by target language. 'Latin Filter' is the fixed latin preset; 'font-subsetter (latin-ext)' / '(vietnamese)' are the named presets in that tool.
| Language | Smallest preset | Why |
|---|---|---|
| English | Latin Filter | Pure ASCII fits Basic Latin |
| German / Spanish / Italian / Portuguese | Latin Filter | ä ö ü ß ñ á é are all in Latin-1 Supplement |
| French | font-subsetter (latin-ext) | œ/Œ (U+0152/0153) are Extended-A — Latin Filter misses them |
| Polish / Czech / Hungarian / Croatian | font-subsetter (latin-ext) | ł ą ę ř č ž ő ű are Extended-A |
| Turkish / Romanian | font-subsetter (latin-ext) | ğ ş İ ı ț ș are Extended-A |
| Vietnamese | font-subsetter (vietnamese) | Needs Extended-A + U+1E00–U+1EFF precomposed + ₫ U+20AB |
Size & count trade-off
Codepoints kept per preset (the font must actually contain them). Sizes are illustrative for a typical sans at one weight; your numbers depend on the font and on WOFF2 compression downstream.
| Preset | Codepoints | Typical subset (TTF → WOFF2) | Tool |
|---|---|---|---|
latin | 191 | ~40 KB → ~16 KB | Latin Filter |
latin-ext | 592 | ~70 KB → ~26 KB | font-subsetter |
vietnamese | 593 (latin-ext + ₫) | ~72 KB → ~27 KB | font-subsetter |
| Full font (no subset) | 3,000–60,000 | 200 KB – several MB | — |
Cookbook
Decision recipes by audience. Every subset drops kerning/layout tables (opentype.js) and outputs TTF — compress with ttf-to-woff2 before shipping.
English + German + Spanish marketing site
ExampleAll three fit inside Latin-1 Supplement. No need for Extended-A. One-click Latin Filter is the right call.
Languages: en, de, es
Test: Bücher niño Straße ¿Qué? über
All glyphs in U+0020-00FF -> latin preset is enough.
Use: Latin Filter (fixed preset) -> ttf-to-woff2
@font-face {
font-family: "Brand";
src: url(/f/brand.latin.woff2) format("woff2");
unicode-range: U+0000-00FF;
}Add French — and trip on œ
ExampleFrench looks like it fits Latin-1, but the ligature œ/Œ is in Extended-A. A Latin-only subset renders 'sœur' as 's□ur'. Switch to latin-ext.
Latin Filter output, French text: sœur -> s□ur (œ U+0153 missing) cœur -> c□ur Œuvre -> □uvre Fix: font-subsetter, preset = latin-ext unicode-range adds U+0100-024F sœur -> sœur ✓
Central European (Polish + Czech)
ExampleThese languages live almost entirely in Latin Extended-A. Latin-1 covers none of their distinctive letters, so latin-ext is mandatory.
Polish: źdźbło łąka Gdańsk zażółć Czech: příliš žluťoučký kůň Latin-1 only: ?d?b?o ??ka Gda?sk latin-ext: źdźbło łąka Gdańsk ✓ Use: font-subsetter, preset = latin-ext unicode-range: U+0100-024F, U+1E00-1EFF
Vietnamese — why it needs its own preset
ExampleVietnamese precomposed letters span Extended-A AND the U+1E00–U+1EFF block, plus the đồng currency sign U+20AB. The dedicated vietnamese preset bundles all three; latin-ext alone misses ₫.
Text: Tiếng Việt đồng Phở ngữ pháp latin-ext covers tiếng/việt (U+1E00-1EFF) but ₫ (U+20AB) is in Currency Symbols, NOT Extended. Use: font-subsetter, preset = vietnamese ranges: U+0100-024F, U+1E00-1EFF, U+20AB -> Tiếng Việt ... 5₫ ✓
Layered loading — small Latin + on-demand extended
ExampleShip the 191-glyph Latin file to everyone, and a separate Extended file the browser only fetches when a page actually contains Extended-A codepoints. unicode-range does the routing.
@font-face { /* everyone downloads this */
font-family: "Brand";
src: url(/f/brand.latin.woff2) format("woff2");
unicode-range: U+0000-00FF;
}
@font-face { /* fetched only on pages with
Extended-A characters */
font-family: "Brand";
src: url(/f/brand.ext.woff2) format("woff2");
unicode-range: U+0100-024F, U+1E00-1EFF;
}Edge cases and what actually happens
Every row below was probed against the live API. Some documented requirements (alphabetical axis order, numerical tuple order) are not actually enforced in practice — useful to know if you've been blaming the wrong thing for a 400.
French œ renders as a box on a Latin-only subset
Under-subsettedœ (U+0153) and Œ (U+0152) are in Latin Extended-A, not Latin-1 Supplement. The Latin Filter (latin preset) silently drops them — there's no error, the reader just sees □. For any French content, use font-subsetter with latin-ext. The same applies to the rarely-used Ÿ's lowercase ligature contexts.
Turkish dotted/dotless I confusion
Under-subsettedTurkish has four I-letters: I i İ ı. I and i are ASCII (kept by latin), but İ (U+0130) and ı (U+0131) are Extended-A and dropped by Latin Filter. A Turkish site on a Latin-only subset shows boxes for the dotted-uppercase and dotless-lowercase forms. Use latin-ext.
Vietnamese covered by latin-ext but ₫ still boxes
Under-subsettedlatin-ext includes U+1E00–U+1EFF, so most Vietnamese precomposed letters survive. But the đồng currency sign ₫ (U+20AB) is in the Currency Symbols block, outside any Latin range. Only the dedicated vietnamese preset in font-subsetter bundles it. If you don't show prices, latin-ext may be enough.
The font doesn't contain Extended-A even though you subset to it
Source limitationSubsetting to latin-ext only keeps glyphs the font actually has. Many display and free fonts cover Latin-1 but stop there — subsetting to latin-ext on such a font yields the same ~191 glyphs as the Latin Filter, because there's nothing in U+0100–U+024F to keep. Check first with character-coverage-map.
Combining diacritics don't compose after subsetting
By designPrecomposed letters (é = single codepoint U+00E9) are kept by the relevant preset. But text using a base letter + combining accent (e + U+0301) relies on GPOS mark positioning, which the opentype.js subset drops. So decomposed text may stack the accent badly. Use precomposed (NFC-normalised) text, or keep the layout tables via a harfbuzz subsetter.
Romanian comma-below vs cedilla
Under-subsettedCorrect Romanian uses ș ț with comma-below (U+0219/U+021B, Extended-B) — not the Turkish cedilla forms. Both are outside Latin-1. latin-ext (which covers U+0100–U+024F) includes them. A Latin-only subset boxes them entirely; an Extended subset built from a font that only has the cedilla variants will substitute incorrectly.
ð and þ missing even within Latin-1
Source limitationIcelandic ð (U+00F0) and þ (U+00FE) ARE in Latin-1 Supplement and within the Latin Filter's range — but many fonts simply don't draw them. The filter can't keep a glyph that isn't there, so Icelandic text on such a font boxes out despite using a 'covered' range. This is a font-coverage problem, not a preset problem.
Choosing latin-ext 'to be safe' bloats every page
Over-subsettedIf your content is English-only, shipping latin-ext to everyone wastes bandwidth on 400 glyphs nobody renders. The safe-by-default instinct costs real KB on the critical path. Prefer the smallest preset that covers your actual languages, and layer Extended behind unicode-range so only pages that need it pay for it.
Cyrillic / Greek mistaken for 'Latin extended'
Wrong presetCyrillic (U+0400–U+04FF) and Greek (U+0370–U+03FF) are entirely separate blocks — no Latin preset touches them. Use font-subsetter's cyrillic or greek presets. A 'Latin Extended' subset will not render Russian or Greek at all.
Pinyin tone marks need Extended-B
Under-subsettedChinese Pinyin uses toned vowels like ǎ ē ǐ ǔ — several of which (e.g. ǎ U+01CE) sit in Latin Extended-B (U+0180–U+024F). latin-ext covers U+0100–U+024F so it includes them; the Latin Filter does not. If your romanised-Chinese content shows boxes over vowels, you're on the wrong preset.
Frequently asked questions
What's the difference between Latin-1 and Latin Extended?
Latin-1 Supplement (U+00A0–U+00FF) holds the accented letters of Western European languages — German ä ö ü ß, Spanish ñ, French é à, Nordic å æ ø. Latin Extended-A (U+0100–U+017F) adds Central/Eastern European letters (Polish ł, Czech ř, Turkish ğ), French œ, and more. Extended-B (U+0180–U+024F) adds phonetic and African-orthography letters plus Pinyin tones. JAD's latin preset = Basic + Latin-1; latin-ext = Extended-A + B + the precomposed U+1E00–U+1EFF block.
Will Latin-1 cover French?
Almost. All the accents (é è à â ç ù û ï ë) are in Latin-1, but the ligature œ/Œ (U+0152/0153) is in Latin Extended-A. Words like sœur, cœur, and œuvre will box out on a Latin-only subset. For correct French, use font-subsetter with latin-ext.
Which preset do I need for Polish?
latin-ext. Polish uses ł ą ę ó ś ź ż ć ń — almost all in Latin Extended-A, none in Latin-1. The one-click Latin Filter would box every distinctive Polish letter. Pick latin-ext in font-subsetter, and verify the font actually contains those glyphs with character-coverage-map.
Why does Vietnamese have its own preset?
Vietnamese precomposed letters are split across Latin Extended-A AND the Latin Extended Additional block (U+1E00–U+1EFF), and prices use the đồng sign ₫ (U+20AB) from the Currency Symbols block. No single Latin range covers all three, so JAD's vietnamese preset bundles U+0100–U+024F, U+1E00–U+1EFF, and U+20AB together. Use it in font-subsetter.
How much bigger is latin-ext than latin?
About 3× the codepoints (592 vs 191), but the on-disk difference is smaller and shrinks further after WOFF2 compression — often only a few KB at one weight. The bigger lever is dropping the rest of a full-Unicode font, which both presets do. If your audience needs Extended-A, the size cost is well worth correct rendering.
Can I mix presets — Latin-1 plus a few Extended letters?
Not within a single preset. Use character-whitelist-builder and paste exactly the characters you want (the full Latin-1 set plus, say, œ and ł), or ship two @font-face rules with different unicode-range values so the browser fetches the small Latin file plus a tiny Extended file on demand.
Does choosing a bigger preset hurt performance?
Slightly — more glyphs means a larger file on the critical path. The standard mitigation is unicode-range: declare the subset's range so the browser only downloads it for pages that contain those codepoints. That way an English page never pays for the Extended file even if it exists.
What if my font doesn't have the Extended glyphs at all?
Subsetting only keeps glyphs the font contains. If the source font stops at Latin-1, subsetting to latin-ext produces the same result as the Latin Filter — there's nothing in U+0100–U+024F to keep. Always check coverage first with character-coverage-map, which scores against 346 real Unicode blocks.
Do these presets keep kerning?
No. Both latin and latin-ext go through the same opentype.js rebuild, which drops GPOS/GSUB, so kerning and ligatures are lost. This is independent of which preset you pick. If you need layout tables preserved, use a harfbuzz-based subsetter — the batch-script guide shows the pyftsubset alternative.
Is Cyrillic or Greek part of 'Latin Extended'?
No — they're entirely separate Unicode blocks (U+0400–U+04FF Cyrillic, U+0370–U+03FF Greek). A Latin Extended subset renders neither. Use font-subsetter's cyrillic or greek presets for those scripts.
How do I test that I picked the right preset?
After subsetting and compressing, render a sentence per target language containing its hardest letters — Polish źćłą, Turkish ığİŞ, Czech řčšž, Vietnamese ếợữ. If anything shows as □, you under-subsetted; bump up to the next preset. opentype-features-inspector and character-coverage-map help confirm coverage up front.
Which tool selects each preset?
Only the latin preset has a dedicated one-click tool — Latin Filter. Every other named preset (latin-ext, cyrillic, greek, vietnamese, symbols) is selected from the charset dropdown in font-subsetter. Both tools share the same opentype.js engine, so identical charsets produce identical output.
Privacy first
Every JAD Font tool runs entirely in your browser using opentype.js and the wawoff2 WASM Brotli encoder. Your fonts never leave your device — verified by zero outbound network requests during processing.