How to glyph data formats reference: names, unicode, metrics
- Step 1Glyph index — A sequential integer 0..N-1. Index 0 is always `.notdef` (the tofu rectangle). Indices are font-internal — the same character can have different indices in two fonts — so never use an index as a portable identifier. It's the `index` field, and it's the loop variable in the inspector's walk.
- Step 2Glyph name (post / CFF charset) — Optional human-readable name: `A`, `home`, `f_i`, `a.smcp`. TrueType stores it in the `post` table; OTF in the CFF charset. Many web fonts strip `post` to save bytes, so the inspector's `name` is `null` for those — fall back to `index` or `unicode` to identify the glyph.
- Step 3Unicode codepoint (cmap) — The `cmap` table maps codepoints to glyph indices. A glyph can be reached by several codepoints, but the inspector reports the glyph's primary `unicode` as `U+XXXX` (or `null` for unencoded glyphs). For the full codepoint listing, read the cmap directly or use the coverage map.
- Step 4Advance width (hmtx) — The `hmtx` table stores each glyph's horizontal advance — how far the pen moves after drawing it. The inspector surfaces it as `advance` (font units). It drives layout spacing and is independent of the visible ink width (the bbox).
- Step 5Bounding box (outline) — `xMin/xMax/yMin/yMax` are the outline's extent in font units, taken from `getBoundingBox()` (with a fallback to the glyph table's recorded extents). Negative `yMin` means the glyph descends below the baseline. Empty box (all zero) means no ink — a blank glyph.
- Step 6SVG path and viewBox — `svgPath` is `getPath(0, em, em).toPathData(2)` — a y-down `d` string scaled to the em box with the baseline at `y = units_per_em`. `viewBox` is `0 0 <width> <units_per_em>`. Together they render the glyph upright with no extra transforms. Contour-less glyphs give `svgPath: ""`.
The four glyph identifiers
Every glyph carries these. Confusing index with codepoint is the root of most font-tooling bugs.
| Identifier | Source table | Portable? | Inspector field |
|---|---|---|---|
| Glyph index | the glyph order itself | No — font-internal | index |
| Glyph name | post (TT) / CFF charset (OTF) | Somewhat — names can be custom | name (or null) |
| Unicode codepoint | cmap | Yes — the portable identity | unicode (primary, or null) |
| Advance + bbox | hmtx + outline | Values in font units | advance, xMin/xMax/yMin/yMax |
Inspector field → OpenType source
Where each value in a GlyphRecord comes from, and how it's computed.
| Field | Origin | How it's produced |
|---|---|---|
index | glyph order | Loop counter 0..min(total,5000)-1 |
name | post / CFF charset | glyph.name ?? null |
unicode | cmap | glyph.unicode formatted U+XXXX, else null |
advance | hmtx | glyph.advanceWidth ?? null |
xMin/xMax/yMin/yMax | outline | getBoundingBox(), fallback to table extents |
svgPath | glyf/CFF outline | getPath(0, em, em).toPathData(2); "" if no contours |
viewBox | computed | 0 0 <advance|em> <units_per_em> |
Output header fields and the cap
Fields that appear once, before the glyphs array, plus the safety cap.
| Field | Meaning | Notes |
|---|---|---|
total_glyphs | Every glyph in the font | font.glyphs.length, including .notdef and unencoded glyphs |
sampled | Records actually serialised | min(total_glyphs, 5000) — lower than total means truncated |
units_per_em | Em square size | Use as SVG fontSize and viewBox height; 1000 (CFF) or 2048 (TT) typical |
ascender / descender | Font-unit vertical metrics | From opentype; may be null |
coordinate_system | Path coordinate note | y-down; fontSize=units_per_em; baseline at y=units_per_em |
| Safety cap | 5,000 records, all tiers | GLYPH_INSPECT_SAFETY_CAP — bounds payload, not a tier limit |
Cookbook
Field-by-field, with real records. UPM is 1000 unless the example says otherwise.
Index vs codepoint — they're different numbers
ExampleThe letter A: glyph index 36 in this font, Unicode U+0041. The index is where it sits in the font; the codepoint is its portable identity. In another font, A might be index 4.
{ "index": 36, "name": "A", "unicode": "U+0041", ... }
index 36 → position inside THIS font only
U+0041 → the character "A" anywhere
Same glyph, two unrelated numbers. Never key on index.A glyph with a name but no codepoint
ExampleThe small-cap A is unencoded — reached via the smcp feature, not by typing a character. It has a name and an outline but unicode is null.
{
"index": 401,
"name": "A.smcp",
"unicode": null, ← unencoded
"advance": 600,
"svgPath": "M... Z",
"viewBox": "0 0 600 1000"
}A glyph with a codepoint but no name
ExampleA web font stripped its post table. The euro sign is encoded (U+20AC) but has no name — identify it by codepoint instead.
{
"index": 1203,
"name": null, ← post table stripped
"unicode": "U+20AC",
"advance": 556,
"svgPath": "M... Z",
"viewBox": "0 0 556 1000"
}Advance vs ink width
ExampleA comma has a small outline (the ink) but its advance includes spacing. Note the negative yMin — the tail dips below the baseline.
{
"index": 15, "name": "comma", "unicode": "U+002C",
"advance": 250, ← cursor moves 250 units
"xMin": 60, "xMax": 190, ← ink only 130 wide
"yMin": -180, "yMax": 130, ← tail below baseline
"svgPath": "M... Z", "viewBox": "0 0 250 1000"
}Reading the output header
ExampleA large CJK font: total exceeds the cap, so sampled is 5,000. UPM is 1000 here. The header tells you the list is truncated before you scan a single glyph.
{
"total_glyphs": 18452,
"sampled": 5000, ← truncated at the cap
"units_per_em": 1000,
"ascender": 880, "descender": -120,
"coordinate_system": "y-down; fontSize=units_per_em; baseline at y=units_per_em",
"glyphs": [ ...5000 records... ]
}Edge cases and what actually happens
Every row below was probed against the live API. Some documented requirements (alphabetical axis order, numerical tuple order) are not actually enforced in practice — useful to know if you've been blaming the wrong thing for a 400.
Index used as a stable identifier across fonts
Don't do itGlyph indices are font-internal. The same character routinely has a different index in two fonts, and subsetting renumbers indices entirely. Tooling that keys on index across fonts (or across versions of one font) will silently point at the wrong glyph. Key on unicode for encoded glyphs, or on name where names are stable; treat index as ephemeral.
Glyph has no name (post / CFF charset absent)
Name is nullTrueType names live in post; OTF names in the CFF charset. Optimised web fonts often strip post (5–20 KB saved), so opentype can't recover names and name is null. The glyph is fully usable — identify it by index or unicode. Check whether post is present with the font-metadata-extractor.
Glyph has no codepoint (unencoded)
Unicode is nullLigatures (f_i), small caps (a.smcp), stylistic alternates, and .notdef are unencoded — no cmap entry maps a codepoint to them. They have an index, often a name, and an outline, but unicode is null. They're reached through OpenType layout (GSUB), not by typing. This is normal and expected, not a defect.
One glyph, several codepoints
Primary onlyA glyph can be mapped from multiple codepoints (a unified hyphen, a shared quote). The inspector reports the glyph's primary unicode — it doesn't enumerate every codepoint that reaches it. To list all of them, read font.tables.cmap.glyphIndexMap or use the character-coverage-map, which works from the full cmap across 346 blocks.
Advance differs sharply from ink width
Expectedadvance (from hmtx) is how far the pen moves; the bbox (xMin..xMax) is the ink. They differ for spacing-heavy glyphs: a comma has a tiny outline but a normal advance; a combining mark may have zero advance but real ink. Don't conflate the two — use advance for layout, the bbox for visual bounds.
Negative yMin / descenders below the baseline
ExpectedBecause paths are y-down with the baseline at y = units_per_em, descenders (the tails of g, p, comma) produce a negative yMin and draw below the baseline line. That's correct, not a sign error. When you render with the record's viewBox, the descender falls within the box because the box height is units_per_em measured from the top.
Bounding box disagrees with glyph-table extents
Outline winsThe inspector prefers the precise getBoundingBox() result and only falls back to the glyph table's recorded xMin/xMax/yMin/yMax if that's non-finite or inverted. Subset or auto-generated fonts sometimes carry stale table extents, so trust the reported bbox — it's derived from the actual contours, not a possibly-outdated header value.
Header shows sampled below total_glyphs
Truncated at 5,000When total_glyphs exceeds 5,000, only the first 5,000 records are serialised (GLYPH_INSPECT_SAFETY_CAP, applied on all tiers). The remaining glyphs aren't in the output, so absence of a glyph from the array isn't proof it's absent from the font. For full-coverage questions, use the coverage map or a direct opentype walk; the cap exists to bound the JSON payload, not to gate features.
Frequently asked questions
What's the difference between a glyph index and a Unicode codepoint?
The index is the glyph's position inside one font (0..N-1, with 0 = .notdef). The codepoint is the character's portable identity from Unicode (e.g. U+0041 for A), mapped to a glyph index by the cmap table. The same character can have different indices in different fonts, and subsetting renumbers indices, so always key on the codepoint (or a stable name), never the index.
Why doesn't every glyph have a name?
Names come from the post table (TrueType) or CFF charset (OTF), and both are optional. Optimised web fonts — including most of Google Fonts — strip post to save 5–20 KB, so opentype can't recover names and the inspector reports name: null. The glyph still has an index and (if encoded) a unicode to identify it by.
Why doesn't every glyph have a Unicode value?
Some glyphs are unencoded — no codepoint in the cmap maps to them. Ligatures (f_i), small caps (a.smcp), stylistic alternates, and the mandatory .notdef are typical. They're reachable only through OpenType layout features, not by typing a character, so the inspector reports unicode: null for them.
Why does the inspector show only one codepoint per glyph?
It reports the glyph's primary unicode. A glyph can legitimately be reached by several codepoints (a shared quote, a unified hyphen), but the per-glyph record carries just the primary one. To enumerate every codepoint that maps to a glyph, read font.tables.cmap.glyphIndexMap directly, or use the character-coverage-map for the block-level picture.
What's the .notdef glyph?
It's the mandatory glyph at index 0 — the box ("tofu") the engine draws when a codepoint has no real glyph in the active font. Every OpenType font must include it. It has unicode: null (no codepoint maps to it) and is always the first record in the inspector's output.
How do I read advance width per glyph?
The hmtx table holds horizontal metrics; the inspector surfaces each glyph's advance as the advance field, in font units. It's how far the pen moves after the glyph — layout spacing, not the visible ink width. For the ink, look at the bbox (xMin..xMax). You can derive side bearings as xMin (left) and advance - xMax (right).
What coordinate system is the svgPath in?
y-down screen space, produced by getPath(0, em, em).toPathData(2) where em = units_per_em. The baseline sits at y = units_per_em, so an upright glyph occupies roughly 0..units_per_em vertically and pairs directly with the record's viewBox (0 0 <width> <units_per_em>). No Y flip or scale is needed to render it.
Why is yMin negative on some glyphs?
Descenders — the tails of g, p, y, and the comma — extend below the baseline. Because the baseline is at y = units_per_em in y-down coordinates, those parts produce a negative yMin. It's expected geometry, not a sign error, and the descender still renders inside the record's viewBox.
What are units_per_em and why does it matter?
It's the size of the font's design grid — the em square — reported once in the output header. All font-unit metrics (advance, bbox, path coords) are relative to it. Typical values are 1000 (CFF/OTF) and 2048 (many TrueType fonts). Use it as both the SVG fontSize and the viewBox height when rendering a path, so glyphs from different fonts render at a comparable scale.
Why might the bounding box differ from the glyph table values?
The inspector computes the bbox from the actual outline via getBoundingBox(), falling back to the glyph table's recorded extents only when that's non-finite. Subset or programmatically generated fonts sometimes carry stale table extents, so the computed box is the trustworthy one — it reflects the real contours.
What does sampled vs total_glyphs tell me?
total_glyphs is every glyph in the font; sampled is how many records were serialised, capped at 5,000 on all tiers. When sampled is below total_glyphs, the list is truncated, so a glyph's absence from the array doesn't prove it's absent from the font. For large fonts, confirm coverage with the character-coverage-map or a direct walk.
How do I get a font's table list or overall stats instead of per-glyph data?
Use the font-metadata-extractor for tables_present, family names, UPM, and glyph count; the glyph-count-analyzer for counts; and the opentype-features-inspector for the GSUB/GPOS features that drive unencoded glyphs. The glyph inspector is the per-glyph drill-down; those tools give the font-level view.
Privacy first
Every JAD Font tool runs entirely in your browser using opentype.js and the wawoff2 WASM Brotli encoder. Your fonts never leave your device — verified by zero outbound network requests during processing.