Glyph Data Reference: Names, Unicode, Metrics

How to glyph data formats reference: names, unicode, metrics

Step 1
Glyph index — A sequential integer 0..N-1. Index 0 is always `.notdef` (the tofu rectangle). Indices are font-internal — the same character can have different indices in two fonts — so never use an index as a portable identifier. It's the `index` field, and it's the loop variable in the inspector's walk.
Step 2
Glyph name (post / CFF charset) — Optional human-readable name: `A`, `home`, `f_i`, `a.smcp`. TrueType stores it in the `post` table; OTF in the CFF charset. Many web fonts strip `post` to save bytes, so the inspector's `name` is `null` for those — fall back to `index` or `unicode` to identify the glyph.
Step 3
Unicode codepoint (cmap) — The `cmap` table maps codepoints to glyph indices. A glyph can be reached by several codepoints, but the inspector reports the glyph's primary `unicode` as `U+XXXX` (or `null` for unencoded glyphs). For the full codepoint listing, read the cmap directly or use the coverage map.
Step 4
Advance width (hmtx) — The `hmtx` table stores each glyph's horizontal advance — how far the pen moves after drawing it. The inspector surfaces it as `advance` (font units). It drives layout spacing and is independent of the visible ink width (the bbox).
Step 5
Bounding box (outline) — `xMin/xMax/yMin/yMax` are the outline's extent in font units, taken from `getBoundingBox()` (with a fallback to the glyph table's recorded extents). Negative `yMin` means the glyph descends below the baseline. Empty box (all zero) means no ink — a blank glyph.
Step 6
SVG path and viewBox — `svgPath` is `getPath(0, em, em).toPathData(2)` — a y-down `d` string scaled to the em box with the baseline at `y = units_per_em`. `viewBox` is `0 0 <width> <units_per_em>`. Together they render the glyph upright with no extra transforms. Contour-less glyphs give `svgPath: ""`.

The four glyph identifiers

Every glyph carries these. Confusing index with codepoint is the root of most font-tooling bugs.

Identifier	Source table	Portable?	Inspector field
Glyph index	the glyph order itself	No — font-internal	`index`
Glyph name	`post` (TT) / CFF charset (OTF)	Somewhat — names can be custom	`name` (or null)
Unicode codepoint	`cmap`	Yes — the portable identity	`unicode` (primary, or null)
Advance + bbox	`hmtx` + outline	Values in font units	`advance`, `xMin/xMax/yMin/yMax`

Inspector field → OpenType source

Where each value in a GlyphRecord comes from, and how it's computed.

Field	Origin	How it's produced
`index`	glyph order	Loop counter 0..min(total,5000)-1
`name`	`post` / CFF charset	`glyph.name ?? null`
`unicode`	`cmap`	`glyph.unicode` formatted `U+XXXX`, else null
`advance`	`hmtx`	`glyph.advanceWidth ?? null`
`xMin/xMax/yMin/yMax`	outline	`getBoundingBox()`, fallback to table extents
`svgPath`	`glyf`/CFF outline	`getPath(0, em, em).toPathData(2)`; `""` if no contours
`viewBox`	computed	`0 0 <advance\|em> <units_per_em>`

Output header fields and the cap

Fields that appear once, before the glyphs array, plus the safety cap.

Field	Meaning	Notes
`total_glyphs`	Every glyph in the font	`font.glyphs.length`, including .notdef and unencoded glyphs
`sampled`	Records actually serialised	`min(total_glyphs, 5000)` — lower than total means truncated
`units_per_em`	Em square size	Use as SVG fontSize and viewBox height; 1000 (CFF) or 2048 (TT) typical
`ascender` / `descender`	Font-unit vertical metrics	From opentype; may be null
`coordinate_system`	Path coordinate note	y-down; fontSize=units_per_em; baseline at y=units_per_em
Safety cap	5,000 records, all tiers	`GLYPH_INSPECT_SAFETY_CAP` — bounds payload, not a tier limit

Cookbook

Field-by-field, with real records. UPM is 1000 unless the example says otherwise.

Index vs codepoint — they're different numbers

Example

The letter A: glyph index 36 in this font, Unicode U+0041. The index is where it sits in the font; the codepoint is its portable identity. In another font, A might be index 4.

{ "index": 36, "name": "A", "unicode": "U+0041", ... }

index 36   → position inside THIS font only
U+0041     → the character "A" anywhere
Same glyph, two unrelated numbers. Never key on index.

A glyph with a name but no codepoint

Example

The small-cap A is unencoded — reached via the smcp feature, not by typing a character. It has a name and an outline but unicode is null.

{
  "index": 401,
  "name": "A.smcp",
  "unicode": null,        ← unencoded
  "advance": 600,
  "svgPath": "M... Z",
  "viewBox": "0 0 600 1000"
}

A glyph with a codepoint but no name

Example

A web font stripped its post table. The euro sign is encoded (U+20AC) but has no name — identify it by codepoint instead.

{
  "index": 1203,
  "name": null,           ← post table stripped
  "unicode": "U+20AC",
  "advance": 556,
  "svgPath": "M... Z",
  "viewBox": "0 0 556 1000"
}

Advance vs ink width

Example

A comma has a small outline (the ink) but its advance includes spacing. Note the negative yMin — the tail dips below the baseline.

{
  "index": 15, "name": "comma", "unicode": "U+002C",
  "advance": 250,         ← cursor moves 250 units
  "xMin": 60, "xMax": 190, ← ink only 130 wide
  "yMin": -180, "yMax": 130, ← tail below baseline
  "svgPath": "M... Z", "viewBox": "0 0 250 1000"
}

Reading the output header

Example

A large CJK font: total exceeds the cap, so sampled is 5,000. UPM is 1000 here. The header tells you the list is truncated before you scan a single glyph.

{
  "total_glyphs": 18452,
  "sampled": 5000,        ← truncated at the cap
  "units_per_em": 1000,
  "ascender": 880, "descender": -120,
  "coordinate_system": "y-down; fontSize=units_per_em; baseline at y=units_per_em",
  "glyphs": [ ...5000 records... ]
}

Edge cases and what actually happens

Every row below was probed against the live API. Some documented requirements (alphabetical axis order, numerical tuple order) are not actually enforced in practice — useful to know if you've been blaming the wrong thing for a 400.

Index used as a stable identifier across fonts

Don't do it

Glyph indices are font-internal. The same character routinely has a different index in two fonts, and subsetting renumbers indices entirely. Tooling that keys on index across fonts (or across versions of one font) will silently point at the wrong glyph. Key on unicode for encoded glyphs, or on name where names are stable; treat index as ephemeral.

Glyph has no name (post / CFF charset absent)

Name is null

TrueType names live in post; OTF names in the CFF charset. Optimised web fonts often strip post (5–20 KB saved), so opentype can't recover names and name is null. The glyph is fully usable — identify it by index or unicode. Check whether post is present with the font-metadata-extractor.

Glyph has no codepoint (unencoded)

Unicode is null

Ligatures (f_i), small caps (a.smcp), stylistic alternates, and .notdef are unencoded — no cmap entry maps a codepoint to them. They have an index, often a name, and an outline, but unicode is null. They're reached through OpenType layout (GSUB), not by typing. This is normal and expected, not a defect.

One glyph, several codepoints

Primary only

A glyph can be mapped from multiple codepoints (a unified hyphen, a shared quote). The inspector reports the glyph's primary unicode — it doesn't enumerate every codepoint that reaches it. To list all of them, read font.tables.cmap.glyphIndexMap or use the character-coverage-map, which works from the full cmap across 346 blocks.

Advance differs sharply from ink width

Expected

advance (from hmtx) is how far the pen moves; the bbox (xMin..xMax) is the ink. They differ for spacing-heavy glyphs: a comma has a tiny outline but a normal advance; a combining mark may have zero advance but real ink. Don't conflate the two — use advance for layout, the bbox for visual bounds.

Negative yMin / descenders below the baseline

Expected

Because paths are y-down with the baseline at y = units_per_em, descenders (the tails of g, p, comma) produce a negative yMin and draw below the baseline line. That's correct, not a sign error. When you render with the record's viewBox, the descender falls within the box because the box height is units_per_em measured from the top.

Bounding box disagrees with glyph-table extents

Outline wins

The inspector prefers the precise getBoundingBox() result and only falls back to the glyph table's recorded xMin/xMax/yMin/yMax if that's non-finite or inverted. Subset or auto-generated fonts sometimes carry stale table extents, so trust the reported bbox — it's derived from the actual contours, not a possibly-outdated header value.

Header shows sampled below total_glyphs

Truncated at 5,000

When total_glyphs exceeds 5,000, only the first 5,000 records are serialised (GLYPH_INSPECT_SAFETY_CAP, applied on all tiers). The remaining glyphs aren't in the output, so absence of a glyph from the array isn't proof it's absent from the font. For full-coverage questions, use the coverage map or a direct opentype walk; the cap exists to bound the JSON payload, not to gate features.

Frequently asked questions

What's the difference between a glyph index and a Unicode codepoint?

The index is the glyph's position inside one font (0..N-1, with 0 = .notdef). The codepoint is the character's portable identity from Unicode (e.g. U+0041 for A), mapped to a glyph index by the cmap table. The same character can have different indices in different fonts, and subsetting renumbers indices, so always key on the codepoint (or a stable name), never the index.

Why doesn't every glyph have a name?

Names come from the post table (TrueType) or CFF charset (OTF), and both are optional. Optimised web fonts — including most of Google Fonts — strip post to save 5–20 KB, so opentype can't recover names and the inspector reports name: null. The glyph still has an index and (if encoded) a unicode to identify it by.

Why doesn't every glyph have a Unicode value?

Some glyphs are unencoded — no codepoint in the cmap maps to them. Ligatures (f_i), small caps (a.smcp), stylistic alternates, and the mandatory .notdef are typical. They're reachable only through OpenType layout features, not by typing a character, so the inspector reports unicode: null for them.

Why does the inspector show only one codepoint per glyph?

It reports the glyph's primary unicode. A glyph can legitimately be reached by several codepoints (a shared quote, a unified hyphen), but the per-glyph record carries just the primary one. To enumerate every codepoint that maps to a glyph, read font.tables.cmap.glyphIndexMap directly, or use the character-coverage-map for the block-level picture.

What's the .notdef glyph?

It's the mandatory glyph at index 0 — the box ("tofu") the engine draws when a codepoint has no real glyph in the active font. Every OpenType font must include it. It has unicode: null (no codepoint maps to it) and is always the first record in the inspector's output.

How do I read advance width per glyph?

The hmtx table holds horizontal metrics; the inspector surfaces each glyph's advance as the advance field, in font units. It's how far the pen moves after the glyph — layout spacing, not the visible ink width. For the ink, look at the bbox (xMin..xMax). You can derive side bearings as xMin (left) and advance - xMax (right).

What coordinate system is the svgPath in?

y-down screen space, produced by getPath(0, em, em).toPathData(2) where em = units_per_em. The baseline sits at y = units_per_em, so an upright glyph occupies roughly 0..units_per_em vertically and pairs directly with the record's viewBox (0 0 <width> <units_per_em>). No Y flip or scale is needed to render it.

Why is yMin negative on some glyphs?

Descenders — the tails of g, p, y, and the comma — extend below the baseline. Because the baseline is at y = units_per_em in y-down coordinates, those parts produce a negative yMin. It's expected geometry, not a sign error, and the descender still renders inside the record's viewBox.

What are units_per_em and why does it matter?

It's the size of the font's design grid — the em square — reported once in the output header. All font-unit metrics (advance, bbox, path coords) are relative to it. Typical values are 1000 (CFF/OTF) and 2048 (many TrueType fonts). Use it as both the SVG fontSize and the viewBox height when rendering a path, so glyphs from different fonts render at a comparable scale.

Why might the bounding box differ from the glyph table values?

The inspector computes the bbox from the actual outline via getBoundingBox(), falling back to the glyph table's recorded extents only when that's non-finite. Subset or programmatically generated fonts sometimes carry stale table extents, so the computed box is the trustworthy one — it reflects the real contours.

What does sampled vs total_glyphs tell me?

total_glyphs is every glyph in the font; sampled is how many records were serialised, capped at 5,000 on all tiers. When sampled is below total_glyphs, the list is truncated, so a glyph's absence from the array doesn't prove it's absent from the font. For large fonts, confirm coverage with the character-coverage-map or a direct walk.

How do I get a font's table list or overall stats instead of per-glyph data?

Use the font-metadata-extractor for tables_present, family names, UPM, and glyph count; the glyph-count-analyzer for counts; and the opentype-features-inspector for the GSUB/GPOS features that drive unencoded glyphs. The glyph inspector is the per-glyph drill-down; those tools give the font-level view.

Privacy first

Every JAD Font tool runs entirely in your browser using opentype.js and the wawoff2 WASM Brotli encoder. Your fonts never leave your device — verified by zero outbound network requests during processing.

How to glyph data formats reference: names, unicode, metrics

Step 1
Glyph index — A sequential integer 0..N-1. Index 0 is always `.notdef` (the tofu rectangle). Indices are font-internal — the same character can have different indices in two fonts — so never use an index as a portable identifier. It's the `index` field, and it's the loop variable in the inspector's walk.
Step 2
Glyph name (post / CFF charset) — Optional human-readable name: `A`, `home`, `f_i`, `a.smcp`. TrueType stores it in the `post` table; OTF in the CFF charset. Many web fonts strip `post` to save bytes, so the inspector's `name` is `null` for those — fall back to `index` or `unicode` to identify the glyph.
Step 3
Unicode codepoint (cmap) — The `cmap` table maps codepoints to glyph indices. A glyph can be reached by several codepoints, but the inspector reports the glyph's primary `unicode` as `U+XXXX` (or `null` for unencoded glyphs). For the full codepoint listing, read the cmap directly or use the coverage map.
Step 4
Advance width (hmtx) — The `hmtx` table stores each glyph's horizontal advance — how far the pen moves after drawing it. The inspector surfaces it as `advance` (font units). It drives layout spacing and is independent of the visible ink width (the bbox).
Step 5
Bounding box (outline) — `xMin/xMax/yMin/yMax` are the outline's extent in font units, taken from `getBoundingBox()` (with a fallback to the glyph table's recorded extents). Negative `yMin` means the glyph descends below the baseline. Empty box (all zero) means no ink — a blank glyph.
Step 6
SVG path and viewBox — `svgPath` is `getPath(0, em, em).toPathData(2)` — a y-down `d` string scaled to the em box with the baseline at `y = units_per_em`. `viewBox` is `0 0 <width> <units_per_em>`. Together they render the glyph upright with no extra transforms. Contour-less glyphs give `svgPath: ""`.

The four glyph identifiers

Every glyph carries these. Confusing index with codepoint is the root of most font-tooling bugs.

Identifier	Source table	Portable?	Inspector field
Glyph index	the glyph order itself	No — font-internal	`index`
Glyph name	`post` (TT) / CFF charset (OTF)	Somewhat — names can be custom	`name` (or null)
Unicode codepoint	`cmap`	Yes — the portable identity	`unicode` (primary, or null)
Advance + bbox	`hmtx` + outline	Values in font units	`advance`, `xMin/xMax/yMin/yMax`

Inspector field → OpenType source

Where each value in a GlyphRecord comes from, and how it's computed.

Field	Origin	How it's produced
`index`	glyph order	Loop counter 0..min(total,5000)-1
`name`	`post` / CFF charset	`glyph.name ?? null`
`unicode`	`cmap`	`glyph.unicode` formatted `U+XXXX`, else null
`advance`	`hmtx`	`glyph.advanceWidth ?? null`
`xMin/xMax/yMin/yMax`	outline	`getBoundingBox()`, fallback to table extents
`svgPath`	`glyf`/CFF outline	`getPath(0, em, em).toPathData(2)`; `""` if no contours
`viewBox`	computed	`0 0 <advance\|em> <units_per_em>`

Output header fields and the cap

Fields that appear once, before the glyphs array, plus the safety cap.

Field	Meaning	Notes
`total_glyphs`	Every glyph in the font	`font.glyphs.length`, including .notdef and unencoded glyphs
`sampled`	Records actually serialised	`min(total_glyphs, 5000)` — lower than total means truncated
`units_per_em`	Em square size	Use as SVG fontSize and viewBox height; 1000 (CFF) or 2048 (TT) typical
`ascender` / `descender`	Font-unit vertical metrics	From opentype; may be null
`coordinate_system`	Path coordinate note	y-down; fontSize=units_per_em; baseline at y=units_per_em
Safety cap	5,000 records, all tiers	`GLYPH_INSPECT_SAFETY_CAP` — bounds payload, not a tier limit

Cookbook

Field-by-field, with real records. UPM is 1000 unless the example says otherwise.

Index vs codepoint — they're different numbers

Example

The letter A: glyph index 36 in this font, Unicode U+0041. The index is where it sits in the font; the codepoint is its portable identity. In another font, A might be index 4.

{ "index": 36, "name": "A", "unicode": "U+0041", ... }

index 36   → position inside THIS font only
U+0041     → the character "A" anywhere
Same glyph, two unrelated numbers. Never key on index.

A glyph with a name but no codepoint

Example

The small-cap A is unencoded — reached via the smcp feature, not by typing a character. It has a name and an outline but unicode is null.

{
  "index": 401,
  "name": "A.smcp",
  "unicode": null,        ← unencoded
  "advance": 600,
  "svgPath": "M... Z",
  "viewBox": "0 0 600 1000"
}

A glyph with a codepoint but no name

Example

A web font stripped its post table. The euro sign is encoded (U+20AC) but has no name — identify it by codepoint instead.

{
  "index": 1203,
  "name": null,           ← post table stripped
  "unicode": "U+20AC",
  "advance": 556,
  "svgPath": "M... Z",
  "viewBox": "0 0 556 1000"
}

Advance vs ink width

Example

A comma has a small outline (the ink) but its advance includes spacing. Note the negative yMin — the tail dips below the baseline.

{
  "index": 15, "name": "comma", "unicode": "U+002C",
  "advance": 250,         ← cursor moves 250 units
  "xMin": 60, "xMax": 190, ← ink only 130 wide
  "yMin": -180, "yMax": 130, ← tail below baseline
  "svgPath": "M... Z", "viewBox": "0 0 250 1000"
}

Reading the output header

Example

A large CJK font: total exceeds the cap, so sampled is 5,000. UPM is 1000 here. The header tells you the list is truncated before you scan a single glyph.

{
  "total_glyphs": 18452,
  "sampled": 5000,        ← truncated at the cap
  "units_per_em": 1000,
  "ascender": 880, "descender": -120,
  "coordinate_system": "y-down; fontSize=units_per_em; baseline at y=units_per_em",
  "glyphs": [ ...5000 records... ]
}

Glyph Data Formats Reference: Names, Unicode, Metrics

How to glyph data formats reference: names, unicode, metrics

The four glyph identifiers

Inspector field → OpenType source

Output header fields and the cap

Cookbook

Index vs codepoint — they're different numbers

A glyph with a name but no codepoint

A glyph with a codepoint but no name

Advance vs ink width

Reading the output header

Edge cases and what actually happens

Index used as a stable identifier across fonts

Glyph has no name (post / CFF charset absent)

Glyph has no codepoint (unencoded)

One glyph, several codepoints

Advance differs sharply from ink width

Negative yMin / descenders below the baseline

Bounding box disagrees with glyph-table extents

Header shows sampled below total_glyphs

Frequently asked questions

What's the difference between a glyph index and a Unicode codepoint?

Why doesn't every glyph have a name?

Why doesn't every glyph have a Unicode value?

Why does the inspector show only one codepoint per glyph?

What's the .notdef glyph?

How do I read advance width per glyph?

What coordinate system is the svgPath in?

Why is yMin negative on some glyphs?

What are units_per_em and why does it matter?

Why might the bounding box differ from the glyph table values?

What does sampled vs total_glyphs tell me?

How do I get a font's table list or overall stats instead of per-glyph data?

Privacy first

Related guides

Glyph Data Formats Reference: Names, Unicode, Metrics

How to glyph data formats reference: names, unicode, metrics

The four glyph identifiers

Inspector field → OpenType source

Output header fields and the cap

Cookbook

Index vs codepoint — they're different numbers

A glyph with a name but no codepoint

A glyph with a codepoint but no name

Advance vs ink width

Reading the output header

Edge cases and what actually happens

Index used as a stable identifier across fonts

Glyph has no name (post / CFF charset absent)

Glyph has no codepoint (unencoded)

One glyph, several codepoints

Advance differs sharply from ink width

Negative yMin / descenders below the baseline

Bounding box disagrees with glyph-table extents

Header shows sampled below total_glyphs

Frequently asked questions

What's the difference between a glyph index and a Unicode codepoint?

Why doesn't every glyph have a name?

Why doesn't every glyph have a Unicode value?

Why does the inspector show only one codepoint per glyph?

What's the .notdef glyph?

How do I read advance width per glyph?

What coordinate system is the svgPath in?

Why is yMin negative on some glyphs?

What are units_per_em and why does it matter?

Why might the bounding box differ from the glyph table values?

What does sampled vs total_glyphs tell me?

How do I get a font's table list or overall stats instead of per-glyph data?

Privacy first

Related guides