9.4 KiB
OTFbuild
Python toolchain that builds an OpenType (CFF) and Web Open Font (WOFF2) font from the TGA sprite sheets used by the bitmap font engine.
Building
# builds both OTF and WOFF2
make all
Debugging with HarfBuzz
Install uharfbuzz for shaping tests:
pip install uharfbuzz
Shape text and inspect glyph substitutions, advances, and positioning:
import uharfbuzz as hb
from fontTools.ttLib import TTFont
with open('OTFbuild/TerrarumSansBitmap.otf', 'rb') as f:
font_data = f.read()
blob = hb.Blob(font_data)
face = hb.Face(blob)
font = hb.Font(face)
text = "ऐतिहासिक"
buf = hb.Buffer()
buf.add_str(text)
buf.guess_segment_properties()
hb.shape(font, buf)
ttfont = TTFont('OTFbuild/TerrarumSansBitmap.otf')
glyph_order = ttfont.getGlyphOrder()
for info, pos in zip(buf.glyph_infos, buf.glyph_positions):
name = glyph_order[info.codepoint]
print(f" {name} advance=({pos.x_advance},{pos.y_advance}) cluster={info.cluster}")
Key things to check:
- advance=(0,0) on a visible character means the glyph is zero-width (likely missing outline or failed GSUB substitution)
- glyph name starts with
uF0means GSUB substituted to an internal PUA form (expected for Devanagari consonants, Hangul jamo variants, etc.) - cluster groups glyphs that originated from the same input character(s)
Inspecting GSUB tables
from fontTools.ttLib import TTFont
font = TTFont('OTFbuild/TerrarumSansBitmap.otf')
gsub = font['GSUB']
# List scripts and their features
for sr in gsub.table.ScriptList.ScriptRecord:
tag = sr.ScriptTag
if sr.Script.DefaultLangSys:
for idx in sr.Script.DefaultLangSys.FeatureIndex:
fr = gsub.table.FeatureList.FeatureRecord[idx]
print(f" {tag}/{fr.FeatureTag}: lookups={fr.Feature.LookupListIndex}")
# Inspect a specific lookup's substitution mappings
lookup = gsub.table.LookupList.Lookup[18] # e.g. DevaConsonantMap
for st in lookup.SubTable:
for src, dst in st.mapping.items():
print(f" {src} -> {dst}")
Checking glyph outlines and metrics
font = TTFont('OTFbuild/TerrarumSansBitmap.otf')
hmtx = font['hmtx']
cff = font['CFF ']
name = 'uni0915' # Devanagari KA
w, lsb = hmtx[name]
cs = cff.cff.topDictIndex[0].CharStrings[name]
cs.decompile()
has_outlines = len(cs.program) > 2 # more than just width + endchar
print(f"{name}: advance={w}, has_outlines={has_outlines}")
Architecture
Build pipeline (font_builder.py)
- Parse sheets —
glyph_parser.pyreads each TGA sprite sheet, extracts per-glyph bitmaps and tag-column metadata (width, alignment, diacritics anchors, kerning data, directives) - Compose Hangul —
hangul.pyassembles 11,172 precomposed Hangul syllables from jamo components and stores jamo variants in PUA for GSUB - Populate Devanagari — consonants U+0915-0939 have width=0 in the sprite sheet (the Kotlin engine normalises them to PUA forms); the builder copies PUA glyph data back to the Unicode positions so they render without GSUB
- Expand replacewith — glyphs with the
replacewithdirective (opcode 0x80-0x87) are collected for GSUB multiple substitution (e.g. U+0910 -> U+090F U+0947) - Build glyph order and cmap — PUA internal forms (0xF0000-0xF0FFF) get glyphs but no cmap entries
- Trace bitmaps —
bitmap_tracer.pyconverts 1-bit bitmaps to CFF rectangle contours (50 units/pixel) - Set metrics — hmtx, hhea, OS/2, head, name, post tables
- OpenType features —
opentype_features.pygenerates feaLib code, compiled viafontTools.feaLib - Bitmap strike — optional EBDT/EBLC at 20ppem via TTX import
Module overview
| Module | Purpose |
|---|---|
build_font.py |
CLI entry point |
font_builder.py |
Orchestrates the build pipeline |
sheet_config.py |
Sheet indices, code ranges, index functions, metric constants, Hangul/Devanagari/Tamil/Sundanese constants |
glyph_parser.py |
TGA sprite sheet parsing; extracts bitmaps and tag-column properties |
tga_reader.py |
Low-level TGA image reader |
bitmap_tracer.py |
Converts 1-bit bitmaps to CFF outlines (rectangle merging) |
opentype_features.py |
Generates GSUB/GPOS feature code for feaLib |
keming_machine.py |
Generates kerning pairs from glyph kern masks |
hangul.py |
Hangul syllable composition and jamo GSUB data |
otf2woff2.py |
OTF to WOFF2 wrapper |
OpenType features generated (opentype_features.py)
- ccmp — replacewith expansions (DFLT); consonant-to-PUA mapping + vowel decompositions + anusvara upper (dev2); vowel decompositions (tml2)
- kern — pair positioning from
keming_machine.py - liga — Latin ligatures (ff, fi, fl, ffi, ffl, st) and Armenian ligatures
- locl — Bulgarian/Serbian Cyrillic alternates
- nukt, akhn, half, vatu, pres, blws, rphf — Devanagari complex script shaping (all under
script dev2) - pres (tml2) — Tamil consonant+vowel ligatures
- pres (sund) — Sundanese diacritic combinations
- ljmo, vjmo, tjmo — Hangul jamo positional variants
- mark — GPOS mark-to-base diacritics positioning
- mkmk — GPOS mark-to-mark diacritics stacking (successive marks shift by H_DIACRITICS)
Devanagari PUA mapping
The bitmap font engine normalises Devanagari consonants to internal PUA forms before rendering. The OTF builder mirrors this:
| Unicode range | PUA range | Purpose |
|---|---|---|
| U+0915-0939 | 0xF0140-0xF0164 | Base consonants |
| U+0915-0939 +48 | 0xF0170-0xF0194 | Nukta forms (consonant + U+093C) |
| U+0915-0939 +240 | 0xF0230-0xF0254 | Half forms (consonant + virama) |
| U+0915-0939 +480 | 0xF0320-0xF0404 | RA-appended forms (consonant + virama + RA) |
| U+0915-0939 +720 | 0xF0410-0xF04F4 | RA-appended half forms (consonant + virama + RA + virama) |
Mapping formula: to_deva_internal(c) = c - 0x0915 + 0xF0140 for U+0915-0939.
Script tag gotcha
When a script-specific feature exists in GSUB (e.g. ccmp under dev2), HarfBuzz uses only the script-specific lookups and does not fall back to the DFLT script's lookups for that feature. Any substitutions needed for a specific script must be registered under that script's tag.
languagesystem and language records
The languagesystem declarations in the preamble control which script/language records are created in the font tables. Key rules:
languagesystemdeclarations must be at the top level of the feature file, not inside anyfeatureblock. Putting them insidefeature aalt { }is invalid feaLib syntax and causes silent compilation failure.- When a language-specific record exists (e.g.
dev2/MARfromlanguagesystem dev2 MAR;), features registered underscript dev2;only populatedev2/dflt— they are not automatically copied todev2/MAR. The language record inherits only from DFLT, resulting in incomplete feature sets. - Only declare language-specific records when you have
loclor other language-differentiated features. Otherwise, use onlylanguagesystem <script> dflt;to avoid partial feature inheritance that breaks DirectWrite and CoreText.
Inspecting feature registration per script
To verify that features are correctly registered under each script:
from fontTools.ttLib import TTFont
font = TTFont('OTFbuild/TerrarumSansBitmap.otf')
gsub = font['GSUB']
for sr in gsub.table.ScriptList.ScriptRecord:
tag = sr.ScriptTag
if sr.Script.DefaultLangSys:
feats = []
for idx in sr.Script.DefaultLangSys.FeatureIndex:
fr = gsub.table.FeatureList.FeatureRecord[idx]
feats.append(fr.FeatureTag)
print(f"{tag}/dflt: {' '.join(sorted(set(feats)))}")
for lsr in (sr.Script.LangSysRecord or []):
feats = []
for idx in lsr.LangSys.FeatureIndex:
fr = gsub.table.FeatureList.FeatureRecord[idx]
feats.append(fr.FeatureTag)
print(f"{tag}/{lsr.LangSysTag}: {' '.join(sorted(set(feats)))}")
Expected output for dev2: dev2/dflt: abvs akhn blwf blws calt ccmp cjct half liga nukt pres psts rphf. If language-specific records (e.g. dev2/MAR) appear with only ccmp liga, the language records have incomplete feature inheritance — remove the corresponding languagesystem declaration.
Debugging feature compilation failures
The build writes debugout_features.fea with the raw feature code before compilation. When compilation fails, inspect this file to find syntax errors. Common issues:
languagesysteminside a feature block — must be at the top level- Named lookup defined inside a feature block — applies unconditionally to all input. Define the lookup outside the feature block and reference it via contextual rules inside.
- Glyph not in font — a substitution references a glyph name that doesn't exist in the font's glyph order (e.g. a control character was removed)
HarfBuzz Indic shaper (dev2) feature order
Understanding feature application order is critical for Devanagari debugging:
- Pre-reordering (Unicode order):
ccmp - Reordering: HarfBuzz reorders pre-base matras (e.g. I-matra U+093F moves before the consonant)
- Post-reordering:
nukt→akhn→rphf→half→blwf→cjct→pres→abvs→blws→psts→haln→calt - GPOS:
kern→mark/abvm→mkmk
Implication: GSUB rules that need to match pre-base matras adjacent to post-base marks (e.g. anusvara substitution triggered by I-matra) must go in ccmp, not psts, because reordering separates them.