somewhat working in CoreText, not at all in DirectWrite

2026-06-17 02:44:05 +09:00 · 2026-03-02 21:02:55 +09:00
parent 73c2b6986d
commit 3e79181aa3
4 changed files with 108 additions and 21 deletions
--- a/OTFbuild/CLAUDE.md
+++ b/OTFbuild/CLAUDE.md
@@ -118,7 +118,7 @@ print(f"{name}: advance={w}, has_outlines={has_outlines}")

 ### OpenType features generated (`opentype_features.py`)

- **ccmp** — replacewith expansions (DFLT); consonant-to-PUA mapping + vowel decompositions (dev2)
+- **ccmp** — replacewith expansions (DFLT); consonant-to-PUA mapping + vowel decompositions + anusvara upper (dev2); vowel decompositions (tml2)
 - **kern** — pair positioning from `keming_machine.py`
 - **liga** — Latin ligatures (ff, fi, fl, ffi, ffl, st) and Armenian ligatures
 - **locl** — Bulgarian/Serbian Cyrillic alternates
@@ -127,6 +127,7 @@ print(f"{name}: advance={w}, has_outlines={has_outlines}")
 - **pres** (sund) — Sundanese diacritic combinations
 - **ljmo, vjmo, tjmo** — Hangul jamo positional variants
 - **mark** — GPOS mark-to-base diacritics positioning
+- **mkmk** — GPOS mark-to-mark diacritics stacking (successive marks shift by H_DIACRITICS)

 ### Devanagari PUA mapping

@@ -145,3 +146,58 @@ Mapping formula: `to_deva_internal(c)` = `c - 0x0915 + 0xF0140` for U+0915-0939.
 ### Script tag gotcha

 When a script-specific feature exists in GSUB (e.g. `ccmp` under `dev2`), HarfBuzz uses **only** the script-specific lookups and does **not** fall back to the DFLT script's lookups for that feature. Any substitutions needed for a specific script must be registered under that script's tag.
+
+### languagesystem and language records
+
+The `languagesystem` declarations in the preamble control which script/language records are created in the font tables. Key rules:
+
+- `languagesystem` declarations must be at the **top level** of the feature file, not inside any `feature` block. Putting them inside `feature aalt { }` is invalid feaLib syntax and causes silent compilation failure.
+- When a language-specific record exists (e.g. `dev2/MAR` from `languagesystem dev2 MAR;`), features registered under `script dev2;` only populate `dev2/dflt` — they are **not** automatically copied to `dev2/MAR`. The language record inherits only from DFLT, resulting in incomplete feature sets.
+- Only declare language-specific records when you have `locl` or other language-differentiated features. Otherwise, use only `languagesystem <script> dflt;` to avoid partial feature inheritance that breaks DirectWrite and CoreText.
+
+### Inspecting feature registration per script
+
+To verify that features are correctly registered under each script:
+
+```python
+from fontTools.ttLib import TTFont
+
+font = TTFont('OTFbuild/TerrarumSansBitmap.otf')
+gsub = font['GSUB']
+
+for sr in gsub.table.ScriptList.ScriptRecord:
+    tag = sr.ScriptTag
+    if sr.Script.DefaultLangSys:
+        feats = []
+        for idx in sr.Script.DefaultLangSys.FeatureIndex:
+            fr = gsub.table.FeatureList.FeatureRecord[idx]
+            feats.append(fr.FeatureTag)
+        print(f"{tag}/dflt: {' '.join(sorted(set(feats)))}")
+    for lsr in (sr.Script.LangSysRecord or []):
+        feats = []
+        for idx in lsr.LangSys.FeatureIndex:
+            fr = gsub.table.FeatureList.FeatureRecord[idx]
+            feats.append(fr.FeatureTag)
+        print(f"{tag}/{lsr.LangSysTag}: {' '.join(sorted(set(feats)))}")
+```
+
+Expected output for dev2: `dev2/dflt: abvs akhn blwf blws calt ccmp cjct half liga nukt pres psts rphf`. If language-specific records (e.g. `dev2/MAR`) appear with only `ccmp liga`, the language records have incomplete feature inheritance — remove the corresponding `languagesystem` declaration.
+
+### Debugging feature compilation failures
+
+The build writes `debugout_features.fea` with the raw feature code before compilation. When compilation fails, inspect this file to find syntax errors. Common issues:
+
+- **`languagesystem` inside a feature block** — must be at the top level
+- **Named lookup defined inside a feature block** — applies unconditionally to all input. Define the lookup outside the feature block and reference it via contextual rules inside.
+- **Glyph not in font** — a substitution references a glyph name that doesn't exist in the font's glyph order (e.g. a control character was removed)
+
+### HarfBuzz Indic shaper (dev2) feature order
+
+Understanding feature application order is critical for Devanagari debugging:
+
+1. **Pre-reordering** (Unicode order): `ccmp`
+2. **Reordering**: HarfBuzz reorders pre-base matras (e.g. I-matra U+093F moves before the consonant)
+3. **Post-reordering**: `nukt` → `akhn` → `rphf` → `half` → `blwf` → `cjct` → `pres` → `abvs` → `blws` → `psts` → `haln` → `calt`
+4. **GPOS**: `kern` → `mark`/`abvm` → `mkmk`
+
+Implication: GSUB rules that need to match pre-base matras adjacent to post-base marks (e.g. anusvara substitution triggered by I-matra) must go in `ccmp`, not `psts`, because reordering separates them.
--- a/OTFbuild/font_builder.py
+++ b/OTFbuild/font_builder.py
@@ -401,11 +401,12 @@ def build_font(assets_dir, output_path, no_bitmap=False, no_features=False):
        if fea_code.strip():
            print("  Compiling features with feaLib...")
            try:
-                fea_stream = io.StringIO(fea_code)
-                addOpenTypeFeatures(font, fea_stream)
                # Obtain raw .fea text for debugging
                with open("debugout_features.fea", "w") as text_file:
                    text_file.write(fea_code)
+
+                fea_stream = io.StringIO(fea_code)
+                addOpenTypeFeatures(font, fea_stream)
                print("  Features compiled successfully")
            except Exception as e:
                print(f"  [WARNING] Feature compilation failed: {e}")
--- a/OTFbuild/opentype_features.py
+++ b/OTFbuild/opentype_features.py
@@ -48,7 +48,7 @@ def generate_features(glyphs, kern_pairs, font_glyph_set,
    def has(cp):
        return glyph_name(cp) in font_glyph_set

-    preamble = """feature aalt {
+    preamble = """\
 languagesystem DFLT dflt;
 languagesystem latn dflt;
 languagesystem cyrl dflt;
@@ -57,13 +57,9 @@ languagesystem hang KOR ;
 languagesystem hang KOH ;
 languagesystem cyrl SRB ;
 languagesystem cyrl BGR ;
-languagesystem dev2 MAR ;
-languagesystem dev2 NEP ;
-languagesystem dev2 SAN ;
-languagesystem dev2 SAT ;
-languagesystem tml2 TAM ;
-languagesystem sund SUN ;
-} aalt;
+languagesystem dev2 dflt;
+languagesystem tml2 dflt;
+languagesystem sund dflt;
 """
    if preamble:
        parts.append(preamble)
@@ -99,7 +95,7 @@ languagesystem sund SUN ;
        parts.append(deva_code)

    # Tamil features
-    tamil_code = _generate_tamil(glyphs, has)
+    tamil_code = _generate_tamil(glyphs, has, replacewith_subs or [])
    if tamil_code:
        parts.append(tamil_code)

@@ -127,12 +123,26 @@ languagesystem sund SUN ;


 def _generate_ccmp(replacewith_subs, has):
-    """Generate ccmp feature for replacewith directives (multiple substitution)."""
+    """Generate ccmp feature for replacewith directives (multiple substitution).
+
+    Devanagari (0x0900-097F) and Tamil (0x0B80-0BFF) source codepoints are
+    excluded here because their ccmp lookups must live under the script-
+    specific tags (dev2, tml2).  DirectWrite and CoreText do not fall back
+    from a script-specific ccmp to DFLT.
+    """
    if not replacewith_subs:
        return ""

+    # Ranges handled by script-specific ccmp features
+    _SCRIPT_RANGES = (
+        range(0x0900, 0x0980),  # Devanagari → dev2 ccmp
+        range(0x0B80, 0x0C00),  # Tamil → tml2 ccmp
+    )
+
    subs = []
    for src_cp, target_cps in replacewith_subs:
+        if any(src_cp in r for r in _SCRIPT_RANGES):
+            continue
        if not has(src_cp):
            continue
        if not all(has(t) for t in target_cps):
@@ -1399,8 +1409,28 @@ def _generate_psts_open_ya(glyphs, has):
    return lookups, body


-def _generate_tamil(glyphs, has):
-    """Generate Tamil GSUB features."""
+def _generate_tamil(glyphs, has, replacewith_subs=None):
+    """Generate Tamil GSUB features (ccmp + pres under tml2)."""
+    features = []
+
+    # --- tml2 ccmp: Tamil replacewith decompositions ---
+    # Must be under tml2 so DirectWrite/CoreText see them.
+    if replacewith_subs:
+        tamil_ccmp = []
+        for src_cp, target_cps in replacewith_subs:
+            if not (0x0B80 <= src_cp <= 0x0BFF):
+                continue
+            if not has(src_cp) or not all(has(t) for t in target_cps):
+                continue
+            src = glyph_name(src_cp)
+            targets = ' '.join(glyph_name(t) for t in target_cps)
+            tamil_ccmp.append(f"        sub {src} by {targets};")
+        if tamil_ccmp:
+            features.append("feature ccmp {\n    script tml2;\n"
+                            "    lookup TamilDecomp {\n"
+                            + '\n'.join(tamil_ccmp)
+                            + "\n    } TamilDecomp;\n} ccmp;")
+
    subs = []

    _tamil_i_rules = [
@@ -1436,13 +1466,13 @@ def _generate_tamil(glyphs, has):
    if has(0x0BB8) and has(0x0BCD) and has(0x0BB0) and has(0x0BC0) and has(SC.TAMIL_SHRII):
        subs.append(f"    sub {glyph_name(0x0BB8)} {glyph_name(0x0BCD)} {glyph_name(0x0BB0)} {glyph_name(0x0BC0)} by {glyph_name(SC.TAMIL_SHRII)}; # SHRII (sa)")

-    if not subs:
-        return ""
+    if subs:
+        lines = ["feature pres {", "    script tml2;"]
+        lines.extend(subs)
+        lines.append("} pres;")
+        features.append('\n'.join(lines))

-    lines = ["feature pres {", "    script tml2;"]
-    lines.extend(subs)
-    lines.append("} pres;")
-    return '\n'.join(lines)
+    return '\n\n'.join(features) if features else ""


 def _generate_sundanese(glyphs, has):
--- a/demo.PNG
+++ b/demo.PNG