diff --git a/OTFbuild/CLAUDE.md b/OTFbuild/CLAUDE.md index f49833b..baf3203 100644 --- a/OTFbuild/CLAUDE.md +++ b/OTFbuild/CLAUDE.md @@ -118,11 +118,11 @@ print(f"{name}: advance={w}, has_outlines={has_outlines}") ### OpenType features generated (`opentype_features.py`) -- **ccmp** — replacewith expansions (DFLT); consonant-to-PUA mapping + vowel decompositions + anusvara upper (dev2); vowel decompositions (tml2) +- **ccmp** — replacewith expansions (DFLT); consonant-to-PUA mapping + vowel decompositions + anusvara upper (dev2/deva); vowel decompositions (tml2) - **kern** — pair positioning from `keming_machine.py` - **liga** — Latin ligatures (ff, fi, fl, ffi, ffl, st) and Armenian ligatures -- **locl** — Bulgarian/Serbian Cyrillic alternates; Devanagari consonant-to-PUA mapping + vowel decompositions + anusvara upper (dev2, duplicated from ccmp for DirectWrite compatibility) -- **nukt, akhn, half, blwf, cjct, pres, blws, rphf, abvs, psts, calt** — Devanagari complex script shaping (all under `script dev2`) +- **locl** — Bulgarian/Serbian Cyrillic alternates; Devanagari consonant-to-PUA mapping + vowel decompositions + anusvara upper (dev2/deva, duplicated from ccmp for DirectWrite compatibility) +- **nukt, akhn, half, blwf, cjct, pres, blws, rphf, abvs, psts, calt** — Devanagari complex script shaping (all under both `script dev2` and `script deva`) - **pres** (tml2) — Tamil consonant+vowel ligatures - **pres** (sund) — Sundanese diacritic combinations - **ljmo, vjmo, tjmo** — Hangul jamo positional variants @@ -204,23 +204,125 @@ Implication: GSUB rules that need to match pre-base matras adjacent to post-base ### Cross-platform shaper differences (DirectWrite, CoreText, HarfBuzz) -The three major shapers behave differently for Devanagari (dev2): +The three major shapers behave differently for Devanagari. The font registers all Devanagari features under **both** `dev2` (new Indic) and `deva` (old Indic) script tags. HarfBuzz and DirectWrite use `dev2`; CoreText uses `deva`. -**DirectWrite (Windows)**: -- Feature order: `locl` → `nukt` → `akhn` → `rphf` → `rkrf` → `blwf` → `half` → `vatu` → `cjct` → `pres` → `abvs` → `blws` → `psts` → `haln` → `calt` → GPOS: `kern` → `dist` → `abvm` → `blwm` -- **Does NOT apply `ccmp`** for the dev2 script. All lookups that must run before `nukt` (e.g. consonant-to-PUA mapping) must be registered under `locl` instead. -- Tests reph eligibility via `would_substitute([RA, virama], rphf)` using **original Unicode codepoints** (before locl/ccmp). The `rphf` feature must include a rule with the Unicode form of RA, not just the PUA form. +#### Script tag selection -**CoreText (macOS)**: -- Applies `ccmp` but may do so **after** reordering (unlike HarfBuzz which applies ccmp before reordering). This means pre-base matras (I-matra U+093F) are already reordered before the consonant, breaking adjacency rules like `sub 093F 0902'`. -- Tests reph eligibility using `would_substitute()` with Unicode codepoints, same as DirectWrite. -- Solution: add wider-context fallback rules in `abvs` (post-reordering) that match I-matra separated from anusvara by 1-3 intervening glyphs. +| Shaper | Script tag used | Indic model | +|---|---|---| +| HarfBuzz | `dev2` | New Indic (ot-indic2) | +| DirectWrite | `dev2` | New Indic | +| CoreText | `deva` | Old Indic | -**HarfBuzz (reference)**: -- Applies `ccmp` **before** reordering (Unicode order). -- Reph detection is pattern-based (RA + halant + consonant at syllable start), not feature-based. -- Most lenient — works with PUA-only rules. +Both tags must exist, and all GSUB/GPOS features must be registered under both, otherwise CoreText silently breaks. -**Practical implication**: Define standalone lookups (e.g. `DevaConsonantMap`, `DevaVowelDecomp`) **outside** any feature block, then reference them from both `locl` and `ccmp`. This ensures DirectWrite (via locl) and HarfBuzz (via ccmp) both fire the lookups. The second application is a no-op since glyphs are already transformed. +#### Feature order differences + +**HarfBuzz (dev2, reference implementation)**: +1. Pre-reordering: `locl` → `ccmp` +2. Reordering (I-matra moves before consonant, reph moves to end) +3. Post-reordering: `nukt` → `akhn` → `rphf` → `half` → `blwf` → `cjct` → `pres` → `abvs` → `blws` → `psts` → `haln` → `calt` +4. GPOS: `kern` → `abvm` → `blwm` + +**DirectWrite (dev2)**: +- `locl` → `nukt` → `akhn` → `rphf` → `rkrf` → `blwf` → `half` → `vatu` → `cjct` → `pres` → `abvs` → `blws` → `psts` → `haln` → `calt` +- GPOS: `kern` → `dist` → `abvm` → `blwm` +- **Does NOT apply `ccmp`** for the dev2 script. All lookups that must run before `nukt` (e.g. consonant-to-PUA mapping, anusvara upper) must be registered under `locl` instead. + +**CoreText (deva)**: +- Applies `locl` and `ccmp`, but may apply `ccmp` **after** reordering (unlike HarfBuzz). +- Post-reordering features same as above: `nukt` → `akhn` → `rphf` → ... → `abvs` → ... → `psts` +- GPOS: `kern` → `abvm` (+ `mark`/`mkmk` if registered under `deva`) + +#### Key behavioural differences + +**1. ccmp timing (CoreText vs HarfBuzz)** + +HarfBuzz applies `ccmp` in Unicode order (before reordering). CoreText may apply it after reordering. This breaks adjacency-based rules: + +``` +# In ccmp — works on HarfBuzz (Unicode order: C + matra + anusvara): +sub uni093F uni0902' lookup AnusvaraUpper; # I-matra + anusvara + +# After reordering on CoreText: I-matra + [consonants] + anusvara +# The I-matra and anusvara are no longer adjacent → rule fails +``` + +**Fix**: duplicate these rules in `abvs` (post-reordering) with wildcard gaps: +``` +sub uni093F @devaAny uni0902' lookup AnusvaraUpper; +sub uni093F @devaAny @devaAny uni0902' lookup AnusvaraUpper; +``` + +**2. Reph eligibility testing** + +| Shaper | Method | +|---|---| +| HarfBuzz | Pattern-based (RA + halant + consonant at syllable start) | +| DirectWrite | `would_substitute([RA, virama], rphf)` with **Unicode** codepoints | +| CoreText | `would_substitute()` with Unicode codepoints (same as DW) | + +The `rphf` feature must include a rule with the Unicode form of RA (`uni0930`), not just the PUA form. Otherwise DW and CT won't detect reph. + +**3. Within-lookup glyph visibility (CoreText)** + +In OpenType, a single lookup processes the glyph string left-to-right. Per spec, a substitution at position N should be visible when the lookup reaches position N+1. CoreText appears to **not** propagate substitutions within a single lookup pass to subsequent positions' backtrack context. + +Example: two rules in one anonymous lookup: +``` +sub @trigger uF010C' lookup ComplexReph; # rule at pos N: uF010C → uF010D +sub uF010D uF016C' lookup AnusvaraLower; # rule at pos N+1: needs uF010D in backtrack +``` + +On HarfBuzz/DirectWrite, rule 2 sees the updated `uF010D` at position N. On CoreText, it still sees the original `uF010C` → rule 2 fails to match. + +**Fix**: split into separate **named lookups** so each runs as an independent pass: +``` +lookup AbvsPass1 { + sub @trigger uF010C' lookup ComplexReph; +} AbvsPass1; +lookup AbvsPass2 { + sub uF010D uF016C' lookup AnusvaraLower; +} AbvsPass2; +feature abvs { + script dev2; lookup AbvsPass1; lookup AbvsPass2; + script deva; lookup AbvsPass1; lookup AbvsPass2; +} abvs; +``` + +**4. GPOS mark stacking heuristics** + +When two marks share the same base without MarkToMark, each shaper applies different internal Y adjustments: + +| Shaper | Internal Y shift | +|---|---| +| HarfBuzz | 0 (no heuristic) | +| DirectWrite | -100 | +| CoreText | -200 | + +No single GPOS Y value satisfies all three. **Fix**: use explicit MarkToMark positioning (e.g. `AnusvaraToComplexReph`) which suppresses shaper heuristics and gives consistent results across all three. + +**5. GPOS double-application with dev2+deva** + +When both script tags exist, CoreText/DirectWrite may merge lookup lists from both scripts. Inline (anonymous) GPOS rules create separate lookups per script → cumulative positioning doubles. **Fix**: use **named lookups** for all GPOS contextual positioning so both scripts reference the same lookup index. + +**6. mark/mkmk feature scoping** + +The `mark` and `mkmk` features are registered under `deva` (for CoreText) but **not** `dev2`. Under `dev2`, all mark positioning goes through `abvm` instead. This prevents double-application on HarfBuzz/DirectWrite where `abvm` already contains the same mark/mkmk lookups. + +``` +# GPOS features per script: +# dev2/dflt: abvm kern +# deva/dflt: abvm kern mark mkmk +``` + +#### Practical rules + +1. **Standalone lookups**: define all substitution/positioning lookups (e.g. `DevaConsonantMap`, `DevaVowelDecomp`, `ComplexReph`) **outside** any feature block, then reference from both `locl`/`ccmp` and script-specific features. +2. **locl mirrors ccmp** for Devanagari: DirectWrite skips `ccmp`, so anything that must run early (consonant mapping, anusvara upper, vowel decomposition) must also be in `locl`. +3. **abvs post-reordering fallbacks**: rules that depend on matra+anusvara adjacency (broken by reordering on CoreText) need wildcard-gap variants in `abvs`. +4. **Separate lookup passes**: if rule B's backtrack context depends on rule A's output at an adjacent position, put them in separate named lookups. CoreText may not propagate within-pass substitutions. +5. **Named GPOS lookups**: all contextual GPOS rules must use named lookups to avoid double-application across dev2/deva. +6. **MarkToMark for multi-mark stacking**: never rely on shaper heuristics for positioning multiple marks on the same base — always provide explicit MarkToMark. Source: [Microsoft Devanagari shaping spec](https://learn.microsoft.com/en-us/typography/script-development/devanagari) diff --git a/OTFbuild/calligra_font_tests.odt b/OTFbuild/calligra_font_tests.odt index 17d1296..0e47f49 100644 Binary files a/OTFbuild/calligra_font_tests.odt and b/OTFbuild/calligra_font_tests.odt differ diff --git a/OTFbuild/opentype_features.py b/OTFbuild/opentype_features.py index 01af629..40e2c7f 100644 --- a/OTFbuild/opentype_features.py +++ b/OTFbuild/opentype_features.py @@ -1100,7 +1100,12 @@ def _generate_devanagari(glyphs, has, replacewith_subs=None): deva_any_glyphs = [glyph_name(cp) for cp in sorted(set(deva_any_cps)) if has(cp)] abvs_lookups = [] - abvs_body = [] + # Split abvs into two named lookups so that AnusvaraLower (pass 2) + # sees the ComplexReph output (pass 1). CoreText may not update the + # glyph stream within a single lookup pass, so rules that depend on + # earlier substitutions at adjacent positions must be in separate lookups. + abvs_pass1_rules = [] # ComplexReph + AnusvaraUpper + abvs_pass2_rules = [] # AnusvaraLower (needs to see uF010D from pass 1) if has(SC.DEVANAGARI_RA_SUPER) and has(SC.DEVANAGARI_RA_SUPER_COMPLEX) and deva_any_glyphs: trigger_cps = ( @@ -1121,48 +1126,66 @@ def _generate_devanagari(glyphs, has, replacewith_subs=None): abvs_lookups.append(f" sub {reph} by {complex_reph};") abvs_lookups.append(f"}} ComplexReph;") - abvs_body.append(f" @complexRephTriggers = [{' '.join(trigger_glyphs)}];") + abvs_pass1_rules.append(f" @complexRephTriggers = [{' '.join(trigger_glyphs)}];") # Rule 1: trigger mark/vowel immediately before reph - abvs_body.append(f" sub @complexRephTriggers {reph}' lookup ComplexReph;") + abvs_pass1_rules.append(f" sub @complexRephTriggers {reph}' lookup ComplexReph;") # Rules 2-4: i-matra separated from reph by 1-3 intervening glyphs - abvs_body.append(f" sub {glyph_name(0x093F)} @devaAny {reph}' lookup ComplexReph;") - abvs_body.append(f" sub {glyph_name(0x093F)} @devaAny @devaAny {reph}' lookup ComplexReph;") - abvs_body.append(f" sub {glyph_name(0x093F)} @devaAny @devaAny @devaAny {reph}' lookup ComplexReph;") + abvs_pass1_rules.append(f" sub {glyph_name(0x093F)} @devaAny {reph}' lookup ComplexReph;") + abvs_pass1_rules.append(f" sub {glyph_name(0x093F)} @devaAny @devaAny {reph}' lookup ComplexReph;") + abvs_pass1_rules.append(f" sub {glyph_name(0x093F)} @devaAny @devaAny @devaAny {reph}' lookup ComplexReph;") # Post-reordering anusvara upper: catch I-matra separated from # anusvara by reordering (1-3 intervening consonants/marks). # On HarfBuzz, ccmp already handled this (no-op here); on CoreText, # ccmp may run after reordering so the adjacency rule didn't match. if has(0x093F) and has(0x0902) and has(anusvara_upper) and deva_any_glyphs: - abvs_body.append(f" sub {glyph_name(0x093F)} @devaAny" + abvs_pass1_rules.append(f" sub {glyph_name(0x093F)} @devaAny" f" {glyph_name(0x0902)}' lookup AnusvaraUpper;") - abvs_body.append(f" sub {glyph_name(0x093F)} @devaAny @devaAny" + abvs_pass1_rules.append(f" sub {glyph_name(0x093F)} @devaAny @devaAny" f" {glyph_name(0x0902)}' lookup AnusvaraUpper;") - abvs_body.append(f" sub {glyph_name(0x093F)} @devaAny @devaAny @devaAny" + abvs_pass1_rules.append(f" sub {glyph_name(0x093F)} @devaAny @devaAny @devaAny" f" {glyph_name(0x0902)}' lookup AnusvaraUpper;") # Reverse anusvara upper when complex reph is present: the vertical # offset is encoded in the glyph choice (uni0902 sits 2px lower than # uF016C), so when complex reph precedes, use the lower form. + # This MUST be in a separate lookup from ComplexReph so it runs in a + # second pass and sees the uF010C→uF010D substitution output. if has(0x0902) and has(anusvara_upper) and has(SC.DEVANAGARI_RA_SUPER_COMPLEX): abvs_lookups.append(f"lookup AnusvaraLower {{") abvs_lookups.append(f" sub {glyph_name(anusvara_upper)} by {glyph_name(0x0902)};") abvs_lookups.append(f"}} AnusvaraLower;") - abvs_body.append( + abvs_pass2_rules.append( f" sub {glyph_name(SC.DEVANAGARI_RA_SUPER_COMPLEX)}" f" {glyph_name(anusvara_upper)}' lookup AnusvaraLower;" ) - if abvs_body: + has_abvs = abvs_pass1_rules or abvs_pass2_rules + if has_abvs: abvs_lines = abvs_lookups[:] if abvs_lookups: abvs_lines.append("") + # Pass 1: ComplexReph + AnusvaraUpper + if abvs_pass1_rules: + abvs_lines.append("lookup AbvsPass1 {") + if deva_any_glyphs: + abvs_lines.append(f" @devaAny = [{' '.join(deva_any_glyphs)}];") + abvs_lines.extend(abvs_pass1_rules) + abvs_lines.append("} AbvsPass1;") + abvs_lines.append("") + # Pass 2: AnusvaraLower (sees ComplexReph output) + if abvs_pass2_rules: + abvs_lines.append("lookup AbvsPass2 {") + abvs_lines.extend(abvs_pass2_rules) + abvs_lines.append("} AbvsPass2;") + abvs_lines.append("") abvs_lines.append("feature abvs {") for _st in ['dev2', 'deva']: abvs_lines.append(f" script {_st};") - if deva_any_glyphs: - abvs_lines.append(f" @devaAny = [{' '.join(deva_any_glyphs)}];") - abvs_lines.extend(abvs_body) + if abvs_pass1_rules: + abvs_lines.append(" lookup AbvsPass1;") + if abvs_pass2_rules: + abvs_lines.append(" lookup AbvsPass2;") abvs_lines.append("} abvs;") features.append('\n'.join(abvs_lines)) @@ -1807,23 +1830,25 @@ def _generate_mark(glyphs, has): lines.append("") mkmk_lookup_names.append(mkmk_name) - # Register MarkToBase lookups under mark for non-Devanagari scripts. - # For dev2/deva, abvm already includes these lookups. Registering - # mark/mkmk under dev2/deva too risks double-application on shapers - # (CoreText, DirectWrite) that may process mark AND abvm separately. - _NON_DEVA_SCRIPTS = ['DFLT', 'latn', 'cyrl', 'grek', 'hang', 'tml2', 'sund'] + # Register MarkToBase lookups under mark. + # dev2 is excluded: HarfBuzz/DirectWrite use abvm for Devanagari marks. + # deva is INCLUDED: CoreText's old-Indic shaper may need mark/mkmk + # in addition to abvm for correct mark attachment. + # Double-application is safe because MarkToBase is idempotent and + # all ChainContextPos adjustments are X-only (no Y doubling risk). + _MARK_SCRIPTS = ['DFLT', 'latn', 'cyrl', 'grek', 'hang', 'tml2', 'sund', 'deva'] lines.append("feature mark {") - for _st in _NON_DEVA_SCRIPTS: + for _st in _MARK_SCRIPTS: lines.append(f" script {_st};") for ln in lookup_names: lines.append(f" lookup {ln};") lines.append("} mark;") - # Register MarkToMark lookups under mkmk (non-Devanagari only) + # Register MarkToMark lookups under mkmk (includes deva for CoreText) if mkmk_lookup_names: lines.append("") lines.append("feature mkmk {") - for _st in _NON_DEVA_SCRIPTS: + for _st in _MARK_SCRIPTS: lines.append(f" script {_st};") for ln in mkmk_lookup_names: lines.append(f" lookup {ln};")