110 Commits

Author SHA1 Message Date
minjaesong
4dbcbb7df5 working on Han char sheet 2026-03-15 22:11:24 +09:00
minjaesong
ad4a044ace ascii fractions to use char decompo 2026-03-15 01:18:43 +09:00
minjaesong
9ca8117b5f coptic 2026-03-15 01:00:23 +09:00
minjaesong
76f223aee8 ogham 2026-03-14 19:32:15 +09:00
minjaesong
662dc5b093 glyph texture atlas (2) 2026-03-14 16:06:31 +09:00
minjaesong
f4e1db5846 glyph texture atlas 2026-03-14 14:24:23 +09:00
minjaesong
1c7471ccf3 added missing glyphs in Greek and Coptic block 2026-03-14 11:14:18 +09:00
minjaesong
5fca96a861 fix: number forms using wrong slash 2026-03-13 21:17:25 +09:00
minjaesong
539a2c9f46 autokem: more filtering 2026-03-13 20:08:56 +09:00
CuriousTorvald
d57707b210 Add funding options for GitHub and PayPal
Updated funding options to include GitHub Sponsors and a PayPal link.
2026-03-13 19:16:55 +09:00
minjaesong
7c4ab9d4be reshaping Tironian et 2026-03-13 19:14:58 +09:00
minjaesong
175fe4edfb demo text update 2026-03-13 14:13:28 +09:00
minjaesong
4d7aa79740 fixed some mislabeling 2026-03-13 13:59:58 +09:00
minjaesong
9d9efce9d4 Latin Ext F and G 2026-03-13 13:29:43 +09:00
minjaesong
8daa968d80 cyrillic extb relabelling 2026-03-11 09:11:28 +09:00
minjaesong
199519fe1c latin ext e 2026-03-10 22:51:44 +09:00
minjaesong
194cda93fe latin ext-e 2026-03-10 04:02:03 +09:00
minjaesong
71370a82a7 revised autokem model 2026-03-09 23:54:51 +09:00
minjaesong
268610a8b3 revised autokem model 2026-03-09 23:46:28 +09:00
minjaesong
244371aa9d revised autokem model 2026-03-08 20:34:45 +09:00
minjaesong
39603d897b more fixes 2026-03-08 03:12:40 +09:00
minjaesong
2b1f9a866f more fixes 2026-03-08 02:38:03 +09:00
minjaesong
af1d720ec2 anchor fixes 2026-03-08 01:44:17 +09:00
minjaesong
8a52fcfb91 tfw part of your code was full of hacks 2026-03-08 00:32:42 +09:00
minjaesong
b82dbecc30 keming machine dot removal directive for OTF 2026-03-07 22:47:01 +09:00
minjaesong
2008bbf6dd keming machine dot removal directive 2026-03-07 22:41:24 +09:00
minjaesong
163e3d7b3e Arrow and Number Forms unicode block 2026-03-07 22:13:22 +09:00
minjaesong
f26998b641 Cyrillic Ext-C 2026-03-07 21:34:40 +09:00
minjaesong
4121648dfc minor fixes (2) 2026-03-07 21:08:44 +09:00
minjaesong
30de279a08 minor fixes 2026-03-07 21:03:10 +09:00
minjaesong
956599b83f Full Cyrillic, CyrlExtA/B support (enables church slavonic) 2026-03-07 20:55:47 +09:00
minjaesong
71ea63b48e calculators update 2026-03-07 18:03:36 +09:00
minjaesong
3a4aa165f6 more readme (2) 2026-03-07 12:59:40 +09:00
minjaesong
f402563e6c more readme 2026-03-07 12:57:16 +09:00
minjaesong
5a251ad381 readme update 2026-03-07 12:53:45 +09:00
minjaesong
7c90766394 optimised contour generation 2026-03-06 23:27:57 +09:00
minjaesong
bc827be492 minor changes 2026-03-06 16:34:28 +09:00
minjaesong
9c4c8d3153 gitignore updates 2026-03-06 15:51:52 +09:00
minjaesong
0c99a27ffe Autokem: CNN-based glyph labeller for Keming Machine 2026-03-06 15:43:47 +09:00
minjaesong
adab8fa0ef fix: some glyphs having rgb of 254,254,254 2026-03-06 11:15:23 +09:00
minjaesong
a5572b1d95 final touch 2026-03-05 21:14:57 +09:00
minjaesong
0f9fbe9713 glyph touchups 2026-03-05 20:49:10 +09:00
minjaesong
f2bc61928b keming calculator 2026-03-05 20:38:00 +09:00
minjaesong
0b730c7a47 support for symbols for legacy computing (unicode 17) 2026-03-05 17:54:24 +09:00
minjaesong
0afbfdf043 better proportion for uni20C1 2026-03-05 17:32:21 +09:00
minjaesong
12629ee3e8 bump for Unicode 17 2026-03-05 17:20:51 +09:00
minjaesong
99c6ed5c8c added control pictures unicode block 2026-03-05 11:22:50 +09:00
minjaesong
f7ffeec0e2 licence update 2026-03-05 10:49:48 +09:00
minjaesong
b106e1c1b0 coloured font with COLRv0 2026-03-04 20:44:16 +09:00
minjaesong
ec911e568d more fixes 2026-03-04 18:17:04 +09:00
minjaesong
68873c8d80 finally "fixed" well enough 2026-03-04 14:07:12 +09:00
minjaesong
fed73338e2 aaaaaaaaanusvara 2026-03-04 07:58:16 +09:00
minjaesong
da59fe24d4 anusvara positioning fixed for DirectWrite and CoreText but now broken for HarfBuzz 2026-03-04 07:00:14 +09:00
minjaesong
cb2f432479 fix: Armenian lowercase ho had lone misplaced dot on spritesheet 2026-03-03 19:33:16 +09:00
minjaesong
bc2dbf8b69 more attempts at cross system compatibility 2026-03-02 21:40:45 +09:00
minjaesong
3e79181aa3 somewhat working in CoreText, not at all in DirectWrite 2026-03-02 21:03:02 +09:00
minjaesong
73c2b6986d waaaaaaaaaa 2026-03-02 16:48:19 +09:00
minjaesong
f4573536e4 actually writing metadata that Windows likes 2026-03-02 15:23:09 +09:00
minjaesong
c8b197ec01 fix: hangul filler not working on OTF 2026-03-02 14:12:43 +09:00
minjaesong
c299aa8d50 fix: mixed direction diacritics stacking up wrongly 2026-03-02 13:53:37 +09:00
minjaesong
673ca100d4 builder now has opentype sanitiser 2026-03-02 13:03:53 +09:00
minjaesong
db327d8357 WOFF builder 2026-03-02 12:35:50 +09:00
minjaesong
cce9d62bd1 fix: diacritics not stacking 2026-03-02 10:07:31 +09:00
minjaesong
b3acbf1c0e old hangul composition 2026-03-02 07:21:28 +09:00
minjaesong
602923f5bc intonation marks 2026-03-02 06:04:34 +09:00
minjaesong
714cca79be tamil ii-matra adjustments 2026-03-02 05:28:11 +09:00
minjaesong
83303603c0 OTF diacritics positioning with nudging 2026-03-02 05:18:42 +09:00
minjaesong
e2550b6ef6 diacritics anchoring reads isLowheight now 2026-03-01 16:14:50 +09:00
minjaesong
a69aee9aa7 anusvara handling reworked 2026-03-01 15:10:44 +09:00
minjaesong
5c6da36fa8 fix: tamil uu-matra is disjoined by one pixel 2026-03-01 14:23:36 +09:00
minjaesong
3c9bc38dfd more minor changes 2026-03-01 14:07:47 +09:00
minjaesong
0811971a8e minor cyrillic changes 2026-03-01 11:31:33 +09:00
minjaesong
b0391e5d80 a hack was added 2026-03-01 10:59:44 +09:00
minjaesong
95fafe51a9 contextual devanagari anusvara positioning 2026-03-01 10:46:39 +09:00
minjaesong
b78b4711fb marwari dd conjuncts 2026-02-28 12:22:19 +09:00
minjaesong
35d4d94818 wrong reph_super_complex fix 2026-02-28 10:51:08 +09:00
minjaesong
7c788eb9d8 diacritics pos for glyphs sans explicit anchor tags 2026-02-28 05:55:09 +09:00
minjaesong
23e748cc88 tamil vowel positioning fix 2026-02-28 05:30:46 +09:00
minjaesong
5d10bdb8e8 improved visarga handling 2026-02-27 10:59:42 +09:00
minjaesong
95912acc32 contextual visarga 2026-02-27 08:26:19 +09:00
minjaesong
e3a3079fb2 cjk baseline fix 2026-02-27 05:25:25 +09:00
minjaesong
3e3e20e5d4 fix: stack-down diacritic with nudge-Y values for both Kotlin and OTF 2026-02-26 11:19:21 +09:00
minjaesong
f55f90352b eyelash ra 2026-02-26 10:44:21 +09:00
minjaesong
80b67a3886 shla conjunct 2026-02-26 10:24:31 +09:00
minjaesong
982fb94828 more devanagari obscurities 2026-02-26 10:15:23 +09:00
minjaesong
be1c8e2f79 Devanagari RA-appended half forms 2026-02-26 09:10:41 +09:00
minjaesong
08d1b41cc0 slightly better fi fl ligs 2026-02-26 06:12:17 +09:00
minjaesong
3f9f5fb679 more visually balanced Jya conjunct 2026-02-26 05:52:40 +09:00
minjaesong
a2a73128e0 nudge-y fixed 2026-02-26 05:25:32 +09:00
minjaesong
488304b7b4 trigraphs with reph but wip 2026-02-26 05:10:20 +09:00
minjaesong
b73aa76285 nudge-x 2026-02-26 04:31:07 +09:00
minjaesong
f38cd8f4da adding missing sirorekha for visarga 2026-02-25 05:54:16 +09:00
minjaesong
a567b9f7fc CLAUDE.md 2026-02-25 03:13:06 +09:00
minjaesong
86699af92d fix: devanagari candrabindu and anusvara off by one pixel 2026-02-25 03:10:43 +09:00
minjaesong
cdc3499f38 fix: devanagari SHA has wrong anchor point 2026-02-25 02:46:23 +09:00
minjaesong
fca02f1a3d otf more deva 2026-02-24 21:15:56 +09:00
minjaesong
73fcd7d922 otf deva complex combi 2026-02-24 09:57:43 +09:00
minjaesong
1d6eb7b2c8 otf wip5 2026-02-24 07:25:55 +09:00
minjaesong
d94bac6186 otf wip4 2026-02-24 06:01:24 +09:00
minjaesong
63adbba1bb otf wip3 2026-02-24 04:29:11 +09:00
minjaesong
8d1e669a93 otf wip2 2026-02-24 02:32:49 +09:00
minjaesong
949b6aa777 otf wip 2026-02-23 19:32:25 +09:00
minjaesong
5e2cacd491 TTF build using fontforge 2026-02-23 18:32:03 +09:00
minjaesong
208466bbb2 bitsnpicas probably not decent 2026-02-23 11:18:09 +09:00
minjaesong
b5f01a4d41 why are you still looking for tga.gz 2026-02-20 10:45:03 +09:00
minjaesong
e7afe0135e moving assets inside classpath 2026-02-19 09:17:20 +09:00
minjaesong
e3904790dc hangul: minor legibility improvements 2025-08-03 03:10:50 +09:00
minjaesong
0c3a73c2f9 symbols for legacy computing wip 2025-01-19 19:51:49 +09:00
minjaesong
648f3ffadd pua: added ESC keycap 2025-01-19 19:50:57 +09:00
minjaesong
2dc148116e minor improvements 2024-10-01 22:37:33 +09:00
200 changed files with 10103 additions and 442 deletions

1
.gitattributes vendored
View File

@@ -8,6 +8,7 @@
*.kra filter=lfs diff=lfs merge=lfs -text *.kra filter=lfs diff=lfs merge=lfs -text
*.png filter=lfs diff=lfs merge=lfs -text *.png filter=lfs diff=lfs merge=lfs -text
*.wav filter=lfs diff=lfs merge=lfs -text *.wav filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
*.tga binary diff=hex *.tga binary diff=hex
*.kra binary diff=hex *.kra binary diff=hex

2
.github/FUNDING.yml vendored Normal file
View File

@@ -0,0 +1,2 @@
github: [curioustorvald]
custom: ["https://paypal.me/curioustorvald"]

20
.gitignore vendored
View File

@@ -13,3 +13,23 @@ tmp_*
*.bak *.bak
*-autosave.kra *-autosave.kra
.directory .directory
# from OTF build
**/__pycache__
OTFbuild/*.ttf
OTFbuild/*.otf
OTFbuild/*.woff
OTFbuild/*.woff2
OTFbuild/*.md
*.fea
*.xdp-*
# from Autokem build
Autokem/*.o
Autokem/autokem
Autokem/*.md
# exceptions
!**/CLAUDE.md
!CLAUDE.md

View File

@@ -1,12 +1,12 @@
<component name="ArtifactManager"> <component name="ArtifactManager">
<artifact type="jar" name="TerrarumSansBitmap"> <artifact type="jar" build-on-make="true" name="TerrarumSansBitmap">
<output-path>$PROJECT_DIR$/lib</output-path> <output-path>$PROJECT_DIR$/lib</output-path>
<root id="archive" name="TerrarumSansBitmap.jar"> <root id="archive" name="TerrarumSansBitmap.jar">
<element id="module-output" name="BuildJAR_TerrarumSansBitmap" />
<element id="directory" name="META-INF"> <element id="directory" name="META-INF">
<element id="file-copy" path="$PROJECT_DIR$/META-INF/MANIFEST.MF" /> <element id="file-copy" path="$PROJECT_DIR$/META-INF/MANIFEST.MF" />
</element> </element>
<element id="dir-copy" path="$PROJECT_DIR$/src" /> <element id="dir-copy" path="$PROJECT_DIR$/src" />
<element id="module-output" name="BuildJAR_TerrarumSansBitmap" />
</root> </root>
</artifact> </artifact>
</component> </component>

10
.idea/kotlinc.xml generated Executable file → Normal file
View File

@@ -1,7 +1,13 @@
<?xml version="1.0" encoding="UTF-8"?> <?xml version="1.0" encoding="UTF-8"?>
<project version="4"> <project version="4">
<component name="Kotlin2JsCompilerArguments">
<option name="moduleKind" value="plain" />
</component>
<component name="Kotlin2JvmCompilerArguments">
<option name="jvmTarget" value="21" />
</component>
<component name="KotlinCommonCompilerArguments"> <component name="KotlinCommonCompilerArguments">
<option name="apiVersion" value="1.4" /> <option name="apiVersion" value="2.0" />
<option name="languageVersion" value="1.4" /> <option name="languageVersion" value="2.0" />
</component> </component>
</project> </project>

View File

@@ -1,19 +1,33 @@
<component name="libraryTable"> <component name="libraryTable">
<library name="KotlinJavaRuntime"> <library name="KotlinJavaRuntime" type="repository">
<properties maven-id="org.jetbrains.kotlin:kotlin-stdlib-jdk8:2.1.21" />
<CLASSES> <CLASSES>
<root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-stdlib.jar!/" /> <root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-stdlib.jar!/" />
<root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-reflect.jar!/" /> <root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-reflect.jar!/" />
<root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-test.jar!/" /> <root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-test.jar!/" />
<root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-stdlib-jdk7.jar!/" /> <root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-stdlib-jdk7.jar!/" />
<root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-stdlib-jdk8.jar!/" /> <root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-stdlib-jdk8.jar!/" />
<root url="jar://$MAVEN_REPOSITORY$/org/jetbrains/kotlin/kotlin-stdlib-jdk8/2.1.21/kotlin-stdlib-jdk8-2.1.21.jar!/" />
<root url="jar://$MAVEN_REPOSITORY$/org/jetbrains/kotlin/kotlin-stdlib/2.1.21/kotlin-stdlib-2.1.21.jar!/" />
<root url="jar://$MAVEN_REPOSITORY$/org/jetbrains/annotations/13.0/annotations-13.0.jar!/" />
<root url="jar://$MAVEN_REPOSITORY$/org/jetbrains/kotlin/kotlin-stdlib-jdk7/2.1.21/kotlin-stdlib-jdk7-2.1.21.jar!/" />
</CLASSES> </CLASSES>
<JAVADOC /> <JAVADOC>
<root url="jar://$MAVEN_REPOSITORY$/org/jetbrains/kotlin/kotlin-stdlib-jdk8/2.1.21/kotlin-stdlib-jdk8-2.1.21-javadoc.jar!/" />
<root url="jar://$MAVEN_REPOSITORY$/org/jetbrains/kotlin/kotlin-stdlib/2.1.21/kotlin-stdlib-2.1.21-javadoc.jar!/" />
<root url="jar://$MAVEN_REPOSITORY$/org/jetbrains/annotations/13.0/annotations-13.0-javadoc.jar!/" />
<root url="jar://$MAVEN_REPOSITORY$/org/jetbrains/kotlin/kotlin-stdlib-jdk7/2.1.21/kotlin-stdlib-jdk7-2.1.21-javadoc.jar!/" />
</JAVADOC>
<SOURCES> <SOURCES>
<root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-stdlib-sources.jar!/" /> <root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-stdlib-sources.jar!/" />
<root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-reflect-sources.jar!/" /> <root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-reflect-sources.jar!/" />
<root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-test-sources.jar!/" /> <root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-test-sources.jar!/" />
<root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-stdlib-jdk7-sources.jar!/" /> <root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-stdlib-jdk7-sources.jar!/" />
<root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-stdlib-jdk8-sources.jar!/" /> <root url="jar://$KOTLIN_BUNDLED$/lib/kotlin-stdlib-jdk8-sources.jar!/" />
<root url="jar://$MAVEN_REPOSITORY$/org/jetbrains/kotlin/kotlin-stdlib-jdk8/2.1.21/kotlin-stdlib-jdk8-2.1.21-sources.jar!/" />
<root url="jar://$MAVEN_REPOSITORY$/org/jetbrains/kotlin/kotlin-stdlib/2.1.21/kotlin-stdlib-2.1.21-sources.jar!/" />
<root url="jar://$MAVEN_REPOSITORY$/org/jetbrains/annotations/13.0/annotations-13.0-sources.jar!/" />
<root url="jar://$MAVEN_REPOSITORY$/org/jetbrains/kotlin/kotlin-stdlib-jdk7/2.1.21/kotlin-stdlib-jdk7-2.1.21-sources.jar!/" />
</SOURCES> </SOURCES>
</library> </library>
</component> </component>

2
.idea/misc.xml generated
View File

@@ -1,6 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?> <?xml version="1.0" encoding="UTF-8"?>
<project version="4"> <project version="4">
<component name="ProjectRootManager" version="2" languageLevel="JDK_1_8" default="true" project-jdk-name="1.8.0_242" project-jdk-type="JavaSDK"> <component name="ProjectRootManager" version="2" languageLevel="JDK_21" default="true" project-jdk-name="21" project-jdk-type="JavaSDK">
<output url="file://$PROJECT_DIR$/out" /> <output url="file://$PROJECT_DIR$/out" />
</component> </component>
</project> </project>

1
.idea/modules.xml generated
View File

@@ -4,6 +4,7 @@
<modules> <modules>
<module fileurl="file://$PROJECT_DIR$/BuildJAR_TerrarumSansBitmap.iml" filepath="$PROJECT_DIR$/BuildJAR_TerrarumSansBitmap.iml" /> <module fileurl="file://$PROJECT_DIR$/BuildJAR_TerrarumSansBitmap.iml" filepath="$PROJECT_DIR$/BuildJAR_TerrarumSansBitmap.iml" />
<module fileurl="file://$PROJECT_DIR$/FontTestGDX/FontTestGDX.iml" filepath="$PROJECT_DIR$/FontTestGDX/FontTestGDX.iml" /> <module fileurl="file://$PROJECT_DIR$/FontTestGDX/FontTestGDX.iml" filepath="$PROJECT_DIR$/FontTestGDX/FontTestGDX.iml" />
<module fileurl="file://$PROJECT_DIR$/OTFbuild/OTFbuild.iml" filepath="$PROJECT_DIR$/OTFbuild/OTFbuild.iml" />
</modules> </modules>
</component> </component>
</project> </project>

140
.idea/workspace.xml generated
View File

@@ -9,31 +9,10 @@
<option name="autoReloadType" value="SELECTIVE" /> <option name="autoReloadType" value="SELECTIVE" />
</component> </component>
<component name="ChangeListManager"> <component name="ChangeListManager">
<list default="true" id="22c5bc80-996c-4846-b173-7dc8c2096fe3" name="Default" comment=""> <list default="true" id="22c5bc80-996c-4846-b173-7dc8c2096fe3" name="Default" comment="why are you still looking for tga.gz">
<change beforePath="$PROJECT_DIR$/.idea/modules.xml" beforeDir="false" afterPath="$PROJECT_DIR$/.idea/modules.xml" afterDir="false" />
<change beforePath="$PROJECT_DIR$/.idea/workspace.xml" beforeDir="false" afterPath="$PROJECT_DIR$/.idea/workspace.xml" afterDir="false" /> <change beforePath="$PROJECT_DIR$/.idea/workspace.xml" beforeDir="false" afterPath="$PROJECT_DIR$/.idea/workspace.xml" afterDir="false" />
<change beforePath="$PROJECT_DIR$/FontTestGDX/src/FontTestGDX.kt" beforeDir="false" afterPath="$PROJECT_DIR$/FontTestGDX/src/FontTestGDX.kt" afterDir="false" /> <change beforePath="$PROJECT_DIR$/src/net/torvald/terrarumsansbitmap/gdx/TerrarumSansBitmap.kt" beforeDir="false" afterPath="$PROJECT_DIR$/src/net/torvald/terrarumsansbitmap/gdx/TerrarumSansBitmap.kt" afterDir="false" />
<change beforePath="$PROJECT_DIR$/assets/cjkpunct.tga" beforeDir="false" afterPath="$PROJECT_DIR$/assets/cjkpunct.tga" afterDir="false" />
<change beforePath="$PROJECT_DIR$/assets/currencies_variable.tga" beforeDir="false" afterPath="$PROJECT_DIR$/assets/currencies_variable.tga" afterDir="false" />
<change beforePath="$PROJECT_DIR$/assets/futhark.tga" beforeDir="false" afterPath="$PROJECT_DIR$/assets/futhark.tga" afterDir="false" />
<change beforePath="$PROJECT_DIR$/assets/latinExtC_variable.tga" beforeDir="false" afterPath="$PROJECT_DIR$/assets/latinExtC_variable.tga" afterDir="false" />
<change beforePath="$PROJECT_DIR$/assets/richtext_furigana.tga" beforeDir="false" afterPath="$PROJECT_DIR$/assets/richtext_furigana.tga" afterDir="false" />
<change beforePath="$PROJECT_DIR$/assets/typewriter/typewriter_intl_qwerty.tga" beforeDir="false" afterPath="$PROJECT_DIR$/assets/typewriter/typewriter_intl_qwerty.tga" afterDir="false" />
<change beforePath="$PROJECT_DIR$/assets/typewriter/typewriter_ko_3set-390.tga" beforeDir="false" afterPath="$PROJECT_DIR$/assets/typewriter/typewriter_ko_3set-390.tga" afterDir="false" />
<change beforePath="$PROJECT_DIR$/assets/wenquanyi.tga.gz" beforeDir="false" afterPath="$PROJECT_DIR$/assets/wenquanyi.tga.gz" afterDir="false" />
<change beforePath="$PROJECT_DIR$/font_drawing_template.png" beforeDir="false" afterPath="$PROJECT_DIR$/font_drawing_template.png" afterDir="false" />
<change beforePath="$PROJECT_DIR$/glyph_height_pos_annotation.png" beforeDir="false" afterPath="$PROJECT_DIR$/glyph_height_pos_annotation.png" afterDir="false" />
<change beforePath="$PROJECT_DIR$/samples/wikipedia_x86.png" beforeDir="false" afterPath="$PROJECT_DIR$/samples/wikipedia_x86.png" afterDir="false" />
<change beforePath="$PROJECT_DIR$/terrarum_sans_cyrillic_2.png" beforeDir="false" afterPath="$PROJECT_DIR$/terrarum_sans_cyrillic_2.png" afterDir="false" />
<change beforePath="$PROJECT_DIR$/testing.PNG" beforeDir="false" afterPath="$PROJECT_DIR$/testing.PNG" afterDir="false" />
<change beforePath="$PROJECT_DIR$/testtext.txt" beforeDir="false" afterPath="$PROJECT_DIR$/testtext.txt" afterDir="false" />
<change beforePath="$PROJECT_DIR$/width_bit_encoding_annotated.png" beforeDir="false" afterPath="$PROJECT_DIR$/width_bit_encoding_annotated.png" afterDir="false" />
<change beforePath="$PROJECT_DIR$/work_files/typewriter_input/alphnum_glyphs_master.kra" beforeDir="false" afterPath="$PROJECT_DIR$/work_files/typewriter_input/alphnum_glyphs_master.kra" afterDir="false" />
<change beforePath="$PROJECT_DIR$/work_files/typewriter_input/alphnum_glyphs_resized.kra" beforeDir="false" afterPath="$PROJECT_DIR$/work_files/typewriter_input/alphnum_glyphs_resized.kra" afterDir="false" />
<change beforePath="$PROJECT_DIR$/work_files/typewriter_input/hangul_3set_glyphs_master.kra" beforeDir="false" afterPath="$PROJECT_DIR$/work_files/typewriter_input/hangul_3set_glyphs_master.kra" afterDir="false" />
<change beforePath="$PROJECT_DIR$/work_files/typewriter_input/typewriter_input_template.psd" beforeDir="false" afterPath="$PROJECT_DIR$/work_files/typewriter_input/typewriter_input_template.psd" afterDir="false" />
<change beforePath="$PROJECT_DIR$/work_files/typewriter_input/typewriter_intl_qwerty.psd" beforeDir="false" afterPath="$PROJECT_DIR$/work_files/typewriter_input/typewriter_intl_qwerty.psd" afterDir="false" />
<change beforePath="$PROJECT_DIR$/work_files/typewriter_input/typewriter_ko_3set-390.psd" beforeDir="false" afterPath="$PROJECT_DIR$/work_files/typewriter_input/typewriter_ko_3set-390.psd" afterDir="false" />
<change beforePath="$PROJECT_DIR$/work_files/typewriter_input/typewriter_ko_3set_glyphs_resized.kra" beforeDir="false" afterPath="$PROJECT_DIR$/work_files/typewriter_input/typewriter_ko_3set_glyphs_resized.kra" afterDir="false" />
</list> </list>
<option name="SHOW_DIALOG" value="false" /> <option name="SHOW_DIALOG" value="false" />
<option name="HIGHLIGHT_CONFLICTS" value="true" /> <option name="HIGHLIGHT_CONFLICTS" value="true" />
@@ -43,13 +22,31 @@
<component name="FileTemplateManagerImpl"> <component name="FileTemplateManagerImpl">
<option name="RECENT_TEMPLATES"> <option name="RECENT_TEMPLATES">
<list> <list>
<option value="Kotlin Class" />
<option value="Interface" /> <option value="Interface" />
<option value="Class" /> <option value="Class" />
<option value="Kotlin Class" />
</list> </list>
</option> </option>
</component> </component>
<component name="Git.Settings"> <component name="Git.Settings">
<excluded-from-favorite>
<branch-storage>
<map>
<entry type="LOCAL">
<value>
<list>
<branch-info repo="$PROJECT_DIR$" source="master" />
</list>
</value>
</entry>
</map>
</branch-storage>
</excluded-from-favorite>
<option name="RECENT_BRANCH_BY_REPOSITORY">
<map>
<entry key="$PROJECT_DIR$" value="ttf-otf-build-system" />
</map>
</option>
<option name="RECENT_GIT_ROOT_PATH" value="$PROJECT_DIR$" /> <option name="RECENT_GIT_ROOT_PATH" value="$PROJECT_DIR$" />
</component> </component>
<component name="GitSEFilterConfiguration"> <component name="GitSEFilterConfiguration">
@@ -60,9 +57,20 @@
<filtered-out-file-type name="COMMIT_BY_MESSAGE" /> <filtered-out-file-type name="COMMIT_BY_MESSAGE" />
</file-type-list> </file-type-list>
</component> </component>
<component name="HighlightingSettingsPerFile">
<setting file="jar://$PROJECT_DIR$/lib/gdx-1.10.0-sources.jar!/com/badlogic/gdx/Input.java" root0="SKIP_INSPECTION" />
<setting file="jar://$PROJECT_DIR$/lib/gdx-1.10.0-sources.jar!/com/badlogic/gdx/graphics/g2d/BitmapFont.java" root0="SKIP_INSPECTION" />
</component>
<component name="KotlinCompilerWorkspaceSettings">
<option name="preciseIncrementalEnabled" value="false" />
</component>
<component name="MarkdownSettingsMigration"> <component name="MarkdownSettingsMigration">
<option name="stateVersion" value="1" /> <option name="stateVersion" value="1" />
</component> </component>
<component name="ProjectColorInfo">{
&quot;customColor&quot;: &quot;&quot;,
&quot;associatedIndex&quot;: 2
}</component>
<component name="ProjectId" id="1aVE5t6KObkWt36lb07GBy1GY1S" /> <component name="ProjectId" id="1aVE5t6KObkWt36lb07GBy1GY1S" />
<component name="ProjectViewState"> <component name="ProjectViewState">
<option name="hideEmptyMiddlePackages" value="true" /> <option name="hideEmptyMiddlePackages" value="true" />
@@ -70,13 +78,31 @@
</component> </component>
<component name="PropertiesComponent">{ <component name="PropertiesComponent">{
&quot;keyToString&quot;: { &quot;keyToString&quot;: {
&quot;Kotlin.FontTestGDXKt.executor&quot;: &quot;Debug&quot;,
&quot;Kotlin.TypewriterGDXKt.executor&quot;: &quot;Debug&quot;,
&quot;RunOnceActivity.CodyAccountHistoryMigration&quot;: &quot;true&quot;,
&quot;RunOnceActivity.CodyAccountsIdsRefresh&quot;: &quot;true&quot;,
&quot;RunOnceActivity.CodyAssignOrphanedChatsToActiveAccount&quot;: &quot;true&quot;,
&quot;RunOnceActivity.CodyConvertUrlToCodebaseName&quot;: &quot;true&quot;,
&quot;RunOnceActivity.CodyHistoryLlmMigration&quot;: &quot;true&quot;,
&quot;RunOnceActivity.CodyMigrateChatHistory-v2&quot;: &quot;true&quot;,
&quot;RunOnceActivity.CodyProjectSettingsMigration&quot;: &quot;true&quot;,
&quot;RunOnceActivity.OpenProjectViewOnStart&quot;: &quot;true&quot;,
&quot;RunOnceActivity.ToggleCodyToolWindowAfterMigration&quot;: &quot;true&quot;,
&quot;RunOnceActivity.git.unshallow&quot;: &quot;true&quot;,
&quot;git-widget-placeholder&quot;: &quot;master&quot;,
&quot;kotlin-language-version-configured&quot;: &quot;true&quot;,
&quot;last_opened_file_path&quot;: &quot;/home/torvald/Documents/Terrarum-sans-bitmap&quot;, &quot;last_opened_file_path&quot;: &quot;/home/torvald/Documents/Terrarum-sans-bitmap&quot;,
&quot;project.structure.last.edited&quot;: &quot;Artifacts&quot;, &quot;project.structure.last.edited&quot;: &quot;Modules&quot;,
&quot;project.structure.proportion&quot;: &quot;0.0&quot;, &quot;project.structure.proportion&quot;: &quot;0.15&quot;,
&quot;project.structure.side.proportion&quot;: &quot;0.0&quot; &quot;project.structure.side.proportion&quot;: &quot;0.20724516&quot;,
&quot;settings.editor.selected.configurable&quot;: &quot;project.kotlinCompiler&quot;
} }
}</component> }</component>
<component name="RecentsManager"> <component name="RecentsManager">
<key name="CopyFile.RECENT_KEYS">
<recent name="$PROJECT_DIR$" />
</key>
<key name="MoveFile.RECENT_KEYS"> <key name="MoveFile.RECENT_KEYS">
<recent name="C:\Users\minjaesong\Documents\Terrarum-sans-bitmap\" /> <recent name="C:\Users\minjaesong\Documents\Terrarum-sans-bitmap\" />
<recent name="C:\Users\minjaesong\Documents\Terrarum-sans-bitmap" /> <recent name="C:\Users\minjaesong\Documents\Terrarum-sans-bitmap" />
@@ -90,6 +116,15 @@
<option name="Make" enabled="true" /> <option name="Make" enabled="true" />
</method> </method>
</configuration> </configuration>
<configuration default="true" type="#org.jetbrains.idea.devkit.run.PluginConfigurationType">
<module name="" />
<option name="VM_PARAMETERS" value="-Xmx512m -Xms256m -XX:MaxPermSize=250m -ea" />
<option name="PROGRAM_PARAMETERS" />
<predefined_log_file enabled="true" id="idea.log" />
<method v="2">
<option name="Make" enabled="true" />
</method>
</configuration>
<configuration default="true" type="executeSpecs" factoryName="Gauge Execution"> <configuration default="true" type="executeSpecs" factoryName="Gauge Execution">
<setting name="environment" value="" /> <setting name="environment" value="" />
<setting name="specsToExecute" value="" /> <setting name="specsToExecute" value="" />
@@ -148,15 +183,6 @@
<option name="Make" enabled="true" /> <option name="Make" enabled="true" />
</method> </method>
</configuration> </configuration>
<configuration default="true" type="#org.jetbrains.idea.devkit.run.PluginConfigurationType">
<module name="" />
<option name="VM_PARAMETERS" value="-Xmx512m -Xms256m -XX:MaxPermSize=250m -ea" />
<option name="PROGRAM_PARAMETERS" />
<predefined_log_file enabled="true" id="idea.log" />
<method v="2">
<option name="Make" enabled="true" />
</method>
</configuration>
<recent_temporary> <recent_temporary>
<list> <list>
<item itemvalue="Kotlin.FontTestGDXKt" /> <item itemvalue="Kotlin.FontTestGDXKt" />
@@ -176,6 +202,39 @@
<option name="presentableId" value="Default" /> <option name="presentableId" value="Default" />
<updated>1497950823354</updated> <updated>1497950823354</updated>
</task> </task>
<task id="LOCAL-00001" summary="Old hangul rendering fix">
<option name="closed" value="true" />
<created>1705647715000</created>
<option name="number" value="00001" />
<option name="presentableId" value="LOCAL-00001" />
<option name="project" value="LOCAL" />
<updated>1705647715000</updated>
</task>
<task id="LOCAL-00002" summary="fix: characters not on overriden charset would not print">
<option name="closed" value="true" />
<created>1726151824465</created>
<option name="number" value="00002" />
<option name="presentableId" value="LOCAL-00002" />
<option name="project" value="LOCAL" />
<updated>1726151824465</updated>
</task>
<task id="LOCAL-00003" summary="moving assets inside classpath">
<option name="closed" value="true" />
<created>1771460240293</created>
<option name="number" value="00003" />
<option name="presentableId" value="LOCAL-00003" />
<option name="project" value="LOCAL" />
<updated>1771460240293</updated>
</task>
<task id="LOCAL-00004" summary="why are you still looking for tga.gz">
<option name="closed" value="true" />
<created>1771551906182</created>
<option name="number" value="00004" />
<option name="presentableId" value="LOCAL-00004" />
<option name="project" value="LOCAL" />
<updated>1771551906182</updated>
</task>
<option name="localTasksCounter" value="5" />
<servers /> <servers />
</component> </component>
<component name="TodoView"> <component name="TodoView">
@@ -198,6 +257,13 @@
</map> </map>
</option> </option>
</component> </component>
<component name="VcsManagerConfiguration">
<MESSAGE value="Old hangul rendering fix" />
<MESSAGE value="fix: characters not on overriden charset would not print" />
<MESSAGE value="moving assets inside classpath" />
<MESSAGE value="why are you still looking for tga.gz" />
<option name="LAST_COMMIT_MESSAGE" value="why are you still looking for tga.gz" />
</component>
<component name="XSLT-Support.FileAssociations.UIState"> <component name="XSLT-Support.FileAssociations.UIState">
<expand /> <expand />
<select /> <select />

134
Autokem/CLAUDE.md Normal file
View File

@@ -0,0 +1,134 @@
# Autokem
CNN-based tool that predicts kerning tag bits for font sprite sheets.
Trains on manually-tagged `*_variable.tga` sheets (~2650 samples across 24 sheets), then applies learned predictions to new or untagged sheets.
## Building
```bash
cd Autokem
make # optimised build (-Ofast)
make debug # ASan + UBSan, no optimisation
make clean
```
## Usage
```bash
./autokem train # train on ../src/assets/*_variable.tga
./autokem apply ../src/assets/foo_variable.tga # apply model to a sheet
./autokem stats # print model tensor shapes + metadata
./autokem help
```
- `train` scans `../src/assets/` for `*_variable.tga` (skips `*extrawide*`), collects labelled samples, trains with 80/20 split + early stopping, saves `autokem.safetensors`
- `apply` creates `.bak` backup, runs inference per cell, writes Y+5 (lowheight) and Y+6 (kern data) pixels. Skips cells with width=0, writeOnTop, or compiler directives
- Model file `autokem.safetensors` must be in the working directory
### PyTorch training (faster prototyping)
```bash
cd Autokem
.venv/bin/python train_torch.py # train with defaults
.venv/bin/python train_torch.py --epochs 300 # override max epochs
.venv/bin/python train_torch.py --lr 0.0005 # override learning rate
.venv/bin/python train_torch.py --load model.safetensors # resume from weights
```
- Drop-in replacement for `./autokem train` — reads the same sheets, produces the same safetensors format
- The exported `autokem.safetensors` is directly loadable by the C inference code (`./autokem apply`)
- Requires: `pip install torch numpy` (venv at `.venv/`)
## Architecture
### Neural network
```
Input: 15x20x1 binary (300 values, alpha >= 0x80 → 1.0)
Conv2D(1→32, 7x7, pad=1) → SiLU
Conv2D(32→64, 7x7, pad=1) → SiLU
Global Average Pool → [batch, 64]
Dense(64→256) → SiLU
Dense(256→12) → sigmoid (10 shape bits + 1 ytype + 1 lowheight)
Total: ~121,740 params (~476 KB float32)
```
Training: Adam (lr=0.001, beta1=0.9, beta2=0.999), BCE loss, batch size 32, early stopping patience 10.
### File layout
| File | Purpose |
|------|---------|
| `main.c` | CLI dispatch |
| `tga.h/tga.c` | TGA reader/writer — BGRA↔RGBA8888, row-order handling, per-pixel write-in-place |
| `nn.h/nn.c` | Tensor, Conv2D (configurable padding), Dense, SiLU, sigmoid, global avg pool, Adam, He init |
| `safetensor.h/safetensor.c` | `.safetensors` serialisation — 8 named tensors + JSON metadata |
| `train_torch.py` | PyTorch training script — same data pipeline and architecture, exports C-compatible safetensors |
| `train.h/train.c` | Data collection from sheets, training loop, validation, label distribution |
| `apply.h/apply.c` | Backup, eligibility checks, inference, pixel composition |
## Pixel format
All pixels are RGBA8888: `(R<<24) | (G<<16) | (B<<8) | A`. TGA files store bytes as BGRA — the reader/writer swaps B↔R.
### Tag column (rightmost pixel column of each 16x20 cell)
| Row | Field | Encoding |
|-----|-------|----------|
| Y+0..Y+4 | Width | 5-bit binary, alpha != 0 → bit set |
| Y+5 | lowheight | alpha=0xFF → lowheight, alpha=0 → not |
| Y+6 | Kern data | See below |
| Y+9 | Compiler directive | opcode in R byte; skip cell if != 0 |
| Y+17 | writeOnTop | alpha != 0 → skip cell |
### Y+6 kern data pixel
```
R byte: Y0000000 (Y-type flag in MSB, bit 31)
G byte: JK000000 (J = bit 23, K = bit 22)
B byte: ABCDEFGH (A = bit 15, ..., H = bit 8)
A byte: 0xFF (hasKernData flag — must be 0xFF, not 0x01)
```
`tagify(pixel)`: returns 0 if alpha == 0, else full pixel value.
`kerningMask = (pixel >> 8) & 0xFFFFFF` then extract individual bits.
### Shape bit layout
```
A-B top (unset for lowheight minuscules like e)
|-|
C-D middle hole for majuscules (like C)
E-F middle hole for minuscules (like c)
G-H
--- baseline
|-|
J-K descender
```
## Key pitfalls
- **Alpha must be 0xFF, not 0x01.** All manually-tagged sheets use alpha=255 for kern/lowheight pixels. Writing alpha=1 is functionally accepted by the font engine (`& 0xFF != 0`) but produces visually transparent pixels that look like nothing was written.
- **TGA byte order**: file stores BGRA, memory is RGBA8888. Must swap B↔R on both read and write.
- **Row order**: check TGA descriptor bit 5 (`top_to_bottom`) for both read and write paths.
- **XY-swap**: `*_xyswap_variable.tga` sheets use column-major cell enumeration. Both train and apply detect `xyswap` in the filename.
- **Overfitting**: 117K params vs ~2650 samples — early stopping is essential. The model will memorise training data almost perfectly.
- **Sigmoid stability**: two-branch form (`x >= 0` vs `x < 0`) to avoid `exp()` overflow.
## Reference files
| File | What to check |
|------|---------------|
| `TerrarumSansBitmap.kt:917-930` | Tag parsing (Y+5, Y+6, tagify) |
| `TerrarumSansBitmap.kt:3082-3134` | Keming rules, kemingBitMask, rule matching |
| `OTFbuild/tga_reader.py` | TGA BGRA→RGBA conversion (reference impl) |
| `OTFbuild/glyph_parser.py:107-194` | Sheet parsing, eligibility, xyswap |
| `keming_machine.txt` | Bit encoding spec, shape examples, rule definitions |
## Verification
1. `make && ./autokem train` — should find ~2650 samples, label distribution should show A~55%, C~92%, etc.
2. `./autokem stats` — prints tensor shapes, training metadata
3. `./autokem apply ../src/assets/currencies_variable.tga` — creates `.bak`, writes kern bits
4. Check applied pixels with Python: `from tga_reader import read_tga; img.get_pixel(tag_x, tag_y+6)` — alpha should be 0xFF, not 0x00
5. `java -jar FontDemoGDX.jar` with modified sheet to visually verify kerning

22
Autokem/Makefile Normal file
View File

@@ -0,0 +1,22 @@
CC = gcc
CFLAGS = -Ofast -Wall -Wextra -std=c11
LDFLAGS = -lm
SRC = main.c tga.c nn.c safetensor.c train.c apply.c
OBJ = $(SRC:.c=.o)
all: autokem
autokem: $(OBJ)
$(CC) $(CFLAGS) -o $@ $^ $(LDFLAGS)
%.o: %.c
$(CC) $(CFLAGS) -c $< -o $@
debug: CFLAGS = -g -Wall -Wextra -std=c11 -fsanitize=address,undefined
debug: LDFLAGS += -fsanitize=address,undefined
debug: clean autokem
clean:
rm -f *.o autokem
.PHONY: all debug clean

181
Autokem/apply.c Normal file
View File

@@ -0,0 +1,181 @@
#include "apply.h"
#include "tga.h"
#include "nn.h"
#include "safetensor.h"
#include "unicode_filter.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* Copy file for backup */
static int copy_file(const char *src, const char *dst) {
FILE *in = fopen(src, "rb");
if (!in) return -1;
FILE *out = fopen(dst, "wb");
if (!out) { fclose(in); return -1; }
char buf[4096];
size_t n;
while ((n = fread(buf, 1, sizeof(buf), in)) > 0) {
if (fwrite(buf, 1, n, out) != n) {
fclose(in); fclose(out);
return -1;
}
}
fclose(in);
fclose(out);
return 0;
}
int apply_model(const char *tga_path) {
/* Validate filename */
const char *basename = strrchr(tga_path, '/');
basename = basename ? basename + 1 : tga_path;
if (strstr(basename, "variable") == NULL) {
fprintf(stderr, "Error: %s does not appear to be a variable sheet\n", tga_path);
return 1;
}
if (strstr(basename, "extrawide") != NULL) {
fprintf(stderr, "Error: extrawide sheets are not supported\n");
return 1;
}
int is_xyswap = (strstr(basename, "xyswap") != NULL);
/* Create backup */
char bakpath[512];
snprintf(bakpath, sizeof(bakpath), "%s.bak", tga_path);
if (copy_file(tga_path, bakpath) != 0) {
fprintf(stderr, "Error: failed to create backup %s\n", bakpath);
return 1;
}
printf("Backup: %s\n", bakpath);
/* Load model */
Network *net = network_create();
if (safetensor_load("autokem.safetensors", net) != 0) {
fprintf(stderr, "Error: failed to load model\n");
network_free(net);
return 1;
}
/* Load TGA */
TgaImage *img = tga_read(tga_path);
if (!img) {
fprintf(stderr, "Error: cannot read %s\n", tga_path);
network_free(net);
return 1;
}
int cell_w = 16, cell_h = 20;
int cols = img->width / cell_w;
int rows = img->height / cell_h;
int total_cells = cols * rows;
int start_code = sheet_start_code(basename);
int processed = 0, updated = 0, skipped = 0, fixed_lm = 0;
for (int index = 0; index < total_cells; index++) {
int cell_x, cell_y;
if (is_xyswap) {
cell_x = (index / cols) * cell_w;
cell_y = (index % cols) * cell_h;
} else {
cell_x = (index % cols) * cell_w;
cell_y = (index / cols) * cell_h;
}
int tag_x = cell_x + (cell_w - 1);
int tag_y = cell_y;
/* Read width */
int width = 0;
for (int y = 0; y < 5; y++) {
if (tga_get_pixel(img, tag_x, tag_y + y) & 0xFF)
width |= (1 << y);
}
if (width == 0) { skipped++; continue; }
/* Check writeOnTop at Y+17 — skip if defined */
uint32_t wot = tga_get_pixel(img, tag_x, tag_y + 17);
if ((wot & 0xFF) != 0) { skipped++; continue; }
/* Check compiler directive at Y+9 — skip if opcode != 0 */
uint32_t dir_pixel = tagify(tga_get_pixel(img, tag_x, tag_y + 9));
int opcode = (int)((dir_pixel >> 24) & 0xFF);
if (opcode != 0) { skipped++; continue; }
/* Modifier letters: fixed kern pixel, skip inference */
if (start_code >= 0 && is_modifier_letter(start_code + index)) {
if (is_subscript_modifier(start_code + index)) {
/* Subscript: CDEFGHJK(B), lowheight=1 */
tga_write_pixel(tga_path, img, tag_x, tag_y + 5, 0xFFFFFFFF);
tga_write_pixel(tga_path, img, tag_x, tag_y + 6, 0x00C03FFF);
} else {
/* Superscript: ABCDEF(B), lowheight=0 */
tga_write_pixel(tga_path, img, tag_x, tag_y + 5, 0x00000000);
tga_write_pixel(tga_path, img, tag_x, tag_y + 6, 0x0000FCFF);
}
processed++; updated++; fixed_lm++;
continue;
}
/* Extract 15x20 binary input */
float input[300];
for (int gy = 0; gy < 20; gy++) {
for (int gx = 0; gx < 15; gx++) {
uint32_t p = tga_get_pixel(img, cell_x + gx, cell_y + gy);
input[gy * 15 + gx] = ((p & 0x80) != 0) ? 1.0f : 0.0f;
}
}
/* Inference */
float output[12];
network_infer(net, input, output);
/* Threshold at 0.5 */
int A = output[0] >= 0.5f;
int B = output[1] >= 0.5f;
int C = output[2] >= 0.5f;
int D = output[3] >= 0.5f;
int E = output[4] >= 0.5f;
int F = output[5] >= 0.5f;
int G = output[6] >= 0.5f;
int H = output[7] >= 0.5f;
int J = output[8] >= 0.5f;
int K = output[9] >= 0.5f;
int ytype = output[10] >= 0.5f;
int lowheight = output[11] >= 0.5f;
/* Compose Y+5 pixel: lowheight (alpha=0xFF when set) */
uint32_t lh_pixel = lowheight ? 0xFFFFFFFF : 0x00000000;
tga_write_pixel(tga_path, img, tag_x, tag_y + 5, lh_pixel);
/* Compose Y+6 pixel:
* Red byte: Y0000000 -> bit 31
* Green byte: JK000000 -> bits 23,22
* Blue byte: ABCDEFGH -> bits 15-8
* Alpha: 0xFF = hasKernData */
uint32_t pixel = 0;
pixel |= (uint32_t)(ytype ? 0x80 : 0) << 24;
pixel |= (uint32_t)((J ? 0x80 : 0) | (K ? 0x40 : 0)) << 16;
pixel |= (uint32_t)(A<<7 | B<<6 | C<<5 | D<<4 | E<<3 | F<<2 | G<<1 | H) << 8;
pixel |= 0xFF;
tga_write_pixel(tga_path, img, tag_x, tag_y + 6, pixel);
processed++;
updated++;
}
printf("Processed: %d cells, Updated: %d, Skipped: %d, Fixed Lm: %d (of %d total)\n",
processed, updated, skipped, fixed_lm, total_cells);
tga_free(img);
network_free(net);
return 0;
}

8
Autokem/apply.h Normal file
View File

@@ -0,0 +1,8 @@
#ifndef APPLY_H
#define APPLY_H
/* Apply trained model to a spritesheet.
Creates .bak backup, then writes predicted kerning bits. */
int apply_model(const char *tga_path);
#endif

BIN
Autokem/autokem.safetensors LFS Normal file

Binary file not shown.

68
Autokem/eval.sh Executable file
View File

@@ -0,0 +1,68 @@
#!/usr/bin/env bash
# Run train_torch.py N times and report mean ± stddev of per-bit and overall accuracy.
# Usage: ./eval.sh [runs] [extra train_torch.py args...]
# e.g. ./eval.sh 10
# ./eval.sh 5 --epochs 300 --lr 0.0005
set -euo pipefail
cd "$(dirname "$0")"
RUNS="${1:-42}"
shift 2>/dev/null || true
EXTRA_ARGS="$*"
PYTHON="${PYTHON:-.venv/bin/python3}"
RESULTS_FILE=$(mktemp)
trap 'rm -f "$RESULTS_FILE"' EXIT
echo "=== Autokem evaluation: $RUNS runs ==="
[ -n "$EXTRA_ARGS" ] && echo "Extra args: $EXTRA_ARGS"
echo
for i in $(seq 1 "$RUNS"); do
echo "--- Run $i/$RUNS ---"
OUT=$("$PYTHON" train_torch.py --save /dev/null $EXTRA_ARGS 2>&1)
# Extract per-bit line (the one after "Per-bit accuracy"): A:53.9% B:46.7% ...
PERBIT=$(echo "$OUT" | grep -A1 'Per-bit accuracy' | tail -1)
# Extract overall line: Overall: 5267/6660 (79.08%)
OVERALL=$(echo "$OUT" | grep -oP 'Overall:.*\(\K[0-9.]+')
# Extract val_loss
VALLOSS=$(echo "$OUT" | grep -oP 'val_loss: \K[0-9.]+' | tail -1)
# Parse per-bit percentages into a tab-separated line
BITS=$(echo "$PERBIT" | grep -oP '[0-9.]+(?=%)' | tr '\n' '\t')
echo "$BITS$OVERALL $VALLOSS" >> "$RESULTS_FILE"
echo " val_loss=$VALLOSS overall=$OVERALL%"
done
echo
echo "=== Results ($RUNS runs) ==="
"$PYTHON" - "$RESULTS_FILE" <<'PYEOF'
import sys
import numpy as np
names = ['A','B','C','D','E','F','G','H','J','K','Ytype','LowH','Overall','ValLoss']
data = []
with open(sys.argv[1]) as f:
for line in f:
vals = line.strip().split('\t')
if len(vals) >= len(names):
data.append([float(v) for v in vals[:len(names)]])
if not data:
print("No data collected!")
sys.exit(1)
arr = np.array(data)
means = arr.mean(axis=0)
stds = arr.std(axis=0)
print(f"{'Metric':<10s} {'Mean':>8s} {'StdDev':>8s}")
print("-" * 28)
for i, name in enumerate(names):
unit = '' if name == 'ValLoss' else '%'
print(f"{name:<10s} {means[i]:>7.2f}{unit} {stds[i]:>7.2f}{unit}")
PYEOF

40
Autokem/main.c Normal file
View File

@@ -0,0 +1,40 @@
#include <stdio.h>
#include <string.h>
#include "train.h"
#include "apply.h"
#include "safetensor.h"
static void print_usage(void) {
printf("Usage: autokem <command> [args]\n");
printf("Commands:\n");
printf(" train Train model on existing spritesheets\n");
printf(" apply <file.tga> Apply trained model to a spritesheet\n");
printf(" stats Print model statistics\n");
printf(" help Print this message\n");
}
int main(int argc, char **argv) {
if (argc < 2) {
print_usage();
return 1;
}
if (strcmp(argv[1], "train") == 0) {
return train_model();
} else if (strcmp(argv[1], "apply") == 0) {
if (argc < 3) {
fprintf(stderr, "Error: apply requires a TGA file path\n");
return 1;
}
return apply_model(argv[2]);
} else if (strcmp(argv[1], "stats") == 0) {
return safetensor_stats("autokem.safetensors");
} else if (strcmp(argv[1], "help") == 0) {
print_usage();
return 0;
} else {
fprintf(stderr, "Unknown command: %s\n", argv[1]);
print_usage();
return 1;
}
}

542
Autokem/nn.c Normal file
View File

@@ -0,0 +1,542 @@
#define _GNU_SOURCE
#include "nn.h"
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <time.h>
#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif
/* ---- Tensor ---- */
Tensor *tensor_alloc(int ndim, const int *shape) {
Tensor *t = malloc(sizeof(Tensor));
t->ndim = ndim;
t->size = 1;
for (int i = 0; i < ndim; i++) {
t->shape[i] = shape[i];
t->size *= shape[i];
}
for (int i = ndim; i < 4; i++) t->shape[i] = 0;
t->data = malloc((size_t)t->size * sizeof(float));
return t;
}
Tensor *tensor_zeros(int ndim, const int *shape) {
Tensor *t = tensor_alloc(ndim, shape);
memset(t->data, 0, (size_t)t->size * sizeof(float));
return t;
}
void tensor_free(Tensor *t) {
if (!t) return;
free(t->data);
free(t);
}
/* ---- RNG (Box-Muller) ---- */
static uint64_t rng_state = 0;
static void rng_seed(uint64_t s) { rng_state = s; }
static uint64_t xorshift64(void) {
uint64_t x = rng_state;
x ^= x << 13;
x ^= x >> 7;
x ^= x << 17;
rng_state = x;
return x;
}
static float rand_uniform(void) {
return (float)(xorshift64() & 0x7FFFFFFF) / (float)0x7FFFFFFF;
}
static float rand_normal(void) {
float u1, u2;
do { u1 = rand_uniform(); } while (u1 < 1e-10f);
u2 = rand_uniform();
return sqrtf(-2.0f * logf(u1)) * cosf(2.0f * (float)M_PI * u2);
}
/* He init: std = sqrt(2/fan_in) */
static void he_init(Tensor *w, int fan_in) {
float std = sqrtf(2.0f / (float)fan_in);
for (int i = 0; i < w->size; i++)
w->data[i] = rand_normal() * std;
}
/* ---- Activations ---- */
static inline float sigmoid_f(float x) {
if (x >= 0.0f) {
float ez = expf(-x);
return 1.0f / (1.0f + ez);
} else {
float ez = expf(x);
return ez / (1.0f + ez);
}
}
static inline float silu_f(float x) {
return x * sigmoid_f(x);
}
static inline float silu_grad(float x) {
float s = sigmoid_f(x);
return s * (1.0f + x * (1.0f - s));
}
/* ---- Conv2D forward/backward ---- */
static void conv2d_init(Conv2D *c, int in_ch, int out_ch, int kh, int kw, int pad) {
c->in_ch = in_ch;
c->out_ch = out_ch;
c->kh = kh;
c->kw = kw;
c->pad_h = pad;
c->pad_w = pad;
int wshape[] = {out_ch, in_ch, kh, kw};
int bshape[] = {out_ch};
c->weight = tensor_alloc(4, wshape);
c->bias = tensor_zeros(1, bshape);
c->grad_weight = tensor_zeros(4, wshape);
c->grad_bias = tensor_zeros(1, bshape);
c->m_weight = tensor_zeros(4, wshape);
c->v_weight = tensor_zeros(4, wshape);
c->m_bias = tensor_zeros(1, bshape);
c->v_bias = tensor_zeros(1, bshape);
c->input_cache = NULL;
he_init(c->weight, in_ch * kh * kw);
}
static void conv2d_free(Conv2D *c) {
tensor_free(c->weight);
tensor_free(c->bias);
tensor_free(c->grad_weight);
tensor_free(c->grad_bias);
tensor_free(c->m_weight);
tensor_free(c->v_weight);
tensor_free(c->m_bias);
tensor_free(c->v_bias);
tensor_free(c->input_cache);
}
/* Forward: input [batch, in_ch, H, W] -> output [batch, out_ch, oH, oW] */
static Tensor *conv2d_forward(Conv2D *c, Tensor *input, int training) {
int batch = input->shape[0];
int in_ch = c->in_ch, out_ch = c->out_ch;
int H = input->shape[2], W = input->shape[3];
int kh = c->kh, kw = c->kw;
int ph = c->pad_h, pw = c->pad_w;
if (training) {
tensor_free(c->input_cache);
c->input_cache = tensor_alloc(input->ndim, input->shape);
memcpy(c->input_cache->data, input->data, (size_t)input->size * sizeof(float));
}
int oH = H + 2 * ph - kh + 1;
int oW = W + 2 * pw - kw + 1;
int oshape[] = {batch, out_ch, oH, oW};
Tensor *out = tensor_alloc(4, oshape);
for (int b = 0; b < batch; b++) {
for (int oc = 0; oc < out_ch; oc++) {
for (int oh = 0; oh < oH; oh++) {
for (int ow = 0; ow < oW; ow++) {
float sum = c->bias->data[oc];
for (int ic = 0; ic < in_ch; ic++) {
for (int fh = 0; fh < kh; fh++) {
for (int fw = 0; fw < kw; fw++) {
int ih = oh + fh - ph;
int iw = ow + fw - pw;
if (ih >= 0 && ih < H && iw >= 0 && iw < W) {
float inp = input->data[((b * in_ch + ic) * H + ih) * W + iw];
float wt = c->weight->data[((oc * in_ch + ic) * kh + fh) * kw + fw];
sum += inp * wt;
}
}
}
}
out->data[((b * out_ch + oc) * oH + oh) * oW + ow] = sum;
}
}
}
}
return out;
}
/* Backward: grad_output [batch, out_ch, oH, oW] -> grad_input [batch, in_ch, H, W] */
static Tensor *conv2d_backward(Conv2D *c, Tensor *grad_output) {
Tensor *input = c->input_cache;
int batch = input->shape[0];
int in_ch = c->in_ch, out_ch = c->out_ch;
int H = input->shape[2], W = input->shape[3];
int kh = c->kh, kw = c->kw;
int ph = c->pad_h, pw = c->pad_w;
int oH = grad_output->shape[2], oW = grad_output->shape[3];
Tensor *grad_input = tensor_zeros(input->ndim, input->shape);
for (int b = 0; b < batch; b++) {
for (int oc = 0; oc < out_ch; oc++) {
for (int oh = 0; oh < oH; oh++) {
for (int ow = 0; ow < oW; ow++) {
float go = grad_output->data[((b * out_ch + oc) * oH + oh) * oW + ow];
c->grad_bias->data[oc] += go;
for (int ic = 0; ic < in_ch; ic++) {
for (int fh = 0; fh < kh; fh++) {
for (int fw = 0; fw < kw; fw++) {
int ih = oh + fh - ph;
int iw = ow + fw - pw;
if (ih >= 0 && ih < H && iw >= 0 && iw < W) {
float inp = input->data[((b * in_ch + ic) * H + ih) * W + iw];
c->grad_weight->data[((oc * in_ch + ic) * kh + fh) * kw + fw] += go * inp;
grad_input->data[((b * in_ch + ic) * H + ih) * W + iw] +=
go * c->weight->data[((oc * in_ch + ic) * kh + fh) * kw + fw];
}
}
}
}
}
}
}
}
return grad_input;
}
/* ---- Dense forward/backward ---- */
static void dense_init(Dense *d, int in_f, int out_f) {
d->in_features = in_f;
d->out_features = out_f;
int wshape[] = {out_f, in_f};
int bshape[] = {out_f};
d->weight = tensor_alloc(2, wshape);
d->bias = tensor_zeros(1, bshape);
d->grad_weight = tensor_zeros(2, wshape);
d->grad_bias = tensor_zeros(1, bshape);
d->m_weight = tensor_zeros(2, wshape);
d->v_weight = tensor_zeros(2, wshape);
d->m_bias = tensor_zeros(1, bshape);
d->v_bias = tensor_zeros(1, bshape);
d->input_cache = NULL;
he_init(d->weight, in_f);
}
static void dense_free(Dense *d) {
tensor_free(d->weight);
tensor_free(d->bias);
tensor_free(d->grad_weight);
tensor_free(d->grad_bias);
tensor_free(d->m_weight);
tensor_free(d->v_weight);
tensor_free(d->m_bias);
tensor_free(d->v_bias);
tensor_free(d->input_cache);
}
/* Forward: input [batch, in_f] -> output [batch, out_f] */
static Tensor *dense_forward(Dense *d, Tensor *input, int training) {
int batch = input->shape[0];
int in_f = d->in_features, out_f = d->out_features;
if (training) {
tensor_free(d->input_cache);
d->input_cache = tensor_alloc(input->ndim, input->shape);
memcpy(d->input_cache->data, input->data, (size_t)input->size * sizeof(float));
}
int oshape[] = {batch, out_f};
Tensor *out = tensor_alloc(2, oshape);
for (int b = 0; b < batch; b++) {
for (int o = 0; o < out_f; o++) {
float sum = d->bias->data[o];
for (int i = 0; i < in_f; i++) {
sum += input->data[b * in_f + i] * d->weight->data[o * in_f + i];
}
out->data[b * out_f + o] = sum;
}
}
return out;
}
/* Backward: grad_output [batch, out_f] -> grad_input [batch, in_f] */
static Tensor *dense_backward(Dense *d, Tensor *grad_output) {
Tensor *input = d->input_cache;
int batch = input->shape[0];
int in_f = d->in_features, out_f = d->out_features;
int gshape[] = {batch, in_f};
Tensor *grad_input = tensor_zeros(2, gshape);
for (int b = 0; b < batch; b++) {
for (int o = 0; o < out_f; o++) {
float go = grad_output->data[b * out_f + o];
d->grad_bias->data[o] += go;
for (int i = 0; i < in_f; i++) {
d->grad_weight->data[o * in_f + i] += go * input->data[b * in_f + i];
grad_input->data[b * in_f + i] += go * d->weight->data[o * in_f + i];
}
}
}
return grad_input;
}
/* ---- SiLU helpers on tensors ---- */
static Tensor *apply_silu(Tensor *input) {
Tensor *out = tensor_alloc(input->ndim, input->shape);
for (int i = 0; i < input->size; i++)
out->data[i] = silu_f(input->data[i]);
return out;
}
static Tensor *apply_silu_backward(Tensor *grad_output, Tensor *pre_activation) {
Tensor *grad = tensor_alloc(grad_output->ndim, grad_output->shape);
for (int i = 0; i < grad_output->size; i++)
grad->data[i] = grad_output->data[i] * silu_grad(pre_activation->data[i]);
return grad;
}
/* ---- Global Average Pooling ---- */
/* Forward: input [batch, C, H, W] -> output [batch, C] */
static Tensor *global_avg_pool_forward(Tensor *input) {
int batch = input->shape[0];
int C = input->shape[1];
int H = input->shape[2];
int W = input->shape[3];
int hw = H * W;
int oshape[] = {batch, C};
Tensor *out = tensor_alloc(2, oshape);
for (int b = 0; b < batch; b++) {
for (int c = 0; c < C; c++) {
float sum = 0.0f;
int base = (b * C + c) * hw;
for (int i = 0; i < hw; i++)
sum += input->data[base + i];
out->data[b * C + c] = sum / (float)hw;
}
}
return out;
}
/* Backward: grad_output [batch, C] -> grad_input [batch, C, H, W] */
static Tensor *global_avg_pool_backward(Tensor *grad_output, int H, int W) {
int batch = grad_output->shape[0];
int C = grad_output->shape[1];
int hw = H * W;
float scale = 1.0f / (float)hw;
int ishape[] = {batch, C, H, W};
Tensor *grad_input = tensor_alloc(4, ishape);
for (int b = 0; b < batch; b++) {
for (int c = 0; c < C; c++) {
float go = grad_output->data[b * C + c] * scale;
int base = (b * C + c) * hw;
for (int i = 0; i < hw; i++)
grad_input->data[base + i] = go;
}
}
return grad_input;
}
/* ---- Sigmoid on tensor ---- */
static Tensor *apply_sigmoid(Tensor *input) {
Tensor *out = tensor_alloc(input->ndim, input->shape);
for (int i = 0; i < input->size; i++)
out->data[i] = sigmoid_f(input->data[i]);
return out;
}
/* ---- Adam step for a single parameter tensor ---- */
static void adam_update(Tensor *param, Tensor *grad, Tensor *m, Tensor *v,
float lr, float beta1, float beta2, float eps, int t) {
float bc1 = 1.0f - powf(beta1, (float)t);
float bc2 = 1.0f - powf(beta2, (float)t);
for (int i = 0; i < param->size; i++) {
m->data[i] = beta1 * m->data[i] + (1.0f - beta1) * grad->data[i];
v->data[i] = beta2 * v->data[i] + (1.0f - beta2) * grad->data[i] * grad->data[i];
float m_hat = m->data[i] / bc1;
float v_hat = v->data[i] / bc2;
param->data[i] -= lr * m_hat / (sqrtf(v_hat) + eps);
}
}
/* ---- Network ---- */
Network *network_create(void) {
rng_seed((uint64_t)time(NULL) ^ 0xDEADBEEF);
Network *net = calloc(1, sizeof(Network));
conv2d_init(&net->conv1, 1, 32, 7, 7, 1);
conv2d_init(&net->conv2, 32, 64, 7, 7, 1);
dense_init(&net->fc1, 64, 256);
dense_init(&net->output, 256, 12);
return net;
}
void network_free(Network *net) {
if (!net) return;
conv2d_free(&net->conv1);
conv2d_free(&net->conv2);
dense_free(&net->fc1);
dense_free(&net->output);
tensor_free(net->act_conv1);
tensor_free(net->act_silu1);
tensor_free(net->act_conv2);
tensor_free(net->act_silu2);
tensor_free(net->act_pool);
tensor_free(net->act_fc1);
tensor_free(net->act_silu3);
tensor_free(net->act_logits);
tensor_free(net->out_all);
free(net);
}
static void free_activations(Network *net) {
tensor_free(net->act_conv1); net->act_conv1 = NULL;
tensor_free(net->act_silu1); net->act_silu1 = NULL;
tensor_free(net->act_conv2); net->act_conv2 = NULL;
tensor_free(net->act_silu2); net->act_silu2 = NULL;
tensor_free(net->act_pool); net->act_pool = NULL;
tensor_free(net->act_fc1); net->act_fc1 = NULL;
tensor_free(net->act_silu3); net->act_silu3 = NULL;
tensor_free(net->act_logits); net->act_logits = NULL;
tensor_free(net->out_all); net->out_all = NULL;
}
void network_forward(Network *net, Tensor *input, int training) {
free_activations(net);
/* Conv1 -> SiLU */
net->act_conv1 = conv2d_forward(&net->conv1, input, training);
net->act_silu1 = apply_silu(net->act_conv1);
/* Conv2 -> SiLU */
net->act_conv2 = conv2d_forward(&net->conv2, net->act_silu1, training);
net->act_silu2 = apply_silu(net->act_conv2);
/* Global Average Pool */
net->act_pool = global_avg_pool_forward(net->act_silu2);
/* FC1 -> SiLU */
net->act_fc1 = dense_forward(&net->fc1, net->act_pool, training);
net->act_silu3 = apply_silu(net->act_fc1);
/* Output -> Sigmoid */
net->act_logits = dense_forward(&net->output, net->act_silu3, training);
net->out_all = apply_sigmoid(net->act_logits);
}
void network_backward(Network *net, Tensor *target) {
int batch = net->out_all->shape[0];
int n_out = 12;
/* BCE gradient at sigmoid: d_logit = (pred - target) / batch */
int gs[] = {batch, n_out};
Tensor *grad_logits = tensor_alloc(2, gs);
for (int i = 0; i < batch * n_out; i++)
grad_logits->data[i] = (net->out_all->data[i] - target->data[i]) / (float)batch;
/* Output layer backward */
Tensor *grad_silu3 = dense_backward(&net->output, grad_logits);
tensor_free(grad_logits);
/* SiLU backward (fc1) */
Tensor *grad_fc1_out = apply_silu_backward(grad_silu3, net->act_fc1);
tensor_free(grad_silu3);
/* FC1 backward */
Tensor *grad_pool = dense_backward(&net->fc1, grad_fc1_out);
tensor_free(grad_fc1_out);
/* Global Average Pool backward */
int H = net->act_silu2->shape[2], W = net->act_silu2->shape[3];
Tensor *grad_silu2 = global_avg_pool_backward(grad_pool, H, W);
tensor_free(grad_pool);
/* SiLU backward (conv2) */
Tensor *grad_conv2_out = apply_silu_backward(grad_silu2, net->act_conv2);
tensor_free(grad_silu2);
/* Conv2 backward */
Tensor *grad_silu1 = conv2d_backward(&net->conv2, grad_conv2_out);
tensor_free(grad_conv2_out);
/* SiLU backward (conv1) */
Tensor *grad_conv1_out = apply_silu_backward(grad_silu1, net->act_conv1);
tensor_free(grad_silu1);
/* Conv1 backward */
Tensor *grad_input = conv2d_backward(&net->conv1, grad_conv1_out);
tensor_free(grad_conv1_out);
tensor_free(grad_input);
}
void network_adam_step(Network *net, float lr, float beta1, float beta2, float eps, int t) {
adam_update(net->conv1.weight, net->conv1.grad_weight, net->conv1.m_weight, net->conv1.v_weight, lr, beta1, beta2, eps, t);
adam_update(net->conv1.bias, net->conv1.grad_bias, net->conv1.m_bias, net->conv1.v_bias, lr, beta1, beta2, eps, t);
adam_update(net->conv2.weight, net->conv2.grad_weight, net->conv2.m_weight, net->conv2.v_weight, lr, beta1, beta2, eps, t);
adam_update(net->conv2.bias, net->conv2.grad_bias, net->conv2.m_bias, net->conv2.v_bias, lr, beta1, beta2, eps, t);
adam_update(net->fc1.weight, net->fc1.grad_weight, net->fc1.m_weight, net->fc1.v_weight, lr, beta1, beta2, eps, t);
adam_update(net->fc1.bias, net->fc1.grad_bias, net->fc1.m_bias, net->fc1.v_bias, lr, beta1, beta2, eps, t);
adam_update(net->output.weight, net->output.grad_weight, net->output.m_weight, net->output.v_weight, lr, beta1, beta2, eps, t);
adam_update(net->output.bias, net->output.grad_bias, net->output.m_bias, net->output.v_bias, lr, beta1, beta2, eps, t);
}
void network_zero_grad(Network *net) {
memset(net->conv1.grad_weight->data, 0, (size_t)net->conv1.grad_weight->size * sizeof(float));
memset(net->conv1.grad_bias->data, 0, (size_t)net->conv1.grad_bias->size * sizeof(float));
memset(net->conv2.grad_weight->data, 0, (size_t)net->conv2.grad_weight->size * sizeof(float));
memset(net->conv2.grad_bias->data, 0, (size_t)net->conv2.grad_bias->size * sizeof(float));
memset(net->fc1.grad_weight->data, 0, (size_t)net->fc1.grad_weight->size * sizeof(float));
memset(net->fc1.grad_bias->data, 0, (size_t)net->fc1.grad_bias->size * sizeof(float));
memset(net->output.grad_weight->data, 0, (size_t)net->output.grad_weight->size * sizeof(float));
memset(net->output.grad_bias->data, 0, (size_t)net->output.grad_bias->size * sizeof(float));
}
float network_bce_loss(Network *net, Tensor *target) {
float loss = 0.0f;
int batch = net->out_all->shape[0];
int n = batch * 12;
for (int i = 0; i < n; i++) {
float p = fmaxf(1e-7f, fminf(1.0f - 1e-7f, net->out_all->data[i]));
float t = target->data[i];
loss -= t * logf(p) + (1.0f - t) * logf(1.0f - p);
}
return loss / (float)batch;
}
void network_infer(Network *net, const float *input300, float *output12) {
int ishape[] = {1, 1, 20, 15};
Tensor *input = tensor_alloc(4, ishape);
memcpy(input->data, input300, 300 * sizeof(float));
network_forward(net, input, 0);
for (int i = 0; i < 12; i++)
output12[i] = net->out_all->data[i];
tensor_free(input);
}

88
Autokem/nn.h Normal file
View File

@@ -0,0 +1,88 @@
#ifndef NN_H
#define NN_H
#include <stdint.h>
/* ---- Tensor ---- */
typedef struct {
float *data;
int shape[4]; /* up to 4 dims */
int ndim;
int size; /* total number of elements */
} Tensor;
Tensor *tensor_alloc(int ndim, const int *shape);
Tensor *tensor_zeros(int ndim, const int *shape);
void tensor_free(Tensor *t);
/* ---- Layers ---- */
typedef struct {
int in_ch, out_ch, kh, kw;
int pad_h, pad_w;
Tensor *weight; /* [out_ch, in_ch, kh, kw] */
Tensor *bias; /* [out_ch] */
Tensor *grad_weight;
Tensor *grad_bias;
/* Adam moments */
Tensor *m_weight, *v_weight;
Tensor *m_bias, *v_bias;
/* cached input for backward */
Tensor *input_cache;
} Conv2D;
typedef struct {
int in_features, out_features;
Tensor *weight; /* [out_features, in_features] */
Tensor *bias; /* [out_features] */
Tensor *grad_weight;
Tensor *grad_bias;
Tensor *m_weight, *v_weight;
Tensor *m_bias, *v_bias;
Tensor *input_cache;
} Dense;
/* ---- Network ---- */
typedef struct {
Conv2D conv1; /* 1->32, 7x7, pad=1 */
Conv2D conv2; /* 32->64, 7x7, pad=1 */
Dense fc1; /* 64->256 */
Dense output; /* 256->12 (10 shape + 1 ytype + 1 lowheight) */
/* activation caches (allocated per forward) */
Tensor *act_conv1;
Tensor *act_silu1;
Tensor *act_conv2;
Tensor *act_silu2;
Tensor *act_pool; /* global average pool output */
Tensor *act_fc1;
Tensor *act_silu3;
Tensor *act_logits; /* pre-sigmoid */
Tensor *out_all; /* sigmoid output [batch, 12] */
} Network;
/* Init / free */
Network *network_create(void);
void network_free(Network *net);
/* Forward pass. input: [batch, 1, 20, 15]. Output stored in net->out_all */
void network_forward(Network *net, Tensor *input, int training);
/* Backward pass. target: [batch, 12] */
void network_backward(Network *net, Tensor *target);
/* Adam update step */
void network_adam_step(Network *net, float lr, float beta1, float beta2, float eps, int t);
/* Zero all gradients */
void network_zero_grad(Network *net);
/* Compute BCE loss */
float network_bce_loss(Network *net, Tensor *target);
/* Single-sample inference: input float[300], output float[12] (A-H,J,K,ytype,lowheight) */
void network_infer(Network *net, const float *input300, float *output12);
#endif

263
Autokem/safetensor.c Normal file
View File

@@ -0,0 +1,263 @@
#include "safetensor.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
/* Tensor registry entry */
typedef struct {
const char *name;
float *data;
int size;
int ndim;
int shape[4];
} TensorEntry;
static void collect_tensors(Network *net, TensorEntry *entries, int *count) {
int n = 0;
#define ADD(nm, layer, field) do { \
entries[n].name = nm; \
entries[n].data = net->layer.field->data; \
entries[n].size = net->layer.field->size; \
entries[n].ndim = net->layer.field->ndim; \
for (int i = 0; i < net->layer.field->ndim; i++) \
entries[n].shape[i] = net->layer.field->shape[i]; \
n++; \
} while(0)
ADD("conv1.weight", conv1, weight);
ADD("conv1.bias", conv1, bias);
ADD("conv2.weight", conv2, weight);
ADD("conv2.bias", conv2, bias);
ADD("fc1.weight", fc1, weight);
ADD("fc1.bias", fc1, bias);
ADD("output.weight", output, weight);
ADD("output.bias", output, bias);
#undef ADD
*count = n;
}
int safetensor_save(const char *path, Network *net, int total_samples, int epochs, float val_loss) {
TensorEntry entries[8];
int count;
collect_tensors(net, entries, &count);
/* Build JSON header */
char header[8192];
int pos = 0;
pos += snprintf(header + pos, sizeof(header) - (size_t)pos, "{");
/* metadata */
pos += snprintf(header + pos, sizeof(header) - (size_t)pos,
"\"__metadata__\":{\"samples\":\"%d\",\"epochs\":\"%d\",\"val_loss\":\"%.6f\"},",
total_samples, epochs, (double)val_loss);
/* tensor entries */
size_t data_offset = 0;
for (int i = 0; i < count; i++) {
size_t byte_size = (size_t)entries[i].size * sizeof(float);
pos += snprintf(header + pos, sizeof(header) - (size_t)pos,
"\"%s\":{\"dtype\":\"F32\",\"shape\":[", entries[i].name);
for (int d = 0; d < entries[i].ndim; d++) {
if (d > 0) pos += snprintf(header + pos, sizeof(header) - (size_t)pos, ",");
pos += snprintf(header + pos, sizeof(header) - (size_t)pos, "%d", entries[i].shape[d]);
}
pos += snprintf(header + pos, sizeof(header) - (size_t)pos,
"],\"data_offsets\":[%zu,%zu]}", data_offset, data_offset + byte_size);
if (i < count - 1)
pos += snprintf(header + pos, sizeof(header) - (size_t)pos, ",");
data_offset += byte_size;
}
pos += snprintf(header + pos, sizeof(header) - (size_t)pos, "}");
/* Pad header to 8-byte alignment */
size_t header_len = (size_t)pos;
size_t padded = (header_len + 7) & ~(size_t)7;
while (header_len < padded) {
header[header_len++] = ' ';
}
FILE *f = fopen(path, "wb");
if (!f) {
fprintf(stderr, "Error: cannot open %s for writing\n", path);
return -1;
}
/* 8-byte LE header length */
uint64_t hlen = (uint64_t)header_len;
fwrite(&hlen, 8, 1, f);
/* JSON header */
fwrite(header, 1, header_len, f);
/* Raw tensor data */
for (int i = 0; i < count; i++) {
fwrite(entries[i].data, sizeof(float), (size_t)entries[i].size, f);
}
fclose(f);
printf("Saved model to %s (%zu bytes)\n", path, 8 + header_len + data_offset);
return 0;
}
/* Minimal JSON parser: find tensor by name, extract data_offsets */
static int find_tensor_offsets(const char *json, size_t json_len, const char *name,
size_t *off_start, size_t *off_end) {
/* Search for "name": */
size_t nlen = strlen(name);
for (size_t i = 0; i + nlen + 3 < json_len; i++) {
if (json[i] == '"' && strncmp(json + i + 1, name, nlen) == 0 && json[i + 1 + nlen] == '"') {
/* Found the key, now find data_offsets */
const char *doff = strstr(json + i, "\"data_offsets\"");
if (!doff || (size_t)(doff - json) > json_len) return -1;
const char *bracket = strchr(doff, '[');
if (!bracket) return -1;
if (sscanf(bracket, "[%zu,%zu]", off_start, off_end) != 2) return -1;
return 0;
}
}
return -1;
}
int safetensor_load(const char *path, Network *net) {
FILE *f = fopen(path, "rb");
if (!f) {
fprintf(stderr, "Error: cannot open %s\n", path);
return -1;
}
uint64_t header_len;
if (fread(&header_len, 8, 1, f) != 1) { fclose(f); return -1; }
char *json = malloc((size_t)header_len + 1);
if (fread(json, 1, (size_t)header_len, f) != (size_t)header_len) {
free(json);
fclose(f);
return -1;
}
json[header_len] = '\0';
long data_start = 8 + (long)header_len;
TensorEntry entries[8];
int count;
collect_tensors(net, entries, &count);
for (int i = 0; i < count; i++) {
size_t off_start, off_end;
if (find_tensor_offsets(json, (size_t)header_len, entries[i].name, &off_start, &off_end) != 0) {
fprintf(stderr, "Error: tensor '%s' not found in %s\n", entries[i].name, path);
free(json);
fclose(f);
return -1;
}
size_t byte_size = off_end - off_start;
if (byte_size != (size_t)entries[i].size * sizeof(float)) {
fprintf(stderr, "Error: size mismatch for '%s': expected %zu, got %zu\n",
entries[i].name, (size_t)entries[i].size * sizeof(float), byte_size);
free(json);
fclose(f);
return -1;
}
fseek(f, data_start + (long)off_start, SEEK_SET);
if (fread(entries[i].data, 1, byte_size, f) != byte_size) {
fprintf(stderr, "Error: failed to read tensor '%s'\n", entries[i].name);
free(json);
fclose(f);
return -1;
}
}
free(json);
fclose(f);
return 0;
}
int safetensor_stats(const char *path) {
FILE *f = fopen(path, "rb");
if (!f) {
fprintf(stderr, "Error: cannot open %s\n", path);
return -1;
}
uint64_t header_len;
if (fread(&header_len, 8, 1, f) != 1) { fclose(f); return -1; }
char *json = malloc((size_t)header_len + 1);
if (fread(json, 1, (size_t)header_len, f) != (size_t)header_len) {
free(json);
fclose(f);
return -1;
}
json[header_len] = '\0';
fclose(f);
printf("Model: %s\n", path);
printf("Header length: %lu bytes\n", (unsigned long)header_len);
/* Extract a JSON string value: find "key":"value" and return value */
/* Helper: find value for key within metadata block */
const char *meta = strstr(json, "\"__metadata__\"");
if (meta) {
const char *keys[] = {"samples", "epochs", "val_loss"};
const char *labels[] = {"Training samples", "Epochs", "Validation loss"};
for (int k = 0; k < 3; k++) {
char search[64];
snprintf(search, sizeof(search), "\"%s\"", keys[k]);
const char *found = strstr(meta, search);
if (!found) continue;
/* skip past key and colon to opening quote of value */
const char *colon = strchr(found + strlen(search), ':');
if (!colon) continue;
const char *vstart = strchr(colon, '"');
if (!vstart) continue;
vstart++;
const char *vend = strchr(vstart, '"');
if (!vend) continue;
printf("%s: %.*s\n", labels[k], (int)(vend - vstart), vstart);
}
}
/* List tensors */
const char *tensor_names[] = {
"conv1.weight", "conv1.bias", "conv2.weight", "conv2.bias",
"fc1.weight", "fc1.bias",
"output.weight", "output.bias"
};
int total_params = 0;
printf("\nTensors:\n");
for (int i = 0; i < 8; i++) {
size_t off_start, off_end;
if (find_tensor_offsets(json, (size_t)header_len, tensor_names[i], &off_start, &off_end) == 0) {
int params = (int)(off_end - off_start) / 4;
total_params += params;
/* Extract shape */
const char *key = strstr(json, tensor_names[i]);
if (key) {
const char *shp = strstr(key, "\"shape\"");
if (shp) {
const char *br = strchr(shp, '[');
const char *bre = strchr(shp, ']');
if (br && bre) {
printf(" %-28s shape=[%.*s] params=%d\n",
tensor_names[i], (int)(bre - br - 1), br + 1, params);
}
}
}
}
}
printf("\nTotal parameters: %d (%.1f KB as float32)\n", total_params, (float)total_params * 4.0f / 1024.0f);
free(json);
return 0;
}

16
Autokem/safetensor.h Normal file
View File

@@ -0,0 +1,16 @@
#ifndef SAFETENSOR_H
#define SAFETENSOR_H
#include "nn.h"
/* Save network weights to .safetensors format.
metadata: optional string pairs (key,value,...,NULL) */
int safetensor_save(const char *path, Network *net, int total_samples, int epochs, float val_loss);
/* Load network weights from .safetensors file. */
int safetensor_load(const char *path, Network *net);
/* Print model stats from .safetensors file. */
int safetensor_stats(const char *path);
#endif

631
Autokem/sheet_stats.py Normal file
View File

@@ -0,0 +1,631 @@
#!/usr/bin/env python3
"""
Spritesheet statistics generator for TerrarumSansBitmap.
Scans all *_variable.tga sheets and reports:
- Width distribution
- Compiler directives (replaceWith breakdown)
- Kerning shape distribution
- Lowheight count
- Diacritics (anchors, writeOnTop, stacking)
- Glyphs missing kerning data
- Dot removal directives
- Nudge usage
- Alignment modes
- Per-sheet summary
Usage:
python sheet_stats.py [assets_dir]
python sheet_stats.py ../src/assets
"""
import os
import struct
import sys
from collections import Counter, defaultdict
# ---- TGA reader ----
class TgaImage:
__slots__ = ('width', 'height', 'pixels')
def __init__(self, width, height, pixels):
self.width = width
self.height = height
self.pixels = pixels
def get_pixel(self, x, y):
if x < 0 or x >= self.width or y < 0 or y >= self.height:
return 0
return self.pixels[y * self.width + x]
def read_tga(path):
with open(path, 'rb') as f:
data = f.read()
pos = 0
id_length = data[pos]; pos += 1
pos += 1 # colour_map_type
image_type = data[pos]; pos += 1
pos += 5
pos += 4 # x/y origin
width = struct.unpack_from('<H', data, pos)[0]; pos += 2
height = struct.unpack_from('<H', data, pos)[0]; pos += 2
bits_per_pixel = data[pos]; pos += 1
descriptor = data[pos]; pos += 1
top_to_bottom = (descriptor & 0x20) != 0
bpp = bits_per_pixel // 8
pos += id_length
if image_type != 2 or bpp not in (3, 4):
raise ValueError(f"Unsupported TGA: type={image_type}, bpp={bits_per_pixel}")
pixels = [0] * (width * height)
for row in range(height):
y = row if top_to_bottom else (height - 1 - row)
for x in range(width):
b = data[pos]; g = data[pos+1]; r = data[pos+2]
a = data[pos+3] if bpp == 4 else 0xFF
pos += bpp
pixels[y * width + x] = (r << 24) | (g << 16) | (b << 8) | a
return TgaImage(width, height, pixels)
def tagify(pixel):
return 0 if (pixel & 0xFF) == 0 else pixel
def signed_byte(val):
return val - 256 if val >= 128 else val
# ---- Unicode range classification ----
# Ranges to EXCLUDE from "missing kern" report
EXCLUDE_KERN_RANGES = [
(0x3400, 0xA000, 'CJK Unified Ideographs'),
(0x1100, 0x1200, 'Hangul Jamo'),
(0xA960, 0xA980, 'Hangul Jamo Extended-A'),
(0xD7B0, 0xD800, 'Hangul Jamo Extended-B'),
(0x3130, 0x3190, 'Hangul Compatibility Jamo'),
(0xAC00, 0xD7A4, 'Hangul Syllables'),
(0xE000, 0xE100, 'Custom Symbols (PUA)'),
(0xF0000, 0xF0600, 'Internal PUA'),
(0xFFE00, 0x100000, 'Internal control/PUA'),
(0x2800, 0x2900, 'Braille'),
(0x1FB00, 0x1FC00, 'Legacy Computing Symbols'),
(0x2400, 0x2440, 'Control Pictures'),
(0x3000, 0x3040, 'CJK Punctuation'),
(0x3040, 0x3100, 'Hiragana/Katakana'),
(0x31F0, 0x3200, 'Katakana Phonetic Ext'),
(0xFF00, 0x10000, 'Halfwidth/Fullwidth'),
(0x16A0, 0x1700, 'Runic'),
(0x300, 0x370, 'Combining Diacritical Marks'),
(0x1B000, 0x1B170, 'Hentaigana'),
]
def is_excluded_from_kern(cp):
for lo, hi, _ in EXCLUDE_KERN_RANGES:
if lo <= cp < hi:
return True
return False
def unicode_block_name(cp):
"""Rough Unicode block classification for display."""
blocks = [
(0x0000, 0x0080, 'Basic Latin'),
(0x0080, 0x0100, 'Latin-1 Supplement'),
(0x0100, 0x0180, 'Latin Extended-A'),
(0x0180, 0x0250, 'Latin Extended-B'),
(0x0250, 0x02B0, 'IPA Extensions'),
(0x02B0, 0x0300, 'Spacing Modifier Letters'),
(0x0300, 0x0370, 'Combining Diacritical Marks'),
(0x0370, 0x0400, 'Greek and Coptic'),
(0x0400, 0x0530, 'Cyrillic'),
(0x0530, 0x0590, 'Armenian'),
(0x0900, 0x0980, 'Devanagari'),
(0x0980, 0x0A00, 'Bengali'),
(0x0B80, 0x0C00, 'Tamil'),
(0x0E00, 0x0E80, 'Thai'),
(0x10D0, 0x1100, 'Georgian'),
(0x1100, 0x1200, 'Hangul Jamo'),
(0x13A0, 0x13F6, 'Cherokee'),
(0x1B80, 0x1BC0, 'Sundanese'),
(0x1C80, 0x1CC0, 'Cyrillic Extended'),
(0x1D00, 0x1DC0, 'Phonetic Extensions'),
(0x1E00, 0x1F00, 'Latin Extended Additional'),
(0x1F00, 0x2000, 'Greek Extended'),
(0x2000, 0x2070, 'General Punctuation'),
(0x20A0, 0x20D0, 'Currency Symbols'),
(0x2100, 0x2200, 'Letterlike Symbols'),
(0x2C60, 0x2C80, 'Latin Extended-C'),
(0x2DE0, 0x2E00, 'Cyrillic Extended-A'),
(0xA640, 0xA6A0, 'Cyrillic Extended-B'),
(0xA720, 0xA800, 'Latin Extended-D'),
(0xFB00, 0xFB50, 'Alphabetic Presentation Forms'),
(0x1F100, 0x1F200, 'Enclosed Alphanumeric Supplement'),
(0xF0000, 0xF0060, 'PUA Bulgarian'),
(0xF0060, 0xF00C0, 'PUA Serbian'),
(0xF0100, 0xF0500, 'PUA Devanagari Internal'),
(0xF0500, 0xF0600, 'PUA Sundanese/Codestyle'),
]
for lo, hi, name in blocks:
if lo <= cp < hi:
return name
return f'U+{cp:04X}'
# ---- Code ranges (from sheet_config.py) ----
CODE_RANGE = [
list(range(0x00, 0x100)),
list(range(0x1100, 0x1200)) + list(range(0xA960, 0xA980)) + list(range(0xD7B0, 0xD800)),
list(range(0x100, 0x180)),
list(range(0x180, 0x250)),
list(range(0x3040, 0x3100)) + list(range(0x31F0, 0x3200)),
list(range(0x3000, 0x3040)),
list(range(0x3400, 0xA000)),
list(range(0x400, 0x530)),
list(range(0xFF00, 0x10000)),
list(range(0x2000, 0x20A0)),
list(range(0x370, 0x3CF)),
list(range(0xE00, 0xE60)),
list(range(0x530, 0x590)),
list(range(0x10D0, 0x1100)),
list(range(0x250, 0x300)),
list(range(0x16A0, 0x1700)),
list(range(0x1E00, 0x1F00)),
list(range(0xE000, 0xE100)),
list(range(0xF0000, 0xF0060)),
list(range(0xF0060, 0xF00C0)),
list(range(0x13A0, 0x13F6)),
list(range(0x1D00, 0x1DC0)),
list(range(0x900, 0x980)) + list(range(0xF0100, 0xF0500)),
list(range(0x1C90, 0x1CC0)),
list(range(0x300, 0x370)),
list(range(0x1F00, 0x2000)),
list(range(0x2C60, 0x2C80)),
list(range(0xA720, 0xA800)),
list(range(0x20A0, 0x20D0)),
list(range(0xFFE00, 0xFFFA0)),
list(range(0x2100, 0x2200)),
list(range(0x1F100, 0x1F200)),
list(range(0x0B80, 0x0C00)) + list(range(0xF00C0, 0xF0100)),
list(range(0x980, 0xA00)),
list(range(0x2800, 0x2900)),
list(range(0x1B80, 0x1BC0)) + list(range(0x1CC0, 0x1CD0)) + list(range(0xF0500, 0xF0510)),
list(range(0xF0110, 0xF0130)),
list(range(0xF0520, 0xF0580)),
list(range(0xFB00, 0xFB18)),
list(range(0x1B000, 0x1B170)),
list(range(0x2400, 0x2440)),
list(range(0x1FB00, 0x1FC00)),
list(range(0xA640, 0xA6A0)),
list(range(0x2DE0, 0x2E00)),
list(range(0x1C80, 0x1C8F)),
]
FILE_LIST = [
"ascii_variable.tga",
"hangul_johab.tga",
"latinExtA_variable.tga",
"latinExtB_variable.tga",
"kana_variable.tga",
"cjkpunct_variable.tga",
"wenquanyi.tga",
"cyrilic_variable.tga",
"halfwidth_fullwidth_variable.tga",
"unipunct_variable.tga",
"greek_variable.tga",
"thai_variable.tga",
"hayeren_variable.tga",
"kartuli_variable.tga",
"ipa_ext_variable.tga",
"futhark.tga",
"latinExt_additional_variable.tga",
"puae000-e0ff.tga",
"cyrilic_bulgarian_variable.tga",
"cyrilic_serbian_variable.tga",
"tsalagi_variable.tga",
"phonetic_extensions_variable.tga",
"devanagari_variable.tga",
"kartuli_allcaps_variable.tga",
"diacritical_marks_variable.tga",
"greek_polytonic_xyswap_variable.tga",
"latinExtC_variable.tga",
"latinExtD_variable.tga",
"currencies_variable.tga",
"internal_variable.tga",
"letterlike_symbols_variable.tga",
"enclosed_alphanumeric_supplement_variable.tga",
"tamil_extrawide_variable.tga",
"bengali_variable.tga",
"braille_variable.tga",
"sundanese_variable.tga",
"devanagari_internal_extrawide_variable.tga",
"pua_codestyle_ascii_variable.tga",
"alphabetic_presentation_forms_extrawide_variable.tga",
"hentaigana_variable.tga",
"control_pictures_variable.tga",
"symbols_for_legacy_computing_variable.tga",
"cyrilic_extB_variable.tga",
"cyrilic_extA_variable.tga",
"cyrilic_extC_variable.tga",
]
def is_variable(fn):
return fn.endswith('_variable.tga')
def is_extra_wide(fn):
return 'extrawide' in fn.lower()
def is_xyswap(fn):
return 'xyswap' in fn.lower()
# ---- Shape tag formatting ----
SHAPE_CHARS = 'ABCDEFGHJK'
def format_shape(mask, is_ytype):
"""Format kerning mask + ytype as keming_machine tag, e.g. 'ABCDEFGH(B)'."""
bits = []
for i, ch in enumerate(SHAPE_CHARS):
bit_pos = [7, 6, 5, 4, 3, 2, 1, 0, 15, 14][i]
if (mask >> bit_pos) & 1:
bits.append(ch)
chars = ''.join(bits) if bits else '(empty)'
mode = '(Y)' if is_ytype else '(B)'
return f'{chars}{mode}'
# ---- Parsing ----
def parse_diacritics_anchors(img, tag_x, tag_y):
"""Return number of defined diacritics anchors (0-6)."""
count = 0
for i in range(6):
y_pos = 13 - (i // 3) * 2
shift = (3 - (i % 3)) * 8
y_pixel = tagify(img.get_pixel(tag_x, tag_y + y_pos))
x_pixel = tagify(img.get_pixel(tag_x, tag_y + y_pos + 1))
y_used = ((y_pixel >> shift) & 128) != 0
x_used = ((x_pixel >> shift) & 128) != 0
if y_used or x_used:
count += 1
return count
def parse_variable_sheet(path, code_range, is_xy, is_ew):
"""Parse a variable-width sheet and yield per-glyph stats dicts."""
img = read_tga(path)
cell_w = 32 if is_ew else 16
cell_h = 20
cols = img.width // cell_w
for index, code in enumerate(code_range):
if is_xy:
cell_x = (index // cols) * cell_w
cell_y = (index % cols) * cell_h
else:
cell_x = (index % cols) * cell_w
cell_y = (index // cols) * cell_h
tag_x = cell_x + (cell_w - 1)
tag_y = cell_y
# Width
width = 0
for y in range(5):
if img.get_pixel(tag_x, tag_y + y) & 0xFF:
width |= (1 << y)
if width == 0:
continue # empty cell
# Lowheight
is_low_height = (img.get_pixel(tag_x, tag_y + 5) & 0xFF) != 0
# Kerning data
kern_pixel = tagify(img.get_pixel(tag_x, tag_y + 6))
has_kern = (kern_pixel & 0xFF) != 0
is_ytype = (kern_pixel & 0x80000000) != 0 if has_kern else False
kern_mask = ((kern_pixel >> 8) & 0xFFFFFF) if has_kern else 0
# Dot removal (Y+7)
dot_pixel = tagify(img.get_pixel(tag_x, tag_y + 7))
has_dot_removal = dot_pixel != 0
# Compiler directive (Y+9)
dir_pixel = tagify(img.get_pixel(tag_x, tag_y + 9))
opcode = (dir_pixel >> 24) & 0xFF
arg1 = (dir_pixel >> 16) & 0xFF
arg2 = (dir_pixel >> 8) & 0xFF
# Nudge (Y+10)
nudge_pixel = tagify(img.get_pixel(tag_x, tag_y + 10))
nudge_x = signed_byte((nudge_pixel >> 24) & 0xFF) if nudge_pixel else 0
nudge_y = signed_byte((nudge_pixel >> 16) & 0xFF) if nudge_pixel else 0
has_nudge = nudge_x != 0 or nudge_y != 0
# Diacritics anchors (Y+11..Y+14)
n_anchors = parse_diacritics_anchors(img, tag_x, tag_y)
# Alignment (Y+15..Y+16)
align = 0
for y in range(2):
if img.get_pixel(tag_x, tag_y + 15 + y) & 0xFF:
align |= (1 << y)
# WriteOnTop (Y+17)
wot_raw = img.get_pixel(tag_x, tag_y + 17)
has_write_on_top = (wot_raw & 0xFF) != 0
# Stack (Y+18..Y+19)
s0 = tagify(img.get_pixel(tag_x, tag_y + 18))
s1 = tagify(img.get_pixel(tag_x, tag_y + 19))
if s0 == 0x00FF00FF and s1 == 0x00FF00FF:
stack_where = 4 # STACK_DONT
else:
stack_where = 0
for y in range(2):
if img.get_pixel(tag_x, tag_y + 18 + y) & 0xFF:
stack_where |= (1 << y)
yield {
'code': code,
'width': width,
'lowheight': is_low_height,
'has_kern': has_kern,
'is_ytype': is_ytype,
'kern_mask': kern_mask,
'has_dot_removal': has_dot_removal,
'opcode': opcode,
'opcode_arg1': arg1,
'opcode_arg2': arg2,
'has_nudge': has_nudge,
'nudge_x': nudge_x,
'nudge_y': nudge_y,
'n_anchors': n_anchors,
'align': align,
'has_write_on_top': has_write_on_top,
'stack_where': stack_where,
}
# ---- Main ----
def main():
assets_dir = sys.argv[1] if len(sys.argv) > 1 else '../src/assets'
# Accumulators
all_glyphs = []
per_sheet = defaultdict(lambda: {'total': 0, 'kern': 0, 'lowh': 0, 'directives': 0})
sheets_scanned = 0
print(f"Scanning {assets_dir}...\n")
for sheet_idx, filename in enumerate(FILE_LIST):
if not is_variable(filename):
continue
if sheet_idx >= len(CODE_RANGE):
continue
path = os.path.join(assets_dir, filename)
if not os.path.exists(path):
continue
is_xy = is_xyswap(filename)
is_ew = is_extra_wide(filename)
code_range = CODE_RANGE[sheet_idx]
count = 0
for g in parse_variable_sheet(path, code_range, is_xy, is_ew):
g['sheet'] = filename
all_glyphs.append(g)
s = per_sheet[filename]
s['total'] += 1
if g['has_kern']:
s['kern'] += 1
if g['lowheight']:
s['lowh'] += 1
if g['opcode'] != 0:
s['directives'] += 1
count += 1
sheets_scanned += 1
total = len(all_glyphs)
if total == 0:
print("No glyphs found!")
return 1
print(f"Scanned {sheets_scanned} variable sheets, {total} glyphs with width > 0\n")
# ---- 1. Width distribution ----
width_counter = Counter(g['width'] for g in all_glyphs)
print("=" * 60)
print("WIDTH DISTRIBUTION")
print("=" * 60)
for w in sorted(width_counter):
c = width_counter[w]
bar = '#' * (c * 40 // max(width_counter.values()))
print(f" w={w:2d}: {c:5d} ({100*c/total:5.1f}%) {bar}")
print(f" Total: {total}")
# ---- 2. Compiler directives ----
dir_glyphs = [g for g in all_glyphs if g['opcode'] != 0]
print(f"\n{'=' * 60}")
print("COMPILER DIRECTIVES")
print("=" * 60)
print(f" Total glyphs with directives: {len(dir_glyphs)}/{total} ({100*len(dir_glyphs)/total:.1f}%)")
opcode_counter = Counter()
replace_counts = Counter()
illegal_count = 0
for g in dir_glyphs:
op = g['opcode']
opcode_counter[op] += 1
if 0x80 <= op <= 0x87:
n_replace = op & 0x07
replace_counts[n_replace] += 1
if op == 255:
illegal_count += 1
if opcode_counter:
print(f"\n By opcode:")
for op in sorted(opcode_counter):
c = opcode_counter[op]
if 0x80 <= op <= 0x87:
label = f'replaceWith (n={op & 0x07})'
elif op == 255:
label = 'ILLEGAL (0xFF)'
else:
label = f'unknown'
print(f" 0x{op:02X} ({label}): {c}")
if replace_counts:
print(f"\n replaceWith breakdown:")
for n in sorted(replace_counts):
print(f" {n} replacement char(s): {replace_counts[n]}")
if illegal_count:
print(f" Illegal glyphs: {illegal_count}")
# ---- 3. Kerning shapes ----
kern_glyphs = [g for g in all_glyphs if g['has_kern']]
print(f"\n{'=' * 60}")
print("KERNING SHAPES")
print("=" * 60)
print(f" Glyphs with kern data: {len(kern_glyphs)}/{total} ({100*len(kern_glyphs)/total:.1f}%)")
shape_counter = Counter()
for g in kern_glyphs:
tag = format_shape(g['kern_mask'], g['is_ytype'])
shape_counter[tag] += 1
n_unique = len(shape_counter)
n_kern = len(kern_glyphs)
ytype_count = sum(1 for g in kern_glyphs if g['is_ytype'])
btype_count = n_kern - ytype_count
print(f" Unique shapes: {n_unique}")
print(f" B-type: {btype_count} ({100*btype_count/n_kern:.1f}%)")
print(f" Y-type: {ytype_count} ({100*ytype_count/n_kern:.1f}%)")
# Per-bit occurrences
bit_names = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K']
bit_positions = [7, 6, 5, 4, 3, 2, 1, 0, 15, 14]
print(f"\n Per-bit occurrences ({n_kern} glyphs with kern):")
for name, pos in zip(bit_names, bit_positions):
c = sum(1 for g in kern_glyphs if (g['kern_mask'] >> pos) & 1)
bar = '#' * (c * 30 // n_kern)
print(f" {name}: {c:5d}/{n_kern} ({100*c/n_kern:5.1f}%) {bar}")
print(f"\n Top shapes (of {n_unique} unique):")
for tag, c in shape_counter.most_common(30):
bar = '#' * (c * 30 // shape_counter.most_common(1)[0][1])
print(f" {tag:<22s} {c:4d} ({100*c/len(kern_glyphs):5.1f}%) {bar}")
if n_unique > 30:
remaining = sum(c for _, c in shape_counter.most_common()[30:])
print(f" ... {n_unique - 30} more shapes: {remaining} glyphs")
# ---- 4. Lowheight ----
lowh_glyphs = [g for g in all_glyphs if g['lowheight']]
print(f"\n{'=' * 60}")
print("LOWHEIGHT")
print("=" * 60)
print(f" Lowheight glyphs: {len(lowh_glyphs)}/{total} ({100*len(lowh_glyphs)/total:.1f}%)")
# ---- 5. Diacritics / stacking ----
anchor_glyphs = [g for g in all_glyphs if g['n_anchors'] > 0]
wot_glyphs = [g for g in all_glyphs if g['has_write_on_top']]
stack_names = {0: 'STACK_UP', 1: 'STACK_DOWN', 2: 'STACK_BEFORE_N_AFTER',
3: 'STACK_UP_N_DOWN', 4: 'STACK_DONT'}
stack_counter = Counter(g['stack_where'] for g in all_glyphs if g['stack_where'] != 0)
print(f"\n{'=' * 60}")
print("DIACRITICS & STACKING")
print("=" * 60)
print(f" Glyphs with diacritics anchors: {len(anchor_glyphs)}/{total} ({100*len(anchor_glyphs)/total:.1f}%)")
anchor_count_dist = Counter(g['n_anchors'] for g in anchor_glyphs)
for n in sorted(anchor_count_dist):
print(f" {n} anchor(s): {anchor_count_dist[n]}")
print(f" Glyphs with writeOnTop: {len(wot_glyphs)}")
if stack_counter:
print(f" Stack modes:")
for sw, c in stack_counter.most_common():
print(f" {stack_names.get(sw, f'?{sw}')}: {c}")
# ---- 6. Dot removal ----
dot_glyphs = [g for g in all_glyphs if g['has_dot_removal']]
print(f"\n{'=' * 60}")
print("DOT REMOVAL")
print("=" * 60)
print(f" Glyphs with dot removal directive: {len(dot_glyphs)}/{total} ({100*len(dot_glyphs)/total:.1f}%)")
# ---- 7. Nudge ----
nudge_glyphs = [g for g in all_glyphs if g['has_nudge']]
print(f"\n{'=' * 60}")
print("NUDGE")
print("=" * 60)
print(f" Glyphs with nudge: {len(nudge_glyphs)}/{total} ({100*len(nudge_glyphs)/total:.1f}%)")
if nudge_glyphs:
nudge_x_vals = Counter(g['nudge_x'] for g in nudge_glyphs if g['nudge_x'] != 0)
nudge_y_vals = Counter(g['nudge_y'] for g in nudge_glyphs if g['nudge_y'] != 0)
if nudge_x_vals:
print(f" X nudge values: {dict(sorted(nudge_x_vals.items()))}")
if nudge_y_vals:
print(f" Y nudge values: {dict(sorted(nudge_y_vals.items()))}")
# ---- 8. Alignment ----
align_names = {0: 'LEFT', 1: 'RIGHT', 2: 'CENTRE', 3: 'BEFORE'}
align_counter = Counter(g['align'] for g in all_glyphs if g['align'] != 0)
print(f"\n{'=' * 60}")
print("ALIGNMENT")
print("=" * 60)
if align_counter:
for a, c in align_counter.most_common():
print(f" {align_names.get(a, f'?{a}')}: {c}")
else:
print(" All glyphs use default (LEFT) alignment")
# ---- 9. Missing kern data ----
missing = [g for g in all_glyphs
if not g['has_kern']
and g['opcode'] == 0
and not is_excluded_from_kern(g['code'])]
print(f"\n{'=' * 60}")
print("MISSING KERNING DATA")
print("=" * 60)
print(f" Glyphs without kern (excl. CJK/Hangul/symbols/diacriticals): "
f"{len(missing)}/{total} ({100*len(missing)/total:.1f}%)")
if missing:
by_block = defaultdict(list)
for g in missing:
by_block[unicode_block_name(g['code'])].append(g['code'])
print(f"\n By block:")
for block in sorted(by_block, key=lambda b: by_block[b][0]):
cps = by_block[block]
sample = ', '.join(f'U+{c:04X}' for c in cps[:8])
more = f' ... +{len(cps)-8}' if len(cps) > 8 else ''
print(f" {block}: {len(cps)} ({sample}{more})")
# ---- 10. Per-sheet summary ----
print(f"\n{'=' * 60}")
print("PER-SHEET SUMMARY")
print("=" * 60)
print(f" {'Sheet':<52s} {'Total':>5s} {'Kern':>5s} {'LowH':>5s} {'Dir':>4s}")
print(f" {'-'*52} {'-'*5} {'-'*5} {'-'*5} {'-'*4}")
for fn in sorted(per_sheet):
s = per_sheet[fn]
print(f" {fn:<52s} {s['total']:5d} {s['kern']:5d} {s['lowh']:5d} {s['directives']:4d}")
return 0
if __name__ == '__main__':
sys.exit(main())

105
Autokem/tga.c Normal file
View File

@@ -0,0 +1,105 @@
#include "tga.h"
#include <stdlib.h>
#include <string.h>
TgaImage *tga_read(const char *path) {
FILE *f = fopen(path, "rb");
if (!f) return NULL;
uint8_t header[18];
if (fread(header, 1, 18, f) != 18) { fclose(f); return NULL; }
uint8_t id_length = header[0];
uint8_t colour_map_type = header[1];
uint8_t image_type = header[2];
/* skip colour map spec (bytes 3-7) */
/* image spec starts at byte 8 */
uint16_t width = header[12] | (header[13] << 8);
uint16_t height = header[14] | (header[15] << 8);
uint8_t bpp = header[16];
uint8_t descriptor = header[17];
if (colour_map_type != 0 || image_type != 2 || bpp != 32) {
fclose(f);
return NULL;
}
int top_to_bottom = (descriptor & 0x20) != 0;
/* skip image ID */
if (id_length > 0) fseek(f, id_length, SEEK_CUR);
long pixel_data_offset = 18 + id_length;
TgaImage *img = malloc(sizeof(TgaImage));
if (!img) { fclose(f); return NULL; }
img->width = width;
img->height = height;
img->pixel_data_offset = pixel_data_offset;
img->top_to_bottom = top_to_bottom;
img->pixels = malloc((size_t)width * height * sizeof(uint32_t));
if (!img->pixels) { free(img); fclose(f); return NULL; }
for (int row = 0; row < height; row++) {
int y = top_to_bottom ? row : (height - 1 - row);
for (int x = 0; x < width; x++) {
uint8_t bgra[4];
if (fread(bgra, 1, 4, f) != 4) {
free(img->pixels); free(img); fclose(f);
return NULL;
}
/* TGA stores BGRA, convert to RGBA8888 */
uint32_t r = bgra[2], g = bgra[1], b = bgra[0], a = bgra[3];
img->pixels[y * width + x] = (r << 24) | (g << 16) | (b << 8) | a;
}
}
fclose(f);
return img;
}
uint32_t tga_get_pixel(const TgaImage *img, int x, int y) {
if (x < 0 || x >= img->width || y < 0 || y >= img->height) return 0;
return img->pixels[y * img->width + x];
}
int tga_write_pixel(const char *path, TgaImage *img, int x, int y, uint32_t rgba) {
if (x < 0 || x >= img->width || y < 0 || y >= img->height) return -1;
/* compute file row: reverse the mapping used during read */
int file_row;
if (img->top_to_bottom) {
file_row = y;
} else {
file_row = img->height - 1 - y;
}
long offset = img->pixel_data_offset + ((long)file_row * img->width + x) * 4;
FILE *f = fopen(path, "r+b");
if (!f) return -1;
fseek(f, offset, SEEK_SET);
/* convert RGBA8888 to TGA BGRA */
uint8_t bgra[4];
bgra[2] = (rgba >> 24) & 0xFF; /* R */
bgra[1] = (rgba >> 16) & 0xFF; /* G */
bgra[0] = (rgba >> 8) & 0xFF; /* B */
bgra[3] = rgba & 0xFF; /* A */
size_t written = fwrite(bgra, 1, 4, f);
fclose(f);
/* also update in-memory pixel array */
img->pixels[y * img->width + x] = rgba;
return (written == 4) ? 0 : -1;
}
void tga_free(TgaImage *img) {
if (!img) return;
free(img->pixels);
free(img);
}

33
Autokem/tga.h Normal file
View File

@@ -0,0 +1,33 @@
#ifndef TGA_H
#define TGA_H
#include <stdint.h>
#include <stdio.h>
typedef struct {
int width;
int height;
uint32_t *pixels; /* RGBA8888: R<<24 | G<<16 | B<<8 | A */
long pixel_data_offset; /* byte offset of pixel data in file */
int top_to_bottom;
} TgaImage;
/* Read an uncompressed 32-bit TGA file. Returns NULL on error. */
TgaImage *tga_read(const char *path);
/* Get pixel at (x,y) as RGBA8888. Returns 0 for out-of-bounds. */
uint32_t tga_get_pixel(const TgaImage *img, int x, int y);
/* Write a single pixel (RGBA8888) to TGA file on disk at (x,y).
Opens/closes the file internally. */
int tga_write_pixel(const char *path, TgaImage *img, int x, int y, uint32_t rgba);
/* Free a TgaImage. */
void tga_free(TgaImage *img);
/* tagify: returns 0 if alpha==0, else full pixel value */
static inline uint32_t tagify(uint32_t pixel) {
return (pixel & 0xFF) == 0 ? 0 : pixel;
}
#endif

415
Autokem/train.c Normal file
View File

@@ -0,0 +1,415 @@
#include "train.h"
#include "tga.h"
#include "nn.h"
#include "safetensor.h"
#include "unicode_filter.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <time.h>
#include <dirent.h>
/* ---- Data sample ---- */
typedef struct {
float input[300]; /* 15x20 binary */
float shape[10]; /* A,B,C,D,E,F,G,H,J,K */
float ytype;
float lowheight;
} Sample;
/* ---- Bit extraction from kerning mask ---- */
/* kerningMask = pixel >> 8 & 0xFFFFFF
* Layout: Red=Y0000000, Green=JK000000, Blue=ABCDEFGH
* After >> 8: bits 23-16 = Red[7:0], bits 15-8 = Green[7:0], bits 7-0 = Blue[7:0]
* Y = bit 23 (already extracted separately as isKernYtype)
* J = bit 15, K = bit 14
* A = bit 7, B = bit 6, ..., H = bit 0
*/
static void extract_shape_bits(int kerning_mask, float *shape) {
shape[0] = (float)((kerning_mask >> 7) & 1); /* A */
shape[1] = (float)((kerning_mask >> 6) & 1); /* B */
shape[2] = (float)((kerning_mask >> 5) & 1); /* C */
shape[3] = (float)((kerning_mask >> 4) & 1); /* D */
shape[4] = (float)((kerning_mask >> 3) & 1); /* E */
shape[5] = (float)((kerning_mask >> 2) & 1); /* F */
shape[6] = (float)((kerning_mask >> 1) & 1); /* G */
shape[7] = (float)((kerning_mask >> 0) & 1); /* H */
shape[8] = (float)((kerning_mask >> 15) & 1); /* J */
shape[9] = (float)((kerning_mask >> 14) & 1); /* K */
}
/* ---- Collect samples from one TGA ---- */
static int collect_from_sheet(const char *path, int is_xyswap, int start_code,
Sample *samples, int max_samples) {
TgaImage *img = tga_read(path);
if (!img) {
fprintf(stderr, "Warning: cannot read %s\n", path);
return 0;
}
int cell_w = 16, cell_h = 20;
int cols = img->width / cell_w;
int rows = img->height / cell_h;
int total_cells = cols * rows;
int count = 0;
for (int index = 0; index < total_cells && count < max_samples; index++) {
int cell_x, cell_y;
if (is_xyswap) {
cell_x = (index / cols) * cell_w;
cell_y = (index % cols) * cell_h;
} else {
cell_x = (index % cols) * cell_w;
cell_y = (index / cols) * cell_h;
}
int tag_x = cell_x + (cell_w - 1); /* rightmost column */
int tag_y = cell_y;
/* Read width (5-bit binary from Y+0..Y+4) */
int width = 0;
for (int y = 0; y < 5; y++) {
if (tga_get_pixel(img, tag_x, tag_y + y) & 0xFF)
width |= (1 << y);
}
if (width == 0) continue;
/* Skip modifier letters, symbols, punctuation */
if (start_code >= 0 && is_excluded_from_training(start_code + index))
continue;
/* Read kerning data pixel at Y+6 */
uint32_t kern_pixel = tagify(tga_get_pixel(img, tag_x, tag_y + 6));
if ((kern_pixel & 0xFF) == 0) continue; /* no kern data */
/* Extract labels */
int is_kern_ytype = (kern_pixel & 0x80000000u) != 0;
int kerning_mask = (int)((kern_pixel >> 8) & 0xFFFFFF);
int is_low_height = (tga_get_pixel(img, tag_x, tag_y + 5) & 0xFF) != 0;
Sample *s = &samples[count];
extract_shape_bits(kerning_mask, s->shape);
s->ytype = (float)is_kern_ytype;
s->lowheight = (float)is_low_height;
/* Extract 15x20 binary input from the glyph area */
for (int gy = 0; gy < 20; gy++) {
for (int gx = 0; gx < 15; gx++) {
uint32_t p = tga_get_pixel(img, cell_x + gx, cell_y + gy);
s->input[gy * 15 + gx] = ((p & 0x80) != 0) ? 1.0f : 0.0f;
}
}
count++;
}
tga_free(img);
return count;
}
/* ---- Fisher-Yates shuffle ---- */
static void shuffle_indices(int *arr, int n) {
for (int i = n - 1; i > 0; i--) {
int j = rand() % (i + 1);
int tmp = arr[i]; arr[i] = arr[j]; arr[j] = tmp;
}
}
/* ---- Copy network weights ---- */
static void copy_tensor_data(Tensor *dst, Tensor *src) {
memcpy(dst->data, src->data, (size_t)src->size * sizeof(float));
}
static void save_weights(Network *net, Network *best) {
copy_tensor_data(best->conv1.weight, net->conv1.weight);
copy_tensor_data(best->conv1.bias, net->conv1.bias);
copy_tensor_data(best->conv2.weight, net->conv2.weight);
copy_tensor_data(best->conv2.bias, net->conv2.bias);
copy_tensor_data(best->fc1.weight, net->fc1.weight);
copy_tensor_data(best->fc1.bias, net->fc1.bias);
copy_tensor_data(best->output.weight, net->output.weight);
copy_tensor_data(best->output.bias, net->output.bias);
}
/* ---- Training ---- */
int train_model(void) {
const char *assets_dir = "../src/assets";
const int max_total = 16384;
Sample *all_samples = calloc((size_t)max_total, sizeof(Sample));
if (!all_samples) { fprintf(stderr, "Error: out of memory\n"); return 1; }
int total = 0;
/* Scan for *_variable.tga files */
DIR *dir = opendir(assets_dir);
if (!dir) {
fprintf(stderr, "Error: cannot open %s\n", assets_dir);
free(all_samples);
return 1;
}
struct dirent *ent;
int file_count = 0;
while ((ent = readdir(dir)) != NULL) {
const char *name = ent->d_name;
size_t len = strlen(name);
/* Must end with _variable.tga */
if (len < 14) continue;
if (strcmp(name + len - 13, "_variable.tga") != 0) continue;
/* Skip extrawide */
if (strstr(name, "extrawide") != NULL) continue;
/* Check for xyswap */
int is_xyswap = (strstr(name, "xyswap") != NULL);
char fullpath[512];
snprintf(fullpath, sizeof(fullpath), "%s/%s", assets_dir, name);
int start_code = sheet_start_code(name);
int got = collect_from_sheet(fullpath, is_xyswap, start_code,
all_samples + total, max_total - total);
if (got > 0) {
printf(" %s: %d samples\n", name, got);
total += got;
file_count++;
}
}
closedir(dir);
printf("Collected %d samples from %d sheets\n", total, file_count);
if (total < 10) {
fprintf(stderr, "Error: too few samples to train\n");
free(all_samples);
return 1;
}
/* Print label distribution */
{
const char *bit_names[] = {"A","B","C","D","E","F","G","H","J","K","Ytype","LowH"};
int counts[12] = {0};
int nonzero_input = 0;
for (int i = 0; i < total; i++) {
for (int b = 0; b < 10; b++)
counts[b] += (int)all_samples[i].shape[b];
counts[10] += (int)all_samples[i].ytype;
counts[11] += (int)all_samples[i].lowheight;
for (int p = 0; p < 300; p++)
if (all_samples[i].input[p] > 0.5f) { nonzero_input++; break; }
}
printf("Label distribution:\n ");
for (int b = 0; b < 12; b++)
printf("%s:%d(%.0f%%) ", bit_names[b], counts[b], 100.0 * counts[b] / total);
printf("\n Non-empty inputs: %d/%d\n\n", nonzero_input, total);
}
/* Shuffle and split 80/20 */
srand((unsigned)time(NULL));
int *indices = malloc((size_t)total * sizeof(int));
for (int i = 0; i < total; i++) indices[i] = i;
shuffle_indices(indices, total);
int n_train = (int)(total * 0.8);
int n_val = total - n_train;
printf("Train: %d, Validation: %d\n\n", n_train, n_val);
/* Create network */
Network *net = network_create();
Network *best_net = network_create();
int batch_size = 32;
float lr = 0.001f, beta1 = 0.9f, beta2 = 0.999f, eps = 1e-8f;
int max_epochs = 200;
int patience = 10;
float best_val_loss = 1e30f;
int patience_counter = 0;
int best_epoch = 0;
int adam_t = 0;
for (int epoch = 0; epoch < max_epochs; epoch++) {
/* Shuffle training indices */
shuffle_indices(indices, n_train);
float train_loss = 0.0f;
int n_batches = 0;
/* Training loop */
for (int start = 0; start < n_train; start += batch_size) {
int bs = (start + batch_size <= n_train) ? batch_size : (n_train - start);
/* Build batch tensors */
int ishape[] = {bs, 1, 20, 15};
Tensor *input = tensor_alloc(4, ishape);
int tshape[] = {bs, 12};
Tensor *target = tensor_alloc(2, tshape);
for (int i = 0; i < bs; i++) {
Sample *s = &all_samples[indices[start + i]];
memcpy(input->data + i * 300, s->input, 300 * sizeof(float));
memcpy(target->data + i * 12, s->shape, 10 * sizeof(float));
target->data[i * 12 + 10] = s->ytype;
target->data[i * 12 + 11] = s->lowheight;
}
/* Forward */
network_zero_grad(net);
network_forward(net, input, 1);
/* Loss */
float loss = network_bce_loss(net, target);
train_loss += loss;
n_batches++;
/* Backward */
network_backward(net, target);
/* Adam step */
adam_t++;
network_adam_step(net, lr, beta1, beta2, eps, adam_t);
tensor_free(input);
tensor_free(target);
}
train_loss /= (float)n_batches;
/* Validation */
float val_loss = 0.0f;
int val_batches = 0;
for (int start = 0; start < n_val; start += batch_size) {
int bs = (start + batch_size <= n_val) ? batch_size : (n_val - start);
int ishape[] = {bs, 1, 20, 15};
Tensor *input = tensor_alloc(4, ishape);
int tshape[] = {bs, 12};
Tensor *target = tensor_alloc(2, tshape);
for (int i = 0; i < bs; i++) {
Sample *s = &all_samples[indices[n_train + start + i]];
memcpy(input->data + i * 300, s->input, 300 * sizeof(float));
memcpy(target->data + i * 12, s->shape, 10 * sizeof(float));
target->data[i * 12 + 10] = s->ytype;
target->data[i * 12 + 11] = s->lowheight;
}
network_forward(net, input, 0);
val_loss += network_bce_loss(net, target);
val_batches++;
tensor_free(input);
tensor_free(target);
}
val_loss /= (float)val_batches;
printf("Epoch %3d: train_loss=%.4f val_loss=%.4f", epoch + 1, (double)train_loss, (double)val_loss);
if (val_loss < best_val_loss) {
best_val_loss = val_loss;
best_epoch = epoch + 1;
patience_counter = 0;
save_weights(net, best_net);
printf(" *best*");
} else {
patience_counter++;
}
printf("\n");
if (patience_counter >= patience) {
printf("\nEarly stopping at epoch %d (best epoch: %d)\n", epoch + 1, best_epoch);
break;
}
}
/* Restore best weights and save */
save_weights(best_net, net);
safetensor_save("autokem.safetensors", net, total, best_epoch, best_val_loss);
/* Compute final per-bit accuracy on validation set */
{
const char *bit_names[] = {"A","B","C","D","E","F","G","H","J","K","Ytype","LowH"};
int correct_per_bit[12] = {0};
int total_per_bit = n_val;
int n_examples = 0;
const int max_examples = 8;
printf("\nGlyph Tags — validation predictions:\n");
for (int i = 0; i < n_val; i++) {
Sample *s = &all_samples[indices[n_train + i]];
float output[12];
network_infer(net, s->input, output);
int pred_bits[12], tgt_bits[12];
int any_mismatch = 0;
for (int b = 0; b < 10; b++) {
pred_bits[b] = output[b] >= 0.5f ? 1 : 0;
tgt_bits[b] = (int)s->shape[b];
if (pred_bits[b] == tgt_bits[b]) correct_per_bit[b]++;
else any_mismatch = 1;
}
pred_bits[10] = output[10] >= 0.5f ? 1 : 0;
tgt_bits[10] = (int)s->ytype;
if (pred_bits[10] == tgt_bits[10]) correct_per_bit[10]++;
else any_mismatch = 1;
pred_bits[11] = output[11] >= 0.5f ? 1 : 0;
tgt_bits[11] = (int)s->lowheight;
if (pred_bits[11] == tgt_bits[11]) correct_per_bit[11]++;
else any_mismatch = 1;
/* Print a few examples (mix of correct and mismatched) */
if (n_examples < max_examples && (any_mismatch || i < 4)) {
/* Build tag string: e.g. "ABCDEFGH(B)" or "AB(Y)" */
char actual[32] = "", predicted[32] = "";
int ap = 0, pp = 0;
const char shape_chars[] = "ABCDEFGHJK";
for (int b = 0; b < 10; b++) {
if (tgt_bits[b]) actual[ap++] = shape_chars[b];
if (pred_bits[b]) predicted[pp++] = shape_chars[b];
}
actual[ap] = '\0'; predicted[pp] = '\0';
char actual_tag[48], pred_tag[48];
snprintf(actual_tag, sizeof(actual_tag), "%s%s%s",
ap > 0 ? actual : "(empty)",
tgt_bits[10] ? "(Y)" : "(B)",
tgt_bits[11] ? " low" : "");
snprintf(pred_tag, sizeof(pred_tag), "%s%s%s",
pp > 0 ? predicted : "(empty)",
pred_bits[10] ? "(Y)" : "(B)",
pred_bits[11] ? " low" : "");
printf(" actual=%-20s pred=%-20s %s\n", actual_tag, pred_tag,
any_mismatch ? "MISMATCH" : "ok");
n_examples++;
}
}
printf("\nPer-bit accuracy (%d val samples):\n ", n_val);
int total_correct = 0;
for (int b = 0; b < 12; b++) {
printf("%s:%.1f%% ", bit_names[b], 100.0 * correct_per_bit[b] / total_per_bit);
total_correct += correct_per_bit[b];
}
printf("\n Overall: %d/%d (%.2f%%)\n",
total_correct, n_val * 12, 100.0 * total_correct / (n_val * 12));
}
network_free(net);
network_free(best_net);
free(all_samples);
free(indices);
return 0;
}

8
Autokem/train.h Normal file
View File

@@ -0,0 +1,8 @@
#ifndef TRAIN_H
#define TRAIN_H
/* Train model on existing spritesheets in ../src/assets/
Saves to autokem.safetensors */
int train_model(void);
#endif

717
Autokem/train_torch.py Normal file
View File

@@ -0,0 +1,717 @@
#!/usr/bin/env python3
"""
PyTorch training script for Autokem — drop-in replacement for `autokem train`.
Reads the same *_variable.tga sprite sheets, trains the same architecture,
and saves weights in safetensors format loadable by the C inference code.
Usage:
python train_keras.py # train with defaults
python train_keras.py --epochs 300 # override max epochs
python train_keras.py --lr 0.0005 # override learning rate
python train_keras.py --save model.safetensors
Requirements:
pip install torch numpy
"""
import argparse
import json
import os
import struct
import sys
import unicodedata
from pathlib import Path
import numpy as np
# ---- Sheet code ranges (imported from OTFbuild/sheet_config.py) ----
_otfbuild = os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', 'OTFbuild')
try:
sys.path.insert(0, _otfbuild)
from sheet_config import FILE_LIST as _FILE_LIST, CODE_RANGE as _CODE_RANGE
sys.path.pop(0)
_CODE_RANGE_MAP = {}
for _i, _fn in enumerate(_FILE_LIST):
if _i < len(_CODE_RANGE):
_CODE_RANGE_MAP[_fn] = _CODE_RANGE[_i]
except ImportError:
_CODE_RANGE_MAP = {}
# ---- TGA reader (matches OTFbuild/tga_reader.py and Autokem/tga.c) ----
class TgaImage:
__slots__ = ('width', 'height', 'pixels')
def __init__(self, width, height, pixels):
self.width = width
self.height = height
self.pixels = pixels # flat list of RGBA8888 ints
def get_pixel(self, x, y):
if x < 0 or x >= self.width or y < 0 or y >= self.height:
return 0
return self.pixels[y * self.width + x]
def read_tga(path):
with open(path, 'rb') as f:
data = f.read()
pos = 0
id_length = data[pos]; pos += 1
_colour_map_type = data[pos]; pos += 1
image_type = data[pos]; pos += 1
pos += 5 # colour map spec
pos += 2 # x_origin
pos += 2 # y_origin
width = struct.unpack_from('<H', data, pos)[0]; pos += 2
height = struct.unpack_from('<H', data, pos)[0]; pos += 2
bits_per_pixel = data[pos]; pos += 1
descriptor = data[pos]; pos += 1
top_to_bottom = (descriptor & 0x20) != 0
bpp = bits_per_pixel // 8
pos += id_length
if image_type != 2 or bpp not in (3, 4):
raise ValueError(f"Unsupported TGA: type={image_type}, bpp={bits_per_pixel}")
pixels = [0] * (width * height)
for row in range(height):
y = row if top_to_bottom else (height - 1 - row)
for x in range(width):
b = data[pos]; g = data[pos+1]; r = data[pos+2]
a = data[pos+3] if bpp == 4 else 0xFF
pos += bpp
pixels[y * width + x] = (r << 24) | (g << 16) | (b << 8) | a
return TgaImage(width, height, pixels)
def tagify(pixel):
return 0 if (pixel & 0xFF) == 0 else pixel
# ---- Data collection (matches Autokem/train.c) ----
def collect_from_sheet(path, is_xyswap, code_range=None):
"""Extract labelled samples from a single TGA sheet."""
img = read_tga(path)
cell_w, cell_h = 16, 20
cols = img.width // cell_w
rows = img.height // cell_h
total_cells = cols * rows
inputs = []
labels = []
skipped_lm = 0
for index in range(total_cells):
if is_xyswap:
cell_x = (index // cols) * cell_w
cell_y = (index % cols) * cell_h
else:
cell_x = (index % cols) * cell_w
cell_y = (index // cols) * cell_h
tag_x = cell_x + (cell_w - 1)
tag_y = cell_y
# Width (5-bit)
width = 0
for y in range(5):
if img.get_pixel(tag_x, tag_y + y) & 0xFF:
width |= (1 << y)
if width == 0:
continue
# Skip modifier letters, symbols, punctuation
if code_range is not None and index < len(code_range):
cp = code_range[index]
try:
cat = unicodedata.category(chr(cp))
if cat == 'Lm' or cat[0] in ('S', 'P'):
skipped_lm += 1
continue
except (ValueError, OverflowError):
pass
# Kern data pixel at Y+6
kern_pixel = tagify(img.get_pixel(tag_x, tag_y + 6))
if (kern_pixel & 0xFF) == 0:
continue # no kern data
# Extract labels
is_kern_ytype = 1.0 if (kern_pixel & 0x80000000) != 0 else 0.0
kerning_mask = (kern_pixel >> 8) & 0xFFFFFF
is_low_height = 1.0 if (img.get_pixel(tag_x, tag_y + 5) & 0xFF) != 0 else 0.0
# Shape bits: A(7) B(6) C(5) D(4) E(3) F(2) G(1) H(0) J(15) K(14)
shape = [
float((kerning_mask >> 7) & 1), # A
float((kerning_mask >> 6) & 1), # B
float((kerning_mask >> 5) & 1), # C
float((kerning_mask >> 4) & 1), # D
float((kerning_mask >> 3) & 1), # E
float((kerning_mask >> 2) & 1), # F
float((kerning_mask >> 1) & 1), # G
float((kerning_mask >> 0) & 1), # H
float((kerning_mask >> 15) & 1), # J
float((kerning_mask >> 14) & 1), # K
]
# 15x20 binary input
inp = np.zeros((20, 15), dtype=np.float32)
for gy in range(20):
for gx in range(15):
p = img.get_pixel(cell_x + gx, cell_y + gy)
if (p & 0x80) != 0:
inp[gy, gx] = 1.0
inputs.append(inp)
labels.append(shape + [is_kern_ytype, is_low_height])
return inputs, labels, skipped_lm
def collect_all_samples(assets_dir):
"""Scan assets_dir for *_variable.tga, collect all labelled samples."""
all_inputs = []
all_labels = []
file_count = 0
total_skipped_lm = 0
for name in sorted(os.listdir(assets_dir)):
if not name.endswith('_variable.tga'):
continue
if 'extrawide' in name:
continue
is_xyswap = 'xyswap' in name
code_range = _CODE_RANGE_MAP.get(name, None)
path = os.path.join(assets_dir, name)
inputs, labels, skipped_lm = collect_from_sheet(path, is_xyswap, code_range)
total_skipped_lm += skipped_lm
if inputs:
suffix = f" (skipped {skipped_lm})" if skipped_lm else ""
print(f" {name}: {len(inputs)} samples{suffix}")
all_inputs.extend(inputs)
all_labels.extend(labels)
file_count += 1
if total_skipped_lm:
print(f" Filtered (Lm/S/P): {total_skipped_lm}")
return np.array(all_inputs), np.array(all_labels, dtype=np.float32), file_count
# ---- Model (matches Autokem/nn.c architecture) ----
def build_model():
"""
Conv2D(1->32, 7x7, padding=1) -> SiLU
Conv2D(32->64, 7x7, padding=1) -> SiLU
GlobalAveragePooling2D -> [64]
Dense(256) -> SiLU
Dense(12) -> sigmoid
"""
import torch
import torch.nn as nn
class Keminet(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 7, padding=1)
self.conv2 = nn.Conv2d(32, 64, 7, padding=1)
self.fc1 = nn.Linear(64, 256)
# self.fc2 = nn.Linear(256, 128)
self.output = nn.Linear(256, 12)
self.tf = nn.SiLU()
# He init
for m in self.modules():
if isinstance(m, (nn.Conv2d, nn.Linear)):
nn.init.kaiming_normal_(m.weight, a=0.01, nonlinearity='leaky_relu')
if m.bias is not None:
nn.init.zeros_(m.bias)
def forward(self, x):
x = self.tf(self.conv1(x))
x = self.tf(self.conv2(x))
x = x.mean(dim=(2, 3)) # global average pool
x = self.tf(self.fc1(x))
# x = self.tf(self.fc2(x))
x = torch.sigmoid(self.output(x))
return x
return Keminet()
# ---- Safetensors export (matches Autokem/safetensor.c layout) ----
def export_safetensors(model, path, total_samples, epochs, val_loss):
"""
Save model weights in safetensors format compatible with the C code.
C code expects these tensor names with these shapes:
conv1.weight [out_ch, in_ch, kh, kw] — PyTorch matches this layout
conv1.bias [out_ch]
conv2.weight [out_ch, in_ch, kh, kw]
conv2.bias [out_ch]
fc1.weight [out_features, in_features] — PyTorch matches this layout
fc1.bias [out_features]
fc2.weight [out_features, in_features]
fc2.bias [out_features]
output.weight [out_features, in_features]
output.bias [out_features]
"""
tensor_names = [
'conv1.weight', 'conv1.bias',
'conv2.weight', 'conv2.bias',
'fc1.weight', 'fc1.bias',
# 'fc2.weight', 'fc2.bias',
'output.weight', 'output.bias',
]
state = model.state_dict()
header = {}
header['__metadata__'] = {
'samples': str(total_samples),
'epochs': str(epochs),
'val_loss': f'{val_loss:.6f}',
}
data_parts = []
offset = 0
for name in tensor_names:
arr = state[name].detach().cpu().numpy().astype(np.float32)
raw = arr.tobytes()
header[name] = {
'dtype': 'F32',
'shape': list(arr.shape),
'data_offsets': [offset, offset + len(raw)],
}
data_parts.append(raw)
offset += len(raw)
header_json = json.dumps(header, separators=(',', ':')).encode('utf-8')
padded_len = (len(header_json) + 7) & ~7
header_json = header_json + b' ' * (padded_len - len(header_json))
with open(path, 'wb') as f:
f.write(struct.pack('<Q', len(header_json)))
f.write(header_json)
for part in data_parts:
f.write(part)
total_bytes = 8 + len(header_json) + offset
print(f"Saved model to {path} ({total_bytes} bytes)")
def load_safetensors(model, path):
"""Load weights from safetensors file into the PyTorch model."""
import torch
with open(path, 'rb') as f:
header_len = struct.unpack('<Q', f.read(8))[0]
header_json = f.read(header_len)
header = json.loads(header_json)
data_start = 8 + header_len
state = model.state_dict()
for name in state:
if name not in header:
print(f" Warning: tensor '{name}' not in safetensors")
continue
entry = header[name]
off_start, off_end = entry['data_offsets']
f.seek(data_start + off_start)
raw = f.read(off_end - off_start)
arr = np.frombuffer(raw, dtype=np.float32).reshape(entry['shape'])
state[name] = torch.from_numpy(arr.copy())
model.load_state_dict(state)
print(f"Loaded weights from {path}")
# ---- Pretty-print helpers ----
BIT_NAMES = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'Ytype', 'LowH']
SHAPE_CHARS = 'ABCDEFGHJK'
MIRROR_PAIRS = [(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)] # A↔B, C↔D, E↔F, G↔H, J↔K
def format_tag(bits_12):
"""Format 12 binary bits as keming_machine tag string, e.g. 'ABCDEFGH(B)'."""
chars = ''.join(SHAPE_CHARS[i] for i in range(10) if bits_12[i])
if not chars:
chars = '(empty)'
mode = '(Y)' if bits_12[10] else '(B)'
low = ' low' if bits_12[11] else ''
return f'{chars}{mode}{low}'
def print_label_distribution(labels, total):
counts = labels.sum(axis=0).astype(int)
parts = [f'{BIT_NAMES[b]}:{counts[b]}({100*counts[b]/total:.0f}%)' for b in range(12)]
print(f"Label distribution:\n {' '.join(parts)}")
def print_examples_and_accuracy(model, X_val, y_val, max_examples=8):
"""Print example predictions and per-bit accuracy on validation set."""
import torch
model.eval()
with torch.no_grad():
preds = model(X_val).cpu().numpy()
y_np = y_val.cpu().numpy() if hasattr(y_val, 'cpu') else y_val
pred_bits = (preds >= 0.5).astype(int)
tgt_bits = y_np.astype(int)
n_val = len(y_np)
n_examples = 0
print("\nGlyph Tags — validation predictions:")
for i in range(n_val):
mismatch = not np.array_equal(pred_bits[i], tgt_bits[i])
if n_examples < max_examples and (mismatch or i < 4):
actual_tag = format_tag(tgt_bits[i])
pred_tag = format_tag(pred_bits[i])
status = 'MISMATCH' if mismatch else 'ok'
print(f" actual={actual_tag:<20s} pred={pred_tag:<20s} {status}")
n_examples += 1
correct = (pred_bits == tgt_bits)
per_bit = correct.sum(axis=0)
total_correct = correct.sum()
print(f"\nPer-bit accuracy ({n_val} val samples):")
parts = [f'{BIT_NAMES[b]}:{100*per_bit[b]/n_val:.1f}%' for b in range(12)]
print(f" {' '.join(parts)}")
print(f" Overall: {total_correct}/{n_val*12} ({100*total_correct/(n_val*12):.2f}%)")
# ---- Data augmentation ----
def _shape_key(label):
"""10-bit shape tuple from label (A through K)."""
return tuple(int(label[i]) for i in range(10))
def _mirror_shape(key):
"""Swap mirror pairs: A↔B, C↔D, E↔F, G↔H, J↔K."""
m = list(key)
for a, b in MIRROR_PAIRS:
m[a], m[b] = m[b], m[a]
return tuple(m)
def _mirror_label(label):
"""Mirror shape bits in label, keep ytype and lowheight."""
m = label.copy()
for a, b in MIRROR_PAIRS:
m[a], m[b] = m[b], m[a]
return m
def _shift_image(img, dx, dy):
"""Shift 2D image by (dx, dy), fill with 0."""
h, w = img.shape
shifted = np.zeros_like(img)
sx0, sx1 = max(0, -dx), min(w, w - dx)
sy0, sy1 = max(0, -dy), min(h, h - dy)
dx0, dx1 = max(0, dx), min(w, w + dx)
dy0, dy1 = max(0, dy), min(h, h + dy)
shifted[dy0:dy1, dx0:dx1] = img[sy0:sy1, sx0:sx1]
return shifted
def _augment_one(img, label, rng):
"""One augmented copy: random 1px shift + 1% pixel dropout."""
dx = rng.integers(-1, 2) # -1, 0, or 1
dy = rng.integers(-1, 2) # -1, 0, or 1
aug = _shift_image(img, dx, dy)
# mask = rng.random(aug.shape) > 0.01
# aug = aug * mask
return aug, label.copy()
def _do_mirror_augmentation(X, y, rng):
"""For each mirror pair (S, mirror(S)), fill deficit from the common side."""
shape_counts = {}
shape_indices = {}
for i in range(len(y)):
key = _shape_key(y[i])
shape_counts[key] = shape_counts.get(key, 0) + 1
shape_indices.setdefault(key, []).append(i)
new_X, new_y = [], []
done = set() # avoid processing both directions
for key, count in shape_counts.items():
if key in done:
continue
mkey = _mirror_shape(key)
done.add(key)
done.add(mkey)
if mkey == key:
continue # symmetric shape
mcount = shape_counts.get(mkey, 0)
if count == mcount:
continue
# Mirror from the larger side to fill the smaller side
if count > mcount:
src_key, deficit = key, count - mcount
else:
src_key, deficit = mkey, mcount - count
indices = shape_indices.get(src_key, [])
if not indices:
continue
chosen = rng.choice(indices, size=deficit, replace=True)
for idx in chosen:
new_X.append(np.fliplr(X[idx]).copy())
new_y.append(_mirror_label(y[idx]))
if new_X:
X = np.concatenate([X, np.array(new_X)])
y = np.concatenate([y, np.array(new_y)])
return X, y
def _compute_rarity_weights(y):
"""Per-sample weight: sum of inverse bit frequencies for all 12 bits.
Samples with rare bit values (e.g. J=1 at 13%, C=0 at 8%) get higher weight.
"""
bit_freq = y.mean(axis=0) # [12], P(bit=1)
weights = np.zeros(len(y))
for i in range(len(y)):
w = 0.0
for b in range(12):
p = bit_freq[b] if y[i, b] > 0.5 else (1.0 - bit_freq[b])
w += 1.0 / max(p, 0.01)
weights[i] = w
return weights
def _do_rarity_augmentation(X, y, rng, target_new):
"""Create target_new augmented samples, drawn proportionally to rarity weight."""
if target_new <= 0:
return X, y
weights = _compute_rarity_weights(y)
weights /= weights.sum()
chosen = rng.choice(len(X), size=target_new, replace=True, p=weights)
new_X, new_y = [], []
for idx in chosen:
aug_img, aug_label = _augment_one(X[idx], y[idx], rng)
new_X.append(aug_img)
new_y.append(aug_label)
X = np.concatenate([X, np.array(new_X)])
y = np.concatenate([y, np.array(new_y)])
return X, y
def _print_bit_freq(y, label):
"""Print per-bit frequencies for diagnostics."""
freq = y.mean(axis=0)
names = BIT_NAMES
parts = [f'{names[b]}:{freq[b]*100:.0f}%' for b in range(12)]
print(f" {label}: {' '.join(parts)}")
def augment_training_data(X_train, y_train, rng, aug_factor=3.0):
"""
Three-phase data augmentation:
1. Mirror augmentation — fill deficit between mirror-paired shapes
2. Rarity-weighted — samples with rare bit values get more copies (shift+dropout)
3. Y-type boost — repeat phases 1-2 scoped to Y-type samples only
"""
n0 = len(X_train)
_print_bit_freq(y_train, 'Before')
# Phase 1: Mirror augmentation
X_train, y_train = _do_mirror_augmentation(X_train, y_train, rng)
n1 = len(X_train)
# Phase 2: Rarity-weighted augmentation — target aug_factor × original size
target_new = int(n0 * aug_factor) - n1
X_train, y_train = _do_rarity_augmentation(X_train, y_train, rng, target_new)
n2 = len(X_train)
# Phase 3: Y-type boost — same pipeline for Y-type subset only
ytype_mask = y_train[:, 10] > 0.5
n_ytype_existing = int(ytype_mask.sum())
if n_ytype_existing > 0:
X_yt = X_train[ytype_mask]
y_yt = y_train[ytype_mask]
X_yt, y_yt = _do_mirror_augmentation(X_yt, y_yt, rng)
# Double the Y-type subset via rarity augmentation
yt_new = n_ytype_existing
X_yt, y_yt = _do_rarity_augmentation(X_yt, y_yt, rng, yt_new)
if len(X_yt) > n_ytype_existing:
X_train = np.concatenate([X_train, X_yt[n_ytype_existing:]])
y_train = np.concatenate([y_train, y_yt[n_ytype_existing:]])
n3 = len(X_train)
_print_bit_freq(y_train, 'After ')
print(f"Data augmentation: {n0}{n3} samples ({n3/n0:.1f}×)")
print(f" Mirror: +{n1 - n0}, Rarity: +{n2 - n1}, Y-type boost: +{n3 - n2}")
return X_train, y_train
# ---- Main ----
def main():
parser = argparse.ArgumentParser(description='Train Autokem model (PyTorch)')
parser.add_argument('--assets', default='../src/assets',
help='Path to assets directory (default: ../src/assets)')
parser.add_argument('--save', default='autokem.safetensors',
help='Output safetensors path (default: autokem.safetensors)')
parser.add_argument('--load', default=None,
help='Load weights from safetensors before training')
parser.add_argument('--epochs', type=int, default=200, help='Max epochs (default: 200)')
parser.add_argument('--batch-size', type=int, default=32, help='Batch size (default: 32)')
parser.add_argument('--lr', type=float, default=0.001, help='Learning rate (default: 0.001)')
parser.add_argument('--patience', type=int, default=10,
help='Early stopping patience (default: 10)')
parser.add_argument('--val-split', type=float, default=0.2,
help='Validation split (default: 0.2)')
parser.add_argument('--no-augment', action='store_true',
help='Disable data augmentation')
parser.add_argument('--aug-factor', type=float, default=3.0,
help='Augmentation target multiplier (default: 3.0)')
args = parser.parse_args()
import torch
import torch.nn as nn
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Device: {device}")
# Collect data
print("Collecting samples...")
X, y, file_count = collect_all_samples(args.assets)
if len(X) < 10:
print(f"Error: too few samples ({len(X)})", file=sys.stderr)
return 1
total = len(X)
print(f"Collected {total} samples from {file_count} sheets")
print_label_distribution(y, total)
nonzero = np.any(X.reshape(total, -1) > 0.5, axis=1).sum()
print(f" Non-empty inputs: {nonzero}/{total}\n")
# Shuffle and split
rng = np.random.default_rng(42)
perm = rng.permutation(total)
X, y = X[perm], y[perm]
n_val = int(total * args.val_split)
n_train = total - n_val
X_train, X_val = X[:n_train], X[n_train:]
y_train, y_val = y[:n_train], y[n_train:]
print(f"Train: {n_train}, Validation: {n_val}")
# Data augmentation (training set only)
if not args.no_augment:
X_train, y_train = augment_training_data(X_train, y_train, rng, args.aug_factor)
n_train = len(X_train)
print()
# Convert to tensors — PyTorch conv expects [N, C, H, W]
X_train_t = torch.from_numpy(X_train[:, np.newaxis, :, :]).to(device) # [N,1,20,15]
y_train_t = torch.from_numpy(y_train).to(device)
X_val_t = torch.from_numpy(X_val[:, np.newaxis, :, :]).to(device)
y_val_t = torch.from_numpy(y_val).to(device)
# Build model
model = build_model().to(device)
if args.load:
load_safetensors(model, args.load)
total_params = sum(p.numel() for p in model.parameters())
print(f"Model parameters: {total_params} ({total_params * 4 / 1024:.1f} KB)\n")
optimizer = torch.optim.Adam(model.parameters(), lr=args.lr)
loss_fn = nn.BCELoss()
best_val_loss = float('inf')
best_epoch = 0
patience_counter = 0
best_state = None
for epoch in range(1, args.epochs + 1):
# Training
model.train()
perm_train = torch.randperm(n_train, device=device)
train_loss = 0.0
n_batches = 0
for start in range(0, n_train, args.batch_size):
end = min(start + args.batch_size, n_train)
idx = perm_train[start:end]
optimizer.zero_grad()
pred = model(X_train_t[idx])
loss = loss_fn(pred, y_train_t[idx])
loss.backward()
optimizer.step()
train_loss += loss.item()
n_batches += 1
train_loss /= n_batches
# Validation
model.eval()
with torch.no_grad():
val_pred = model(X_val_t)
val_loss = loss_fn(val_pred, y_val_t).item()
marker = ''
if val_loss < best_val_loss:
best_val_loss = val_loss
best_epoch = epoch
patience_counter = 0
best_state = {k: v.clone() for k, v in model.state_dict().items()}
marker = ' *best*'
else:
patience_counter += 1
print(f"Epoch {epoch:3d}: train_loss={train_loss:.4f} val_loss={val_loss:.4f}{marker}")
if patience_counter >= args.patience:
print(f"\nEarly stopping at epoch {epoch} (best epoch: {best_epoch})")
break
# Restore best weights
if best_state is not None:
model.load_state_dict(best_state)
print(f"\nBest epoch: {best_epoch}, val_loss: {best_val_loss:.6f}")
# Print accuracy
model.eval()
print_examples_and_accuracy(model, X_val_t, y_val, max_examples=8)
# Save
export_safetensors(model, args.save, total, best_epoch, best_val_loss)
return 0
if __name__ == '__main__':
sys.exit(main())

191
Autokem/unicode_filter.h Normal file
View File

@@ -0,0 +1,191 @@
#ifndef UNICODE_FILTER_H
#define UNICODE_FILTER_H
#include <string.h>
/*
* Unicode category filters for training/apply.
* Generated from Python unicodedata (Unicode 16.0).
*
* is_modifier_letter(cp) — category Lm
* is_subscript_modifier(cp) — Lm with <sub> decomposition
* is_symbol_or_punctuation(cp) — categories S* or P*
* is_excluded_from_training(cp) — Lm or S* or P*
*/
/* ---- Lm (modifier letter) ---- */
static inline int is_modifier_letter(int cp) {
if (cp >= 0x02B0 && cp <= 0x02C1) return 1;
if (cp >= 0x02C6 && cp <= 0x02D1) return 1;
if (cp >= 0x02E0 && cp <= 0x02E4) return 1;
if (cp == 0x02EC) return 1;
if (cp == 0x02EE) return 1;
if (cp == 0x0374) return 1;
if (cp == 0x037A) return 1;
if (cp == 0x0559) return 1;
if (cp == 0x0640) return 1;
if (cp >= 0x06E5 && cp <= 0x06E6) return 1;
if (cp >= 0x07F4 && cp <= 0x07F5) return 1;
if (cp == 0x07FA) return 1;
if (cp == 0x081A) return 1;
if (cp == 0x0824) return 1;
if (cp == 0x0828) return 1;
if (cp == 0x08C9) return 1;
if (cp == 0x0971) return 1;
if (cp == 0x0E46) return 1;
if (cp == 0x0EC6) return 1;
if (cp == 0x10FC) return 1;
if (cp == 0x17D7) return 1;
if (cp == 0x1843) return 1;
if (cp == 0x1AA7) return 1;
if (cp >= 0x1C78 && cp <= 0x1C7D) return 1;
if (cp >= 0x1D2C && cp <= 0x1D6A) return 1;
if (cp == 0x1D78) return 1;
if (cp >= 0x1D9B && cp <= 0x1DBF) return 1;
if (cp == 0x2071) return 1;
if (cp == 0x207F) return 1;
if (cp >= 0x2090 && cp <= 0x209C) return 1;
if (cp >= 0x2C7C && cp <= 0x2C7D) return 1;
if (cp == 0x2D6F) return 1;
if (cp == 0x2E2F) return 1;
if (cp == 0x3005) return 1;
if (cp >= 0x3031 && cp <= 0x3035) return 1;
if (cp == 0x303B) return 1;
if (cp >= 0x309D && cp <= 0x309E) return 1;
if (cp >= 0x30FC && cp <= 0x30FE) return 1;
if (cp == 0xA015) return 1;
if (cp >= 0xA4F8 && cp <= 0xA4FD) return 1;
if (cp == 0xA60C) return 1;
if (cp == 0xA67F) return 1;
if (cp >= 0xA69C && cp <= 0xA69D) return 1;
if (cp >= 0xA717 && cp <= 0xA71F) return 1;
if (cp == 0xA770) return 1;
if (cp == 0xA788) return 1;
if (cp >= 0xA7F2 && cp <= 0xA7F4) return 1;
if (cp >= 0xA7F8 && cp <= 0xA7F9) return 1;
if (cp == 0xA9CF) return 1;
if (cp == 0xA9E6) return 1;
if (cp == 0xAA70) return 1;
if (cp == 0xAADD) return 1;
if (cp >= 0xAAF3 && cp <= 0xAAF4) return 1;
if (cp >= 0xAB5C && cp <= 0xAB5F) return 1;
if (cp == 0xAB69) return 1;
if (cp == 0xFF70) return 1;
if (cp >= 0xFF9E && cp <= 0xFF9F) return 1;
if (cp >= 0x10780 && cp <= 0x10785) return 1;
if (cp >= 0x10787 && cp <= 0x107B0) return 1;
if (cp >= 0x107B2 && cp <= 0x107BA) return 1;
if (cp >= 0x16B40 && cp <= 0x16B43) return 1;
if (cp >= 0x16F93 && cp <= 0x16F9F) return 1;
if (cp >= 0x16FE0 && cp <= 0x16FE1) return 1;
if (cp == 0x16FE3) return 1;
if (cp >= 0x1AFF0 && cp <= 0x1AFF3) return 1;
if (cp >= 0x1AFF5 && cp <= 0x1AFFB) return 1;
if (cp >= 0x1AFFD && cp <= 0x1AFFE) return 1;
if (cp >= 0x1E030 && cp <= 0x1E06D) return 1;
if (cp >= 0x1E137 && cp <= 0x1E13D) return 1;
if (cp == 0x1E4EB) return 1;
if (cp == 0x1E94B) return 1;
return 0;
}
static inline int is_subscript_modifier(int cp) {
if (cp >= 0x1D62 && cp <= 0x1D6A) return 1;
if (cp >= 0x2090 && cp <= 0x209C) return 1;
if (cp == 0x2C7C) return 1;
if (cp >= 0x1E051 && cp <= 0x1E06A) return 1;
return 0;
}
/* ---- S* (Symbol) and P* (Punctuation) ---- */
/* Table of {start, end} ranges for S/P codepoints in font sheets */
static const int sp_ranges[][2] = {
{0x00021, 0x0002F}, {0x0003A, 0x00040}, {0x0005B, 0x00060},
{0x0007B, 0x0007E}, {0x000A1, 0x000A9}, {0x000AB, 0x000AC},
{0x000AE, 0x000B1}, {0x000B4, 0x000B4}, {0x000B6, 0x000B8},
{0x000BB, 0x000BB}, {0x000BF, 0x000BF}, {0x000D7, 0x000D7},
{0x000F7, 0x000F7}, {0x002C2, 0x002C5}, {0x002D2, 0x002DF},
{0x002E5, 0x002EB}, {0x002ED, 0x002ED}, {0x002EF, 0x002FF},
{0x00375, 0x00375}, {0x0037E, 0x0037E}, {0x00384, 0x00385},
{0x00387, 0x00387}, {0x00482, 0x00482}, {0x0055A, 0x0055F},
{0x00589, 0x0058A}, {0x0058D, 0x0058F}, {0x00964, 0x00965},
{0x00970, 0x00970}, {0x009F2, 0x009F3}, {0x009FA, 0x009FB},
{0x009FD, 0x009FD}, {0x00BF3, 0x00BFA}, {0x00E3F, 0x00E3F},
{0x00E4F, 0x00E4F}, {0x00E5A, 0x00E5B}, {0x010FB, 0x010FB},
{0x016EB, 0x016ED}, {0x01CC0, 0x01CC7}, {0x01FBD, 0x01FBD},
{0x01FBF, 0x01FC1}, {0x01FCD, 0x01FCF}, {0x01FDD, 0x01FDF},
{0x01FED, 0x01FEF}, {0x01FFD, 0x01FFE}, {0x02010, 0x02027},
{0x02030, 0x0205E}, {0x0207A, 0x0207E}, {0x0208A, 0x0208E},
{0x020A0, 0x020C0}, {0x02100, 0x02101}, {0x02103, 0x02106},
{0x02108, 0x02109}, {0x02114, 0x02114}, {0x02116, 0x02118},
{0x0211E, 0x02123}, {0x02125, 0x02125}, {0x02127, 0x02127},
{0x02129, 0x02129}, {0x0212E, 0x0212E}, {0x0213A, 0x0213B},
{0x02140, 0x02144}, {0x0214A, 0x0214D}, {0x0214F, 0x0214F},
{0x0218A, 0x0218B}, {0x02190, 0x021FF}, {0x02400, 0x02426},
{0x02800, 0x028FF}, {0x03001, 0x03004}, {0x03008, 0x03020},
{0x03030, 0x03030}, {0x03036, 0x03037}, {0x0303D, 0x0303F},
{0x0309B, 0x0309C}, {0x030A0, 0x030A0}, {0x030FB, 0x030FB},
{0x04DC0, 0x04DFF}, {0x0A673, 0x0A673}, {0x0A67E, 0x0A67E},
{0x0A720, 0x0A721}, {0x0A789, 0x0A78A}, {0x0AB5B, 0x0AB5B},
{0x0AB6A, 0x0AB6B}, {0x0FF01, 0x0FF0F}, {0x0FF1A, 0x0FF20},
{0x0FF3B, 0x0FF40}, {0x0FF5B, 0x0FF65}, {0x0FFE0, 0x0FFE6},
{0x0FFE8, 0x0FFEE}, {0x0FFFC, 0x0FFFD}, {0x1F10D, 0x1F1AD},
{0x1F1E6, 0x1F1FF}, {0x1FB00, 0x1FB92}, {0x1FB94, 0x1FBCA},
};
static inline int is_symbol_or_punctuation(int cp) {
int n = (int)(sizeof(sp_ranges) / sizeof(sp_ranges[0]));
for (int i = 0; i < n; i++) {
if (cp >= sp_ranges[i][0] && cp <= sp_ranges[i][1])
return 1;
}
return 0;
}
/* ---- Combined filter for training exclusion ---- */
static inline int is_excluded_from_training(int cp) {
return is_modifier_letter(cp) || is_symbol_or_punctuation(cp);
}
/* ---- Sheet filename → start codepoint ---- */
static int sheet_start_code(const char *basename) {
if (strstr(basename, "ascii_variable")) return 0x00;
if (strstr(basename, "latinExtA_variable")) return 0x100;
if (strstr(basename, "latinExtB_variable")) return 0x180;
if (strstr(basename, "cyrilic_extC_variable")) return 0x1C80;
if (strstr(basename, "cyrilic_extB_variable")) return 0xA640;
if (strstr(basename, "cyrilic_bulgarian_variable")) return 0xF0000;
if (strstr(basename, "cyrilic_serbian_variable")) return 0xF0060;
if (strstr(basename, "cyrilic_variable")) return 0x400;
if (strstr(basename, "halfwidth_fullwidth_variable")) return 0xFF00;
if (strstr(basename, "unipunct_variable")) return 0x2000;
if (strstr(basename, "greek_polytonic")) return 0x1F00;
if (strstr(basename, "greek_variable")) return 0x370;
if (strstr(basename, "thai_variable")) return 0xE00;
if (strstr(basename, "hayeren_variable")) return 0x530;
if (strstr(basename, "kartuli_allcaps_variable")) return 0x1C90;
if (strstr(basename, "kartuli_variable")) return 0x10D0;
if (strstr(basename, "ipa_ext_variable")) return 0x250;
if (strstr(basename, "latinExt_additional_variable")) return 0x1E00;
if (strstr(basename, "tsalagi_variable")) return 0x13A0;
if (strstr(basename, "phonetic_extensions_variable")) return 0x1D00;
if (strstr(basename, "latinExtC_variable")) return 0x2C60;
if (strstr(basename, "latinExtD_variable")) return 0xA720;
if (strstr(basename, "internal_variable")) return 0xFFE00;
if (strstr(basename, "letterlike_symbols_variable")) return 0x2100;
if (strstr(basename, "enclosed_alphanumeric")) return 0x1F100;
if (strstr(basename, "sundanese_variable")) return 0x1B80;
if (strstr(basename, "control_pictures_variable")) return 0x2400;
if (strstr(basename, "latinExtE_variable")) return 0xAB30;
if (strstr(basename, "latinExtF_variable")) return 0x10780;
if (strstr(basename, "latinExtG_variable")) return 0x1DF00;
if (strstr(basename, "devanagari") && !strstr(basename, "internal"))
return 0x900;
return -1;
}
#endif /* UNICODE_FILTER_H */

View File

@@ -1,13 +1,15 @@
<?xml version="1.0" encoding="UTF-8"?> <?xml version="1.0" encoding="UTF-8"?>
<module type="JAVA_MODULE" version="4"> <module type="JAVA_MODULE" version="4">
<component name="NewModuleRootManager" inherit-compiler-output="true"> <component name="NewModuleRootManager">
<output url="file://$MODULE_DIR$/out/production/BuildJAR_TerrarumSansBitmap" />
<output-test url="file://$MODULE_DIR$/out/test/BuildJAR_TerrarumSansBitmap" />
<exclude-output /> <exclude-output />
<content url="file://$MODULE_DIR$"> <content url="file://$MODULE_DIR$">
<sourceFolder url="file://$MODULE_DIR$/src" isTestSource="false" /> <sourceFolder url="file://$MODULE_DIR$/src" isTestSource="false" />
</content> </content>
<orderEntry type="jdk" jdkName="1.8.0_242" jdkType="JavaSDK" /> <orderEntry type="inheritedJdk" />
<orderEntry type="sourceFolder" forTests="false" /> <orderEntry type="sourceFolder" forTests="false" />
<orderEntry type="library" name="KotlinJavaRuntime" level="project" /> <orderEntry type="library" name="KotlinJavaRuntime" level="project" />
<orderEntry type="library" scope="PROVIDED" name="lib" level="project" /> <orderEntry type="library" name="lib" level="project" />
</component> </component>
</module> </module>

82
CLAUDE.md Normal file
View File

@@ -0,0 +1,82 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Development Commands
### Building the JAR
The project uses IntelliJ IDEA project files (`.iml`) for building. Build the main library:
- Main library JAR: `lib/TerrarumSansBitmap.jar`
- Font test application JAR: `FontDemoGDX.jar`
### Testing Font Rendering
Run the font test application:
```bash
java -jar FontDemoGDX.jar
```
The test application demonstrates font rendering with text from `demotext_unaligned.txt` and outputs to `demo.PNG`.
### Key Development Files
- **Source code**: `src/net/torvald/terrarumsansbitmap/`
- **Font assets**: `assets/` directory (TGA format with alpha channel)
- **Test text**: `demotext.txt`, `demotext_unaligned.txt`, `testtext.txt`
- **Demo output**: Generated PNG files for visual verification
## Architecture Overview
### Core Components
**TerrarumSansBitmap** (`src/net/torvald/terrarumsansbitmap/gdx/TerrarumSansBitmap.kt`)
- Main font class extending LibGDX's BitmapFont
- Handles font asset loading from TGA sprite sheets
- Manages variable-width character rendering with complex glyph tagging system
- Supports multiple writing systems (Latin, CJK, Cyrillic, etc.)
**MovableType** (`src/net/torvald/terrarumsansbitmap/MovableType.kt`)
- Advanced typesetting engine with justified text layout
- Implements line-breaking, hyphenation, and kerning
- Supports multiple typesetting strategies (justified, ragged, centered)
- Handles complex text shaping for international scripts
**GlyphProps** (`src/net/torvald/terrarumsansbitmap/GlyphProps.kt`)
- Defines glyph properties including width, diacritics anchors, alignment
- Manages kerning data and special rendering directives
- Handles complex glyph tagging system for font behavior
### Font Asset System
**Glyph Encoding**
- Font data stored in TGA sprite sheets with embedded metadata
- Width encoded in binary dots on rightmost column
- Complex tagging system for diacritics, kerning, and special behaviors
- Variable-width sheets use `_variable` naming convention
**Character Support**
- Latin scripts with full diacritics support
- CJK ideographs (Chinese variant)
- Korean Hangul with syllable composition
- Cyrillic with Bulgarian/Serbian variants (requires control characters U+FFFC1, U+FFFC2)
- Devanagari, Tamil with ligature support
- Many other scripts (see assets directory)
**Typewriter Font**
- Separate typewriter bitmap font in `src/net/torvald/terrarumtypewriterbitmap/`
- Includes audio feedback system with typing sounds
- Supports international QWERTY and Korean 3-set layouts
### Key Technical Details
**Color Coding System**
- Uses Unicode private use area for color codes
- Utility functions: `GameFontBase.toColorCode()` for ARGB4444 format
- U+100000 disables color codes
**Korean Hangul Assembly**
- Decomposes Unicode Hangul into jamo components
- Assembles glyphs from initial/medial/final sprite pieces
- Supports modern Hangul range (U+AC00-U+D7A3)
**Font Metrics**
- Variable-width sheets parse glyph tags from sprite metadata
- Fixed-width sheets: `cjkpunct` (10px), `kana`/`hangul_johab` (12px), `wenquanyi` (16px)
- Diacritics positioning via anchor point system

View File

@@ -46,8 +46,8 @@ Rightmost vertical column (should be 20 px tall) contains the tags. Tags are def
W |= Width of the character W |= Width of the character
W | W |
W -' W -'
m --Is this character lowheight?
K -, K -,
K |
K |= Tags used by the "Keming Machine" K |= Tags used by the "Keming Machine"
K -' K -'
Q ---Compiler Directive (see below) Q ---Compiler Directive (see below)
@@ -77,29 +77,32 @@ Up&Down:
<MSB,Red> SXXXXXXX SYYYYYYY 00000000 <LSB,Blue> <MSB,Red> SXXXXXXX SYYYYYYY 00000000 <LSB,Blue>
Each X and Y numbers are Signed 8-Bit Integer. Each X and Y numbers are TWO'S COMPLEMENT Signed 8-Bit Integer.
X-positive: nudges towards left X-positive: nudges towards left
Y-positive: nudges towards up Y-positive: nudges towards down
#### Diacritics Anchor Point Encoding #### Diacritics Anchor Point Encoding
4 Pixels are further divided as follows: 4 Pixels are further divided as follows:
| LSB | | Red | Green | Blue | | LSB | | Red | Green | Blue |
| ------------ | ------------ | ------------ | ------------ | ------------ | | ------------ | ------------ | ------------ | ----------- | ------------ |
| Y | Anchor point Y for: | undefined | undefined | undefined | | Y | Anchor point Y for: | undefined | undefined | undefined |
| X | Anchor point X for: | undefined | undefined | undefined | | X | Anchor point X for: | undefined | undefined | undefined |
| Y | Anchor point Y for: | (unused) | (unused) | (unused) | | Y | Anchor point Y for: | Type-0 | Type-1 | Type-2 |
| X | Anchor point X for: | Type-0 | Type-1 | Type-2 | | X | Anchor point X for: | Type-0 | Type-1 | Type-2 |
| **MSB** | | | | | | **MSB** | | | | |
<MSB,Red> 1Y1Y1Y1Y 1Y2Y2Y2Y 1Y3Y3Y3Y <LSB,Blue> <MSB,Red> 1Y1Y1Y1Y 2Y2Y2Y2Y 3Y3Y3Y3Y <LSB,Blue>
<MSB,Red> 1X1X1X1X 1X2X2X2X 1X3X3X3X <LSB,Blue> <MSB,Red> 1X1X1X1X 2X2X2X2X 3X3X3X3X <LSB,Blue>
where Red is first, Green is second, Blue is the third diacritics. where Red is first, Green is second, Blue is the third diacritics.
MSB for each word must be set so that the pixel would appear brighter on the image editor.
(the font program will only read low 7 bits for each RGB channel) Each X and Y numbers are SIGN AND MAGNITUDE 8-Bit Integer.
X-positive: nudges towards left
Y-positive: nudges towards down
#### Diacritics Type Bit Encoding #### Diacritics Type Bit Encoding
@@ -150,7 +153,7 @@ To implement those, this two extra code points are needed, which are provided in
For working examples, take a note at the bengali sprite sheet. For working examples, take a note at the bengali sprite sheet.
This tag can be used as a general "replace this with these" directive, as long as you're replacing it into two letters. This directive is exploited to construct dutch ligature "IJ" (U+0132 and U+0133), in the sheet LatinExtA. This tag might be exploited as a general "replace this with these" directive, as long as you're replacing it into two letters. Such construction is FORBIDDEN due to diacritics incompatibility. Use Compiler Directives for such purposes.
Also note that the font compiler will not "stack" these diacritics. Also note that the font compiler will not "stack" these diacritics.
@@ -170,7 +173,7 @@ Keming Machine Tags define the rough shape of the glyph. Please read `keming_mac
## Technical Limitations ## Technical Limitations
- Each spritesheet is 4096x4096 maximum, which is a size of 4K Texture. However it is recommended to be smaller or equal to 1024x1024. - Each spritesheet is 4096x4096 maximum, which is a size of 4K Texture. However it is recommended to be smaller or equal to 1024x1024.
- Glyphs exceeding 15px of width needs to be broken down with 2 or more characters. Wider sheets WILL NOT BE IMPLEMENTED, can't waste much pixels just for few superwide glyphs. - Glyphs exceeding 15px of width needs to be broken down with 2 or more characters, or use EXTRAWIDE spritesheets.
- Due to how the compiler is coded, actual glyph must have alpha value of 255, the tags must have alpha values LESS THAN 255 (and obviously greater than zero). RGB plane of the TGA image doesn't do anything, keep it as #FFFFFF white. - Due to how the compiler is coded, actual glyph must have alpha value of 255, the tags must have alpha values LESS THAN 255 (and obviously greater than zero). RGB plane of the TGA image doesn't do anything, keep it as #FFFFFF white.
## Implementation of the Korean writing system ## Implementation of the Korean writing system

View File

@@ -5,9 +5,9 @@
<content url="file://$MODULE_DIR$"> <content url="file://$MODULE_DIR$">
<sourceFolder url="file://$MODULE_DIR$/src" isTestSource="false" /> <sourceFolder url="file://$MODULE_DIR$/src" isTestSource="false" />
</content> </content>
<orderEntry type="jdk" jdkName="1.8.0_242" jdkType="JavaSDK" /> <orderEntry type="inheritedJdk" />
<orderEntry type="sourceFolder" forTests="false" /> <orderEntry type="sourceFolder" forTests="false" />
<orderEntry type="module" module-name="BuildJAR_TerrarumSansBitmap" /> <orderEntry type="module" module-name="BuildJAR_TerrarumSansBitmap" scope="PROVIDED" />
<orderEntry type="library" name="KotlinJavaRuntime" level="project" /> <orderEntry type="library" name="KotlinJavaRuntime" level="project" />
<orderEntry type="library" name="lib" level="project" /> <orderEntry type="library" name="lib" level="project" />
</component> </component>

View File

@@ -45,7 +45,7 @@ class FontTestGDX : Game() {
private lateinit var testtex: TextureRegion private lateinit var testtex: TextureRegion
override fun create() { override fun create() {
font = TerrarumSansBitmap("./assets", debug = true, flipY = false, errorOnUnknownChar = false, shadowAlpha = 0.5f) // must test for two flipY cases font = TerrarumSansBitmap(debug = true, flipY = false, errorOnUnknownChar = false, shadowAlpha = 0.5f) // must test for two flipY cases
// font.scale = 2 // font.scale = 2
// font.interchar = 1 // font.interchar = 1

View File

@@ -39,7 +39,6 @@ class TypewriterGDX(val width: Int, val height: Int, val cols: Int, val hmargin:
override fun create() { override fun create() {
font = TerrarumTypewriterBitmap( font = TerrarumTypewriterBitmap(
"./assets/typewriter",
StringReader( StringReader(
"""ko_kr_3set-390_typewriter,typewriter_ko_3set-390.tga,16 """ko_kr_3set-390_typewriter,typewriter_ko_3set-390.tga,16
|en_intl_qwerty_typewriter,typewriter_intl_qwerty.tga,0 |en_intl_qwerty_typewriter,typewriter_intl_qwerty.tga,0
@@ -61,17 +60,17 @@ class TypewriterGDX(val width: Int, val height: Int, val cols: Int, val hmargin:
inputStrober = InputStrober(this) inputStrober = InputStrober(this)
try { try {
sndMovingkey = Gdx.audio.newSound(Gdx.files.internal("assets/typewriter/audio/movingkey.wav")) sndMovingkey = Gdx.audio.newSound(Gdx.files.classpath("assets/typewriter/audio/movingkey.wav"))
sndDeadkey = Gdx.audio.newSound(Gdx.files.internal("assets/typewriter/audio/deadkey.wav")) sndDeadkey = Gdx.audio.newSound(Gdx.files.classpath("assets/typewriter/audio/deadkey.wav"))
sndShiftin = Gdx.audio.newSound(Gdx.files.internal("assets/typewriter/audio/shiftin.wav")) sndShiftin = Gdx.audio.newSound(Gdx.files.classpath("assets/typewriter/audio/shiftin.wav"))
sndShiftout = Gdx.audio.newSound(Gdx.files.internal("assets/typewriter/audio/shiftout.wav")) sndShiftout = Gdx.audio.newSound(Gdx.files.classpath("assets/typewriter/audio/shiftout.wav"))
sndSpace = Gdx.audio.newSound(Gdx.files.internal("assets/typewriter/audio/space.wav")) sndSpace = Gdx.audio.newSound(Gdx.files.classpath("assets/typewriter/audio/space.wav"))
sndCRs = Array(6) { sndCRs = Array(6) {
Gdx.audio.newSound(Gdx.files.internal("assets/typewriter/audio/cr$it.wav")) Gdx.audio.newSound(Gdx.files.classpath("assets/typewriter/audio/cr$it.wav"))
} }
sndLF = Gdx.audio.newSound(Gdx.files.internal("assets/typewriter/audio/crlf.wav")) sndLF = Gdx.audio.newSound(Gdx.files.classpath("assets/typewriter/audio/crlf.wav"))
} }
catch (e: GdxRuntimeException) { catch (e: GdxRuntimeException) {
e.printStackTrace() e.printStackTrace()

View File

@@ -1,19 +1,54 @@
Copyright (c) 2017-2024 CuriousTorvald (minjaesong) Copyright (c) 2017-2026 CuriousTorvald (curioustorvald.com), with Reserved Font Name TERRARUM.
Permission is hereby granted, free of charge, to any person obtaining a copy Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
of this software and associated documentation files (the "Software"), to deal copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER This Font Software is licensed under the SIL Open Font License, Version 1.1.
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, This license is copied below, and is also available with a FAQ at:
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE https://openfontlicense.org
SOFTWARE.
-———————————————————————
SIL OPEN FONT LICENSE Version 1.1 - 26 February 2007
-———————————————————————
PREAMBLE
The goals of the Open Font License (OFL) are to stimulate worldwide development of collaborative font projects, to support the font creation efforts of academic and linguistic communities, and to provide a free and open framework in which fonts may be shared and improved in partnership with others.
The OFL allows the licensed fonts to be used, studied, modified and redistributed freely as long as they are not sold by themselves. The fonts, including any derivative works, can be bundled, embedded, redistributed and/or sold with any software provided that any reserved names are not used by derivative works. The fonts and derivatives, however, cannot be released under any other type of license. The requirement for fonts to remain under this license does not apply to any document created using the fonts or their derivatives.
DEFINITIONS
“Font Software” refers to the set of files released by the Copyright Holder(s) under this license and clearly marked as such. This may include source files, build scripts and documentation.
“Reserved Font Name” refers to any names specified as such after the copyright statement(s).
“Original Version” refers to the collection of Font Software components as distributed by the Copyright Holder(s).
“Modified Version” refers to any derivative made by adding to, deleting, or substituting in part or in whole any of the components of the Original Version, by changing formats or by porting the Font Software to a new environment.
“Author” refers to any designer, engineer, programmer, technical writer or other person who contributed to the Font Software.
PERMISSION & CONDITIONS
Permission is hereby granted, free of charge, to any person obtaining a copy of the Font Software, to use, study, copy, merge, embed, modify, redistribute, and sell modified and unmodified copies of the Font Software, subject to the following conditions:
Neither the Font Software nor any of its individual components, in Original or Modified Versions, may be sold by itself.
Original or Modified Versions of the Font Software may be bundled, redistributed and/or sold with any software, provided that each copy contains the above copyright notice and this license. These can be included either as stand-alone text files, human-readable headers or in the appropriate machine-readable metadata fields within text or binary files as long as those fields can be easily viewed by the user.
No Modified Version of the Font Software may use the Reserved Font Name(s) unless explicit written permission is granted by the corresponding Copyright Holder. This restriction only applies to the primary font name as presented to the users.
The name(s) of the Copyright Holder(s) or the Author(s) of the Font Software shall not be used to promote, endorse or advertise any Modified Version, except to acknowledge the contribution(s) of the Copyright Holder(s) and the Author(s) or with their explicit written permission.
The Font Software, modified or unmodified, in part or in whole, must be distributed entirely under this license, and must not be distributed under any other license. The requirement for fonts to remain under this license does not apply to any document created using the Font Software.
TERMINATION
This license becomes null and void if any of the above conditions are not met.
DISCLAIMER
THE FONT SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, INCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM OTHER DEALINGS IN THE FONT SOFTWARE.

328
OTFbuild/CLAUDE.md Normal file
View File

@@ -0,0 +1,328 @@
# OTFbuild
Python toolchain that builds an OpenType (CFF) and Web Open Font (WOFF2) font from the TGA sprite sheets used by the bitmap font engine.
## Building
```bash
# builds both OTF and WOFF2
make all
```
## Debugging with HarfBuzz
Install `uharfbuzz` for shaping tests:
```bash
pip install uharfbuzz
```
Shape text and inspect glyph substitutions, advances, and positioning:
```python
import uharfbuzz as hb
from fontTools.ttLib import TTFont
with open('OTFbuild/TerrarumSansBitmap.otf', 'rb') as f:
font_data = f.read()
blob = hb.Blob(font_data)
face = hb.Face(blob)
font = hb.Font(face)
text = "ऐतिहासिक"
buf = hb.Buffer()
buf.add_str(text)
buf.guess_segment_properties()
hb.shape(font, buf)
ttfont = TTFont('OTFbuild/TerrarumSansBitmap.otf')
glyph_order = ttfont.getGlyphOrder()
for info, pos in zip(buf.glyph_infos, buf.glyph_positions):
name = glyph_order[info.codepoint]
print(f" {name} advance=({pos.x_advance},{pos.y_advance}) cluster={info.cluster}")
```
Key things to check:
- **advance=(0,0)** on a visible character means the glyph is zero-width (likely missing outline or failed GSUB substitution)
- **glyph name starts with `uF0`** means GSUB substituted to an internal PUA form (expected for Devanagari consonants, Hangul jamo variants, etc.)
- **cluster** groups glyphs that originated from the same input character(s)
### Inspecting GSUB tables
```python
from fontTools.ttLib import TTFont
font = TTFont('OTFbuild/TerrarumSansBitmap.otf')
gsub = font['GSUB']
# List scripts and their features
for sr in gsub.table.ScriptList.ScriptRecord:
tag = sr.ScriptTag
if sr.Script.DefaultLangSys:
for idx in sr.Script.DefaultLangSys.FeatureIndex:
fr = gsub.table.FeatureList.FeatureRecord[idx]
print(f" {tag}/{fr.FeatureTag}: lookups={fr.Feature.LookupListIndex}")
# Inspect a specific lookup's substitution mappings
lookup = gsub.table.LookupList.Lookup[18] # e.g. DevaConsonantMap
for st in lookup.SubTable:
for src, dst in st.mapping.items():
print(f" {src} -> {dst}")
```
### Checking glyph outlines and metrics
```python
font = TTFont('OTFbuild/TerrarumSansBitmap.otf')
hmtx = font['hmtx']
cff = font['CFF ']
name = 'uni0915' # Devanagari KA
w, lsb = hmtx[name]
cs = cff.cff.topDictIndex[0].CharStrings[name]
cs.decompile()
has_outlines = len(cs.program) > 2 # more than just width + endchar
print(f"{name}: advance={w}, has_outlines={has_outlines}")
```
## Architecture
### Build pipeline (`font_builder.py`)
1. **Parse sheets**`glyph_parser.py` reads each TGA sprite sheet, extracts per-glyph bitmaps and tag-column metadata (width, alignment, diacritics anchors, kerning data, directives)
2. **Compose Hangul**`hangul.py` assembles 11,172 precomposed Hangul syllables from jamo components and stores jamo variants in PUA for GSUB
3. **Populate Devanagari** — consonants U+0915-0939 have width=0 in the sprite sheet (the Kotlin engine normalises them to PUA forms); the builder copies PUA glyph data back to the Unicode positions so they render without GSUB
4. **Expand replacewith** — glyphs with the `replacewith` directive (opcode 0x80-0x87) are collected for GSUB multiple substitution (e.g. U+0910 -> U+090F U+0947)
5. **Build glyph order and cmap** — PUA internal forms (0xF0000-0xF0FFF) get glyphs but no cmap entries
6. **Trace bitmaps**`bitmap_tracer.py` converts 1-bit bitmaps to CFF rectangle contours (50 units/pixel)
7. **Set metrics** — hmtx, hhea, OS/2, head, name, post tables
8. **OpenType features**`opentype_features.py` generates feaLib code, compiled via `fontTools.feaLib`
9. **Bitmap strike** — optional EBDT/EBLC at 20ppem via TTX import
### Module overview
| Module | Purpose |
|---|---|
| `build_font.py` | CLI entry point |
| `font_builder.py` | Orchestrates the build pipeline |
| `sheet_config.py` | Sheet indices, code ranges, index functions, metric constants, Hangul/Devanagari/Tamil/Sundanese constants |
| `glyph_parser.py` | TGA sprite sheet parsing; extracts bitmaps and tag-column properties |
| `tga_reader.py` | Low-level TGA image reader |
| `bitmap_tracer.py` | Converts 1-bit bitmaps to CFF outlines (rectangle merging) |
| `opentype_features.py` | Generates GSUB/GPOS feature code for feaLib |
| `keming_machine.py` | Generates kerning pairs from glyph kern masks |
| `hangul.py` | Hangul syllable composition and jamo GSUB data |
| `otf2woff2.py` | OTF to WOFF2 wrapper |
### OpenType features generated (`opentype_features.py`)
- **ccmp** — replacewith expansions (DFLT); consonant-to-PUA mapping + vowel decompositions + anusvara upper (dev2/deva); vowel decompositions (tml2)
- **kern** — pair positioning from `keming_machine.py`
- **liga** — Latin ligatures (ff, fi, fl, ffi, ffl, st) and Armenian ligatures
- **locl** — Bulgarian/Serbian Cyrillic alternates; Devanagari consonant-to-PUA mapping + vowel decompositions + anusvara upper (dev2/deva, duplicated from ccmp for DirectWrite compatibility)
- **nukt, akhn, half, blwf, cjct, pres, blws, rphf, abvs, psts, calt** — Devanagari complex script shaping (all under both `script dev2` and `script deva`)
- **pres** (tml2) — Tamil consonant+vowel ligatures
- **pres** (sund) — Sundanese diacritic combinations
- **ljmo, vjmo, tjmo** — Hangul jamo positional variants
- **mark** — GPOS mark-to-base diacritics positioning
- **mkmk** — GPOS mark-to-mark diacritics stacking (successive marks shift by H_DIACRITICS)
### Devanagari PUA mapping
The bitmap font engine normalises Devanagari consonants to internal PUA forms before rendering. The OTF builder mirrors this:
| Unicode range | PUA range | Purpose |
|---|---|---|
| U+0915-0939 | 0xF0140-0xF0164 | Base consonants |
| U+0915-0939 +48 | 0xF0170-0xF0194 | Nukta forms (consonant + U+093C) |
| U+0915-0939 +240 | 0xF0230-0xF0254 | Half forms (consonant + virama) |
| U+0915-0939 +480 | 0xF0320-0xF0404 | RA-appended forms (consonant + virama + RA) |
| U+0915-0939 +720 | 0xF0410-0xF04F4 | RA-appended half forms (consonant + virama + RA + virama) |
Mapping formula: `to_deva_internal(c)` = `c - 0x0915 + 0xF0140` for U+0915-0939.
### Script tag gotcha
When a script-specific feature exists in GSUB (e.g. `ccmp` under `dev2`), HarfBuzz uses **only** the script-specific lookups and does **not** fall back to the DFLT script's lookups for that feature. Any substitutions needed for a specific script must be registered under that script's tag.
### languagesystem and language records
The `languagesystem` declarations in the preamble control which script/language records are created in the font tables. Key rules:
- `languagesystem` declarations must be at the **top level** of the feature file, not inside any `feature` block. Putting them inside `feature aalt { }` is invalid feaLib syntax and causes silent compilation failure.
- When a language-specific record exists (e.g. `dev2/MAR` from `languagesystem dev2 MAR;`), features registered under `script dev2;` only populate `dev2/dflt` — they are **not** automatically copied to `dev2/MAR`. The language record inherits only from DFLT, resulting in incomplete feature sets.
- Only declare language-specific records when you have `locl` or other language-differentiated features. Otherwise, use only `languagesystem <script> dflt;` to avoid partial feature inheritance that breaks DirectWrite and CoreText.
### Inspecting feature registration per script
To verify that features are correctly registered under each script:
```python
from fontTools.ttLib import TTFont
font = TTFont('OTFbuild/TerrarumSansBitmap.otf')
gsub = font['GSUB']
for sr in gsub.table.ScriptList.ScriptRecord:
tag = sr.ScriptTag
if sr.Script.DefaultLangSys:
feats = []
for idx in sr.Script.DefaultLangSys.FeatureIndex:
fr = gsub.table.FeatureList.FeatureRecord[idx]
feats.append(fr.FeatureTag)
print(f"{tag}/dflt: {' '.join(sorted(set(feats)))}")
for lsr in (sr.Script.LangSysRecord or []):
feats = []
for idx in lsr.LangSys.FeatureIndex:
fr = gsub.table.FeatureList.FeatureRecord[idx]
feats.append(fr.FeatureTag)
print(f"{tag}/{lsr.LangSysTag}: {' '.join(sorted(set(feats)))}")
```
Expected output for dev2: `dev2/dflt: abvs akhn blwf blws calt ccmp cjct half liga locl nukt pres psts rphf`. If language-specific records (e.g. `dev2/MAR`) appear with only `ccmp liga`, the language records have incomplete feature inheritance — remove the corresponding `languagesystem` declaration.
### Debugging feature compilation failures
The build writes `debugout_features.fea` with the raw feature code before compilation. When compilation fails, inspect this file to find syntax errors. Common issues:
- **`languagesystem` inside a feature block** — must be at the top level
- **Named lookup defined inside a feature block** — applies unconditionally to all input. Define the lookup outside the feature block and reference it via contextual rules inside.
- **Glyph not in font** — a substitution references a glyph name that doesn't exist in the font's glyph order (e.g. a control character was removed)
### HarfBuzz Indic shaper (dev2) feature order
Understanding feature application order is critical for Devanagari debugging:
1. **Pre-reordering** (Unicode order): `ccmp`
2. **Reordering**: HarfBuzz reorders pre-base matras (e.g. I-matra U+093F moves before the consonant)
3. **Post-reordering**: `nukt``akhn``rphf``half``blwf``cjct``pres``abvs``blws``psts``haln``calt`
4. **GPOS**: `kern``mark`/`abvm``mkmk`
Implication: GSUB rules that need to match pre-base matras adjacent to post-base marks (e.g. anusvara substitution triggered by I-matra) must go in `ccmp`, not `psts`, because reordering separates them.
### Cross-platform shaper differences (DirectWrite, CoreText, HarfBuzz)
The three major shapers behave differently for Devanagari. The font registers all Devanagari features under **both** `dev2` (new Indic) and `deva` (old Indic) script tags. HarfBuzz and DirectWrite use `dev2`; CoreText uses `deva`.
#### Script tag selection
| Shaper | Script tag used | Indic model |
|---|---|---|
| HarfBuzz | `dev2` | New Indic (ot-indic2) |
| DirectWrite | `dev2` | New Indic |
| CoreText | `deva` | Old Indic |
Both tags must exist, and all GSUB/GPOS features must be registered under both, otherwise CoreText silently breaks.
#### Feature order differences
**HarfBuzz (dev2, reference implementation)**:
1. Pre-reordering: `locl``ccmp`
2. Reordering (I-matra moves before consonant, reph moves to end)
3. Post-reordering: `nukt``akhn``rphf``half``blwf``cjct``pres``abvs``blws``psts``haln``calt`
4. GPOS: `kern``abvm``blwm`
**DirectWrite (dev2)**:
- `locl``nukt``akhn``rphf``rkrf``blwf``half``vatu``cjct``pres``abvs``blws``psts``haln``calt`
- GPOS: `kern``dist``abvm``blwm`
- **Does NOT apply `ccmp`** for the dev2 script. All lookups that must run before `nukt` (e.g. consonant-to-PUA mapping, anusvara upper) must be registered under `locl` instead.
**CoreText (deva)**:
- Applies `locl` and `ccmp`, but may apply `ccmp` **after** reordering (unlike HarfBuzz).
- Post-reordering features same as above: `nukt``akhn``rphf` → ... → `abvs` → ... → `psts`
- GPOS: `kern``abvm` (+ `mark`/`mkmk` if registered under `deva`)
#### Key behavioural differences
**1. ccmp timing (CoreText vs HarfBuzz)**
HarfBuzz applies `ccmp` in Unicode order (before reordering). CoreText may apply it after reordering. This breaks adjacency-based rules:
```
# In ccmp — works on HarfBuzz (Unicode order: C + matra + anusvara):
sub uni093F uni0902' lookup AnusvaraUpper; # I-matra + anusvara
# After reordering on CoreText: I-matra + [consonants] + anusvara
# The I-matra and anusvara are no longer adjacent → rule fails
```
**Fix**: duplicate these rules in `abvs` (post-reordering) with wildcard gaps:
```
sub uni093F @devaAny uni0902' lookup AnusvaraUpper;
sub uni093F @devaAny @devaAny uni0902' lookup AnusvaraUpper;
```
**2. Reph eligibility testing**
| Shaper | Method |
|---|---|
| HarfBuzz | Pattern-based (RA + halant + consonant at syllable start) |
| DirectWrite | `would_substitute([RA, virama], rphf)` with **Unicode** codepoints |
| CoreText | `would_substitute()` with Unicode codepoints (same as DW) |
The `rphf` feature must include a rule with the Unicode form of RA (`uni0930`), not just the PUA form. Otherwise DW and CT won't detect reph.
**3. Within-lookup glyph visibility (CoreText)**
In OpenType, a single lookup processes the glyph string left-to-right. Per spec, a substitution at position N should be visible when the lookup reaches position N+1. CoreText appears to **not** propagate substitutions within a single lookup pass to subsequent positions' backtrack context.
Example: two rules in one anonymous lookup:
```
sub @trigger uF010C' lookup ComplexReph; # rule at pos N: uF010C → uF010D
sub uF010D uF016C' lookup AnusvaraLower; # rule at pos N+1: needs uF010D in backtrack
```
On HarfBuzz/DirectWrite, rule 2 sees the updated `uF010D` at position N. On CoreText, it still sees the original `uF010C` → rule 2 fails to match.
**Fix**: split into separate **named lookups** so each runs as an independent pass:
```
lookup AbvsPass1 {
sub @trigger uF010C' lookup ComplexReph;
} AbvsPass1;
lookup AbvsPass2 {
sub uF010D uF016C' lookup AnusvaraLower;
} AbvsPass2;
feature abvs {
script dev2; lookup AbvsPass1; lookup AbvsPass2;
script deva; lookup AbvsPass1; lookup AbvsPass2;
} abvs;
```
**4. GPOS mark stacking heuristics**
When two marks share the same base without MarkToMark, each shaper applies different internal Y adjustments:
| Shaper | Internal Y shift |
|---|---|
| HarfBuzz | 0 (no heuristic) |
| DirectWrite | -100 |
| CoreText | -200 |
No single GPOS Y value satisfies all three. **Fix**: use explicit MarkToMark positioning (e.g. `AnusvaraToComplexReph`) which suppresses shaper heuristics and gives consistent results across all three.
**5. GPOS double-application with dev2+deva**
When both script tags exist, CoreText/DirectWrite may merge lookup lists from both scripts. Inline (anonymous) GPOS rules create separate lookups per script → cumulative positioning doubles. **Fix**: use **named lookups** for all GPOS contextual positioning so both scripts reference the same lookup index.
**6. mark/mkmk feature scoping**
The `mark` and `mkmk` features are registered under `deva` (for CoreText) but **not** `dev2`. Under `dev2`, all mark positioning goes through `abvm` instead. This prevents double-application on HarfBuzz/DirectWrite where `abvm` already contains the same mark/mkmk lookups.
```
# GPOS features per script:
# dev2/dflt: abvm kern
# deva/dflt: abvm kern mark mkmk
```
#### Practical rules
1. **Standalone lookups**: define all substitution/positioning lookups (e.g. `DevaConsonantMap`, `DevaVowelDecomp`, `ComplexReph`) **outside** any feature block, then reference from both `locl`/`ccmp` and script-specific features.
2. **locl mirrors ccmp** for Devanagari: DirectWrite skips `ccmp`, so anything that must run early (consonant mapping, anusvara upper, vowel decomposition) must also be in `locl`.
3. **abvs post-reordering fallbacks**: rules that depend on matra+anusvara adjacency (broken by reordering on CoreText) need wildcard-gap variants in `abvs`.
4. **Separate lookup passes**: if rule B's backtrack context depends on rule A's output at an adjacent position, put them in separate named lookups. CoreText may not propagate within-pass substitutions.
5. **Named GPOS lookups**: all contextual GPOS rules must use named lookups to avoid double-application across dev2/deva.
6. **MarkToMark for multi-mark stacking**: never rely on shaper heuristics for positioning multiple marks on the same base — always provide explicit MarkToMark.
Source: [Microsoft Devanagari shaping spec](https://learn.microsoft.com/en-us/typography/script-development/devanagari)

19
OTFbuild/Makefile Normal file
View File

@@ -0,0 +1,19 @@
PYTHON ?= python3
ASSETS ?= ../src/assets
OTF = TerrarumSansBitmap.otf
WOFF2 = TerrarumSansBitmap.woff2
all: $(OTF) $(WOFF2)
$(OTF): $(wildcard $(ASSETS)/*.tga) build_font.py font_builder.py glyph_parser.py \
bitmap_tracer.py tga_reader.py keming_machine.py hangul.py sheet_config.py \
opentype_features.py
$(PYTHON) build_font.py $(ASSETS) -o $@
$(WOFF2): $(OTF) otf2woff2.py
$(PYTHON) otf2woff2.py $< $@
clean:
rm -f $(OTF) $(WOFF2)
.PHONY: all clean

152
OTFbuild/bitmap_tracer.py Normal file
View File

@@ -0,0 +1,152 @@
"""
Convert 1-bit bitmap arrays to CFF outlines by tracing connected pixel blobs.
Each connected component of filled pixels becomes a single closed contour
(plus additional contours for any holes). Adjacent collinear edges are
merged, minimising vertex count.
Scale: x = col * SCALE, y = (BASELINE_ROW - row) * SCALE
where BASELINE_ROW = 16 (ascent in pixels).
"""
from typing import List, Tuple
import sheet_config as SC
SCALE = SC.SCALE
BASELINE_ROW = 16 # pixels from top to baseline
def _turn_priority(in_dx, in_dy, out_dx, out_dy):
"""
Return priority for outgoing direction relative to incoming.
Lower = preferred (rightmost turn in y-down grid coordinates).
This produces outer contours that are CW in font coordinates (y-up)
and hole contours that are CCW, matching the non-zero winding rule.
"""
# Normalise to unit directions
nidx = (1 if in_dx > 0 else -1) if in_dx else 0
nidy = (1 if in_dy > 0 else -1) if in_dy else 0
ndx = (1 if out_dx > 0 else -1) if out_dx else 0
ndy = (1 if out_dy > 0 else -1) if out_dy else 0
# Right turn in y-down coords: (-in_dy, in_dx)
if (ndx, ndy) == (-nidy, nidx):
return 0
# Straight
if (ndx, ndy) == (nidx, nidy):
return 1
# Left turn: (in_dy, -in_dx)
if (ndx, ndy) == (nidy, -nidx):
return 2
# U-turn
return 3
def _simplify(contour):
"""Remove collinear intermediate vertices from a rectilinear contour."""
n = len(contour)
if n < 3:
return contour
result = []
for i in range(n):
p = contour[(i - 1) % n]
c = contour[i]
q = contour[(i + 1) % n]
# Cross product of consecutive edge vectors
if (c[0] - p[0]) * (q[1] - c[1]) - (c[1] - p[1]) * (q[0] - c[0]) != 0:
result.append(c)
return result if len(result) >= 3 else contour
def trace_bitmap(bitmap, glyph_width_px):
"""
Convert a bitmap to polygon contours by tracing connected pixel blobs.
Returns a list of contours, where each contour is a list of (x, y)
tuples in font units. Outer contours are clockwise, hole contours
counter-clockwise (non-zero winding rule).
"""
if not bitmap or not bitmap[0]:
return []
h = len(bitmap)
w = len(bitmap[0])
def filled(r, c):
return 0 <= r < h and 0 <= c < w and bitmap[r][c]
# -- Step 1: collect directed boundary edges --
# Pixel (r, c) occupies grid square (c, r)-(c+1, r+1).
# Edge direction keeps the filled region to the left (in y-down coords).
edge_map = {} # start_vertex -> [end_vertex, ...]
for r in range(h):
for c in range(w):
if not bitmap[r][c]:
continue
if not filled(r - 1, c): # top boundary
edge_map.setdefault((c, r), []).append((c + 1, r))
if not filled(r + 1, c): # bottom boundary
edge_map.setdefault((c + 1, r + 1), []).append((c, r + 1))
if not filled(r, c - 1): # left boundary
edge_map.setdefault((c, r + 1), []).append((c, r))
if not filled(r, c + 1): # right boundary
edge_map.setdefault((c + 1, r), []).append((c + 1, r + 1))
if not edge_map:
return []
# -- Step 2: trace contours using rightmost-turn rule --
used = set()
contours = []
for sv in sorted(edge_map):
for ev in edge_map[sv]:
if (sv, ev) in used:
continue
path = [sv]
prev, curr = sv, ev
used.add((sv, ev))
while curr != sv:
path.append(curr)
idx, idy = curr[0] - prev[0], curr[1] - prev[1]
candidates = [e for e in edge_map.get(curr, [])
if (curr, e) not in used]
if not candidates:
break
best = min(candidates,
key=lambda e: _turn_priority(
idx, idy, e[0] - curr[0], e[1] - curr[1]))
used.add((curr, best))
prev, curr = curr, best
path = _simplify(path)
if len(path) >= 3:
contours.append([
(x * SCALE, (BASELINE_ROW - y) * SCALE)
for x, y in path
])
return contours
def draw_glyph_to_pen(contours, pen, x_offset=0, y_offset=0):
"""
Draw polygon contours to a T2CharStringPen (or compatible pen).
Each contour is a list of (x, y) vertices forming a closed polygon.
x_offset/y_offset shift all contours (used for alignment positioning).
"""
for contour in contours:
x, y = contour[0]
pen.moveTo((x + x_offset, y + y_offset))
for x, y in contour[1:]:
pen.lineTo((x + x_offset, y + y_offset))
pen.closePath()

93
OTFbuild/build_font.py Normal file
View File

@@ -0,0 +1,93 @@
#!/usr/bin/env python3
"""
Terrarum Sans Bitmap OTF Builder v2 — Python + fonttools
Builds a TTF font with both vector-traced outlines (TrueType glyf)
and embedded bitmap strike (EBDT/EBLC) from TGA sprite sheets.
Usage:
python3 OTFbuild/build_font.py src/assets -o OTFbuild/TerrarumSansBitmap.otf
Options:
--no-bitmap Skip EBDT/EBLC bitmap strike
--no-features Skip GSUB/GPOS OpenType features
"""
import argparse
import sys
import os
# Add OTFbuild dir to path for imports
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from font_builder import build_font
def main():
parser = argparse.ArgumentParser(
description="Build Terrarum Sans Bitmap TTF from TGA sprite sheets"
)
parser.add_argument(
"assets_dir",
help="Path to assets directory containing TGA sprite sheets"
)
parser.add_argument(
"-o", "--output",
default="OTFbuild/TerrarumSansBitmap.otf",
help="Output OTF file path (default: OTFbuild/TerrarumSansBitmap.otf)"
)
parser.add_argument(
"--no-bitmap",
action="store_true",
help="Skip EBDT/EBLC bitmap strike"
)
parser.add_argument(
"--no-features",
action="store_true",
help="Skip GSUB/GPOS OpenType features"
)
args = parser.parse_args()
if not os.path.isdir(args.assets_dir):
print(f"Error: assets directory not found: {args.assets_dir}", file=sys.stderr)
sys.exit(1)
# Ensure output directory exists
output_dir = os.path.dirname(args.output)
if output_dir:
os.makedirs(output_dir, exist_ok=True)
print(f"Terrarum Sans Bitmap OTF Builder v2")
print(f" Assets: {args.assets_dir}")
print(f" Output: {args.output}")
print()
build_font(
assets_dir=args.assets_dir,
output_path=args.output,
no_bitmap=args.no_bitmap,
no_features=args.no_features,
)
# Run OpenType Sanitizer to catch issues browsers would reject
try:
import ots
print("\nRunning OpenType Sanitizer...")
result = ots.sanitize(args.output, capture_output=True)
if result.returncode == 0:
print(" OTS: passed")
else:
print(f" OTS: FAILED (exit code {result.returncode})", file=sys.stderr)
if result.stderr:
for line in result.stderr.decode().strip().splitlines():
print(f" {line}", file=sys.stderr)
sys.exit(1)
except ImportError:
print("\nWarning: opentype-sanitizer not installed, skipping OTS validation",
file=sys.stderr)
print(" Install with: pip install opentype-sanitizer", file=sys.stderr)
if __name__ == "__main__":
main()

Binary file not shown.

720
OTFbuild/font_builder.py Normal file
View File

@@ -0,0 +1,720 @@
"""
Orchestrate fonttools TTFont assembly.
1. Parse all sheets -> glyphs dict
2. Compose Hangul -> add to dict
3. Expand replacewith directives
4. Create glyph order and cmap
5. Trace all bitmaps -> CFF charstrings
6. Set hmtx, hhea, OS/2, head, name, post
7. Generate and compile OpenType features via feaLib
8. Add EBDT/EBLC bitmap strike at ppem=20
9. Save OTF
"""
import time
from typing import Dict
from fontTools.fontBuilder import FontBuilder
from fontTools.pens.t2CharStringPen import T2CharStringPen
from fontTools.feaLib.builder import addOpenTypeFeatures
from fontTools.ttLib import TTFont
import io
from glyph_parser import ExtractedGlyph, GlyphProps, parse_all_sheets
from hangul import compose_hangul, get_jamo_gsub_data, HANGUL_PUA_BASE
from bitmap_tracer import trace_bitmap, draw_glyph_to_pen, SCALE, BASELINE_ROW
from keming_machine import generate_kerning_pairs
from opentype_features import generate_features, glyph_name
import sheet_config as SC
FONT_VERSION = "1.16"
# Codepoints that get cmap entries (user-visible)
# PUA forms used internally by GSUB get glyphs but NO cmap entries
def _should_have_cmap(cp):
"""Determine if a codepoint should have a cmap entry."""
# Standard Unicode characters always get cmap entries
if cp < 0xE000:
return True
# Custom sym PUA range
if 0xE000 <= cp <= 0xE0FF:
return True
# Codestyle PUA
if 0xF0520 <= cp <= 0xF057F:
return True
# Hangul syllables
if 0xAC00 <= cp <= 0xD7A3:
return True
# Hangul compat jamo
if 0x3130 <= cp <= 0x318F:
return True
# SMP characters (Enclosed Alphanumeric Supplement, Hentaigana, etc.)
if 0x1F100 <= cp <= 0x1F1FF:
return True
if 0x1B000 <= cp <= 0x1B16F:
return True
# Unicode noncharacters — never map these (U+FFFE, U+FFFF are reserved;
# format 4 cmap uses 0xFFFF as a sentinel, so mapping it causes OTS rejection)
if cp >= 0xFFFE and cp <= 0xFFFF:
return False
# Everything in standard Unicode ranges (up to 0xFFFF plus SMP)
if cp <= 0xFFFF:
return True
# Internal PUA forms — GSUB-only, no cmap
if 0xF0000 <= cp <= 0xF0FFF:
return False
# Internal control characters
if 0xFFE00 <= cp <= 0xFFFFF:
return False
return True
def _expand_replacewith(glyphs):
"""
Find glyphs with 'replacewith' directive and generate GSUB multiple
substitution data. Returns list of (source_cp, [target_cp, ...]).
A replacewith glyph's extInfo contains up to 7 codepoints that the
glyph expands to (e.g. U+01C7 "LJ" → [0x4C, 0x4A]).
"""
replacements = []
for cp, g in glyphs.items():
if g.props.is_pragma("replacewith"):
targets = []
count = g.props.required_ext_info_count()
for i in range(count):
val = g.props.ext_info[i]
if val != 0:
targets.append(val)
if targets:
replacements.append((cp, targets))
return replacements
def build_font(assets_dir, output_path, no_bitmap=False, no_features=False):
"""Build the complete OTF font."""
t0 = time.time()
# Step 1: Parse all sheets
print("Step 1: Parsing glyph sheets...")
glyphs = parse_all_sheets(assets_dir)
print(f" Parsed {len(glyphs)} glyphs from sheets")
# Step 2: Compose Hangul
print("Step 2: Composing Hangul syllables...")
hangul_glyphs = compose_hangul(assets_dir)
glyphs.update(hangul_glyphs)
print(f" Total glyphs after Hangul: {len(glyphs)}")
# Step 2b: Copy PUA consonant glyphs to Unicode positions
# In the bitmap font, consonants U+0915-0939 have width=0 and empty bitmaps
# because the engine normalises them to PUA forms (0xF0140+) before rendering.
# For OTF, we need the Unicode positions to have actual outlines so that
# consonants render even without GSUB shaping.
print("Step 2b: Populating Devanagari consonant glyphs from PUA forms...")
deva_copied = 0
for uni_cp in range(0x0915, 0x093A):
try:
pua_cp = SC.to_deva_internal(uni_cp)
except ValueError:
continue
if pua_cp in glyphs and uni_cp in glyphs:
pua_g = glyphs[pua_cp]
uni_g = glyphs[uni_cp]
if uni_g.props.width == 0 and pua_g.props.width > 0:
uni_g.props.width = pua_g.props.width
uni_g.bitmap = pua_g.bitmap
uni_g.color_bitmap = pua_g.color_bitmap
deva_copied += 1
# Also copy nukta consonant forms U+0958-095F
for uni_cp in range(0x0958, 0x0960):
try:
pua_cp = SC.to_deva_internal(uni_cp)
except ValueError:
continue
if pua_cp in glyphs and uni_cp in glyphs:
pua_g = glyphs[pua_cp]
uni_g = glyphs[uni_cp]
if uni_g.props.width == 0 and pua_g.props.width > 0:
uni_g.props.width = pua_g.props.width
uni_g.bitmap = pua_g.bitmap
uni_g.color_bitmap = pua_g.color_bitmap
deva_copied += 1
print(f" Copied {deva_copied} consonant glyphs from PUA forms")
# Step 3: Expand replacewith directives
print("Step 3: Processing replacewith directives...")
replacewith_subs = _expand_replacewith(glyphs)
print(f" Found {len(replacewith_subs)} replacewith substitutions")
# Step 3b: Compose fallback bitmaps for replacewith glyphs
# Glyphs with replacewith directives have width=0 and no bitmap; they
# rely on GSUB ccmp to expand into their target sequence. Renderers
# without GSUB support would show whitespace. Build a composite
# bitmap by concatenating the target glyphs' bitmaps side by side.
print("Step 3b: Composing fallback bitmaps for replacewith glyphs...")
composed = 0
for src_cp, target_cps in replacewith_subs:
src_g = glyphs.get(src_cp)
if src_g is None or src_g.props.width > 0:
continue # already has content (e.g. Deva consonants fixed above)
# Resolve target glyphs
target_gs = [glyphs.get(t) for t in target_cps]
if not all(target_gs):
continue
# Compute total advance and composite height
total_width = sum(g.props.width for g in target_gs)
if total_width == 0:
continue
bm_height = max((len(g.bitmap) for g in target_gs if g.bitmap), default=SC.H)
# Build composite bitmap
composite = [[0] * total_width for _ in range(bm_height)]
x = 0
for tg in target_gs:
if not tg.bitmap:
x += tg.props.width
continue
cols = min(tg.props.width, len(tg.bitmap[0])) if tg.props.width > 0 else len(tg.bitmap[0])
nudge = tg.props.nudge_x
for row in range(min(len(tg.bitmap), bm_height)):
for col in range(cols):
dst_col = x + col - nudge
if 0 <= dst_col < total_width and tg.bitmap[row][col]:
composite[row][dst_col] = 1
if tg.props.width > 0:
x += tg.props.width
# Zero-width targets (combining marks) overlay at current position
src_g.props.width = total_width
src_g.bitmap = composite
composed += 1
print(f" Composed {composed} fallback bitmaps")
# Step 3c: Identify combining marks for zero advance width
# Glyphs with write_on_top >= 0 are combining marks positioned via
# GPOS mark-to-base. In OpenType they must have zero advance width;
# otherwise the cursor advances past the base and diacritics appear
# shifted to the right. We record them here but keep props.width
# intact so the mark anchor calculation can use the original width.
mark_cps = set()
for cp, g in glyphs.items():
if g.props.write_on_top >= 0 and g.props.width > 0:
mark_cps.add(cp)
if mark_cps:
print(f"Step 3c: Found {len(mark_cps)} combining marks to zero in hmtx")
# Step 4: Create glyph order and cmap
print("Step 4: Building glyph order and cmap...")
glyph_order = [".notdef"]
cmap = {}
glyph_set = set()
# Sort codepoints for deterministic output
sorted_cps = sorted(glyphs.keys())
for cp in sorted_cps:
g = glyphs[cp]
if g.props.is_illegal:
continue
# Skip C0/C1 control characters and DEL — some platforms render
# their traced bitmaps, which is undesirable.
if cp <= 0x001F or cp == 0x007F or 0x0080 <= cp <= 0x009F:
continue
name = glyph_name(cp)
if name == ".notdef":
continue
if name in glyph_set:
continue
glyph_order.append(name)
glyph_set.add(name)
if _should_have_cmap(cp):
cmap[cp] = name
print(f" Glyph order: {len(glyph_order)} glyphs, cmap: {len(cmap)} entries")
# Step 4a: Detect coloured glyphs and prepare COLR layer data
print("Step 4a: Detecting coloured glyphs...")
colr_layer_data = {} # base_name -> list of (layer_name, colour_rgb)
palette_colours = {} # (r, g, b) -> palette_index
layer_bitmaps = {} # layer_name -> 1-bit bitmap
layer_insert = [] # (after_name, [layer_names]) for glyph_order insertion
for cp in sorted_cps:
g = glyphs[cp]
if g.props.is_illegal or g.color_bitmap is None:
continue
name = glyph_name(cp)
if name == ".notdef" or name not in glyph_set:
continue
# Group pixels by RGB value -> per-colour 1-bit masks
colour_pixels = {} # (r, g, b) -> set of (row, col)
cbm = g.color_bitmap
for row in range(len(cbm)):
for col in range(len(cbm[row])):
px = cbm[row][col]
a = px & 0xFF
if a == 0:
continue
r = (px >> 24) & 0xFF
g_ch = (px >> 16) & 0xFF
b = (px >> 8) & 0xFF
rgb = (r, g_ch, b)
if rgb not in colour_pixels:
colour_pixels[rgb] = set()
colour_pixels[rgb].add((row, col))
if not colour_pixels:
continue
if len(colour_pixels) == 1 and (255, 255, 255) in colour_pixels:
# Only white pixels — no colour layers needed
continue
# Assign palette indices for each unique colour
for rgb in colour_pixels:
if rgb not in palette_colours:
palette_colours[rgb] = len(palette_colours)
# Generate layer glyphs
h = len(cbm)
w = len(cbm[0]) if h > 0 else 0
layers = []
layer_names = []
for i, (rgb, positions) in enumerate(sorted(colour_pixels.items())):
layer_name = f"{name}.clr{i}"
# Build 1-bit mask for this colour
mask = [[0] * w for _ in range(h)]
for (row, col) in positions:
mask[row][col] = 1
layer_bitmaps[layer_name] = mask
layers.append((layer_name, rgb))
layer_names.append(layer_name)
colr_layer_data[name] = layers
layer_insert.append((name, layer_names))
# Insert layer glyph names into glyph_order immediately after their base glyph
for base_name, lnames in layer_insert:
idx = glyph_order.index(base_name)
for j, ln in enumerate(lnames):
glyph_order.insert(idx + 1 + j, ln)
glyph_set.add(ln)
if colr_layer_data:
print(f" Found {len(colr_layer_data)} coloured glyphs, {len(palette_colours)} palette colours, {sum(len(v) for v in colr_layer_data.values())} layer glyphs")
else:
print(" No coloured glyphs found")
# Step 5: Build font with fonttools (CFF/OTF)
print("Step 5: Building font tables...")
fb = FontBuilder(SC.UNITS_PER_EM, isTTF=False)
fb.setupGlyphOrder(glyph_order)
fb.setupCharacterMap(cmap)
# Step 6: Trace bitmaps -> CFF charstrings
print("Step 6: Tracing bitmaps to CFF outlines...")
charstrings = {}
# .notdef glyph (empty box)
pen = T2CharStringPen(SC.UNITS_PER_EM // 2, None)
pen.moveTo((0, 0))
pen.lineTo((0, SC.ASCENT))
pen.lineTo((SC.UNITS_PER_EM // 2, SC.ASCENT))
pen.lineTo((SC.UNITS_PER_EM // 2, 0))
pen.closePath()
_m = 2 * SCALE
pen.moveTo((_m, _m))
pen.lineTo((SC.UNITS_PER_EM // 2 - _m, _m))
pen.lineTo((SC.UNITS_PER_EM // 2 - _m, SC.ASCENT - _m))
pen.lineTo((_m, SC.ASCENT - _m))
pen.closePath()
charstrings[".notdef"] = pen.getCharString()
_unihan_cps = set(SC.CODE_RANGE[SC.SHEET_UNIHAN])
_base_offsets = {} # glyph_name -> (x_offset, y_offset) for COLR layers
traced_count = 0
for cp in sorted_cps:
g = glyphs[cp]
if g.props.is_illegal:
continue
name = glyph_name(cp)
if name == ".notdef" or name not in glyph_set:
continue
advance = 0 if cp in mark_cps else g.props.width * SCALE
# Compute alignment offset (lsb shift).
# The Kotlin code draws the full cell at an offset position:
# ALIGN_LEFT: offset = 0
# ALIGN_RIGHT: offset = width - W_VAR_INIT (negative)
# ALIGN_CENTRE: offset = ceil((width - W_VAR_INIT) / 2) (negative)
# ALIGN_BEFORE: offset = 0
# The bitmap cell width depends on the sheet type.
# nudge_x shifts the glyph left by that many pixels in the
# bitmap engine. The Kotlin engine always applies nudge_x to
# the drawing position (posXbuffer = -nudgeX + ...) and the
# next glyph compensates via extraWidth, so the effective
# origin-to-origin advance stays at `width`. We must bake
# the same leftward shift into the contour x_offset.
import math
# The Kotlin engine always uses W_VAR_INIT for alignment calculations,
# even for EXTRAWIDE sheets. Use W_VAR_INIT here to match.
bm_cols = SC.W_VAR_INIT
if g.props.align_where == SC.ALIGN_RIGHT:
x_offset = (g.props.width - bm_cols) * SCALE
elif g.props.align_where == SC.ALIGN_CENTRE:
x_offset = math.ceil((g.props.width - bm_cols) / 2) * SCALE
else:
x_offset = 0
x_offset -= g.props.nudge_x * SCALE
# For marks (write_on_top >= 0), positive nudge_y means shift UP
# in the bitmap engine (opposite to non-marks where positive = down).
if g.props.write_on_top >= 0:
y_offset = g.props.nudge_y * SCALE
else:
y_offset = -g.props.nudge_y * SCALE
# Unihan glyphs are 16px tall in a 20px cell; the bitmap engine
# centres them vertically with offsetUnihan = (H - H_UNIHAN) / 2.
if cp in _unihan_cps:
y_offset -= ((SC.H - SC.H_UNIHAN) // 2) * SCALE
# Hangul jungseong/jongseong PUA variants (rows 15-18) have zero
# advance and overlay the preceding choseong. Shift their outlines
# left by one syllable cell width so they render at the same position.
if cp >= HANGUL_PUA_BASE:
_pua_row = (cp - HANGUL_PUA_BASE) // 256
if 15 <= _pua_row <= 18:
x_offset -= SC.W_HANGUL_BASE * SCALE
# Store offsets for COLR layer glyphs
if name in colr_layer_data:
_base_offsets[name] = (x_offset, y_offset)
contours = trace_bitmap(g.bitmap, g.props.width)
pen = T2CharStringPen(advance, None)
if contours:
draw_glyph_to_pen(contours, pen, x_offset=x_offset, y_offset=y_offset)
traced_count += 1
charstrings[name] = pen.getCharString()
# Trace COLR layer glyphs
layer_traced = 0
for base_name, layers in colr_layer_data.items():
base_xoff, base_yoff = _base_offsets.get(base_name, (0, 0))
for layer_name, _rgb in layers:
lbm = layer_bitmaps[layer_name]
# Find the effective glyph width from the base glyph's bitmap
lw = len(lbm[0]) if lbm and lbm[0] else 0
contours = trace_bitmap(lbm, lw)
pen = T2CharStringPen(0, None) # advance width 0 for layers
if contours:
draw_glyph_to_pen(contours, pen, x_offset=base_xoff, y_offset=base_yoff)
layer_traced += 1
charstrings[layer_name] = pen.getCharString()
print(f" Traced {traced_count} glyphs with outlines" + (f" + {layer_traced} colour layers" if layer_traced else ""))
fb.setupCFF(
psName="TerrarumSansBitmap-Regular",
fontInfo={},
charStringsDict=charstrings,
privateDict={},
)
# Step 7: Set metrics
print("Step 7: Setting font metrics...")
metrics = {}
metrics[".notdef"] = (SC.UNITS_PER_EM // 2, 0)
for cp in sorted_cps:
g = glyphs[cp]
if g.props.is_illegal:
continue
name = glyph_name(cp)
if name == ".notdef" or name not in glyph_set:
continue
advance = 0 if cp in mark_cps else g.props.width * SCALE
metrics[name] = (advance, 0)
# Add zero-advance metrics for COLR layer glyphs
for _base_name, layers in colr_layer_data.items():
for layer_name, _rgb in layers:
metrics[layer_name] = (0, 0)
fb.setupHorizontalMetrics(metrics)
fb.setupHorizontalHeader(
ascent=SC.ASCENT,
descent=-SC.DESCENT
)
fb.setupNameTable({
"copyright": "Copyright (c) 2026 CuriousTorvald (curioustorvald.com), with Reserved Font Name TERRARUM.",
"familyName": "Terrarum Sans Bitmap",
"styleName": "Regular",
"uniqueFontIdentifier": "TerrarumSansBitmap-Regular-"+FONT_VERSION,
"fullName": "Terrarum Sans Bitmap Regular",
"psName": "TerrarumSansBitmap-Regular",
"version": FONT_VERSION,
"licenseDescription": "This Font Software is licensed under the SIL Open Font License, Version 1.1.",
"licenseInfoURL": "https://openfontlicense.org/"
})
fb.setupOS2(
sTypoAscender=SC.ASCENT,
sTypoDescender=-SC.DESCENT,
sTypoLineGap=SC.LINE_GAP,
usWinAscent=SC.ASCENT,
usWinDescent=SC.DESCENT,
sxHeight=SC.X_HEIGHT,
sCapHeight=SC.CAP_HEIGHT,
fsType=0,
)
unix_ts = int(time.time())
opentype_ts = unix_ts + 2082844800
fb.setupPost()
fb.setupHead(
unitsPerEm=SC.UNITS_PER_EM,
created=opentype_ts,
modified=opentype_ts,
)
font = fb.font
# Step 7a: Build COLR v0 / CPAL tables
if colr_layer_data:
print("Step 7a: Building COLR v0/CPAL tables...")
from fontTools.colorLib.builder import buildCOLR, buildCPAL
# CPAL: single palette normalised to 0..1
palette = [(0, 0, 0, 1.0)] * len(palette_colours)
for (r, g, b), idx in palette_colours.items():
palette[idx] = (r / 255, g / 255, b / 255, 1.0)
font["CPAL"] = buildCPAL([palette])
# COLR v0: list of (layer_glyph_name, palette_index) per base glyph
colr_v0 = {}
for base_name, layers in colr_layer_data.items():
colr_v0[base_name] = [
(layer_name, palette_colours[rgb])
for layer_name, rgb in layers
]
font["COLR"] = buildCOLR(colr_v0, version=0)
print(f" COLR v0: {len(colr_v0)} base glyphs, {len(palette)} palette entries")
# Step 8: Generate and compile OpenType features
if not no_features:
print("Step 8: Generating OpenType features...")
kern_pairs = generate_kerning_pairs(glyphs)
print(f" {len(kern_pairs)} kerning pairs")
jamo_data = get_jamo_gsub_data()
fea_code = generate_features(glyphs, kern_pairs, glyph_set,
replacewith_subs=replacewith_subs,
jamo_data=jamo_data)
if fea_code.strip():
print(" Compiling features with feaLib...")
try:
# Obtain raw .fea text for debugging
with open("debugout_features.fea", "w") as text_file:
text_file.write(fea_code)
fea_stream = io.StringIO(fea_code)
addOpenTypeFeatures(font, fea_stream)
print(" Features compiled successfully")
except Exception as e:
print(f" [WARNING] Feature compilation failed: {e}")
print(" Continuing without OpenType features")
else:
print(" No features to compile")
else:
print("Step 8: Skipping OpenType features (--no-features)")
# Step 9: Add bitmap strike (EBDT/EBLC)
if not no_bitmap:
print("Step 9: Adding bitmap strike...")
_add_bitmap_strike(font, glyphs, glyph_order, glyph_set)
else:
print("Step 9: Skipping bitmap strike (--no-bitmap)")
# Save
print(f"Saving to {output_path}...")
font.save(output_path)
elapsed = time.time() - t0
print(f"Done! Built {len(glyph_order)} glyphs in {elapsed:.1f}s")
print(f"Output: {output_path}")
def _add_bitmap_strike(font, glyphs, glyph_order, glyph_set):
"""Add EBDT/EBLC embedded bitmap strike at ppem=20 via TTX roundtrip."""
import tempfile
import os as _os
ppem = 20
name_to_id = {name: idx for idx, name in enumerate(glyph_order)}
bitmap_entries = []
for name in glyph_order:
if name == ".notdef":
continue
cp = _name_to_cp(name)
if cp is None or cp not in glyphs:
continue
g = glyphs[cp]
if g.props.is_illegal or g.props.width == 0:
continue
bitmap = g.bitmap
h = len(bitmap)
w = len(bitmap[0]) if h > 0 else 0
if w == 0 or h == 0:
continue
hex_rows = []
for row in bitmap:
row_bytes = bytearray()
for col_start in range(0, w, 8):
byte_val = 0
for bit in range(8):
col = col_start + bit
if col < w and row[col]:
byte_val |= (0x80 >> bit)
row_bytes.append(byte_val)
hex_rows.append(row_bytes.hex())
bitmap_entries.append({
'name': name,
'gid': name_to_id.get(name, 0),
'height': h,
'width': w,
'advance': g.props.width,
'hex_rows': hex_rows,
})
if not bitmap_entries:
print(" No bitmap data to embed")
return
gid_sorted = sorted(bitmap_entries, key=lambda e: e['gid'])
runs = []
current_run = [gid_sorted[0]]
for i in range(1, len(gid_sorted)):
if gid_sorted[i]['gid'] == gid_sorted[i-1]['gid'] + 1:
current_run.append(gid_sorted[i])
else:
runs.append(current_run)
current_run = [gid_sorted[i]]
runs.append(current_run)
ebdt_xml = ['<EBDT>', '<header version="2.0"/>', '<strikedata index="0">']
for entry in gid_sorted:
ebdt_xml.append(f' <cbdt_bitmap_format_1 name="{entry["name"]}">')
ebdt_xml.append(f' <SmallGlyphMetrics>')
ebdt_xml.append(f' <height value="{entry["height"]}"/>')
ebdt_xml.append(f' <width value="{entry["width"]}"/>')
ebdt_xml.append(f' <BearingX value="0"/>')
ebdt_xml.append(f' <BearingY value="{BASELINE_ROW}"/>')
ebdt_xml.append(f' <Advance value="{entry["advance"]}"/>')
ebdt_xml.append(f' </SmallGlyphMetrics>')
ebdt_xml.append(f' <rawimagedata>')
for hr in entry['hex_rows']:
ebdt_xml.append(f' {hr}')
ebdt_xml.append(f' </rawimagedata>')
ebdt_xml.append(f' </cbdt_bitmap_format_1>')
ebdt_xml.append('</strikedata>')
ebdt_xml.append('</EBDT>')
all_gids = [e['gid'] for e in gid_sorted]
desc = -(SC.H - BASELINE_ROW)
def _line_metrics_xml(direction, caret_num=1):
return [
f' <sbitLineMetrics direction="{direction}">',
f' <ascender value="{BASELINE_ROW}"/>',
f' <descender value="{desc}"/>',
f' <widthMax value="{SC.W_WIDEVAR_INIT}"/>',
f' <caretSlopeNumerator value="{caret_num}"/>',
' <caretSlopeDenominator value="0"/>',
' <caretOffset value="0"/>',
' <minOriginSB value="0"/>',
' <minAdvanceSB value="0"/>',
f' <maxBeforeBL value="{BASELINE_ROW}"/>',
f' <minAfterBL value="{desc}"/>',
' <pad1 value="0"/>',
' <pad2 value="0"/>',
f' </sbitLineMetrics>',
]
eblc_xml = [
'<EBLC>', '<header version="2.0"/>',
'<strike index="0">', ' <bitmapSizeTable>',
' <colorRef value="0"/>',
]
eblc_xml.extend(_line_metrics_xml("hori", 1))
eblc_xml.extend(_line_metrics_xml("vert", 0))
eblc_xml.extend([
f' <startGlyphIndex value="{all_gids[0]}"/>',
f' <endGlyphIndex value="{all_gids[-1]}"/>',
f' <ppemX value="{ppem}"/>',
f' <ppemY value="{ppem}"/>',
' <bitDepth value="1"/>',
' <flags value="1"/>',
' </bitmapSizeTable>',
])
for run in runs:
first_gid = run[0]['gid']
last_gid = run[-1]['gid']
eblc_xml.append(f' <eblc_index_sub_table_1 imageFormat="1" firstGlyphIndex="{first_gid}" lastGlyphIndex="{last_gid}">')
for entry in run:
eblc_xml.append(f' <glyphLoc name="{entry["name"]}"/>')
eblc_xml.append(' </eblc_index_sub_table_1>')
eblc_xml.append('</strike>')
eblc_xml.append('</EBLC>')
try:
ttx_content = '<?xml version="1.0" encoding="UTF-8"?>\n<ttFont>\n'
ttx_content += '\n'.join(ebdt_xml) + '\n'
ttx_content += '\n'.join(eblc_xml) + '\n'
ttx_content += '</ttFont>\n'
with tempfile.NamedTemporaryFile(mode='w', suffix='.ttx', delete=False) as f:
f.write(ttx_content)
ttx_path = f.name
font.importXML(ttx_path)
_os.unlink(ttx_path)
print(f" Added bitmap strike at {ppem}ppem with {len(bitmap_entries)} glyphs ({len(runs)} index subtables)")
except Exception as e:
print(f" [WARNING] Bitmap strike failed: {e}")
print(" Continuing without bitmap strike")
def _name_to_cp(name):
"""Convert glyph name back to codepoint."""
if name == ".notdef":
return None
if name == "space":
return 0x20
if name.startswith("uni"):
try:
return int(name[3:], 16)
except ValueError:
return None
if name.startswith("u"):
try:
return int(name[1:], 16)
except ValueError:
return None
return None

507
OTFbuild/glyph_parser.py Normal file
View File

@@ -0,0 +1,507 @@
"""
Extract glyph bitmaps and tag-column properties from TGA sprite sheets.
Ported from TerrarumSansBitmap.kt:buildWidthTable() and GlyphSheetParser.kt.
Enhancement over v1: extracts all 6 diacritics anchors for GPOS mark feature.
"""
import os
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Tuple
from tga_reader import TgaImage, read_tga
import sheet_config as SC
@dataclass
class DiacriticsAnchor:
type: int
x: int
y: int
x_used: bool
y_used: bool
@dataclass
class GlyphProps:
width: int
is_low_height: bool = False
nudge_x: int = 0
nudge_y: int = 0
diacritics_anchors: List[DiacriticsAnchor] = field(default_factory=lambda: [
DiacriticsAnchor(i, 0, 0, False, False) for i in range(6)
])
align_where: int = 0
write_on_top: int = -1
stack_where: int = 0
ext_info: List[int] = field(default_factory=lambda: [0] * 15)
has_kern_data: bool = False
is_kern_y_type: bool = False
kerning_mask: int = 255
dot_removal: Optional[int] = None # codepoint to replace with when followed by a STACK_UP mark
directive_opcode: int = 0
directive_arg1: int = 0
directive_arg2: int = 0
@property
def is_illegal(self):
return self.directive_opcode == 255
def required_ext_info_count(self):
if self.stack_where == SC.STACK_BEFORE_N_AFTER:
return 2
if 0b10000_000 <= self.directive_opcode <= 0b10000_111:
return 7
return 0
def is_pragma(self, pragma):
if pragma == "replacewith":
return 0b10000_000 <= self.directive_opcode <= 0b10000_111
return False
@dataclass
class ExtractedGlyph:
codepoint: int
props: GlyphProps
bitmap: List[List[int]] # [row][col], 0 or 1
color_bitmap: Optional[List[List[int]]] = None # [row][col], RGBA8888 values
def _is_coloured_pixel(px):
"""Return True if the pixel is visible (A > 0) and non-white (R+G+B < 765)."""
a = px & 0xFF
if a == 0:
return False
r = (px >> 24) & 0xFF
g = (px >> 16) & 0xFF
b = (px >> 8) & 0xFF
return (r + g + b) < 765
def _tagify(pixel):
"""Return 0 if alpha channel is zero, else return the original value."""
return 0 if (pixel & 0xFF) == 0 else pixel
def _signed_byte(val):
"""Convert unsigned byte to signed."""
return val - 256 if val >= 128 else val
def _parse_diacritics_anchors(image, code_start_x, code_start_y):
"""Parse 6 diacritics anchors from tag column rows 11-14."""
anchors = []
for i in range(6):
y_pos = 13 - (i // 3) * 2
shift = (3 - (i % 3)) * 8
y_pixel = _tagify(image.get_pixel(code_start_x, code_start_y + y_pos))
x_pixel = _tagify(image.get_pixel(code_start_x, code_start_y + y_pos + 1))
y_used = ((y_pixel >> shift) & 128) != 0
x_used = ((x_pixel >> shift) & 128) != 0
y_val = (y_pixel >> shift) & 127 if y_used else 0
x_val = (x_pixel >> shift) & 127 if x_used else 0
anchors.append(DiacriticsAnchor(i, x_val, y_val, x_used, y_used))
return anchors
def parse_variable_sheet(image, sheet_index, cell_w, cell_h, cols, is_xy_swapped):
"""Parse a variable-width sheet: extract tag column for properties, bitmap for glyph."""
code_range = SC.CODE_RANGE[sheet_index]
binary_code_offset = cell_w - 1 # tag column is last pixel column of cell
result = {}
for index, code in enumerate(code_range):
if is_xy_swapped:
cell_x = (index // cols) * cell_w
cell_y = (index % cols) * cell_h
else:
cell_x = (index % cols) * cell_w
cell_y = (index // cols) * cell_h
code_start_x = cell_x + binary_code_offset
code_start_y = cell_y
# Width (5 bits)
width = 0
for y in range(5):
if image.get_pixel(code_start_x, code_start_y + y) & 0xFF:
width |= (1 << y)
is_low_height = (image.get_pixel(code_start_x, code_start_y + 5) & 0xFF) != 0
# Kerning data
kerning_bit1 = _tagify(image.get_pixel(code_start_x, code_start_y + 6))
kerning_bit2 = _tagify(image.get_pixel(code_start_x, code_start_y + 7))
dot_removal = None if kerning_bit2 == 0 else (kerning_bit2 >> 8)
is_kern_y_type = (kerning_bit1 & 0x80000000) != 0
kerning_mask = (kerning_bit1 >> 8) & 0xFFFFFF
has_kern_data = (kerning_bit1 & 0xFF) != 0
if not has_kern_data:
is_kern_y_type = False
kerning_mask = 255
# Compiler directives
compiler_directives = _tagify(image.get_pixel(code_start_x, code_start_y + 9))
directive_opcode = (compiler_directives >> 24) & 255
directive_arg1 = (compiler_directives >> 16) & 255
directive_arg2 = (compiler_directives >> 8) & 255
# Nudge
nudging_bits = _tagify(image.get_pixel(code_start_x, code_start_y + 10))
nudge_x = _signed_byte((nudging_bits >> 24) & 0xFF)
nudge_y = _signed_byte((nudging_bits >> 16) & 0xFF)
# Diacritics anchors
diacritics_anchors = _parse_diacritics_anchors(image, code_start_x, code_start_y)
# Alignment
align_where = 0
for y in range(2):
if image.get_pixel(code_start_x, code_start_y + y + 15) & 0xFF:
align_where |= (1 << y)
# Write on top
write_on_top_raw = image.get_pixel(code_start_x, code_start_y + 17) # NO tagify
if (write_on_top_raw & 0xFF) == 0:
write_on_top = -1
else:
if (write_on_top_raw >> 8) == 0xFFFFFF:
write_on_top = 0
else:
write_on_top = (write_on_top_raw >> 28) & 15
# Stack where
stack_where0 = _tagify(image.get_pixel(code_start_x, code_start_y + 18))
stack_where1 = _tagify(image.get_pixel(code_start_x, code_start_y + 19))
if stack_where0 == 0x00FF00FF and stack_where1 == 0x00FF00FF:
stack_where = SC.STACK_DONT
else:
stack_where = 0
for y in range(2):
if image.get_pixel(code_start_x, code_start_y + y + 18) & 0xFF:
stack_where |= (1 << y)
ext_info = [0] * 15
props = GlyphProps(
width=width, is_low_height=is_low_height,
nudge_x=nudge_x, nudge_y=nudge_y,
diacritics_anchors=diacritics_anchors,
align_where=align_where, write_on_top=write_on_top,
stack_where=stack_where, ext_info=ext_info,
has_kern_data=has_kern_data, is_kern_y_type=is_kern_y_type,
kerning_mask=kerning_mask, dot_removal=dot_removal,
directive_opcode=directive_opcode, directive_arg1=directive_arg1,
directive_arg2=directive_arg2,
)
# Parse extInfo if needed
ext_count = props.required_ext_info_count()
if ext_count > 0:
for x in range(ext_count):
info = 0
for y in range(20):
if image.get_pixel(cell_x + x, cell_y + y) & 0xFF:
info |= (1 << y)
ext_info[x] = info
# Extract glyph bitmap: full cell minus the tag column.
# The Kotlin code draws the ENTIRE cell at a computed position;
# the tag column is the only thing excluded.
# Alignment and width only affect advance/positioning, not the bitmap.
max_w = cell_w - 1 # exclude tag column
bitmap = []
for row in range(cell_h):
row_data = []
for col in range(max_w):
px = image.get_pixel(cell_x + col, cell_y + row)
row_data.append(1 if (px & 0xFF) != 0 else 0)
bitmap.append(row_data)
# Now strip the tag column pixels that may have leaked into
# the glyph area. Tag data lives at column (cell_w - 1) which
# we already excluded, but extInfo columns 0..6 at the LEFT
# edge of the cell also contain tag data for replacewith glyphs.
# Clean those columns if they were used for extInfo.
if ext_count > 0:
for col_idx in range(min(ext_count, max_w)):
for row in range(cell_h):
bitmap[row][col_idx] = 0
# Colour extraction: check if any visible pixel is non-white
has_colour = False
color_bitmap = []
for row in range(cell_h):
row_data = []
for col in range(max_w):
px = image.get_pixel(cell_x + col, cell_y + row)
row_data.append(px)
if not has_colour and _is_coloured_pixel(px):
has_colour = True
color_bitmap.append(row_data)
if has_colour:
# Strip extInfo columns from color_bitmap too
if ext_count > 0:
for col_idx in range(min(ext_count, max_w)):
for row in range(cell_h):
color_bitmap[row][col_idx] = 0
else:
color_bitmap = None
result[code] = ExtractedGlyph(code, props, bitmap, color_bitmap)
return result
def _read_hangul_cell(image, column, row, cell_w=SC.W_HANGUL_BASE, cell_h=SC.H):
"""Read a single cell from the Hangul johab sheet at (column, row)."""
cell_x = column * cell_w
cell_y = row * cell_h
bitmap = []
for r in range(cell_h):
row_data = []
for c in range(cell_w):
px = image.get_pixel(cell_x + c, cell_y + r)
row_data.append(1 if (px & 0xFF) != 0 else 0)
bitmap.append(row_data)
return bitmap
def parse_hangul_jamo_sheet(image, cell_w, cell_h):
"""
Parse the Hangul Jamo sheet with correct row/column mapping.
Layout in hangul_johab.tga:
- Choseong (U+1100-U+115E): column = choseongIndex, row = 1
- Jungseong (U+1161-U+11A7): column = jungseongIndex+1, row = 15
(column 0 is filler U+1160, stored at row 15 col 0)
- Jongseong (U+11A8-U+11FF): column = jongseongIndex, row = 17
(index starts at 1 for 11A8)
- Extended Choseong (U+A960-U+A97F): column = 96+offset, row = 1
- Extended Jungseong (U+D7B0-U+D7C6): column = 72+offset, row = 15
- Extended Jongseong (U+D7CB-U+D7FB): column = 89+offset, row = 17
Each jamo gets a default-row bitmap. Multiple variant rows exist for
syllable composition (handled separately by hangul.py / GSUB).
"""
result = {}
# U+1160 (Hangul Jungseong Filler) — column 0, row 15
bm = _read_hangul_cell(image, 0, 15, cell_w, cell_h)
result[0x1160] = ExtractedGlyph(0x1160, GlyphProps(width=cell_w), bm)
# Choseong: U+1100-U+115E → column = cp - 0x1100, row = 1
for cp in range(0x1100, 0x115F):
col = cp - 0x1100
bm = _read_hangul_cell(image, col, 1, cell_w, cell_h)
result[cp] = ExtractedGlyph(cp, GlyphProps(width=cell_w), bm)
# U+115F (Hangul Choseong Filler)
col = 0x115F - 0x1100
bm = _read_hangul_cell(image, col, 1, cell_w, cell_h)
result[0x115F] = ExtractedGlyph(0x115F, GlyphProps(width=cell_w), bm)
# Jungseong: U+1161-U+11A7 → column = (cp - 0x1160), row = 15
for cp in range(0x1161, 0x11A8):
col = cp - 0x1160
bm = _read_hangul_cell(image, col, 15, cell_w, cell_h)
result[cp] = ExtractedGlyph(cp, GlyphProps(width=cell_w), bm)
# Jongseong: U+11A8-U+11FF → column = (cp - 0x11A8 + 1), row = 17
for cp in range(0x11A8, 0x1200):
col = cp - 0x11A8 + 1
bm = _read_hangul_cell(image, col, 17, cell_w, cell_h)
result[cp] = ExtractedGlyph(cp, GlyphProps(width=cell_w), bm)
# Extended Choseong: U+A960-U+A97F → column = (cp - 0xA960 + 96), row = 1
for cp in range(0xA960, 0xA980):
col = cp - 0xA960 + 96
bm = _read_hangul_cell(image, col, 1, cell_w, cell_h)
result[cp] = ExtractedGlyph(cp, GlyphProps(width=cell_w), bm)
# Extended Jungseong: U+D7B0-U+D7C6 → column = (cp - 0xD7B0 + 72), row = 15
for cp in range(0xD7B0, 0xD7C7):
col = cp - 0xD7B0 + 72
bm = _read_hangul_cell(image, col, 15, cell_w, cell_h)
result[cp] = ExtractedGlyph(cp, GlyphProps(width=cell_w), bm)
# Extended Jongseong: U+D7CB-U+D7FB → column = (cp - 0xD7CB + 88 + 1), row = 17
for cp in range(0xD7CB, 0xD7FC):
col = cp - 0xD7CB + 88 + 1
bm = _read_hangul_cell(image, col, 17, cell_w, cell_h)
result[cp] = ExtractedGlyph(cp, GlyphProps(width=cell_w), bm)
return result
def parse_fixed_sheet(image, sheet_index, cell_w, cell_h, cols):
"""Parse a fixed-width sheet (Hangul, Unihan, Runic, Custom Sym)."""
# Hangul Jamo sheet has special layout — handled separately
if sheet_index == SC.SHEET_HANGUL:
return parse_hangul_jamo_sheet(image, cell_w, cell_h)
code_range = SC.CODE_RANGE[sheet_index]
result = {}
fixed_width = {
SC.SHEET_CUSTOM_SYM: 20,
SC.SHEET_RUNIC: 9,
SC.SHEET_UNIHAN: SC.W_UNIHAN,
}.get(sheet_index, cell_w)
for index, code in enumerate(code_range):
cell_x = (index % cols) * cell_w
cell_y = (index // cols) * cell_h
bitmap = []
has_colour = False
color_bitmap = []
for row in range(cell_h):
row_data = []
color_row = []
for col in range(cell_w):
px = image.get_pixel(cell_x + col, cell_y + row)
row_data.append(1 if (px & 0xFF) != 0 else 0)
color_row.append(px)
if not has_colour and _is_coloured_pixel(px):
has_colour = True
bitmap.append(row_data)
color_bitmap.append(color_row)
props = GlyphProps(width=fixed_width)
result[code] = ExtractedGlyph(code, props, bitmap,
color_bitmap if has_colour else None)
return result
def _empty_bitmap(w=SC.W_VAR_INIT, h=SC.H):
return [[0] * w for _ in range(h)]
def parse_all_sheets(assets_dir):
"""Parse all sheets and return a map of codepoint -> ExtractedGlyph."""
result = {}
for sheet_index, filename in enumerate(SC.FILE_LIST):
filepath = os.path.join(assets_dir, filename)
if not os.path.exists(filepath):
print(f" [SKIP] {filename} not found")
continue
is_var = SC.is_variable(filename)
is_xy = SC.is_xy_swapped(filename)
is_ew = SC.is_extra_wide(filename)
cell_w = SC.get_cell_width(sheet_index)
cell_h = SC.get_cell_height(sheet_index)
cols = SC.get_columns(sheet_index)
tags = []
if is_var: tags.append("VARIABLE")
if is_xy: tags.append("XYSWAP")
if is_ew: tags.append("EXTRAWIDE")
if not tags: tags.append("STATIC")
print(f" Loading [{','.join(tags)}] {filename}")
image = read_tga(filepath)
if is_var:
sheet_glyphs = parse_variable_sheet(image, sheet_index, cell_w, cell_h, cols, is_xy)
else:
sheet_glyphs = parse_fixed_sheet(image, sheet_index, cell_w, cell_h, cols)
result.update(sheet_glyphs)
# Fixed-width overrides
_add_fixed_width_overrides(result)
return result
def _add_fixed_width_overrides(result):
"""Apply fixed-width overrides."""
# Hangul compat jamo
for code in SC.CODE_RANGE_HANGUL_COMPAT:
if code not in result:
result[code] = ExtractedGlyph(code, GlyphProps(width=SC.W_HANGUL_BASE), _empty_bitmap(SC.W_HANGUL_BASE))
# Zero-width ranges (only internal/PUA control ranges, not surrogates or full Plane 16)
for code in range(0xFFFA0, 0x100000):
result[code] = ExtractedGlyph(code, GlyphProps(width=0), _empty_bitmap(1, 1))
# Null char
result[0] = ExtractedGlyph(0, GlyphProps(width=0), _empty_bitmap(1, 1))
# Replacement character at U+007F
if 0x7F in result:
result[0x7F].props.width = 15
def get_hangul_jamo_bitmaps(assets_dir):
"""
Extract raw Hangul jamo bitmaps from the Hangul sheet for composition.
Returns a function: (column_index, row) -> bitmap (list of list of int)
"""
filename = SC.FILE_LIST[SC.SHEET_HANGUL]
filepath = os.path.join(assets_dir, filename)
if not os.path.exists(filepath):
print(" [WARNING] Hangul sheet not found")
return lambda idx, row: _empty_bitmap(SC.W_HANGUL_BASE)
image = read_tga(filepath)
cell_w = SC.W_HANGUL_BASE
cell_h = SC.H
def get_bitmap(index, row):
cell_x = index * cell_w
cell_y = row * cell_h
bitmap = []
for r in range(cell_h):
row_data = []
for c in range(cell_w):
px = image.get_pixel(cell_x + c, cell_y + r)
row_data.append(1 if (px & 0xFF) != 0 else 0)
bitmap.append(row_data)
return bitmap
return get_bitmap
def extract_hangul_jamo_variants(assets_dir):
"""
Extract ALL Hangul jamo variant bitmaps from hangul_johab.tga.
Returns dict of (column, row) -> bitmap for every non-empty cell.
Used by hangul.py to store variants in PUA for GSUB assembly.
Layout:
Row 0: Hangul Compatibility Jamo (U+3130-U+318F)
Rows 1-14: Choseong variants (row depends on jungseong context)
Rows 15-16: Jungseong variants (15=no final, 16=with final)
Rows 17-18: Jongseong variants (17=normal, 18=rightie jungseong)
Rows 19-24: Additional choseong variants (giyeok remapping)
"""
filename = SC.FILE_LIST[SC.SHEET_HANGUL]
filepath = os.path.join(assets_dir, filename)
if not os.path.exists(filepath):
return {}
image = read_tga(filepath)
cell_w = SC.W_HANGUL_BASE
cell_h = SC.H
variants = {}
# Scan all rows that contain jamo data
# Rows 0-24 at minimum, checking up to image height
max_row = image.height // cell_h
max_col = image.width // cell_w
for row in range(max_row):
for col in range(max_col):
bm = _read_hangul_cell(image, col, row, cell_w, cell_h)
# Check if non-empty
if any(px for r in bm for px in r):
variants[(col, row)] = bm
return variants

160
OTFbuild/hangul.py Normal file
View File

@@ -0,0 +1,160 @@
"""
Compose 11,172 Hangul syllables (U+AC00-U+D7A3) from jamo sprite pieces.
Also composes Hangul Compatibility Jamo (U+3130-U+318F).
Also stores all jamo variant bitmaps in PUA for GSUB-based jamo assembly.
Ported from HangulCompositor.kt and TerrarumSansBitmap.kt.
"""
from typing import Dict, List, Tuple
from glyph_parser import (
ExtractedGlyph, GlyphProps, get_hangul_jamo_bitmaps,
extract_hangul_jamo_variants, _read_hangul_cell, _empty_bitmap,
)
import sheet_config as SC
# PUA range for Hangul jamo variant storage.
# We need space for: max_col * max_row variants.
# Using 0xF0600-0xF1E7F
HANGUL_PUA_BASE = 0xF0600
def _compose_bitmaps(a, b, w, h):
"""OR two bitmaps together."""
result = []
for row in range(h):
row_data = []
for col in range(w):
av = a[row][col] if row < len(a) and col < len(a[row]) else 0
bv = b[row][col] if row < len(b) and col < len(b[row]) else 0
row_data.append(1 if av or bv else 0)
result.append(row_data)
return result
def _compose_bitmap_into(target, source, w, h):
"""OR source bitmap into target (mutates target)."""
for row in range(min(h, len(target), len(source))):
for col in range(min(w, len(target[row]), len(source[row]))):
if source[row][col]:
target[row][col] = 1
def _pua_for_jamo_variant(col, row):
"""Get PUA codepoint for a jamo variant at (column, row) in the sheet."""
# Encode as base + row * 256 + col (supports up to 256 columns per row)
return HANGUL_PUA_BASE + row * 256 + col
def compose_hangul(assets_dir) -> Dict[int, ExtractedGlyph]:
"""
Compose all Hangul syllables, compatibility jamo, and jamo variants.
Returns a dict of codepoint -> ExtractedGlyph.
"""
get_jamo = get_hangul_jamo_bitmaps(assets_dir)
cell_w = SC.W_HANGUL_BASE
cell_h = SC.H
result = {}
# Compose Hangul Compatibility Jamo (U+3130-U+318F)
for c in range(0x3130, 0x3190):
index = c - 0x3130
bitmap = get_jamo(index, 0)
props = GlyphProps(width=cell_w)
result[c] = ExtractedGlyph(c, props, bitmap)
# Compose 11,172 Hangul syllables (U+AC00-U+D7A3)
print(" Composing 11,172 Hangul syllables...")
for c in range(0xAC00, 0xD7A4):
c_int = c - 0xAC00
index_cho = c_int // (SC.JUNG_COUNT * SC.JONG_COUNT)
index_jung = c_int // SC.JONG_COUNT % SC.JUNG_COUNT
index_jong = c_int % SC.JONG_COUNT # 0 = no jongseong
# Map to jamo codepoints
cho_cp = 0x1100 + index_cho
jung_cp = 0x1161 + index_jung
jong_cp = 0x11A8 + index_jong - 1 if index_jong > 0 else 0
# Get sheet indices
i_cho = SC.to_hangul_choseong_index(cho_cp)
i_jung = SC.to_hangul_jungseong_index(jung_cp)
if i_jung is None:
i_jung = 0
i_jong = 0
if jong_cp != 0:
idx = SC.to_hangul_jongseong_index(jong_cp)
if idx is not None:
i_jong = idx
# Get row positions
cho_row = SC.get_han_initial_row(i_cho, i_jung, i_jong)
jung_row = SC.get_han_medial_row(i_cho, i_jung, i_jong)
jong_row = SC.get_han_final_row(i_cho, i_jung, i_jong)
# Get jamo bitmaps
cho_bitmap = get_jamo(i_cho, cho_row)
jung_bitmap = get_jamo(i_jung, jung_row)
# Compose
composed = _compose_bitmaps(cho_bitmap, jung_bitmap, cell_w, cell_h)
if index_jong > 0:
jong_bitmap = get_jamo(i_jong, jong_row)
_compose_bitmap_into(composed, jong_bitmap, cell_w, cell_h)
# Determine advance width
advance_width = cell_w + 1 if i_jung in SC.HANGUL_PEAKS_WITH_EXTRA_WIDTH else cell_w
props = GlyphProps(width=advance_width)
result[c] = ExtractedGlyph(c, props, composed)
print(f" Hangul syllable composition done: {len(result)} glyphs")
# Store jamo variant bitmaps in PUA for GSUB assembly
print(" Extracting jamo variants for GSUB...")
variants = extract_hangul_jamo_variants(assets_dir)
variant_count = 0
for (col, row), bm in variants.items():
pua = _pua_for_jamo_variant(col, row)
if pua not in result:
# Jungseong (rows 15-16) and jongseong (rows 17-18) overlay the
# choseong, so they need zero advance width.
w = 0 if 15 <= row <= 18 else cell_w
result[pua] = ExtractedGlyph(pua, GlyphProps(width=w), bm)
variant_count += 1
# Ensure jungseong filler PUA variants exist (col=0, rows 15-16).
# The filler has an empty bitmap so extract_hangul_jamo_variants skips
# it, but the vjmo GSUB lookup needs a PUA target to substitute to.
empty_bm = [[0] * cell_w for _ in range(cell_h)]
for row in [15, 16]:
pua = _pua_for_jamo_variant(0, row)
if pua not in result:
result[pua] = ExtractedGlyph(pua, GlyphProps(width=0), empty_bm)
variant_count += 1
print(f" Stored {variant_count} jamo variant glyphs in PUA (0x{HANGUL_PUA_BASE:05X}+)")
print(f" Total Hangul glyphs: {len(result)}")
return result
def get_jamo_gsub_data():
"""
Generate the data needed for Hangul jamo GSUB lookups.
Returns a dict with:
- 'cho_rows': dict mapping (i_jung, has_jong) -> row for choseong
- 'jung_rows': dict mapping has_jong -> row for jungseong
- 'jong_rows': dict mapping is_rightie -> row for jongseong
- 'pua_fn': function(col, row) -> PUA codepoint
These are the row-selection rules from the Kotlin code:
Choseong row = getHanInitialRow(i_cho, i_jung, i_jong)
Jungseong row = 15 if no final, else 16
Jongseong row = 17 if jungseong is not rightie, else 18
"""
return {
'pua_fn': _pua_for_jamo_variant,
'pua_base': HANGUL_PUA_BASE,
}

126
OTFbuild/keming_machine.py Normal file
View File

@@ -0,0 +1,126 @@
"""
Generate kerning pairs from shape rules.
Ported from TerrarumSansBitmap.kt "The Keming Machine" section.
6 base rules + 6 mirrored (auto-generated) = 12 rules total.
Also includes r+dot special pairs.
Output kern values scaled by SCALE (50 units/pixel):
-1px -> -50 units, -2px -> -100 units
"""
from typing import Dict, Tuple
from glyph_parser import ExtractedGlyph
import sheet_config as SC
SCALE = SC.SCALE
class _Ing:
"""Pattern matcher for kerning shape bits."""
def __init__(self, s):
self.s = s
self.care_bits = 0
self.rule_bits = 0
for index, char in enumerate(s):
if char == '@':
self.care_bits |= SC.KEMING_BIT_MASK[index]
self.rule_bits |= SC.KEMING_BIT_MASK[index]
elif char == '`':
self.care_bits |= SC.KEMING_BIT_MASK[index]
def matches(self, shape_bits):
return (shape_bits & self.care_bits) == self.rule_bits
class _Kem:
def __init__(self, first, second, bb=2, yy=1):
self.first = first
self.second = second
self.bb = bb
self.yy = yy
def _build_kerning_rules():
"""Build the 12 kerning rules (6 base + 6 mirrored)."""
base_rules = [
_Kem(_Ing("_`_@___`__"), _Ing("`_`___@___")),
_Kem(_Ing("_@_`___`__"), _Ing("`_________")),
_Kem(_Ing("_@_@___`__"), _Ing("`___@_@___"), 1, 1),
_Kem(_Ing("_@_@_`_`__"), _Ing("`_____@___")),
_Kem(_Ing("___`_`____"), _Ing("`___@_`___")),
_Kem(_Ing("___`_`____"), _Ing("`_@___`___")),
]
mirrored = []
for rule in base_rules:
left = rule.first.s
right = rule.second.s
new_left = []
new_right = []
for c in range(0, len(left), 2):
new_left.append(right[c + 1])
new_left.append(right[c])
new_right.append(left[c + 1])
new_right.append(left[c])
mirrored.append(_Kem(
_Ing(''.join(new_left)),
_Ing(''.join(new_right)),
rule.bb, rule.yy
))
return base_rules + mirrored
_KERNING_RULES = _build_kerning_rules()
def generate_kerning_pairs(glyphs: Dict[int, ExtractedGlyph]) -> Dict[Tuple[int, int], int]:
"""
Generate kerning pairs from all glyphs that have kerning data.
Returns dict of (left_codepoint, right_codepoint) -> kern_offset_in_font_units.
Negative values = tighter spacing.
"""
result = {}
# Collect all codepoints with kerning data
kernable = {cp: g for cp, g in glyphs.items() if g.props.has_kern_data}
if not kernable:
print(" [KemingMachine] No glyphs with kern data found")
return result
print(f" [KemingMachine] {len(kernable)} glyphs with kern data")
# Special rule: lowercase r + dot
r_dot_count = 0
for r in SC.LOWERCASE_RS:
for d in SC.DOTS:
if r in glyphs and d in glyphs:
result[(r, d)] = -1 * SCALE
r_dot_count += 1
# Apply kerning rules to all pairs
kern_codes = list(kernable.keys())
pairs_found = 0
for left_code in kern_codes:
left_props = kernable[left_code].props
mask_l = left_props.kerning_mask
for right_code in kern_codes:
right_props = kernable[right_code].props
mask_r = right_props.kerning_mask
for rule in _KERNING_RULES:
if rule.first.matches(mask_l) and rule.second.matches(mask_r):
contraction = rule.yy if (left_props.is_kern_y_type or right_props.is_kern_y_type) else rule.bb
if contraction > 0:
result[(left_code, right_code)] = -contraction * SCALE
pairs_found += 1
break # first matching rule wins
print(f" [KemingMachine] Generated {pairs_found} kerning pairs (+ {r_dot_count} r-dot pairs)")
return result

File diff suppressed because it is too large Load Diff

10
OTFbuild/otf2woff2.py Executable file
View File

@@ -0,0 +1,10 @@
#!/usr/bin/env python3
"""Convert an OTF/TTF font to WOFF2 format."""
import sys
from fontTools.ttLib import TTFont
src, dst = sys.argv[1], sys.argv[2]
font = TTFont(src)
font.flavor = 'woff2'
font.save(dst)
print(f" Written {dst}")

View File

@@ -0,0 +1,3 @@
fonttools>=4.47.0
brotli>=1.1.0
opentype-sanitizer>=9.2.0

575
OTFbuild/sheet_config.py Normal file
View File

@@ -0,0 +1,575 @@
"""
Sheet definitions, code ranges, index functions, and font metric constants.
Ported from TerrarumSansBitmap.kt companion object and SheetConfig.kt.
"""
# Font metrics
H = 20
H_UNIHAN = 16
W_HANGUL_BASE = 13
W_UNIHAN = 16
W_LATIN_WIDE = 9
W_VAR_INIT = 15
W_WIDEVAR_INIT = 31
HGAP_VAR = 1
SIZE_CUSTOM_SYM = 20
H_DIACRITICS = 3
H_STACKUP_LOWERCASE_SHIFTDOWN = 4
H_OVERLAY_LOWERCASE_SHIFTDOWN = 2
LINE_HEIGHT = 24
# OTF metrics (1000 UPM, scale = 50 units/pixel)
UNITS_PER_EM = 1000
SCALE = 50 # units per pixel
ASCENT = 16 * SCALE # 800
DESCENT = 4 * SCALE # 200
X_HEIGHT = 8 * SCALE # 400
CAP_HEIGHT = 12 * SCALE # 600
LINE_GAP = (LINE_HEIGHT - H) * SCALE # 200
# Sheet indices
SHEET_ASCII_VARW = 0
SHEET_HANGUL = 1
SHEET_EXTA_VARW = 2
SHEET_EXTB_VARW = 3
SHEET_KANA = 4
SHEET_CJK_PUNCT = 5
SHEET_UNIHAN = 6
SHEET_CYRILIC_VARW = 7
SHEET_HALFWIDTH_FULLWIDTH_VARW = 8
SHEET_UNI_PUNCT_VARW = 9
SHEET_GREEK_VARW = 10
SHEET_THAI_VARW = 11
SHEET_HAYEREN_VARW = 12
SHEET_KARTULI_VARW = 13
SHEET_IPA_VARW = 14
SHEET_RUNIC = 15
SHEET_LATIN_EXT_ADD_VARW = 16
SHEET_CUSTOM_SYM = 17
SHEET_BULGARIAN_VARW = 18
SHEET_SERBIAN_VARW = 19
SHEET_TSALAGI_VARW = 20
SHEET_PHONETIC_EXT_VARW = 21
SHEET_DEVANAGARI_VARW = 22
SHEET_KARTULI_CAPS_VARW = 23
SHEET_DIACRITICAL_MARKS_VARW = 24
SHEET_GREEK_POLY_VARW = 25
SHEET_EXTC_VARW = 26
SHEET_EXTD_VARW = 27
SHEET_CURRENCIES_VARW = 28
SHEET_INTERNAL_VARW = 29
SHEET_LETTERLIKE_MATHS_VARW = 30
SHEET_ENCLOSED_ALPHNUM_SUPL_VARW = 31
SHEET_TAMIL_VARW = 32
SHEET_BENGALI_VARW = 33
SHEET_BRAILLE_VARW = 34
SHEET_SUNDANESE_VARW = 35
SHEET_DEVANAGARI2_INTERNAL_VARW = 36
SHEET_CODESTYLE_ASCII_VARW = 37
SHEET_ALPHABETIC_PRESENTATION_FORMS = 38
SHEET_HENTAIGANA_VARW = 39
SHEET_CONTROL_PICTURES_VARW = 40
SHEET_LEGACY_COMPUTING_VARW = 41
SHEET_CYRILIC_EXTB_VARW = 42
SHEET_CYRILIC_EXTA_VARW = 43
SHEET_CYRILIC_EXTC_VARW = 44
SHEET_LATIN_EXTE_VARW = 45
SHEET_LATIN_EXTF_VARW = 46
SHEET_LATIN_EXTG_VARW = 47
SHEET_OGHAM_VARW = 48
SHEET_COPTIC_VARW = 49
SHEET_UNKNOWN = 254
FILE_LIST = [
"ascii_variable.tga",
"hangul_johab.tga",
"latinExtA_variable.tga",
"latinExtB_variable.tga",
"kana_variable.tga",
"cjkpunct_variable.tga",
"wenquanyi.tga",
"cyrilic_variable.tga",
"halfwidth_fullwidth_variable.tga",
"unipunct_variable.tga",
"greek_variable.tga",
"thai_variable.tga",
"hayeren_variable.tga",
"kartuli_variable.tga",
"ipa_ext_variable.tga",
"futhark.tga",
"latinExt_additional_variable.tga",
"puae000-e0ff.tga",
"cyrilic_bulgarian_variable.tga",
"cyrilic_serbian_variable.tga",
"tsalagi_variable.tga",
"phonetic_extensions_variable.tga",
"devanagari_variable.tga",
"kartuli_allcaps_variable.tga",
"diacritical_marks_variable.tga",
"greek_polytonic_xyswap_variable.tga",
"latinExtC_variable.tga",
"latinExtD_variable.tga",
"currencies_variable.tga",
"internal_variable.tga",
"letterlike_symbols_variable.tga",
"enclosed_alphanumeric_supplement_variable.tga",
"tamil_extrawide_variable.tga",
"bengali_variable.tga",
"braille_variable.tga",
"sundanese_variable.tga",
"devanagari_internal_extrawide_variable.tga",
"pua_codestyle_ascii_variable.tga",
"alphabetic_presentation_forms_extrawide_variable.tga",
"hentaigana_variable.tga",
"control_pictures_variable.tga",
"symbols_for_legacy_computing_variable.tga",
"cyrilic_extB_variable.tga",
"cyrilic_extA_variable.tga",
"cyrilic_extC_variable.tga",
"latinExtE_variable.tga",
"latinExtF_variable.tga",
"latinExtG_variable.tga",
"ogham_variable.tga",
"coptic_variable.tga",
]
CODE_RANGE = [
list(range(0x00, 0x100)), # 0: ASCII
list(range(0x1100, 0x1200)) + list(range(0xA960, 0xA980)) + list(range(0xD7B0, 0xD800)), # 1: Hangul Jamo
list(range(0x100, 0x180)), # 2: Latin Ext A
list(range(0x180, 0x250)), # 3: Latin Ext B
list(range(0x3040, 0x3100)) + list(range(0x31F0, 0x3200)), # 4: Kana
list(range(0x3000, 0x3040)), # 5: CJK Punct
list(range(0x3400, 0xA000)), # 6: Unihan
list(range(0x400, 0x530)), # 7: Cyrillic
list(range(0xFF00, 0x10000)), # 8: Halfwidth/Fullwidth
list(range(0x2000, 0x20A0)), # 9: Uni Punct
list(range(0x370, 0x400)), # 10: Greek
list(range(0xE00, 0xE60)), # 11: Thai
list(range(0x530, 0x590)), # 12: Armenian
list(range(0x10D0, 0x1100)), # 13: Georgian
list(range(0x250, 0x300)), # 14: IPA
list(range(0x16A0, 0x1700)), # 15: Runic
list(range(0x1E00, 0x1F00)), # 16: Latin Ext Additional
list(range(0xE000, 0xE100)), # 17: Custom Sym (PUA)
list(range(0xF0000, 0xF0060)), # 18: Bulgarian
list(range(0xF0060, 0xF00C0)), # 19: Serbian
list(range(0x13A0, 0x13F6)), # 20: Cherokee
list(range(0x1D00, 0x1DC0)), # 21: Phonetic Ext
list(range(0x900, 0x980)) + list(range(0xF0100, 0xF0500)), # 22: Devanagari
list(range(0x1C90, 0x1CC0)), # 23: Georgian Caps
list(range(0x300, 0x370)), # 24: Diacritical Marks
list(range(0x1F00, 0x2000)), # 25: Greek Polytonic
list(range(0x2C60, 0x2C80)), # 26: Latin Ext C
list(range(0xA720, 0xA800)), # 27: Latin Ext D
list(range(0x20A0, 0x20D0)), # 28: Currencies
list(range(0xFFE00, 0xFFFA0)), # 29: Internal
list(range(0x2100, 0x2200)), # 30: Letterlike
list(range(0x1F100, 0x1F200)), # 31: Enclosed Alphanum Supl
list(range(0x0B80, 0x0C00)) + list(range(0xF00C0, 0xF0100)), # 32: Tamil
list(range(0x980, 0xA00)), # 33: Bengali
list(range(0x2800, 0x2900)), # 34: Braille
list(range(0x1B80, 0x1BC0)) + list(range(0x1CC0, 0x1CD0)) + list(range(0xF0500, 0xF0510)), # 35: Sundanese
list(range(0xF0110, 0xF0130)), # 36: Devanagari2 Internal
list(range(0xF0520, 0xF0580)), # 37: Codestyle ASCII
list(range(0xFB00, 0xFB18)), # 38: Alphabetic Presentation
list(range(0x1B000, 0x1B170)), # 39: Hentaigana
list(range(0x2400, 0x2440)), # 40: Control Pictures
list(range(0x1FB00, 0x1FC00)), # 41: Legacy Computing
list(range(0xA640, 0xA6A0)), # 42: Cyrillic Ext B
list(range(0x2DE0, 0x2E00)), # 43: Cyrillic Ext A
list(range(0x1C80, 0x1C8F)), # 44: Cyrillic Ext C
list(range(0xAB30, 0xAB70)), # 45: Latin Ext E
list(range(0x10780, 0x107C0)), # 46: Latin Ext F
list(range(0x1DF00, 0x1E000)), # 47: Latin Ext G
list(range(0x1680, 0x16A0)), # 48: Ogham
list(range(0x2C80, 0x2D00)), # 49: Coptic
]
CODE_RANGE_HANGUL_COMPAT = range(0x3130, 0x3190)
ALT_CHARSET_CODEPOINT_OFFSETS = [
0,
0xF0000 - 0x400, # Bulgarian
0xF0060 - 0x400, # Serbian
0xF0520 - 0x20, # Codestyle
]
ALT_CHARSET_CODEPOINT_DOMAINS = [
range(0, 0x10FFFF + 1),
range(0x400, 0x460),
range(0x400, 0x460),
range(0x20, 0x80),
]
# Unicode spacing characters
NQSP = 0x2000
MQSP = 0x2001
ENSP = 0x2002
EMSP = 0x2003
THREE_PER_EMSP = 0x2004
QUARTER_EMSP = 0x2005
SIX_PER_EMSP = 0x2006
FSP = 0x2007
PSP = 0x2008
THSP = 0x2009
HSP = 0x200A
ZWSP = 0x200B
ZWNJ = 0x200C
ZWJ = 0x200D
SHY = 0xAD
NBSP = 0xA0
OBJ = 0xFFFC
FIXED_BLOCK_1 = 0xFFFD0
MOVABLE_BLOCK_M1 = 0xFFFE0
MOVABLE_BLOCK_1 = 0xFFFF0
CHARSET_OVERRIDE_DEFAULT = 0xFFFC0
CHARSET_OVERRIDE_BG_BG = 0xFFFC1
CHARSET_OVERRIDE_SR_SR = 0xFFFC2
CHARSET_OVERRIDE_CODESTYLE = 0xFFFC3
# Alignment constants
ALIGN_LEFT = 0
ALIGN_RIGHT = 1
ALIGN_CENTRE = 2
ALIGN_BEFORE = 3
# Stack constants
STACK_UP = 0
STACK_DOWN = 1
STACK_BEFORE_N_AFTER = 2
STACK_UP_N_DOWN = 3
STACK_DONT = 4
def is_variable(filename):
return filename.endswith("_variable.tga")
def is_xy_swapped(filename):
return "xyswap" in filename.lower()
def is_extra_wide(filename):
return "extrawide" in filename.lower()
def get_cell_width(sheet_index):
"""Returns the cell pitch in the sprite sheet (includes HGAP_VAR for variable sheets)."""
fn = FILE_LIST[sheet_index]
if is_extra_wide(fn):
return W_WIDEVAR_INIT + HGAP_VAR # 32
if is_variable(fn):
return W_VAR_INIT + HGAP_VAR # 16
if sheet_index == SHEET_UNIHAN:
return W_UNIHAN
if sheet_index == SHEET_HANGUL:
return W_HANGUL_BASE
if sheet_index == SHEET_CUSTOM_SYM:
return SIZE_CUSTOM_SYM
if sheet_index == SHEET_RUNIC:
return W_LATIN_WIDE
return W_VAR_INIT + HGAP_VAR
def get_cell_height(sheet_index):
if sheet_index == SHEET_UNIHAN:
return H_UNIHAN
if sheet_index == SHEET_CUSTOM_SYM:
return SIZE_CUSTOM_SYM
return H
def get_columns(sheet_index):
if sheet_index == SHEET_UNIHAN:
return 256
return 16
# Hangul constants
JUNG_COUNT = 21
JONG_COUNT = 28
# Hangul shape arrays (sorted sets)
JUNGSEONG_I = frozenset([21, 61])
JUNGSEONG_OU = frozenset([9, 13, 14, 18, 34, 35, 39, 45, 51, 53, 54, 64, 73, 80, 83])
JUNGSEONG_OU_COMPLEX = frozenset(
[10, 11, 16] + list(range(22, 34)) + [36, 37, 38] + list(range(41, 45)) +
list(range(46, 51)) + list(range(56, 60)) + [63] + list(range(67, 73)) +
list(range(74, 80)) + list(range(81, 84)) + list(range(85, 92)) + [93, 94]
)
JUNGSEONG_RIGHTIE = frozenset([2, 4, 6, 8, 11, 16, 32, 33, 37, 42, 44, 48, 50, 71, 72, 75, 78, 79, 83, 86, 87, 88, 94])
JUNGSEONG_OEWI = frozenset([12, 15, 17, 40, 52, 55, 89, 90, 91])
JUNGSEONG_EU = frozenset([19, 62, 66])
JUNGSEONG_YI = frozenset([20, 60, 65])
JUNGSEONG_UU = frozenset([14, 15, 16, 17, 18, 27, 30, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 59, 67, 68, 73, 77, 78, 79, 80, 81, 82, 83, 84, 91])
JUNGSEONG_WIDE = frozenset(list(JUNGSEONG_OU) + list(JUNGSEONG_EU))
CHOSEONG_GIYEOKS = frozenset([0, 1, 15, 23, 30, 34, 45, 51, 56, 65, 82, 90, 100, 101, 110, 111, 115])
HANGUL_PEAKS_WITH_EXTRA_WIDTH = frozenset([2, 4, 6, 8, 11, 16, 32, 33, 37, 42, 44, 48, 50, 71, 75, 78, 79, 83, 86, 87, 88, 94])
GIYEOK_REMAPPING = {5: 19, 6: 20, 7: 21, 8: 22, 11: 23, 12: 24}
def is_hangul_choseong(c):
return 0x1100 <= c <= 0x115F or 0xA960 <= c <= 0xA97F
def is_hangul_jungseong(c):
return 0x1160 <= c <= 0x11A7 or 0xD7B0 <= c <= 0xD7C6
def is_hangul_jongseong(c):
return 0x11A8 <= c <= 0x11FF or 0xD7CB <= c <= 0xD7FB
def is_hangul_compat(c):
return 0x3130 <= c <= 0x318F
def to_hangul_choseong_index(c):
if 0x1100 <= c <= 0x115F:
return c - 0x1100
if 0xA960 <= c <= 0xA97F:
return c - 0xA960 + 96
raise ValueError(f"Not a choseong: U+{c:04X}")
def to_hangul_jungseong_index(c):
if 0x1160 <= c <= 0x11A7:
return c - 0x1160
if 0xD7B0 <= c <= 0xD7C6:
return c - 0xD7B0 + 72
return None
def to_hangul_jongseong_index(c):
if 0x11A8 <= c <= 0x11FF:
return c - 0x11A8 + 1
if 0xD7CB <= c <= 0xD7FB:
return c - 0xD7CB + 88 + 1
return None
def get_han_initial_row(i, p, f):
if p in JUNGSEONG_I:
ret = 3
elif p in JUNGSEONG_OEWI:
ret = 11
elif p in JUNGSEONG_OU_COMPLEX:
ret = 7
elif p in JUNGSEONG_OU:
ret = 5
elif p in JUNGSEONG_EU:
ret = 9
elif p in JUNGSEONG_YI:
ret = 13
else:
ret = 1
if f != 0:
ret += 1
if p in JUNGSEONG_UU and i in CHOSEONG_GIYEOKS:
mapped = GIYEOK_REMAPPING.get(ret)
if mapped is None:
raise ValueError(f"Giyeok remapping failed: i={i} p={p} f={f} ret={ret}")
return mapped
return ret
def get_han_medial_row(i, p, f):
return 15 if f == 0 else 16
def get_han_final_row(i, p, f):
return 17 if p not in JUNGSEONG_RIGHTIE else 18
# Kerning constants
KEMING_BIT_MASK = [1 << b for b in [7, 6, 5, 4, 3, 2, 1, 0, 15, 14]]
# Special characters for r+dot kerning
LOWERCASE_RS = frozenset([0x72, 0x155, 0x157, 0x159, 0x211, 0x213, 0x27c, 0x1e59, 0x1e58, 0x1e5f])
DOTS = frozenset([0x2c, 0x2e])
# Devanagari internal encoding
DEVANAGARI_UNICODE_NUQTA_TABLE = [0xF0170, 0xF0171, 0xF0172, 0xF0177, 0xF017C, 0xF017D, 0xF0186, 0xF018A]
def to_deva_internal(c):
if 0x0915 <= c <= 0x0939:
return c - 0x0915 + 0xF0140
if 0x0958 <= c <= 0x095F:
return DEVANAGARI_UNICODE_NUQTA_TABLE[c - 0x0958]
raise ValueError(f"No internal form for U+{c:04X}")
DEVANAGARI_CONSONANTS = frozenset(
list(range(0x0915, 0x093A)) + list(range(0x0958, 0x0960)) +
list(range(0x0978, 0x0980)) + list(range(0xF0140, 0xF0500)) +
list(range(0xF0106, 0xF010A))
)
# Sundanese internal forms
SUNDANESE_ING = 0xF0500
SUNDANESE_ENG = 0xF0501
SUNDANESE_EUNG = 0xF0502
SUNDANESE_IR = 0xF0503
SUNDANESE_ER = 0xF0504
SUNDANESE_EUR = 0xF0505
SUNDANESE_LU = 0xF0506
# Tamil constants
TAMIL_KSSA = 0xF00ED
TAMIL_SHRII = 0xF00EE
TAMIL_I = 0xBBF
TAMIL_LIGATING_CONSONANTS = [
0x0B95, 0x0B99, 0x0B9A, 0x0B9E, 0x0B9F, 0x0BA3, 0x0BA4, 0x0BA8,
0x0BA9, 0x0BAA, 0x0BAE, 0x0BAF, 0x0BB0, 0x0BB1, 0x0BB2, 0x0BB3,
0x0BB4, 0x0BB5,
]
# Devanagari special codepoints
DEVANAGARI_VIRAMA = 0x94D
DEVANAGARI_NUQTA = 0x93C
DEVANAGARI_RA = to_deva_internal(0x930)
DEVANAGARI_YA = to_deva_internal(0x92F)
DEVANAGARI_RRA = to_deva_internal(0x931)
DEVANAGARI_VA = to_deva_internal(0x935)
DEVANAGARI_HA = to_deva_internal(0x939)
DEVANAGARI_U = 0x941
DEVANAGARI_UU = 0x942
DEVANAGARI_I_VOWEL = 0x093F
DEVANAGARI_II_VOWEL = 0x0940
DEVANAGARI_RYA = 0xF0106
DEVANAGARI_HALF_RYA = 0xF0107
DEVANAGARI_OPEN_YA = 0xF0108
DEVANAGARI_OPEN_HALF_YA = 0xF0109
DEVANAGARI_ALT_HALF_SHA = 0xF010F
DEVANAGARI_RA_SUB = 0xF010A # below-base RA (rakaar); transient glyph for blwf/cjct
DEVANAGARI_EYELASH_RA = 0xF010B
DEVANAGARI_RA_SUPER = 0xF010C
DEVANAGARI_RA_SUPER_COMPLEX = 0xF010D
MARWARI_DD = 0x978
MARWARI_LIG_DD_R = 0xF010E
DEVANAGARI_SYLL_RU = 0xF0100
DEVANAGARI_SYLL_RUU = 0xF0101
DEVANAGARI_SYLL_RRU = 0xF0102
DEVANAGARI_SYLL_RRUU = 0xF0103
DEVANAGARI_SYLL_HU = 0xF0104
DEVANAGARI_SYLL_HUU = 0xF0105
# Devanagari ligature codepoints
DEVANAGARI_LIG_K_T = 0xF01BC
DEVANAGARI_LIG_K_SS = 0xF01A1
DEVANAGARI_LIG_J_NY = 0xF01A2
DEVANAGARI_LIG_T_T = 0xF01A3
DEVANAGARI_LIG_N_T = 0xF01A4
DEVANAGARI_LIG_N_N = 0xF01A5
DEVANAGARI_LIG_S_V = 0xF01A6
DEVANAGARI_LIG_SS_P = 0xF01A7
DEVANAGARI_LIG_SH_C = 0xF01A8
DEVANAGARI_LIG_SH_N = 0xF01A9
DEVANAGARI_LIG_SH_V = 0xF01AA
DEVANAGARI_LIG_J_Y = 0xF01AB
DEVANAGARI_LIG_J_J_Y = 0xF01AC
MARWARI_LIG_DD_DD = 0xF01BA
MARWARI_LIG_DD_DDH = 0xF01BB
DEVANAGARI_ANUSVARA_UPPER = 0xF016C
MARWARI_LIG_DD_Y = 0xF016E
MARWARI_HALFLIG_DD_Y = 0xF016F
# Devanagari range sets for feature generation
DEVANAGARI_PRESENTATION_CONSONANTS = range(0xF0140, 0xF0230)
DEVANAGARI_PRESENTATION_CONSONANTS_HALF = range(0xF0230, 0xF0320)
DEVANAGARI_PRESENTATION_CONSONANTS_WITH_RA = range(0xF0320, 0xF0410)
DEVANAGARI_PRESENTATION_CONSONANTS_WITH_RA_HALF = range(0xF0410, 0xF0500)
# Index functions
def _kana_index_y(c):
return 12 if 0x31F0 <= c <= 0x31FF else (c - 0x3040) // 16
def _unihan_index_y(c):
return (c - 0x3400) // 256
def _devanagari_index_y(c):
return ((c - 0x0900) if c < 0xF0000 else (c - 0xF0080)) // 16
def _tamil_index_y(c):
return ((c - 0x0B80) if c < 0xF0000 else (c - 0xF0040)) // 16
def _sundanese_index_y(c):
if c >= 0xF0500:
return (c - 0xF04B0) // 16
if c < 0x1BC0:
return (c - 0x1B80) // 16
return (c - 0x1C80) // 16
def index_x(c):
return c % 16
def unihan_index_x(c):
return (c - 0x3400) % 256
def index_y(sheet_index, c):
"""Y-index (row) for codepoint c in the given sheet."""
return {
SHEET_ASCII_VARW: lambda: c // 16,
SHEET_UNIHAN: lambda: _unihan_index_y(c),
SHEET_EXTA_VARW: lambda: (c - 0x100) // 16,
SHEET_EXTB_VARW: lambda: (c - 0x180) // 16,
SHEET_KANA: lambda: _kana_index_y(c),
SHEET_CJK_PUNCT: lambda: (c - 0x3000) // 16,
SHEET_CYRILIC_VARW: lambda: (c - 0x400) // 16,
SHEET_HALFWIDTH_FULLWIDTH_VARW: lambda: (c - 0xFF00) // 16,
SHEET_UNI_PUNCT_VARW: lambda: (c - 0x2000) // 16,
SHEET_GREEK_VARW: lambda: (c - 0x370) // 16,
SHEET_THAI_VARW: lambda: (c - 0xE00) // 16,
SHEET_CUSTOM_SYM: lambda: (c - 0xE000) // 16,
SHEET_HAYEREN_VARW: lambda: (c - 0x530) // 16,
SHEET_KARTULI_VARW: lambda: (c - 0x10D0) // 16,
SHEET_IPA_VARW: lambda: (c - 0x250) // 16,
SHEET_RUNIC: lambda: (c - 0x16A0) // 16,
SHEET_LATIN_EXT_ADD_VARW: lambda: (c - 0x1E00) // 16,
SHEET_BULGARIAN_VARW: lambda: (c - 0xF0000) // 16,
SHEET_SERBIAN_VARW: lambda: (c - 0xF0060) // 16,
SHEET_TSALAGI_VARW: lambda: (c - 0x13A0) // 16,
SHEET_PHONETIC_EXT_VARW: lambda: (c - 0x1D00) // 16,
SHEET_DEVANAGARI_VARW: lambda: _devanagari_index_y(c),
SHEET_KARTULI_CAPS_VARW: lambda: (c - 0x1C90) // 16,
SHEET_DIACRITICAL_MARKS_VARW: lambda: (c - 0x300) // 16,
SHEET_GREEK_POLY_VARW: lambda: (c - 0x1F00) // 16,
SHEET_EXTC_VARW: lambda: (c - 0x2C60) // 16,
SHEET_EXTD_VARW: lambda: (c - 0xA720) // 16,
SHEET_CURRENCIES_VARW: lambda: (c - 0x20A0) // 16,
SHEET_INTERNAL_VARW: lambda: (c - 0xFFE00) // 16,
SHEET_LETTERLIKE_MATHS_VARW: lambda: (c - 0x2100) // 16,
SHEET_ENCLOSED_ALPHNUM_SUPL_VARW: lambda: (c - 0x1F100) // 16,
SHEET_TAMIL_VARW: lambda: _tamil_index_y(c),
SHEET_BENGALI_VARW: lambda: (c - 0x980) // 16,
SHEET_BRAILLE_VARW: lambda: (c - 0x2800) // 16,
SHEET_SUNDANESE_VARW: lambda: _sundanese_index_y(c),
SHEET_DEVANAGARI2_INTERNAL_VARW: lambda: (c - 0xF0110) // 16,
SHEET_CODESTYLE_ASCII_VARW: lambda: (c - 0xF0520) // 16,
SHEET_ALPHABETIC_PRESENTATION_FORMS: lambda: (c - 0xFB00) // 16,
SHEET_HENTAIGANA_VARW: lambda: (c - 0x1B000) // 16,
SHEET_CONTROL_PICTURES_VARW: lambda: (c - 0x2400) // 16,
SHEET_LEGACY_COMPUTING_VARW: lambda: (c - 0x1FB00) // 16,
SHEET_CYRILIC_EXTB_VARW: lambda: (c - 0xA640) // 16,
SHEET_CYRILIC_EXTA_VARW: lambda: (c - 0x2DE0) // 16,
SHEET_CYRILIC_EXTC_VARW: lambda: (c - 0x1C80) // 16,
SHEET_LATIN_EXTE_VARW: lambda: (c - 0xAB30) // 16,
SHEET_LATIN_EXTF_VARW: lambda: (c - 0x10780) // 16,
SHEET_LATIN_EXTG_VARW: lambda: (c - 0x1DF00) // 16,
SHEET_OGHAM_VARW: lambda: (c - 0x1680) // 16,
SHEET_COPTIC_VARW: lambda: (c - 0x2C80) // 16,
SHEET_HANGUL: lambda: 0,
}.get(sheet_index, lambda: c // 16)()

90
OTFbuild/tga_reader.py Normal file
View File

@@ -0,0 +1,90 @@
"""
TGA reader for uncompressed true-colour images (Type 2).
Stores pixels as RGBA8888: (R<<24 | G<<16 | B<<8 | A).
Matches the convention in TerrarumSansBitmap.kt where .and(255) checks
the alpha channel (lowest byte).
"""
import struct
from typing import List
class TgaImage:
__slots__ = ('width', 'height', 'pixels')
def __init__(self, width: int, height: int, pixels: List[int]):
self.width = width
self.height = height
self.pixels = pixels # flat array, row-major
def get_pixel(self, x: int, y: int) -> int:
"""Get pixel at (x, y) as RGBA8888 (R in bits 31-24, A in bits 7-0)."""
if x < 0 or x >= self.width or y < 0 or y >= self.height:
return 0
return self.pixels[y * self.width + x]
def read_tga(path: str) -> TgaImage:
"""Read an uncompressed true-colour TGA file."""
with open(path, 'rb') as f:
data = f.read()
pos = 0
def u8():
nonlocal pos
val = data[pos]
pos += 1
return val
def u16():
nonlocal pos
val = struct.unpack_from('<H', data, pos)[0]
pos += 2
return val
id_length = u8()
colour_map_type = u8()
image_type = u8()
# colour map spec (5 bytes)
u16(); u16(); u8()
# image spec
x_origin = u16()
y_origin = u16()
width = u16()
height = u16()
bits_per_pixel = u8()
descriptor = u8()
top_to_bottom = (descriptor & 0x20) != 0
bytes_per_pixel = bits_per_pixel // 8
# skip ID
pos += id_length
if colour_map_type != 0:
raise ValueError("Colour-mapped TGA not supported")
if image_type != 2:
raise ValueError(f"Only uncompressed true-colour TGA supported (type 2), got type {image_type}")
if bytes_per_pixel not in (3, 4):
raise ValueError(f"Only 24-bit or 32-bit TGA supported, got {bits_per_pixel}-bit")
pixels = [0] * (width * height)
for row in range(height):
y = row if top_to_bottom else (height - 1 - row)
for x in range(width):
b = data[pos]; pos += 1
g = data[pos]; pos += 1
r = data[pos]; pos += 1
a = data[pos] if bytes_per_pixel == 4 else 0xFF
if bytes_per_pixel == 4:
pos += 1
# Store as RGBA8888: R in high byte, A in low byte
pixels[y * width + x] = (r << 24) | (g << 16) | (b << 8) | a
return TgaImage(width, height, pixels)

Binary file not shown.

View File

@@ -2,17 +2,21 @@
![Font sample — necessary information in this image is also provided below.](demo.PNG) ![Font sample — necessary information in this image is also provided below.](demo.PNG)
This font is a bitmap font used in [my game project called Terrarum](https://github.com/minjaesong/Terrarum) (hence the name). The font supports more than 90 % of european languages, as well as Chinese, Japanese, and Korean. This font is a bitmap font used in [my game project called Terrarum](https://github.com/curioustorvald/Terrarum) (hence the name). The font supports more than 90 % of european languages, as well as Chinese, Japanese, and Korean.
The JAR package is meant to be used with LibGDX (extends ```BitmapFont``` class). If you are not using the framework, please refer to the __Font metrics__ section to implement the font metrics correctly on your system. The font is provided in following formats:
* **OTF** — This is the version you want most likely. It's compatible with anything that supports OpenType fonts.
* **WOFF2** — This is OTF font repackaged as a web font. You will want this if you want to use it on a web page.
* **JAR** — This is the version you want if you work with LibGDX. (extends ```BitmapFont``` class)
The issue page is open. If you have some issues to submit, or have a question, please leave it on the page. The issue page is open. If you have some issues to submit, or have a question, please leave it on the page.
#### Notes and Limitations #### Notes and Limitations
- Displaying Bulgarian/Serbian variants of Cyrillic requires special Control Characters. (`GameFontBase.charsetOverrideBulgarian` -- U+FFFC1; `GameFontBase.charsetOverrideSerbian` -- U+FFFC2) - JAR version comes with its own shaping and typesetting engine, texture caching, and self-contained assets. It is NOT compatible with `GlyphLayout`.
- (JAR only) Displaying Bulgarian/Serbian variants of Cyrillic requires special Control Characters. (`GameFontBase.charsetOverrideBulgarian` -- U+FFFC1; `GameFontBase.charsetOverrideSerbian` -- U+FFFC2)
- All Han characters are in Mainland Chinese variant. There is no plan to support the other variants unless there is someone willing to do the drawing of the characters - All Han characters are in Mainland Chinese variant. There is no plan to support the other variants unless there is someone willing to do the drawing of the characters
- Only the Devanagari and Tamil has full (as much as I can) ligature support for Indic scripts -- Bengali script does not have any ligature support - Only the Devanagari and Tamil has full (as much as I can) ligature support for Indic scripts -- Bengali script does not have any ligature support
- Slick2d versions are now unsupported. I couldn't extend myself to work on both versions, but I'm still welcome to merge your pull requests.
### Design Goals ### Design Goals
@@ -23,13 +27,11 @@ The issue page is open. If you have some issues to submit, or have a question, p
## Download ## Download
- Go ahead to the [release tab](https://github.com/minjaesong/Terrarum-sans-bitmap/releases), and download the most recent version. It is **not** advised to use the .jar found within the repository, they're experimental builds I use during the development, and may contain bugs like leaking memory. - Go ahead to the [release tab](https://github.com/curioustorvald/Terrarum-sans-bitmap/releases), and download the most recent version. It is **not** advised to use the .jar found within the repository, they're experimental builds I use during the development, and may contain bugs like leaking memory.
## Using on your game ## Using on your LibGDX project
- Firstly, place the .jar to your library path and assets folder to the main directory of the app, then: - Firstly, place the .jar to your library path, then:
### Using on LibGDX
On your code (Kotlin): On your code (Kotlin):
@@ -40,7 +42,7 @@ On your code (Kotlin):
lateinit var fontGame: Font lateinit var fontGame: Font
override fun create() { override fun create() {
fontGame = TerrarumSansBitmap(path_to_assets, ...) fontGame = TerrarumSansBitmap(...)
... ...
} }
@@ -62,7 +64,7 @@ On your code (Java):
Font fontGame; Font fontGame;
@Override void create() { @Override void create() {
fontGame = new TerrarumSansBitmap(path_to_assets, ...); fontGame = new TerrarumSansBitmap(...);
... ...
} }
@@ -91,7 +93,7 @@ U+100000 is used to disable previously-applied color codes (going back to origin
## Contribution guidelines ## Contribution guidelines
Please refer to [CONTRIBUTING.md](https://github.com/minjaesong/Terrarum-sans-bitmap/blob/master/CONTRIBUTING.md) Please refer to [CONTRIBUTING.md](https://github.com/curioustorvald/Terrarum-sans-bitmap/blob/master/CONTRIBUTING.md)
## Acknowledgement ## Acknowledgement

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 320 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 320 KiB

Binary file not shown.

Binary file not shown.

BIN
demo.PNG

Binary file not shown.

Before

Width:  |  Height:  |  Size: 167 KiB

After

Width:  |  Height:  |  Size: 180 KiB

View File

@@ -25,6 +25,8 @@ How multilingual? Real multilingual!
􏻬আমি কাঁচ খেতে পারি, তাতে আমার কোনো ক্ষতি হয় না। 􀀀 􏻬আমি কাঁচ খেতে পারি, তাতে আমার কোনো ক্ষতি হয় না। 􀀀
􏻬󿿁Под южно дърво, цъфтящо в синьо, бягаше малко пухкаво зайче󿿀􀀀 􏻬󿿁Под южно дърво, цъфтящо в синьо, бягаше малко пухкаво зайче󿿀􀀀
􏻬ᎠᏍᎦᏯᎡᎦᎢᎾᎨᎢᎣᏍᏓᎤᎩᏍᏗᎥᎴᏓᎯᎲᎢᏔᎵᏕᎦᏟᏗᏖᎸᎳᏗᏗᎧᎵᎢᏘᎴᎩ ᏙᏱᏗᏜᏫᏗᏣᏚᎦᏫᏛᏄᏓᎦᏝᏃᎠᎾᏗᎭᏞᎦᎯᎦᏘᏓᏠᎨᏏᏕᏡᎬᏢᏓᏥᏩᏝᎡᎢᎪᎢ ᎠᎦᏂᏗᎮᎢᎫᎩᎬᏩᎴᎢᎠᏆᏅᏛᎫᏊᎾᎥᎠᏁᏙᎲᏐᏈᎵᎤᎩᎸᏓᏭᎷᏤᎢᏏᏉᏯᏌᏊ ᎤᏂᏋᎢᏡᎬᎢᎰᏩᎬᏤᎵᏍᏗᏱᎩᎱᎱᎤᎩᎴᎢᏦᎢᎠᏂᏧᏣᏨᎦᏥᎪᎥᏌᏊᎤᎶᏒᎢᎢᏡᎬᎢ ᎹᎦᎺᎵᏥᎻᎼᏏᎽᏗᏩᏂᎦᏘᎾᎿᎠᏁᎬᎢᏅᎩᎾᏂᎡᎢᏌᎶᎵᏎᎷᎠᏑᏍᏗᏪᎩ ᎠᎴ ᏬᏗᏲᏭᎾᏓᏍᏓᏴᏁᎢᎤᎦᏅᏮᏰᎵᏳᏂᎨᎢ􀀀 􏻬ᎠᏍᎦᏯᎡᎦᎢᎾᎨᎢᎣᏍᏓᎤᎩᏍᏗᎥᎴᏓᎯᎲᎢᏔᎵᏕᎦᏟᏗᏖᎸᎳᏗᏗᎧᎵᎢᏘᎴᎩ ᏙᏱᏗᏜᏫᏗᏣᏚᎦᏫᏛᏄᏓᎦᏝᏃᎠᎾᏗᎭᏞᎦᎯᎦᏘᏓᏠᎨᏏᏕᏡᎬᏢᏓᏥᏩᏝᎡᎢᎪᎢ ᎠᎦᏂᏗᎮᎢᎫᎩᎬᏩᎴᎢᎠᏆᏅᏛᎫᏊᎾᎥᎠᏁᏙᎲᏐᏈᎵᎤᎩᎸᏓᏭᎷᏤᎢᏏᏉᏯᏌᏊ ᎤᏂᏋᎢᏡᎬᎢᎰᏩᎬᏤᎵᏍᏗᏱᎩᎱᎱᎤᎩᎴᎢᏦᎢᎠᏂᏧᏣᏨᎦᏥᎪᎥᏌᏊᎤᎶᏒᎢᎢᏡᎬᎢ ᎹᎦᎺᎵᏥᎻᎼᏏᎽᏗᏩᏂᎦᏘᎾᎿᎠᏁᎬᎢᏅᎩᎾᏂᎡᎢᏌᎶᎵᏎᎷᎠᏑᏍᏗᏪᎩ ᎠᎴ ᏬᏗᏲᏭᎾᏓᏍᏓᏴᏁᎢᎤᎦᏅᏮᏰᎵᏳᏂᎨᎢ􀀀
􏻬Ѳеѡфа́нъ и҆ Алеѯі́й, ѕѣлѡ̀ возлюби́вше ѱалти́рь, воспѣ́ша при свѣ́тѣ ѕвѣ́здъ, помазꙋ́юще сщ҃е́нное мѵ́ро; серафими мн̑оꙮ҆читїи̑, ꙗ҆́кѡ ѻ҆́гнь, ѡ҆крꙋжа́хꙋ прⷭ҇то́лъ Бж҃їй, и҆ всѧ̀ землѧ̀ и҆спо́лнисѧ свѣ́та, ꙗ҆́кѡ ѕмі́й попра́нъ є҆́сть􀀀
􏻬ⲡⲓⲝⲉⲛⲟⲥ ⲅⲁⲣ ⲁϥϫⲉⲙ ⲟⲩⲫⲱⲥ ϧⲉⲛ ⲡⲓⲍⲏⲗⲟⲥ ⲛⲧⲉ ϯⲯⲩⲭⲏ· ⲁϥϣⲱⲡⲓ ⲇⲉ ⲕⲁⲧⲁ ⲡⲓⲑⲉⲗⲏⲙⲁ· ⲁϥϭⲓ ⲛϩⲱⲃ ⲛⲓⲃⲉⲛ ⲟⲩⲟϩ ⲁϥϯⲙⲟⲧ􀀀
􏻬Příliš žluťoučký kůň úpěl ďábelské ódy􀀀 􏻬Příliš žluťoučký kůň úpěl ďábelské ódy􀀀
􏻬Quizdeltagerne spiste jordbær med fløde, mens cirkusklovnen Walther spillede på xylofon􀀀 􏻬Quizdeltagerne spiste jordbær med fløde, mens cirkusklovnen Walther spillede på xylofon􀀀
􏻬PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS􀀀 􏻬PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS􀀀
@@ -35,7 +37,7 @@ How multilingual? Real multilingual!
􏻬სწრაფი ყავისფერი მელა გადაახტა ზარმაც ძაღლს ᲘᲜᲢᲔᲚ ᲞᲔᲜᲢᲘᲣᲛᲘ ᲛᲘᲙᲠᲝᲞᲠᲝᲪᲔᲡᲝᲠᲘ􀀀 􏻬სწრაფი ყავისფერი მელა გადაახტა ზარმაც ძაღლს ᲘᲜᲢᲔᲚ ᲞᲔᲜᲢᲘᲣᲛᲘ ᲛᲘᲙᲠᲝᲞᲠᲝᲪᲔᲡᲝᲠᲘ􀀀
􏻬ऋषियों को सताने वाले दुष्ट राक्षसों के राजा रावण का सर्वनाश करने वाले विष्णुवतार भगवान श्रीराम अयोध्या के महाराज दशरथ के􀀀 􏻬ऋषियों को सताने वाले दुष्ट राक्षसों के राजा रावण का सर्वनाश करने वाले विष्णुवतार भगवान श्रीराम अयोध्या के महाराज दशरथ के􀀀
􏻬Kæmi ný öxi hér, ykist þjófum nú bæði víl og ádrepa􀀀 􏻬Kæmi ný öxi hér, ykist þjófum nú bæði víl og ádrepa􀀀
􏻬Ċuaiġ bé ṁórṡáċ le dlúṫspád fíoꝛḟinn trí hata mo ḋea-ṗoꝛcáin ḃig􀀀 􏻬Scríoḃ Fergus ⁊ a ṁáṫaır dán le peann úr􀀀
􏻬あめつちほしそら やまかはみねたに くもきりむろこけ ひといぬうへすゑ ゆわさるおふせよ えの𛀁をなれゐて􀀀 􏻬あめつちほしそら やまかはみねたに くもきりむろこけ ひといぬうへすゑ ゆわさるおふせよ えの𛀁をなれゐて􀀀
􏻬トリナクコヱス ユメサマセ ミヨアケワタル ヒンカシヲ ソライロハエテ オキツヘニ ホフネムレヰヌ モヤノウチ􀀀 􏻬トリナクコヱス ユメサマセ ミヨアケワタル ヒンカシヲ ソライロハエテ オキツヘニ ホフネムレヰヌ モヤノウチ􀀀
􏻬田居に出で 菜摘むわれをぞ 君召すと 求食り追ひゆく 山城の 打酔へる子ら 藻葉干せよ え舟繋けぬ􀀀 􏻬田居に出で 菜摘むわれをぞ 君召すと 求食り追ひゆく 山城の 打酔へる子ら 藻葉干せよ え舟繋けぬ􀀀
@@ -104,29 +106,37 @@ How multilingual? Real multilingual!
􎳌‣ Full support for Archaic Kana/Hentaigana􀀀 􎳌‣ Full support for Archaic Kana/Hentaigana􀀀
􏻬серафими мн̑оꙮ҆читїи̑, ꙗ҆́кѡ ѻ҆́гнь, ѡ҆крꙋжа́хꙋ прⷭ҇то́лъ Бж҃їй, и҆ всѧ̀ землѧ̀ и҆спо́лнисѧ свѣ́та􀀀
􎳌‣ Fan of Church Slavonic? Weve got you!􀀀
􏃯Supported Unicode Blocks:􀀀 􏃯Supported Unicode Blocks:􀀀
Basic Latin Basic Latin
Latin-1 Supplement Latin-1 Supplement
Latin Extended Additional Latin Extended Additional
Latin Extended-A/B/C/D Latin Extended-A/B/C/D/E/F/G
Armenian Armenian
Arrows
Bengali􏿆ᶠⁱ􀀀 Bengali􏿆ᶠⁱ􀀀
Braile Patterns Braille Patterns
Cherokee􏿆􀀀 Cherokee􏿆􀀀
CJK Symbols and Punctuation CJK Symbols and Punctuation
CJK Unified Ideographs􏿆⁶􀀀 CJK Unified Ideographs􏿆⁶􀀀
CJK Unified Ideographs Extension A􏿆¹²·¹􀀀 CJK Unified Ideographs Extension A􏿆¹²·¹􀀀
Combining Diacritical Marks Combining Diacritical Marks
Control Pictures
Coptic
Currency Symbols Currency Symbols
Cyrillic􏿆ᴭ􀀀 Cyrillic
Cyrillic Supplement􏿆ᴭ􀀀 Cyrillic Supplement
Cyrillic Extended-A/B/C
Devanagari Devanagari
Enclosed Alphanumeric Supplement Enclosed Alphanumeric Supplement
General Punctuations General Punctuations
Georgian􏿆ჼ􀀀 Georgian􏿆ჼ􀀀
Georgian Extended Georgian Extended
Greek and Coptic􏿆ᴱ􀀀 Greek and Coptic
Greek Extended Greek Extended
Halfwidth and Fullwidth Forms Halfwidth and Fullwidth Forms
Hangul Compatibility Jamo Hangul Compatibility Jamo
@@ -141,6 +151,8 @@ How multilingual? Real multilingual!
Kana Extended-A Kana Extended-A
Small Kana Extension Small Kana Extension
Letterlike Symbols Letterlike Symbols
Number Forms
Ogham
Phonetic Extensions Phonetic Extensions
Phonetic Extensions Supplement Phonetic Extensions Supplement
Runic Runic
@@ -148,11 +160,12 @@ How multilingual? Real multilingual!
Sundanese Sundanese
Sundanese Supplement Sundanese Supplement
Superscripts and Subscripts Superscripts and Subscripts
Symbols for Legacy Computing
Tamil Tamil
Thai Thai
􏿆􀀀 No support for archæic letters  􏿆ᴱ􀀀 No support for Coptic 􏿆ᶠⁱ􀀀 No support for ligatures
􏿆ᶠⁱ􀀀 No support for ligatures  􏿆ჼ􀀀 Mkhedruli only 􏿆ᴬ􀀀 Uppercase only 􏿆ჼ􀀀 Mkhedruli only
􏿆⁶􀀀 􏿆⁷􀀀 􏿆⁹􀀀 􏿆¹²·¹􀀀 Up to the specified Unicode version 􏿆⁶􀀀 􏿆¹²·¹􀀀 Up to the specified Unicode version
GitHubs issue page is open! You can report any 􏽕errors􀀀, or leave 􏽕suggestions􀀀. You can help this font to be more versatile. (for more languages, more frameworks) 􏽕Clone􀀀 this repo, make changes, and make a 􏽕pull request􀀀! I appreciate any and all supports. GitHubs issue page is open! You can report any 􏽕errors􀀀, or leave 􏽕suggestions􀀀. You can help this font to be more versatile. (for more languages, more frameworks) 􏽕Clone􀀀 this repo, make changes, and make a 􏽕pull request􀀀! I appreciate any and all supports.

612
keming_calculator.html Normal file
View File

@@ -0,0 +1,612 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Keming Machine Tag Calculator</title>
<style>
* { box-sizing: border-box; margin: 0; padding: 0; }
body { font-family: 'Segoe UI', system-ui, sans-serif; background: #f5f5f5; color: #222; padding: 24px; min-height: 100vh; }
h1 { font-size: 1.4em; margin-bottom: 4px; color: #111; }
.subtitle { color: #666; font-size: 0.85em; margin-bottom: 20px; }
.main { display: flex; gap: 24px; flex-wrap: wrap; align-items: flex-start; }
.panel { background: #fff; border-radius: 8px; padding: 20px; border: 1px solid #ddd; }
.panel h2 { font-size: 1em; margin-bottom: 12px; color: #1a5fb4; }
.col-left { display: flex; flex-direction: column; gap: 20px; }
/* Lowheight toggle */
.lowheight-section { min-width: 300px; }
.lowheight-row { display: flex; align-items: center; gap: 12px; }
.lowheight-btn {
width: 140px; height: 36px; border: 2px solid #bbb; border-radius: 4px;
background: #f0f0f0; color: #666; font-weight: bold; font-size: 0.9em;
cursor: pointer; transition: all 0.15s; user-select: none;
}
.lowheight-btn:hover { border-color: #888; background: #e8e8e8; }
.lowheight-btn.active { background: #2a5a8a; border-color: #1a4a7a; color: #fff; }
.lowheight-hint { font-size: 0.8em; color: #888; margin-top: 8px; }
/* Shape grid */
.shape-section { min-width: 300px; }
.grid-wrapper { display: flex; gap: 20px; align-items: flex-start; }
.shape-grid { display: grid; grid-template-columns: auto 56px 10px 56px auto; grid-template-rows: repeat(9, auto); align-items: center; gap: 2px 0; }
.zone-btn {
width: 52px; height: 32px; border: 2px solid #bbb; border-radius: 4px;
background: #f0f0f0; color: #666; font-weight: bold; font-size: 0.9em;
cursor: pointer; transition: all 0.15s; display: flex; align-items: center; justify-content: center;
user-select: none;
}
.zone-btn:hover { border-color: #888; background: #e8e8e8; }
.zone-btn.active { background: #2a5a8a; border-color: #1a4a7a; color: #fff; }
.zone-btn.active.wye { background: #7b3f9e; border-color: #5a2d75; }
.grid-label { color: #888; font-size: 0.75em; text-align: center; padding: 0 4px; white-space: nowrap; }
.grid-label-left { text-align: right; }
.grid-label-right { text-align: left; }
.grid-sep { grid-column: 1 / -1; height: 3px; background: #999; margin: 2px 0; border-radius: 1px; }
.grid-dot { text-align: center; color: #ccc; font-size: 0.7em; }
.grid-spacer { height: 36px; }
/* Y toggle */
.y-toggle { margin-top: 16px; }
.y-toggle label { display: flex; align-items: center; gap: 10px; cursor: pointer; font-size: 0.9em; }
.y-toggle input { display: none; }
.toggle-track {
width: 48px; height: 24px; background: #2a5a8a; border-radius: 12px;
position: relative; transition: background 0.2s;
}
.toggle-track::after {
content: ''; position: absolute; top: 2px; left: 2px;
width: 20px; height: 20px; background: #fff; border-radius: 50%;
transition: transform 0.2s; box-shadow: 0 1px 3px rgba(0,0,0,0.3);
}
.y-toggle input:checked + .toggle-track { background: #7b3f9e; }
.y-toggle input:checked + .toggle-track::after { transform: translateX(24px); }
.toggle-labels { display: flex; gap: 4px; font-size: 0.8em; }
.toggle-labels span { padding: 2px 6px; border-radius: 3px; }
.toggle-labels .active-label { background: #2a5a8a; color: #fff; }
.toggle-labels .active-label.wye { background: #7b3f9e; }
/* Codepoint input */
.cp-section { min-width: 300px; }
.cp-input-row { display: flex; gap: 8px; align-items: center; flex-wrap: wrap; }
.cp-input {
width: 180px; height: 34px; border: 2px solid #bbb; border-radius: 4px;
background: #fafafa; padding: 0 8px; font-family: 'Consolas', 'Fira Code', monospace;
font-size: 0.95em; color: #222; outline: none;
}
.cp-input:focus { border-color: #1a5fb4; }
.cp-input.error { border-color: #c00; background: #fff0f0; }
.cp-formats { font-size: 0.75em; color: #888; margin-top: 6px; line-height: 1.5; }
.cp-formats code { background: #eee; padding: 1px 4px; border-radius: 3px; font-family: 'Consolas', monospace; color: #333; }
.cp-resolved { margin-top: 8px; font-size: 0.85em; color: #444; }
.cp-resolved .cp-char { font-size: 1.3em; }
/* Output */
.output-section { min-width: 280px; }
.pixel-row { display: flex; align-items: center; gap: 12px; margin-bottom: 12px; padding: 10px; background: #f8f8f8; border-radius: 6px; border: 1px solid #e0e0e0; }
.colour-swatch {
width: 48px; height: 48px; border-radius: 6px; border: 2px solid #ccc;
flex-shrink: 0; image-rendering: pixelated;
}
.pixel-info { font-size: 0.85em; line-height: 1.6; }
.pixel-info .hex { font-family: 'Consolas', 'Fira Code', monospace; font-size: 1.1em; color: #111; }
.pixel-info .channels { color: #555; }
.pixel-info .channel-r { color: #c00; }
.pixel-info .channel-g { color: #070; }
.pixel-info .channel-b { color: #00c; }
.pixel-label { font-size: 0.8em; color: #1a5fb4; margin-bottom: 4px; font-weight: 600; }
.pixel-inactive { font-size: 0.85em; color: #999; }
.bit-display { font-family: 'Consolas', 'Fira Code', monospace; font-size: 0.8em; color: #777; margin-top: 2px; }
/* Mask input/display */
.mask-section { margin-top: 16px; padding: 10px; background: #f8f8f8; border-radius: 6px; border: 1px solid #e0e0e0; }
.mask-section .label { font-size: 0.8em; color: #1a5fb4; margin-bottom: 4px; }
.mask-input-row { display: flex; gap: 8px; align-items: center; }
/* Examples */
.examples-section { margin-top: 20px; }
.examples-section h2 { font-size: 1em; margin-bottom: 8px; color: #1a5fb4; }
.example-grid { display: grid; grid-template-columns: repeat(auto-fill, minmax(200px, 1fr)); gap: 4px; }
.example-item {
font-size: 0.8em; padding: 4px 8px; border-radius: 4px;
background: #f0f0f0; cursor: pointer; transition: background 0.15s;
display: flex; justify-content: space-between; align-items: center;
border: 1px solid #e0e0e0;
}
.example-item:hover { background: #e0ecf8; border-color: #b0c8e8; }
.example-item .ex-code { color: #555; }
.example-item .ex-char { font-size: 1.2em; min-width: 24px; text-align: center; }
/* Notes */
.notes { margin-top: 20px; font-size: 0.8em; color: #666; line-height: 1.5; }
.notes code { background: #eee; padding: 1px 5px; border-radius: 3px; font-family: 'Consolas', monospace; color: #333; }
</style>
</head>
<body>
<h1>Keming Machine Tag Calculator</h1>
<p class="subtitle">Calculate pixel colour values for the three Keming Machine tag pixels (K at Y+5, Y+6, Y+7)</p>
<div class="main">
<div class="col-left">
<!-- Pixel 1: Lowheight -->
<div class="panel lowheight-section">
<h2>Pixel 1 &mdash; Low Height (Y+5)</h2>
<div class="lowheight-row">
<button class="lowheight-btn" id="lowheightBtn" onclick="toggleLowheight()">Low Height: OFF</button>
</div>
<p class="lowheight-hint">
Set for lowercase-height characters (a, b, c, d, e, etc.).<br>
Set if above-diacritics should be lowered.
</p>
</div>
<!-- Pixel 2: Shape Grid -->
<div class="panel shape-section">
<h2>Pixel 2 &mdash; Glyph Shape (Y+6)</h2>
<p style="font-size:0.8em; color:#666; margin-bottom:12px;">Click zones to mark which parts of the glyph are occupied.</p>
<div class="grid-wrapper">
<div class="shape-grid" id="shapeGrid">
<!-- Row: A B (top / ascenders) -->
<span class="grid-label grid-label-left">top</span>
<button class="zone-btn" data-zone="A" onclick="toggleZone(this)">A</button>
<span class="grid-dot">&middot;</span>
<button class="zone-btn" data-zone="B" onclick="toggleZone(this)">B</button>
<span class="grid-label grid-label-right">ascender</span>
<!-- Spacer row -->
<span></span><span class="grid-spacer"></span><span></span><span class="grid-spacer"></span><span></span>
<!-- Row: C D -->
<span class="grid-label grid-label-left">mid</span>
<button class="zone-btn" data-zone="C" onclick="toggleZone(this)">C</button>
<span class="grid-dot">&middot;</span>
<button class="zone-btn" data-zone="D" onclick="toggleZone(this)">D</button>
<span class="grid-label grid-label-right">cap hole</span>
<!-- Row: E F -->
<span class="grid-label grid-label-left"></span>
<button class="zone-btn" data-zone="E" onclick="toggleZone(this)">E</button>
<span class="grid-dot">&middot;</span>
<button class="zone-btn" data-zone="F" onclick="toggleZone(this)">F</button>
<span class="grid-label grid-label-right">lc hole</span>
<!-- Row: G H -->
<span class="grid-label grid-label-left">btm</span>
<button class="zone-btn" data-zone="G" onclick="toggleZone(this)">G</button>
<span class="grid-dot">&middot;</span>
<button class="zone-btn" data-zone="H" onclick="toggleZone(this)">H</button>
<span class="grid-label grid-label-right">baseline</span>
<!-- Baseline separator -->
<div class="grid-sep"></div>
<!-- Row: J K (below baseline) -->
<span class="grid-label grid-label-left">desc</span>
<button class="zone-btn" data-zone="J" onclick="toggleZone(this)">J</button>
<span class="grid-dot">&middot;</span>
<button class="zone-btn" data-zone="K" onclick="toggleZone(this)">K</button>
<span class="grid-label grid-label-right">descender</span>
</div>
</div>
<!-- Y (Bar/Wye) toggle -->
<div class="y-toggle">
<label>
<input type="checkbox" id="yToggle" onchange="recalc()">
<span class="toggle-track"></span>
<span class="toggle-labels">
<span id="barLabel" class="active-label">Bar (B-type, 2px kern)</span>
<span id="wyeLabel">Wye (Y-type, 1px kern)</span>
</span>
</label>
<p style="font-size:0.75em; color:#888; margin-top:6px; margin-left:58px;">
Set Wye when top/bottom of glyph tapers to a point (V, Y, A, v, etc.)
</p>
</div>
<!-- Kerning mask input/output -->
<div class="mask-section">
<div class="label">Kerning Mask (24-bit hex colour)</div>
<div class="mask-input-row">
<input type="text" class="cp-input" id="maskInput" placeholder="e.g. #800AFF, 0x800AFF" oninput="updateFromMask()" value="">
</div>
<p class="cp-formats" style="margin-top:4px;">
Accepts: <code>#RRGGBB</code>, <code>0xRRGGBB</code>, or <code>RRGGBB</code>
</p>
<div id="maskBin" class="bit-display" style="margin-top:6px;">00000000 00000000 00000000</div>
</div>
</div>
<!-- Pixel 3: Dot Removal -->
<div class="panel cp-section">
<h2>Pixel 3 &mdash; Dot Removal (Y+7)</h2>
<p style="font-size:0.8em; color:#666; margin-bottom:12px;">Replacement character for diacritics dot removal. All 24 bits encode the codepoint.</p>
<div class="cp-input-row">
<input type="text" class="cp-input" id="cpInput" placeholder="e.g. U+0041, 65, A" oninput="updateCodepoint()">
</div>
<p class="cp-formats">
Accepts: <code>U+0041</code> or <code>0x41</code> (hex), <code>65</code> (decimal), or a literal character <code>A</code>
</p>
<div class="cp-resolved" id="cpResolved"></div>
</div>
</div><!-- col-left -->
<!-- Output -->
<div class="panel output-section">
<h2>Pixel Colour Values</h2>
<div class="pixel-label">Pixel 1: Low Height (Y+5)</div>
<div class="pixel-row">
<canvas id="swatch1" class="colour-swatch" width="48" height="48"></canvas>
<div class="pixel-info">
<div class="hex" id="hex1">&mdash;</div>
<div id="p1desc" class="pixel-inactive">No pixel (not lowheight)</div>
</div>
</div>
<div class="pixel-label" style="margin-top:12px;">Pixel 2: Glyph Shape (Y+6)</div>
<div class="pixel-row">
<canvas id="swatch2" class="colour-swatch" width="48" height="48"></canvas>
<div class="pixel-info">
<div class="hex" id="hex2">#000000</div>
<div class="channels">
R: <span class="channel-r" id="r2">0</span> &nbsp;
G: <span class="channel-g" id="g2">0</span> &nbsp;
B: <span class="channel-b" id="b2">0</span>
</div>
<div class="bit-display" id="bits2">00000000 00000000 00000000</div>
</div>
</div>
<div class="pixel-label" style="margin-top:12px;">Pixel 3: Dot Removal (Y+7)</div>
<div class="pixel-row">
<canvas id="swatch3" class="colour-swatch" width="48" height="48"></canvas>
<div class="pixel-info">
<div class="hex" id="hex3">&mdash;</div>
<div class="channels" id="p3channels" style="display:none">
R: <span class="channel-r" id="r3">0</span> &nbsp;
G: <span class="channel-g" id="g3">0</span> &nbsp;
B: <span class="channel-b" id="b3">0</span>
</div>
<div id="p3desc" class="pixel-inactive">No replacement character set</div>
</div>
</div>
<div class="notes" style="margin-top:16px;">
<strong>Alpha channel:</strong> must be non-zero (1&ndash;254) for the pixel to be read as a tag.
Set alpha to <code>1</code> (or any value &lt; 255 and &gt; 0).<br>
A fully transparent pixel (alpha = 0) means &ldquo;no data&rdquo;.
</div>
</div>
</div>
<!-- Examples -->
<div class="panel examples-section" style="margin-top: 20px;">
<h2>Examples &mdash; Glyph Shape (click to load)</h2>
<div class="example-grid" id="exampleGrid"></div>
</div>
<script>
const ZONES = ['A','B','C','D','E','F','G','H','J','K'];
const state = { A:0, B:0, C:0, D:0, E:0, F:0, G:0, H:0, J:0, K:0 };
let isLowheight = false;
let maskInputActive = false; // prevent recalc from overwriting mask input while user types
// Bit positions within kerning_mask (24-bit RGB):
// Blue byte: A=bit7, B=bit6, C=bit5, D=bit4, E=bit3, F=bit2, G=bit1, H=bit0
// Green byte: J=bit15(=green bit7), K=bit14(=green bit6)
// Red byte: Y=bit23(=red bit7) -- tracked separately as isKernYtype
const BIT_POS = { A:7, B:6, C:5, D:4, E:3, F:2, G:1, H:0, J:15, K:14 };
function toggleLowheight() {
isLowheight = !isLowheight;
const btn = document.getElementById('lowheightBtn');
btn.classList.toggle('active', isLowheight);
btn.textContent = isLowheight ? 'Lowheight: ON' : 'Lowheight: OFF';
recalc();
}
function toggleZone(btn) {
const zone = btn.dataset.zone;
state[zone] = state[zone] ? 0 : 1;
btn.classList.toggle('active', !!state[zone]);
recalc();
}
function recalc() {
const isWye = document.getElementById('yToggle').checked;
// Update button styling for wye mode
document.querySelectorAll('.zone-btn.active').forEach(btn => {
btn.classList.toggle('wye', isWye);
});
// Update toggle labels
const barLabel = document.getElementById('barLabel');
const wyeLabel = document.getElementById('wyeLabel');
barLabel.className = isWye ? '' : 'active-label';
wyeLabel.className = isWye ? 'active-label wye' : '';
// --- Pixel 1: Lowheight ---
if (isLowheight) {
// Any non-zero pixel; use white with alpha=1 for visibility in editors
drawSwatchSolid('swatch1', 255, 255, 255);
document.getElementById('hex1').textContent = '#FFFFFF';
document.getElementById('p1desc').textContent = 'Any pixel with alpha > 0';
document.getElementById('p1desc').className = 'channels';
} else {
drawSwatchEmpty('swatch1');
document.getElementById('hex1').innerHTML = '&mdash;';
document.getElementById('p1desc').textContent = 'No pixel (not lowheight)';
document.getElementById('p1desc').className = 'pixel-inactive';
}
// --- Pixel 2: Shape Data ---
// Red: Y bit in MSB (bit 7)
const r = isWye ? 0x80 : 0x00;
// Green: J in bit 7, K in bit 6
const g = (state.J ? 0x80 : 0) | (state.K ? 0x40 : 0);
// Blue: ABCDEFGH
const b = (state.A ? 0x80 : 0) | (state.B ? 0x40 : 0) |
(state.C ? 0x20 : 0) | (state.D ? 0x10 : 0) |
(state.E ? 0x08 : 0) | (state.F ? 0x04 : 0) |
(state.G ? 0x02 : 0) | (state.H ? 0x01 : 0);
// Full 24-bit mask (same as what code extracts)
const fullMask = (r << 16) | (g << 8) | b;
document.getElementById('hex2').textContent = '#' + hex2(r) + hex2(g) + hex2(b);
document.getElementById('r2').textContent = r;
document.getElementById('g2').textContent = g;
document.getElementById('b2').textContent = b;
document.getElementById('bits2').textContent = bin8(r) + ' ' + bin8(g) + ' ' + bin8(b);
// Update mask input only if the change didn't come from the mask input itself
if (!maskInputActive) {
document.getElementById('maskInput').value = '#' + hex2(r) + hex2(g) + hex2(b);
document.getElementById('maskInput').classList.remove('error');
}
document.getElementById('maskBin').textContent = bin8((fullMask >> 16) & 0xFF) + ' ' + bin8((fullMask >> 8) & 0xFF) + ' ' + bin8(fullMask & 0xFF);
drawSwatchSolid('swatch2', r, g, b);
}
function updateFromMask() {
const input = document.getElementById('maskInput');
const raw = input.value.trim();
if (raw === '') {
input.classList.remove('error');
return;
}
// Parse hex colour: #RRGGBB, 0xRRGGBB, or RRGGBB
let hex = null;
if (/^#([0-9A-Fa-f]{6})$/.test(raw)) {
hex = parseInt(RegExp.$1, 16);
} else if (/^0[xX]([0-9A-Fa-f]{6})$/.test(raw)) {
hex = parseInt(RegExp.$1, 16);
} else if (/^([0-9A-Fa-f]{6})$/.test(raw)) {
hex = parseInt(RegExp.$1, 16);
}
if (hex === null) {
input.classList.add('error');
return;
}
input.classList.remove('error');
const r = (hex >> 16) & 0xFF;
const g = (hex >> 8) & 0xFF;
const b = hex & 0xFF;
// Reverse-map to zone states
state.A = (b & 0x80) ? 1 : 0;
state.B = (b & 0x40) ? 1 : 0;
state.C = (b & 0x20) ? 1 : 0;
state.D = (b & 0x10) ? 1 : 0;
state.E = (b & 0x08) ? 1 : 0;
state.F = (b & 0x04) ? 1 : 0;
state.G = (b & 0x02) ? 1 : 0;
state.H = (b & 0x01) ? 1 : 0;
state.J = (g & 0x80) ? 1 : 0;
state.K = (g & 0x40) ? 1 : 0;
// Reverse-map Y toggle
document.getElementById('yToggle').checked = !!(r & 0x80);
// Update all zone buttons
document.querySelectorAll('.zone-btn').forEach(btn => {
btn.classList.toggle('active', !!state[btn.dataset.zone]);
});
// Recalc without overwriting the mask input
maskInputActive = true;
recalc();
maskInputActive = false;
}
function updateCodepoint() {
const input = document.getElementById('cpInput');
const raw = input.value.trim();
if (raw === '') {
input.classList.remove('error');
drawSwatchEmpty('swatch3');
document.getElementById('hex3').innerHTML = '&mdash;';
document.getElementById('p3channels').style.display = 'none';
document.getElementById('p3desc').textContent = 'No replacement character set';
document.getElementById('p3desc').style.display = '';
document.getElementById('cpResolved').textContent = '';
return;
}
const cp = parseCodepoint(raw);
if (cp === null || cp < 0 || cp > 0xFFFFFF) {
input.classList.add('error');
drawSwatchEmpty('swatch3');
document.getElementById('hex3').innerHTML = '&mdash;';
document.getElementById('p3channels').style.display = 'none';
document.getElementById('p3desc').textContent = cp !== null ? 'Codepoint out of 24-bit range' : 'Invalid input';
document.getElementById('p3desc').style.display = '';
document.getElementById('cpResolved').textContent = '';
return;
}
input.classList.remove('error');
const r3 = (cp >> 16) & 0xFF;
const g3 = (cp >> 8) & 0xFF;
const b3 = cp & 0xFF;
document.getElementById('hex3').textContent = '#' + hex2(r3) + hex2(g3) + hex2(b3);
document.getElementById('r3').textContent = r3;
document.getElementById('g3').textContent = g3;
document.getElementById('b3').textContent = b3;
document.getElementById('p3channels').style.display = '';
document.getElementById('p3desc').style.display = 'none';
drawSwatchSolid('swatch3', r3, g3, b3);
// Show resolved character
let charDisplay = '';
try { charDisplay = String.fromCodePoint(cp); } catch(e) {}
document.getElementById('cpResolved').innerHTML =
'U+' + cp.toString(16).toUpperCase().padStart(4, '0') +
' (decimal ' + cp + ')' +
(charDisplay ? ' &mdash; <span class="cp-char">' + escapeHtml(charDisplay) + '</span>' : '');
}
function parseCodepoint(s) {
// U+XXXX or u+XXXX
if (/^[Uu]\+([0-9A-Fa-f]+)$/.test(s)) {
return parseInt(RegExp.$1, 16);
}
// 0xXXXX
if (/^0[xX]([0-9A-Fa-f]+)$/.test(s)) {
return parseInt(RegExp.$1, 16);
}
// Pure decimal number
if (/^[0-9]+$/.test(s)) {
return parseInt(s, 10);
}
// Literal character (single grapheme — could be a surrogate pair)
const codepoints = [...s];
if (codepoints.length === 1) {
return codepoints[0].codePointAt(0);
}
return null;
}
function escapeHtml(s) {
const d = document.createElement('span');
d.textContent = s;
return d.innerHTML;
}
function drawSwatchSolid(id, r, g, b) {
const canvas = document.getElementById(id);
const ctx = canvas.getContext('2d');
// Chequerboard background
ctx.fillStyle = '#ddd';
ctx.fillRect(0, 0, 48, 48);
ctx.fillStyle = '#fff';
for (let y = 0; y < 48; y += 8) {
for (let x = (y % 16 === 0) ? 8 : 0; x < 48; x += 16) {
ctx.fillRect(x, y, 8, 8);
}
}
// Colour
ctx.fillStyle = `rgb(${r},${g},${b})`;
ctx.fillRect(4, 4, 40, 40);
}
function drawSwatchEmpty(id) {
const canvas = document.getElementById(id);
const ctx = canvas.getContext('2d');
ctx.fillStyle = '#ddd';
ctx.fillRect(0, 0, 48, 48);
ctx.fillStyle = '#fff';
for (let y = 0; y < 48; y += 8) {
for (let x = (y % 16 === 0) ? 8 : 0; x < 48; x += 16) {
ctx.fillRect(x, y, 8, 8);
}
}
// Dash to indicate empty
ctx.fillStyle = '#aaa';
ctx.font = '20px sans-serif';
ctx.textAlign = 'center';
ctx.textBaseline = 'middle';
ctx.fillText('\u2014', 24, 24);
}
function hex2(v) { return v.toString(16).toUpperCase().padStart(2, '0'); }
function bin8(v) { return v.toString(2).padStart(8, '0'); }
// Load a preset (shape grid only)
function loadPreset(zones, wye) {
for (const z of ZONES) state[z] = 0;
for (const z of zones) state[z] = 1;
document.getElementById('yToggle').checked = wye;
document.querySelectorAll('.zone-btn').forEach(btn => {
btn.classList.toggle('active', !!state[btn.dataset.zone]);
});
recalc();
}
// Examples from keming_machine.txt
const EXAMPLES = [
{ zones: 'AB', wye: false, chars: 'T', desc: 'AB(B)' },
{ zones: 'ABCEGH', wye: false, chars: 'C', desc: 'ABCEGH(B)' },
{ zones: 'ABCEFGH', wye: true, chars: 'K', desc: 'ABCEFGH(Y)' },
{ zones: 'ABCDEG', wye: false, chars: 'P', desc: 'ABCDEG' },
{ zones: 'ABCDEFGH', wye: false, chars: 'B,D,O', desc: 'ABCDEFGH' },
{ zones: 'ABCDFH', wye: false, chars: '\u0427', desc: 'ABCDFH' },
{ zones: 'ABCEG', wye: false, chars: '\u0413', desc: 'ABCEG' },
{ zones: 'ABGH', wye: false, chars: '\u13C6', desc: 'ABGH' },
{ zones: 'ACDEG', wye: false, chars: '\u13B0', desc: 'ACDEG' },
{ zones: 'ACDEFGH', wye: false, chars: 'h,\u0184', desc: 'ACDEFGH' },
{ zones: 'ACDFH', wye: false, chars: '\u07C6', desc: 'ACDFH' },
{ zones: 'ACEGH', wye: false, chars: 'L', desc: 'ACEGH' },
{ zones: 'AH', wye: true, chars: '\\', desc: 'AH(Y)' },
{ zones: 'BDEFGH', wye: false, chars: 'J', desc: 'BDEFGH' },
{ zones: 'BDFGH', wye: false, chars: '\u027A', desc: 'BDFGH' },
{ zones: 'BG', wye: true, chars: '/', desc: 'BG(Y)' },
{ zones: 'CD', wye: false, chars: '\u10B5', desc: 'CD' },
{ zones: 'CDEF', wye: true, chars: '\u03A6,v', desc: 'CDEF(Y)' },
{ zones: 'CDEFGH', wye: false, chars: 'a,e', desc: 'CDEFGH' },
{ zones: 'CDEFGHJK', wye: false, chars: 'g', desc: 'CDEFGHJK' },
{ zones: 'CDEFGHK', wye: false, chars: '\u019E', desc: 'CDEFGHK' },
{ zones: 'CDEFGH', wye: true, chars: 'A', desc: 'CDEFGH(Y)' },
{ zones: 'CDEGH', wye: false, chars: 'c', desc: 'CDEGH' },
{ zones: 'AB', wye: true, chars: 'Y', desc: 'AB(Y)' },
{ zones: 'ABCD', wye: true, chars: 'V', desc: 'ABCD(Y)' },
{ zones: 'EFGH', wye: true, chars: '\u028C', desc: 'EFGH(Y)' },
];
function buildExamples() {
const grid = document.getElementById('exampleGrid');
for (const ex of EXAMPLES) {
const div = document.createElement('div');
div.className = 'example-item';
div.innerHTML = `<span class="ex-code">${ex.desc}</span> <span class="ex-char">${ex.chars}</span>`;
div.onclick = () => loadPreset(ex.zones.split(''), ex.wye);
grid.appendChild(div);
}
}
// Init
buildExamples();
drawSwatchEmpty('swatch1');
drawSwatchEmpty('swatch3');
recalc();
</script>
</body>
</html>

View File

@@ -1,7 +1,7 @@
--- Pixel 0 --- Pixel 0
- Lowheight bit - Lowheight bit
- encoding: has pixel - it's low height - encoding: has pixel - it's low height
- used by the diacritics system to quickly look up if the character is low height without parsing the Pixel 1 - bit must be set if above-diacritics should be lowered (e.g. lowercase b, which has 'A' shape bit but considered lowheight)
### Legends ### Legends
# #
@@ -106,3 +106,5 @@ dot removal for diacritics:
- encoding: - encoding:
- <MSB> RRRRRRRR GGGGGGGG BBBBBBBB <LSB> - <MSB> RRRRRRRR GGGGGGGG BBBBBBBB <LSB>
--- Pixel 3
Unused for now.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 75 KiB

After

Width:  |  Height:  |  Size: 130 B

BIN
src/assets/ascii_variable.tga LFS Executable file

Binary file not shown.

BIN
src/assets/coptic_variable.tga LFS Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
src/assets/cyrilic_serbian_variable.tga LFS Executable file

Binary file not shown.

BIN
src/assets/cyrilic_variable.tga LFS Executable file

Binary file not shown.

Binary file not shown.

Some files were not shown because too many files have changed in this diff Show More