fix: midi2taud midi release to taud fadeout

2026-06-15 00:44:05 +09:00 · 2026-06-14 18:16:43 +09:00
parent 5bfc1cca3a
commit ed43e4becc
3 changed files with 65 additions and 12 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -38,6 +38,27 @@ Current topics:
 - `reference_materials/MilkyTracker` — FastTracker 2 compatible tracker
 - `reference_materials/schismtracker` — Open-source re-implementation of ImpulseTracker
 - `reference_materials/pt2-clone` — Open-source re-implementation of ProTracker 2
 - `reference_materials/doom/` — id Software's GPL source release of DOOM
  (linuxdoom-1.10). Reference for the TSVM DOOM port in
  `assets/disk0/home/doom/`; demo-sync-critical tables, fixed-point maths and
  playsim call order must be translated from this source, never from memory.
 - `reference_materials/soundfont/` — SoundFont 2.04 spec (PDF + `pdftotext`
  rendering for citations) for `midi2taud.py`. The `README.md` digests SF2
  *layering* semantics (all matching preset+instrument zones sound at once —
  no "first wins"), a generator/modulator census of the three production banks
  (SGM, Timbres of Heaven, Evanescence2), the spec-vs-files layering table, and
  what implementing layering in Taud needs (no new per-layer params — Ixmp
  already carries them; only multi-fire engine semantics + a layer cap of 4–5).
  Probes: `devtests/sf2_layer_probe.py`, `devtests/sf2_gen_census.py`.
 - `reference_materials/fluidsynth/` — verbatim FluidSynth source, the reference
  SoundFont 2 synthesiser. The audible ground truth for Taud's **SF2 filter
  mode**: the SF2 voice low-pass is an **RBJ biquad** (cutoff in absolute cents
  via `fluid_ct2hz`, Q from cB with FluidSynth's −3.01 dB Butterworth offset,
  `1/√Q` passband gain-norm), NOT the IT all-pole filter. The `README.md`
  digests the cutoff/Q/coefficient maths with file:line citations; ported into
  `AudioAdapter.kt` `refreshVoiceFilter`/`applyVoiceFilter` (`filterSfMode`
  branch) to fix the muffling vs. the old overdamped all-pole port. Upstream's
  own README is preserved as `README.upstream.md`.
 When fetching new references, copy the relevant upstream files verbatim into
 a topic folder, write a `README.md` summarising the relevant maths /
@@ -171,7 +192,13 @@ The Taud playback engine lives in `tsvm_core/src/net/torvald/tsvm/peripheral/Aud
 ### Critical Implementation Notes
-**Re-bind the local `inst` after any mid-tick `triggerNote`.** `applyTrackerTick` binds `var inst = instruments[voice.instrumentId]` once at the top of the per-voice loop. When the note-delay (`S$Dx`) deferred trigger fires mid-tick, `triggerNote` swaps the voice's `instrumentId` — but the rest of that tick (playback-rate recompute at the `computePlaybackRate(inst, finalPitch)` line, `advanceEnvelope`, `advancePfEnvelope`, `advanceAutoVibrato`, and the fadeout / filter-env reads of `inst.*`) keeps using the captured binding. The damage on a **never-triggered voice** (`instrumentId == 0` → stale `inst = instruments[0]`, whose `samplingRate == 0`) is that `playbackRate` is overwritten with `0.0`, freezing the sample at its start for the trigger tick — perceived as "the first delayed note on a fresh channel doesn't fire" (canonical: WHEN.taud cue 0 voice 13 pattern 0x0A row 16, inst `0x11` SD2 on a fresh play). On a warm voice the stale `inst` is a real instrument with non-zero rate, so the note sounds (at the wrong rate for one tick — a sub-perceptual glitch). Re-bind `inst = instruments[voice.instrumentId]` immediately after the note-delay fire block. Any future in-tick trigger paths (currently only S$Dx) must do the same.
+**Re-bind the local `inst` after any mid-tick `triggerNote`.** `applyTrackerTick` binds `var inst = instruments[voice.instrumentId]` once at the top of the per-voice loop. When the note-delay (`S$Dx`) deferred trigger fires mid-tick, `triggerNote` swaps the voice's `instrumentId` — but the rest of that tick (playback-rate recompute at the `computePlaybackRate(inst, finalPitch)` line, `advanceEnvelope`, `advancePitchEnvelope`/`advanceFilterEnvelope`, `advanceAutoVibrato`, and the fadeout / filter-env reads of `inst.*`) keeps using the captured binding. The damage on a **never-triggered voice** (`instrumentId == 0` → stale `inst = instruments[0]`, whose `samplingRate == 0`) is that `playbackRate` is overwritten with `0.0`, freezing the sample at its start for the trigger tick — perceived as "the first delayed note on a fresh channel doesn't fire" (canonical: WHEN.taud cue 0 voice 13 pattern 0x0A row 16, inst `0x11` SD2 on a fresh play). On a warm voice the stale `inst` is a real instrument with non-zero rate, so the note sounds (at the wrong rate for one tick — a sub-perceptual glitch). Re-bind `inst = instruments[voice.instrumentId]` immediately after the note-delay fire block. Any future in-tick trigger paths (currently only S$Dx) must do the same.
 **Per-patch envelopes go through the Voice's ACTIVE-envelope view, never `inst.*` directly.** Since 2026-06-13 an Ixmp patch can carry its own volume / pan / filter / pitch envelopes (+ fadeout / cutoff / resonance) — see terranmon.txt §Ixmp, variable-length patches. `applyActiveSample` → `resolveActiveEnvelopes(voice, inst, patch)` snapshots the effective envelope source onto `voice.active{Vol,Pan,Pitch,Filter}Env{,Loop,Sustain}`, `voice.has{Pitch,Filter}Env`, and `voice.active{FadeoutStep,DefaultCutoff,DefaultResonance}`. The base instrument exposes **two** pf-envelope slots — bytes 19.. (`pfEnv*`) and bytes 197..250 (`pf2Env*`, the mandatory complement) — routed into the pitch/filter roles by each slot's m-bit (LOOP-word bit 7). `advanceEnvelope` (vol+pan), `advancePitchEnvelope`, `advanceFilterEnvelope`, `applyKeyLift`, the per-tick pitch/filter/fadeout application (foreground AND background), and `triggerNote`'s envelope seeds must ALL read the `voice.active*` view, not `inst.*`. `copyVoice` (NNA ghost) must copy the whole active view so ghosts keep their patch's envelopes. There is no single `envPf*`/`envPfIsFilter` field any more — it was split into explicit `envPitch*`/`envFilter*` pairs. Headless coverage: `devtests/ixmp/PatchEnvTest` (per-patch env applied) + `IxmpFileTest /tmp/m_e1m1.taud`.
 **The shared pitch/filter envelope walker (`advancePfRole`) must SKIP zero-duration nodes, not freeze on them.** A node whose `offset` rounds to 0 — sub-4 ms, since `ThreeFiveMinifloat`'s smallest non-zero step is ≈3.9 ms — represents an instant transition; the walk must advance to the next node. The old code `return`ed on `offset == 0.0` without advancing the index, stranding fast-attack envelopes at their first node. The audible damage: SF2 filter mod-envelopes (`midi2taud.py` `_filter_env_block_sf`) routinely have a ~1 ms attack that stores offset 0, so the filter never opened from its base cutoff to its sustain cutoff — Strings/Flute/Guitar (SGM base ~600 Hz, sustain ~6 kHz) and low-base sweep drums played permanently muffled at their floor. The skip loop stops at a sustain/loop boundary (`susEnd`, handled by the dispatch above) or `maxIdx`. This also affects pitch mod-envs and any IT/XM envelope with a zero-tick (vertical-jump) node, all now correct. There is still a one-tick (≈seed) delay before the env opens — inaudible on sustained notes; the seed value is the base node.
 **SoundFont filter mode uses an RBJ biquad, NOT the IT all-pole filter.** `refreshVoiceFilter` has two topologies. The IT/tracker path (`else` branch) is the all-pole 2-pole resonant LPF from `reference_materials/tracker_filter/` (no feedforward zeros) — must stay byte-faithful for tracker playback, do not touch it. The **`filterSfMode` branch ports FluidSynth's voice filter** (`reference_materials/fluidsynth/`, see its `README.md`): cutoff = absolute cents → Hz via `8.176·2^(cents/1200)` clamped to `[5 Hz, 0.45·fs]`; Q from centibels with FluidSynth's **−3.01 dB offset** (so Q=0 cB ⇒ q_lin = 1/√2 Butterworth, no resonance hump); RBJ cookbook low-pass coefficients with the SF2 `1/√Q` passband gain-norm. `applyVoiceFilter` runs the biquad (Direct Form I: `y = b02·(x+x₂) + b1·x₁ − a1·y₁ − a2·y₂`) when `voice.filterIsBiquad`. The old code reused the all-pole filter for SF mode too; it is overdamped and rolled the passband off ~3 dB @ 8 kHz / ~5 dB @ 12 kHz vs FluidSynth → audible muffling on every filtered GM instrument. Per-voice biquad state (`filterBqB02/B1/A1/A2`, input history `filterX1/X2`) must be reset on trigger/retrigger and copied in `copyVoice` (NNA ghost) alongside the output history. The background-voice filter-env path must branch on `filterSfMode` too, else an SF-mode ghost's cents-domain cutoff gets clamped into the IT 0..254 byte range (≈9 Hz → silence).
 ## TVDOS
--- a/midi2taud.py
+++ b/midi2taud.py
@@ -38,8 +38,10 @@ Behaviour (per midi2taud.md):
    is the Volume Fadeout (with NNA Note Fade): on key-off the voice holds at
    the sustain node and fades to silence over the SF2 releaseVolEnv time
    (measured against the 100 dB envelope floor: releaseVolEnv·(1000−sus_cb)/
-    1000 seconds). Per-layer Ixmp patches carry their own fadeout when their
+    1000 seconds, then scaled to FluidSynth's PERCEIVED release length because
-    release differs. The canonical zone's ADSR represents the instrument.
+    the engine's fadeout is linear in amplitude, not dB — see _zone_fadeout).
    Per-layer Ixmp patches carry their own fadeout when their release differs.
    The canonical zone's ADSR represents the instrument.
  * Polyphony rides the engine's New Note Action (matching MIDI semantics):
    every instrument (drum kits included) gets NNA = Note Fade, so a voice
    column is reusable the moment its note releases — the release/fade tail
@@ -1368,20 +1370,36 @@ def _filter_env_block_sf(z: SFZone, base_fc: float, amt: float, peak: int) -> di
    return {'loop': loop, 'sustain': sustain, 'nodes': nodes}
 # The engine's Volume Fadeout is LINEAR IN AMPLITUDE (fadeoutVolume drops 1→0 by
 # fadeStep/1024 per tick — AudioAdapter.kt ~L3679), whereas FluidSynth's release ramps
 # attenuation LINEARLY IN dB (amplitude decays exponentially: −96 dB over releaseVolEnv).
 # Matching the two on "time to the absolute floor" makes the linear fade sound MUCH longer:
 # a linear-amplitude fade is still at −6 dB at 50 % of its length and −20 dB only at 90 %,
 # while FluidSynth is already −96 dB (silent) by then. The perceived release tail ends when
 # FluidSynth has dropped ≈22 dB; for the linear fade to land there at the same wall-clock
 # time it must complete in ≈0.25·releaseVolEnv (see the −18..−24 dB crossing band). This
 # scale brings the fadeout in line with FluidSynth's audible release length.
 _RELEASE_PERCEPTUAL_SCALE = 0.25
 def _zone_fadeout(z: SFZone, bpm0: int, fadeout_override) -> int:
    """Volume Fadeout step encoding the zone's SF2 release segment (gen 38,
    releaseVolEnv). With NNA Note Fade the fadeout IS the release: on key-off the
-    voice holds at the sustain level and fades linearly to silence. The SF2 release
+    voice holds at the sustain level and fades to silence. The SF2 release ramps a
-    ramps a constant 100 dB per `releaseVolEnv` seconds (spec sfspec24.txt:1934-1941
+    constant 100 dB per `releaseVolEnv` seconds (spec sfspec24.txt:1934-1941 — "until
-    — "until 100dB attenuation were reached"), so the time from the sustain level
+    100dB attenuation were reached"), so the time from the sustain level (sus_cb cB of
-    (sus_cb cB of attenuation) down to the 100 dB floor is
+    attenuation) down to the 100 dB floor is releaseVolEnv·(1000−sus_cb)/1000.
-    releaseVolEnv·(1000−sus_cb)/1000. fadeStep makes the fadeout complete in that
+
-    wall-clock time at bpm0: the engine subtracts fadeStep/1024 of unit volume per
+    But the engine's fadeout is linear in AMPLITUDE while FluidSynth's release is linear
-    song tick, and the tick rate is bpm0·2/5 Hz, giving fadeStep = 2560/(fade_sec·bpm0)."""
+    in dB (see [_RELEASE_PERCEPTUAL_SCALE]); matching the floor-reaching time would make
    the audible tail ~4× too long, so fade_sec is scaled to FluidSynth's perceived release.
    fadeStep makes the fadeout complete in fade_sec at bpm0: the engine subtracts
    fadeStep/1024 of unit volume per song tick, and the tick rate is bpm0·2/5 Hz, giving
    fadeStep = 2560/(fade_sec·bpm0)."""
    if fadeout_override is not None:
        return min(0xFFF, max(0, fadeout_override))
    sus_cb   = min(max(0.0, z.env_sustain_cb), 1000.0)
-    fade_sec = max(0.02, z.env_release * (1000.0 - sus_cb) / 1000.0)
+    fade_sec = max(0.02, _RELEASE_PERCEPTUAL_SCALE * z.env_release * (1000.0 - sus_cb) / 1000.0)
    return max(1, min(0xFFF, round(2560.0 / (fade_sec * bpm0))))
--- a/terranmon.txt
+++ b/terranmon.txt
@@ -2799,7 +2799,15 @@ TODO:
        [ ] midi2taud: toggleable option for disabling filter for percussions [default: on]
            - Anything on bank 127 and 128 (usually asso siated with ch 10)
            - GeneralMIDI instruments 113..128
-        [ ] midi2taud: instrument fadeout (release) is significantly longer than Fluidsynth
+        [x] midi2taud: instrument fadeout (release) is significantly longer than Fluidsynth
            * DONE 2026-06-14. _zone_fadeout (midi2taud.py) now scales fade_sec by
              _RELEASE_PERCEPTUAL_SCALE = 0.25, bringing the fadeout in line with FluidSynth's perceived
              release. fadeStep comes out ~4× larger (faster fade). I kept the IT/FT2 engine path untouched
              — it's shared, byte-faithful tracker behaviour and must stay linear-amplitude; the
              compensation belongs on the encoder side. 0.25 targets the ~−22 dB "release ended" point.
              If you find sustained pads now cut a touch short (their long tails are more noticeable),
              nudging it toward 0.30–0.35 lengthens the tail without returning to the old over-long
              behaviour.
        [ ] auto-set optimal-ish Tickspeed and RPB using MIDI Time Signature events and note analysis. Break pattern when Time Signature changes.
            Time Signature