fix: midi2taud midi release to taud fadeout

This commit is contained in:
minjaesong
2026-06-14 18:16:43 +09:00
parent 5bfc1cca3a
commit ed43e4becc
3 changed files with 65 additions and 12 deletions

View File

@@ -38,6 +38,27 @@ Current topics:
- `reference_materials/MilkyTracker` — FastTracker 2 compatible tracker - `reference_materials/MilkyTracker` — FastTracker 2 compatible tracker
- `reference_materials/schismtracker` — Open-source re-implementation of ImpulseTracker - `reference_materials/schismtracker` — Open-source re-implementation of ImpulseTracker
- `reference_materials/pt2-clone` — Open-source re-implementation of ProTracker 2 - `reference_materials/pt2-clone` — Open-source re-implementation of ProTracker 2
- `reference_materials/doom/` — id Software's GPL source release of DOOM
(linuxdoom-1.10). Reference for the TSVM DOOM port in
`assets/disk0/home/doom/`; demo-sync-critical tables, fixed-point maths and
playsim call order must be translated from this source, never from memory.
- `reference_materials/soundfont/` — SoundFont 2.04 spec (PDF + `pdftotext`
rendering for citations) for `midi2taud.py`. The `README.md` digests SF2
*layering* semantics (all matching preset+instrument zones sound at once —
no "first wins"), a generator/modulator census of the three production banks
(SGM, Timbres of Heaven, Evanescence2), the spec-vs-files layering table, and
what implementing layering in Taud needs (no new per-layer params — Ixmp
already carries them; only multi-fire engine semantics + a layer cap of 45).
Probes: `devtests/sf2_layer_probe.py`, `devtests/sf2_gen_census.py`.
- `reference_materials/fluidsynth/` — verbatim FluidSynth source, the reference
SoundFont 2 synthesiser. The audible ground truth for Taud's **SF2 filter
mode**: the SF2 voice low-pass is an **RBJ biquad** (cutoff in absolute cents
via `fluid_ct2hz`, Q from cB with FluidSynth's 3.01 dB Butterworth offset,
`1/√Q` passband gain-norm), NOT the IT all-pole filter. The `README.md`
digests the cutoff/Q/coefficient maths with file:line citations; ported into
`AudioAdapter.kt` `refreshVoiceFilter`/`applyVoiceFilter` (`filterSfMode`
branch) to fix the muffling vs. the old overdamped all-pole port. Upstream's
own README is preserved as `README.upstream.md`.
When fetching new references, copy the relevant upstream files verbatim into When fetching new references, copy the relevant upstream files verbatim into
a topic folder, write a `README.md` summarising the relevant maths / a topic folder, write a `README.md` summarising the relevant maths /
@@ -171,7 +192,13 @@ The Taud playback engine lives in `tsvm_core/src/net/torvald/tsvm/peripheral/Aud
### Critical Implementation Notes ### Critical Implementation Notes
**Re-bind the local `inst` after any mid-tick `triggerNote`.** `applyTrackerTick` binds `var inst = instruments[voice.instrumentId]` once at the top of the per-voice loop. When the note-delay (`S$Dx`) deferred trigger fires mid-tick, `triggerNote` swaps the voice's `instrumentId` — but the rest of that tick (playback-rate recompute at the `computePlaybackRate(inst, finalPitch)` line, `advanceEnvelope`, `advancePfEnvelope`, `advanceAutoVibrato`, and the fadeout / filter-env reads of `inst.*`) keeps using the captured binding. The damage on a **never-triggered voice** (`instrumentId == 0` → stale `inst = instruments[0]`, whose `samplingRate == 0`) is that `playbackRate` is overwritten with `0.0`, freezing the sample at its start for the trigger tick — perceived as "the first delayed note on a fresh channel doesn't fire" (canonical: WHEN.taud cue 0 voice 13 pattern 0x0A row 16, inst `0x11` SD2 on a fresh play). On a warm voice the stale `inst` is a real instrument with non-zero rate, so the note sounds (at the wrong rate for one tick — a sub-perceptual glitch). Re-bind `inst = instruments[voice.instrumentId]` immediately after the note-delay fire block. Any future in-tick trigger paths (currently only S$Dx) must do the same. **Re-bind the local `inst` after any mid-tick `triggerNote`.** `applyTrackerTick` binds `var inst = instruments[voice.instrumentId]` once at the top of the per-voice loop. When the note-delay (`S$Dx`) deferred trigger fires mid-tick, `triggerNote` swaps the voice's `instrumentId` — but the rest of that tick (playback-rate recompute at the `computePlaybackRate(inst, finalPitch)` line, `advanceEnvelope`, `advancePitchEnvelope`/`advanceFilterEnvelope`, `advanceAutoVibrato`, and the fadeout / filter-env reads of `inst.*`) keeps using the captured binding. The damage on a **never-triggered voice** (`instrumentId == 0` → stale `inst = instruments[0]`, whose `samplingRate == 0`) is that `playbackRate` is overwritten with `0.0`, freezing the sample at its start for the trigger tick — perceived as "the first delayed note on a fresh channel doesn't fire" (canonical: WHEN.taud cue 0 voice 13 pattern 0x0A row 16, inst `0x11` SD2 on a fresh play). On a warm voice the stale `inst` is a real instrument with non-zero rate, so the note sounds (at the wrong rate for one tick — a sub-perceptual glitch). Re-bind `inst = instruments[voice.instrumentId]` immediately after the note-delay fire block. Any future in-tick trigger paths (currently only S$Dx) must do the same.
**Per-patch envelopes go through the Voice's ACTIVE-envelope view, never `inst.*` directly.** Since 2026-06-13 an Ixmp patch can carry its own volume / pan / filter / pitch envelopes (+ fadeout / cutoff / resonance) — see terranmon.txt §Ixmp, variable-length patches. `applyActiveSample``resolveActiveEnvelopes(voice, inst, patch)` snapshots the effective envelope source onto `voice.active{Vol,Pan,Pitch,Filter}Env{,Loop,Sustain}`, `voice.has{Pitch,Filter}Env`, and `voice.active{FadeoutStep,DefaultCutoff,DefaultResonance}`. The base instrument exposes **two** pf-envelope slots — bytes 19.. (`pfEnv*`) and bytes 197..250 (`pf2Env*`, the mandatory complement) — routed into the pitch/filter roles by each slot's m-bit (LOOP-word bit 7). `advanceEnvelope` (vol+pan), `advancePitchEnvelope`, `advanceFilterEnvelope`, `applyKeyLift`, the per-tick pitch/filter/fadeout application (foreground AND background), and `triggerNote`'s envelope seeds must ALL read the `voice.active*` view, not `inst.*`. `copyVoice` (NNA ghost) must copy the whole active view so ghosts keep their patch's envelopes. There is no single `envPf*`/`envPfIsFilter` field any more — it was split into explicit `envPitch*`/`envFilter*` pairs. Headless coverage: `devtests/ixmp/PatchEnvTest` (per-patch env applied) + `IxmpFileTest /tmp/m_e1m1.taud`.
**The shared pitch/filter envelope walker (`advancePfRole`) must SKIP zero-duration nodes, not freeze on them.** A node whose `offset` rounds to 0 — sub-4 ms, since `ThreeFiveMinifloat`'s smallest non-zero step is ≈3.9 ms — represents an instant transition; the walk must advance to the next node. The old code `return`ed on `offset == 0.0` without advancing the index, stranding fast-attack envelopes at their first node. The audible damage: SF2 filter mod-envelopes (`midi2taud.py` `_filter_env_block_sf`) routinely have a ~1 ms attack that stores offset 0, so the filter never opened from its base cutoff to its sustain cutoff — Strings/Flute/Guitar (SGM base ~600 Hz, sustain ~6 kHz) and low-base sweep drums played permanently muffled at their floor. The skip loop stops at a sustain/loop boundary (`susEnd`, handled by the dispatch above) or `maxIdx`. This also affects pitch mod-envs and any IT/XM envelope with a zero-tick (vertical-jump) node, all now correct. There is still a one-tick (≈seed) delay before the env opens — inaudible on sustained notes; the seed value is the base node.
**SoundFont filter mode uses an RBJ biquad, NOT the IT all-pole filter.** `refreshVoiceFilter` has two topologies. The IT/tracker path (`else` branch) is the all-pole 2-pole resonant LPF from `reference_materials/tracker_filter/` (no feedforward zeros) — must stay byte-faithful for tracker playback, do not touch it. The **`filterSfMode` branch ports FluidSynth's voice filter** (`reference_materials/fluidsynth/`, see its `README.md`): cutoff = absolute cents → Hz via `8.176·2^(cents/1200)` clamped to `[5 Hz, 0.45·fs]`; Q from centibels with FluidSynth's **3.01 dB offset** (so Q=0 cB ⇒ q_lin = 1/√2 Butterworth, no resonance hump); RBJ cookbook low-pass coefficients with the SF2 `1/√Q` passband gain-norm. `applyVoiceFilter` runs the biquad (Direct Form I: `y = b02·(x+x₂) + b1·x₁ a1·y₁ a2·y₂`) when `voice.filterIsBiquad`. The old code reused the all-pole filter for SF mode too; it is overdamped and rolled the passband off ~3 dB @ 8 kHz / ~5 dB @ 12 kHz vs FluidSynth → audible muffling on every filtered GM instrument. Per-voice biquad state (`filterBqB02/B1/A1/A2`, input history `filterX1/X2`) must be reset on trigger/retrigger and copied in `copyVoice` (NNA ghost) alongside the output history. The background-voice filter-env path must branch on `filterSfMode` too, else an SF-mode ghost's cents-domain cutoff gets clamped into the IT 0..254 byte range (≈9 Hz → silence).
## TVDOS ## TVDOS

View File

@@ -38,8 +38,10 @@ Behaviour (per midi2taud.md):
is the Volume Fadeout (with NNA Note Fade): on key-off the voice holds at is the Volume Fadeout (with NNA Note Fade): on key-off the voice holds at
the sustain node and fades to silence over the SF2 releaseVolEnv time the sustain node and fades to silence over the SF2 releaseVolEnv time
(measured against the 100 dB envelope floor: releaseVolEnv·(1000sus_cb)/ (measured against the 100 dB envelope floor: releaseVolEnv·(1000sus_cb)/
1000 seconds). Per-layer Ixmp patches carry their own fadeout when their 1000 seconds, then scaled to FluidSynth's PERCEIVED release length because
release differs. The canonical zone's ADSR represents the instrument. the engine's fadeout is linear in amplitude, not dB — see _zone_fadeout).
Per-layer Ixmp patches carry their own fadeout when their release differs.
The canonical zone's ADSR represents the instrument.
* Polyphony rides the engine's New Note Action (matching MIDI semantics): * Polyphony rides the engine's New Note Action (matching MIDI semantics):
every instrument (drum kits included) gets NNA = Note Fade, so a voice every instrument (drum kits included) gets NNA = Note Fade, so a voice
column is reusable the moment its note releases — the release/fade tail column is reusable the moment its note releases — the release/fade tail
@@ -1368,20 +1370,36 @@ def _filter_env_block_sf(z: SFZone, base_fc: float, amt: float, peak: int) -> di
return {'loop': loop, 'sustain': sustain, 'nodes': nodes} return {'loop': loop, 'sustain': sustain, 'nodes': nodes}
# The engine's Volume Fadeout is LINEAR IN AMPLITUDE (fadeoutVolume drops 1→0 by
# fadeStep/1024 per tick — AudioAdapter.kt ~L3679), whereas FluidSynth's release ramps
# attenuation LINEARLY IN dB (amplitude decays exponentially: 96 dB over releaseVolEnv).
# Matching the two on "time to the absolute floor" makes the linear fade sound MUCH longer:
# a linear-amplitude fade is still at 6 dB at 50 % of its length and 20 dB only at 90 %,
# while FluidSynth is already 96 dB (silent) by then. The perceived release tail ends when
# FluidSynth has dropped ≈22 dB; for the linear fade to land there at the same wall-clock
# time it must complete in ≈0.25·releaseVolEnv (see the 18..24 dB crossing band). This
# scale brings the fadeout in line with FluidSynth's audible release length.
_RELEASE_PERCEPTUAL_SCALE = 0.25
def _zone_fadeout(z: SFZone, bpm0: int, fadeout_override) -> int: def _zone_fadeout(z: SFZone, bpm0: int, fadeout_override) -> int:
"""Volume Fadeout step encoding the zone's SF2 release segment (gen 38, """Volume Fadeout step encoding the zone's SF2 release segment (gen 38,
releaseVolEnv). With NNA Note Fade the fadeout IS the release: on key-off the releaseVolEnv). With NNA Note Fade the fadeout IS the release: on key-off the
voice holds at the sustain level and fades linearly to silence. The SF2 release voice holds at the sustain level and fades to silence. The SF2 release ramps a
ramps a constant 100 dB per `releaseVolEnv` seconds (spec sfspec24.txt:1934-1941 constant 100 dB per `releaseVolEnv` seconds (spec sfspec24.txt:1934-1941"until
"until 100dB attenuation were reached"), so the time from the sustain level 100dB attenuation were reached"), so the time from the sustain level (sus_cb cB of
(sus_cb cB of attenuation) down to the 100 dB floor is attenuation) down to the 100 dB floor is releaseVolEnv·(1000sus_cb)/1000.
releaseVolEnv·(1000sus_cb)/1000. fadeStep makes the fadeout complete in that
wall-clock time at bpm0: the engine subtracts fadeStep/1024 of unit volume per But the engine's fadeout is linear in AMPLITUDE while FluidSynth's release is linear
song tick, and the tick rate is bpm0·2/5 Hz, giving fadeStep = 2560/(fade_sec·bpm0).""" in dB (see [_RELEASE_PERCEPTUAL_SCALE]); matching the floor-reaching time would make
the audible tail ~4× too long, so fade_sec is scaled to FluidSynth's perceived release.
fadeStep makes the fadeout complete in fade_sec at bpm0: the engine subtracts
fadeStep/1024 of unit volume per song tick, and the tick rate is bpm0·2/5 Hz, giving
fadeStep = 2560/(fade_sec·bpm0)."""
if fadeout_override is not None: if fadeout_override is not None:
return min(0xFFF, max(0, fadeout_override)) return min(0xFFF, max(0, fadeout_override))
sus_cb = min(max(0.0, z.env_sustain_cb), 1000.0) sus_cb = min(max(0.0, z.env_sustain_cb), 1000.0)
fade_sec = max(0.02, z.env_release * (1000.0 - sus_cb) / 1000.0) fade_sec = max(0.02, _RELEASE_PERCEPTUAL_SCALE * z.env_release * (1000.0 - sus_cb) / 1000.0)
return max(1, min(0xFFF, round(2560.0 / (fade_sec * bpm0)))) return max(1, min(0xFFF, round(2560.0 / (fade_sec * bpm0))))

View File

@@ -2799,7 +2799,15 @@ TODO:
[ ] midi2taud: toggleable option for disabling filter for percussions [default: on] [ ] midi2taud: toggleable option for disabling filter for percussions [default: on]
- Anything on bank 127 and 128 (usually asso siated with ch 10) - Anything on bank 127 and 128 (usually asso siated with ch 10)
- GeneralMIDI instruments 113..128 - GeneralMIDI instruments 113..128
[ ] midi2taud: instrument fadeout (release) is significantly longer than Fluidsynth [x] midi2taud: instrument fadeout (release) is significantly longer than Fluidsynth
* DONE 2026-06-14. _zone_fadeout (midi2taud.py) now scales fade_sec by
_RELEASE_PERCEPTUAL_SCALE = 0.25, bringing the fadeout in line with FluidSynth's perceived
release. fadeStep comes out ~4× larger (faster fade). I kept the IT/FT2 engine path untouched
— it's shared, byte-faithful tracker behaviour and must stay linear-amplitude; the
compensation belongs on the encoder side. 0.25 targets the ~22 dB "release ended" point.
If you find sustained pads now cut a touch short (their long tails are more noticeable),
nudging it toward 0.300.35 lengthens the tail without returning to the old over-long
behaviour.
[ ] auto-set optimal-ish Tickspeed and RPB using MIDI Time Signature events and note analysis. Break pattern when Time Signature changes. [ ] auto-set optimal-ish Tickspeed and RPB using MIDI Time Signature events and note analysis. Break pattern when Time Signature changes.
Time Signature Time Signature