Files
tsvm/terranmon.txt
2026-05-08 17:27:27 +09:00

2850 lines
119 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
1 byte = 2 pixels
560x448@4bpp = 125 440 bytes
560x448@8bpp = 250 880 bytes
-> 262144 bytes (256 kB)
[USER AREA | HW AREA]
Number of pheripherals = 8, of which the computer itself is considered as
a peripheral.
HW AREA = [Peripherals | MMIO | INTVEC]
User area: 8 MB, hardware area: 8 MB
8192 kB
User Space
1024 kB
Peripheral #7
1024 kB
Peripheral #6
...
1024 kB (where Peripheral #0 would be)
MMIO and Interrupt Vectors
128 kB
MMIO for Peri #7
128 kB
MMIO for Peri #6
...
128 kB (where Peripheral #0 would be)
MMIO for the computer
Certain memory mapper may allow extra 4 MB of User Space in exchange for the Peripheral slot #4 through #7.
--------------------------------------------------------------------------------
IO Device
Endianness: little
Note: Always takes up the peripheral slot of zero
Latching: latching is used to "lock" the fluctuating values when you attempt to read them so you would get
reliable values when you try to read them, especially the multibyte values where another byte would
change after you read one byte, e.g. System uptime in nanoseconds
MMIO
0..31 RO: Raw Keyboard Buffer read. Won't shift the key buffer
32..33 RO: Mouse X pos
34..35 RO: Mouse Y pos
36 RO: Mouse down? (1 for TRUE, 0 for FALSE)
37 RW: Read/Write single key input. Key buffer will be shifted. Manual writing is
usually unnecessary as such action must be automatically managed via LibGDX
input processing.
Stores ASCII code representing the character, plus:
(1..26: Ctrl+[alph])
3 : Ctrl+C
4 : Ctrl+D
8 : Backspace
(13: Return)
19: Up arrow
20: Down arrow
21: Left arrow
22: Right arrow
38 RW: Request keyboard input be read (TTY Function). Write nonzero value to enable, write zero to
close it. Keyboard buffer will be cleared whenever request is received, so
MAKE SURE YOU REQUEST THE KEY INPUT ONLY ONCE!
39 WO: Latch Key/Mouse Input (Raw Input function). Write nonzero value to latch.
Stores LibGDX Key code
40..47 RO: Key Press buffer
stores keys that are held down. Can accomodate 8-key rollover (in keyboard geeks' terms)
0x0 is written for the empty area; numbers are always sorted
48..51 RO: System flags
48: 0b rq00 000t
t: STOP button (should raise SIGTERM)
r: RESET button (hypervisor should reset the system)
q: SysRq button (hypervisor should respond to it)
49: set to 1 if a key has pushed into key buffer (or, if the system has a key press to pull) via MMIO 38; othewise 0
64..67 RO: User area memory size in bytes
68 WO: Counter latch
0b 0000 00ba
a: System uptime
b: RTC
72..79 RO: System uptime in nanoseconds
80..87 RO: RTC in microseconds
88 RW: Rom mapping
write 0xFF to NOT map any rom
write 0x00 to map BIOS
write 0x01 to map first "extra ROM"
89 RW: BMS flags
0b P000 b0ca
a: 1 if charging (accepting power from the AC adapter)
c: 1 if battery is detected
b: 1 if the device is battery-operated
P: 1 if CPU halted (so that the "smart" power supply can shut itself down)
note: only the high nybbles are writable!
if the device is battery-operated but currently running off of an AC adapter and there is no battery inserted,
the flag would be 0000 1001
90 RO: BMS calculated battery percentage where 255 is 100%
91 RO: BMS battery voltage multiplied by 10 (127 = "12.7 V")
92 RW: Memory Mapping
0: 8 MB Core, 8 MB Hardware-reserved, 7 card slots
1: 12 MB Core, 4 MB Hardware-reserved, 3 card slots (HW addr 131072..1048575 cannot be reclaimed though)
1024..2047 RW: Reserved for integrated peripherals (e.g. built-in status display)
2048..4075 RW: Used by the hypervisor
2048..3071 RW: Interrupt vectors (0-255), 32-bit address. Used regardless of the existence of the hypervisor.
If hypervisor is installed, the interrupt calls are handled using the hypervisor
If no hypervisors are installed, the interrupt call is performed by the "hardware"
Interrupt Vector Table:
0x00 - Initial Stack Pointer (currently unused)
0x01 - Reset
0x02 - NMI
0x03 - Out of Memory
0x0C - IRQ_COM1
0x0D - IRQ_COM2
0x0E - IRQ_COM3
0x0F - IRQ_COM4
0x10 - Core Memory Access Violation
0x11 - Card 1 Access Violation
0x12 - Card 2 Access Violation
0x13 - Card 3 Access Violation
0x14 - Card 4 Access Violation
0x15 - Card 5 Access Violation
0x16 - Card 6 Access Violation
0x17 - Card 7 Access Violation
0x20 - IRQ_Core
0x21 - IRQ_CARD1
0x22 - IRQ_CARD2
0x23 - IRQ_CARD3
0x24 - IRQ_CARD4
0x25 - IRQ_CARD5
0x26 - IRQ_CARD6
0x27 - IRQ_CARD7
3072..3075 RW: Status flags
4076..4079 RW: 8-bit status code for the port
4080..4083 RO: 8-bit status code for connected device
4084..4091 RO: Block transfer status
0b nnnnnnnn a00z mmmm
n-read: size of the block from the other device, LSB (4096-full block size is zero)
m-read: size of the block from the other device, MSB (4096-full block size is zero)
a-read: if the other device hasNext (doYouHaveNext), false if device not present
z-read: set if the size is actually 0 instead of 4096 (overrides n and m parameters)
n-write: size of the block I'm sending, LSB (4096-full block size is zero)
m-write: size of the block I'm sending, MSB (4096-full block size is zero)
a-write: if there's more to send (hasNext)
z-write: set if the size is actually 0 instead of 4096 (overrides n and m parameters)
4092..4095 RW: Block transfer control for Port 1 through 4
0b 00ms abcd
m-readonly: device in master setup
s-readonly: device in slave setup
a: 1 for send, 0 for receive
b-write: 1 to start sending if a-bit is set; if a-bit is unset, make other device to start sending
b-read: if this bit is set, you're currently receiving something (aka busy)
c-write: I'm ready to receive
c-read: Are you ready to receive?
d-read: Are you there? (if the other device's recipient is myself)
NOTE: not ready AND not busy (bits b and d set when read) means the device is not connected to the port
4096..8191 RW: Buffer for block transfer lane #1
8192..12287 RW: Buffer for block transfer lane #2
12288..16383 RW: Buffer for block transfer lane #3
16384..20479 RW: Buffer for block transfer lane #4
65536..131071 RO: Mapped to ROM
--------------------------------------------------------------------------------
VRAM Bank 0 (256 kB)
Endianness: little
Memory Space
250880 bytes
Framebuffer
3 bytes
Initial background (and the border) colour RGB, 8 bits per channel
1 byte
command (writing to this memory address changes the status)
1: reset palette to default
2: fill framebuffer with given colour (arg1)
3: do '1' then do '2' (with arg1) then do '4' (with arg2)
4: fill framebuffer2 with given colour (arg1)
16: copy Low Font ROM (char 0127) to mapping area
17: copy High Font ROM (char 128255) to mapping area
18: write contents of the font ROM mapping area to the Low Font ROM
19: write contents of the font ROM mapping area to the High Font ROM
20: reset Low Font ROM to default
21: reset High Font ROM to default
12 bytes
argument for "command" (arg1: Byte, arg2: Byte)
write to this address FIRST and then write to "command" to execute the command
1008 bytes
reserved
2046 bytes
unused
2 bytes
Cursor position in: (y*80 + x)
2560 bytes
Text foreground colours
2560 bytes
Text background colours
2560 bytes
Text buffer of 80x32 (7x14 character size, and yes: actual character data is on the bottom)
512 bytes
Palette stored in following pattern: 0b rrrr gggg, 0b bbbb aaaa, ....
Palette number 255 is always full transparent (bits being all zero)
MMIO
0..1 RO
Framebuffer width in pixels
2..3 RO
Framebuffer height in pixels
4 RO
Text mode columns
5 RO
Text mode rows
6 RW
Text-mode attributes
0b 0000 00rc (r: TTY Raw mode, c: Cursor blink)
7 RW
Graphics-mode attributes
0b 0000 rrrr (r: Resolution/colour depth)
8 RO
Last used colour (set by poking at the framebuffer)
9 RW
current TTY foreground colour (useful for print() function)
10 RW
current TTY background colour (useful for print() function)
11 RO
Number of Banks, or VRAM size (1 = 256 kB, max 4)
12 RW
Graphics Mode
0: 560x448, 256 Colours, 1 layer
1: 280x224, 256 Colours, 4 layers
2: 280x224, 4096 Colours, 2 layers
3: 560x448, 256 Colours, 2 layers (if bank 2 is not installed, mode change will not happen)
4: 560x448, 4096 Colours, 1 layer (if bank 2 is not installed, mode change will not happen)
5: 560x448, 15-bit colour, 1 layer (if bank 2 is not installed, mode change will not happen)
8: 560x448, 24-bit colour, 1 layer (if bank 3 and 4 are not installed, mode change will not happen)
4096 is also known as "direct colour mode" (4096 colours * 16 transparency -> 65536 colours)
Two layers are grouped to make a frame, "low layer" contains RG colours and "high layer" has BA colours,
Red and Blue occupies MSBs
13 RW
Layer Arrangement
If 4 layers are used:
Num LO<->HI
0 1234
1 1243
2 1324
3 1342
4 1423
5 1432
6 2134
7 2143
8 2314
9 2341
10 2413
11 2431
12 3124
13 3142
14 3214
15 3241
16 3412
17 3421
18 4123
19 4132
20 4213
21 4231
22 4312
23 4321
If 2 layers are used:
Num LO<->HI
0 12
1 12
2 12
3 12
4 12
5 12
6 12
7 21
8 21
9 21
10 21
11 21
12 12
13 12
14 21
15 21
16 12
17 21
18 12
19 12
20 21
21 21
22 12
23 21
If 1 layer is used, this field will do nothing and always fall back to 0
14..15 RW
framebuffer scroll X
16..17 RW
framebuffer scroll Y
18 RO
Busy flags
1: Codec in-use
2: Draw Instructions being decoded
19 WO
Write non-zero value to initiate the Draw Instruction decoding
20..21 RO
Program Counter for the Draw Instruction decoding
1024..2047 RW
horizontal scroll offset for scanlines
2048..4095 RW
!!NEW!! Font ROM Mapping Area
Format is always 8x16 pixels, 1bpp ROM format (so that it would be YY_CHR-Compatible)
(designer's note: it's still useful to divide the char rom to two halves, lower half being characters ROM and upper half being symbols ROM)
65536..131071 RW
Draw Instructions
Text-mode-font-ROM is immutable and does not belong to VRAM
Even in the text mode framebuffer is still being drawn onto the screen, and the texts are drawn on top of it
--------------------------------------------------------------------------------
TSVM MOV file format
Endianness: Little
\x1F T S V M M O V
[METADATA]
[PACKET 0]
[PACKET 1]
[PACKET 2]
...
where:
METADATA -
uint16 WIDTH
uint16 HEIGHT
uint16 FPS (0: play as fast as can)
uint32 NUMBER OF FRAMES
uint16 UNUSED (fill with 255,0)
uint16 AUDIO QUEUE INFO
when read as little endian:
0b nnnn bbbb bbbb bbbb
[byte 21] [byte 20]
n: size of the queue (number of entries). Allocate at least 1 more entry than the number specified!
b: size of each entry in bytes DIVIDED BY FOUR (all zero = 16384; always 0x240 for MP2 because MP2-VBR is not supported)
n=0 indicates the video audio must be decoded on-the-fly instead of being queued, or has no audio packets
byte[10] RESERVED
Packet Types -
<video>
0,0: 256-Colour frame
1,0: 256-Colour frame with palette data
2,0: 4096-Colour frame (stored as two byte-planes)
4,t: iPF no-alpha indicator (see iPF Type Numbers for details)
5,t: iPF with alpha indicator (see iPF Type Numbers for details)
16,0: Series of JPEGs
18,0: Series of PNGs
20,0: Series of TGAs
21,0: Series of TGA/GZs
<audio>
0,16: Raw PCM Stereo
1,16: Raw PCM Mono
p,17: MP2, 32 kHz (see MP2 Format Details section for p-value)
q,18: ADPCM, 32 kHz (q = 2 * log_2(frameSize) + (1 if mono, 0 if stereo))
<special>
255,255: sync packet (wait until the next frame)
254,255: background colour packet
31,84 : prohibited
Packet Type High Byte (iPF Type Numbers)
0..7: iPF Type 1..8
- MP2 Format Details
Rate | 2ch | 1ch
32 | 0 | 1
48 | 2 | 3
56 | 4 | 5
64 | 6 | 7 (libtwolame does not allow bitrate lower than this on 32 kHz stereo)
80 | 8 | 9
96 | 10 | 11
112 | 12 | 13
128 | 14 | 15
160 | 16 | 17
192 | 18 | 19
224 | 20 | 21
256 | 22 | 23
320 | 24 | 25
384 | 26 | 27
Add 128 to the resulting number if the frame has a padding bit (should not happen on 32kHz sampling rate)
Special value of 255 may indicate some errors
To encode an audio to compliant format, use ffmpeg: ffmpeg -i <your_music> -acodec libtwolame -psymodel 4 -b:a <rate>k -ar 32000 <output.mp2>
Rationale:
-acodec libtwolame : ffmpeg has two mp2 encoders, and libtwolame produces vastly higher quality audio
-psymodel 4 : use alternative psychoacoustic model -- the default model (3) tends to insert "clunk" sounds throughout the audio
-b:a : 256k is recommended for high quality audio (trust me, you don't need 384k)
-ar 32000 : resample the audio to 32kHz, the sampling rate of the TSVM soundcard
TYPE 0 Packet -
uint32 SIZE OF COMPRESSED FRAMEDATA
* COMPRESSED FRAMEDATA
TYPE 1 Packet -
byte[512] Palette Data
uint32 SIZE OF COMPRESSED FRAMEDATA
* COMPRESSED FRAMEDATA
TYPE 2 Packet -
uint32 SIZE OF COMPRESSED FRAMEDATA BYTE-PLANE 1
* COMPRESSED FRAMEDATA
uint32 SIZE OF COMPRESSED FRAMEDATA BYTE-PLANE 2
* COMPRESSED FRAMEDATA
iPF Packet -
uint32 SIZE OF COMPRESSED FRAMEDATA
* COMPRESSED FRAMEDATA // only the actual gzip (and no UNCOMPRESSED SIZE) of the "Blocks.gz" is stored
TYPE 3 Packet (Patch-encoded iPF 1 Packet) -
uint32 SIZE OF COMPRESSED PATCHES
* COMPRESSED PATCHES
PATCHES are bunch of PATCHes concatenated
where each PATCH is encoded as:
uint8 X-coord of the patch (pixel position divided by four)
uint8 Y-coord of the patch (pixel position divided by four)
uint8 width of the patch (size divided by four)
uint8 height of the patch (size divided by four)
(calculating uncompressed size)
(iPF1 no alpha: width * height * 12)
(iPF1 with alpha: width * height * 20)
(iPF2 no alpha: width * height * 16)
(iPF2 with alpha: width * height * 24)
* UN-COMPRESSED PATCHDATA
TYPE 16+ Packet -
uint32 SIZE OF COMPRESSED FRAMEDATA BYTE-PLANE 1
* FRAMEDATA (COMPRESSED for TGA/GZ)
MP2 Packet & ADPCM Packet -
uint16 TYPE OF PACKET // follows the Metadata Packet Type scheme
* MP2 FRAME/ADPCM BLOCK
Sync Packet (subset of GLOBAL TYPE 255 Packet) -
uint16 0xFFFF (type of packet for Global Type 255)
Background Colour Packet -
uint16 0xFEFF
uint8 Red (0-255)
uint8 Green (0-255)
uint8 Blue (0-255)
uint8 0x00 (pad byte)
Frame Timing
If the global type is not 255, each packet is interpreted as a single full frame, and then will wait for the next
frame time; For type 255 however, the assumption no longer holds and each frame can have multiple packets, and thus
needs explicit "sync" packet for proper frame timing.
Comperssion Method
Old standard used Gzip, new standard is Zstd.
tsvm will read the zip header and will use appropriate decompression method, so that the old Gzipped
files remain compatible.
NOTE FROM DEVELOPER
In the future, the global packet type will be deprecated.
--------------------------------------------------------------------------------
TSVM Interchangeable Picture Format (aka iPF Type 1/2)
Image is divided into 4x4 blocks and each block is serialised, then the entire iPF blocks are Zstd-compressed
# File Structure
\x1F T S V M i P F
[HEADER]
[Blocks]
- Header
uint16 WIDTH
uint16 HEIGHT
uint8 Flags
0b p00z 000a
- a: has alpha
- z: Zstd-compressed (p flag always sets this flag)
- p: progressive ordering (Adam7)
uint8 iPF Type/Colour Mode
0: Type 1 (4:2:0 chroma subsampling; 2048 colours?)
1: Type 2 (4:2:2 chroma subsampling; 2048 colours?)
byte[10] RESERVED
uint32 UNCOMPRESSED SIZE (somewhat redundant but included for convenience)
- Chroma Subsampled Blocks
Zstd-compressed unless the z-flag is not set.
4x4 pixels are sampled, then divided into YCoCg planes.
CoCg planes are "chroma subsampled" by 4:2:0, then quantised to 4 bits (8 bits for CoCg combined)
Y plane is quantised to 4 bits
By doing so, CoCg planes will reduce to 4 pixels
For the description of packing, pixels in Y/Cx plane will be numbered as:
Y0 Y1 Y2 Y3 || Cx1 Cx2 | Cx1 Cx2
Y4 Y5 Y6 Y7 || (iPF 1) | Cx3 Cx4
Y8 Y9 YA YB || Cx3 Cx4 | Cx5 Cx6
YC YD YE YF || (iPF 1) | Cx7 Cx8
Bits are packed like so:
iPF1:
uint16 [Co4 | Co3 | Co2 | Co1]
uint16 [Cg4 | Cg3 | Cg2 | Cg1]
uint16 [Y1 | Y0 | Y5 | Y4]
uint16 [Y3 | Y2 | Y7 | Y6]
uint16 [Y9 | Y8 | YD | YC]
uint16 [YB | YA | YF | YE]
(total: 12 bytes)
iPF2:
uint32 [Co8 | Co7 | Co6 | Co5 | Co4 | Co3 | Co2 | Co1]
uint32 [Cg8 | Cg7 | Cg6 | Cg5 | Cg4 | Cg3 | Cg2 | Cg1]
uint16 [Y1 | Y0 | Y5 | Y4]
uint16 [Y3 | Y2 | Y7 | Y6]
uint16 [Y9 | Y8 | YD | YC]
uint16 [YB | YA | YF | YE]
(total: 16 bytes)
If has alpha, append following bytes for alpha values
uint16 [a1 | a0 | a5 | a4]
uint16 [a3 | a2 | a7 | a6]
uint16 [a9 | a8 | aD | aC]
uint16 [aB | aA | aF | aE]
(total: 20/24 bytes)
Subsampling mask:
Least significant byte for top-left, most significant for bottom-right
For example, this default pattern
00 00 01 01
00 00 01 01
10 10 11 11
10 10 11 11
turns into:
01010000 -> 0x30
01010000 -> 0x30
11111010 -> 0xFA
11111010 -> 0xFA
which packs into: [ 30 | 30 | FA | FA ] (because little endian)
iPF1-delta (for video encoding):
Delta encoded frames contain "insutructions" for patch-encoding the existing frame.
Or, a collection of [StateChangeCode] [Optional VarInts] [Payload...] pairs
States:
0x00 SKIP [varint skipCount]
0x01 PATCH [varint blockCount] [12x blockCount bytes]
0x02 REPEAT [varint repeatCount] [a block]
0xFF END
Sample stream:
[SKIP 10] [PATCH A] [REPEAT 3] [SKIP 5] [PATCH B] [END]
Delta block format:
Each PATCH delta payload is still:
8 bytes of Luma (4-bit deltas for 16 pixels)
2 bytes of Co deltas (4× 4-bit deltas)
2 bytes of Cg deltas (4× 4-bit deltas)
Total: 12 bytes per PATCH.
These are always relative to the same-position block in the previous frame.
- Progressive Blocks
Ordered string of words (word size varies by the colour mode) are stored here.
If progressive mode is enabled, words are stored in the order that accomodates it.
--------------------------------------------------------------------------------
TSVM Enhanced Video (TEV) Format
Created by CuriousTorvald and Claude on 2025-08-17
TEV is a modern video codec optimized for TSVM's 4096-color hardware, featuring
DCT-based compression, optional motion compensation, and efficient temporal coding.
## Version History
- Version 2.0: YCoCg-R 4:2:0 with 16x16/8x8 DCT blocks
- Version 2.1: Added Rate Control Factor to all video packets (breaking change)
* Enables bitrate-constrained encoding alongside quality modes
* All video frames now include 4-byte rate control factor after payload size
- Version 3.0: Additional support of ICtCp Colour space
# File Structure
\x1F T S V M T E V (if video), \x1F T S V M T E P (if still picture)
[HEADER]
[PACKET 0]
[PACKET 1]
[PACKET 2]
...
## Header (24 bytes)
uint8 Magic[8]: "\x1FTSVMTEV" or "\x1FTSVMTEP"
uint8 Version: 2 (YCoCg-R) or 3 (ICtCp)
uint16 Width: video width in pixels
uint16 Height: video height in pixels
uint8 FPS: frames per second
uint32 Total Frames: number of video frames
uint8 Quality Index for Y channel (0-99; 100 denotes all quantiser is 1)
uint8 Quality Index for Co channel (0-99; 100 denotes all quantiser is 1)
uint8 Quality Index for Cg channel (0-99; 100 denotes all quantiser is 1)
uint8 Extra Feature Flags
- bit 0 = has audio
- bit 1 = has subtitle
- bit 2 = infinite loop (must be ignored when File Role is 1)
- bit 7 = has no actual packets, this file is header-only without an Intro Movie
uint8 Video Flags
- bit 0 = is interlaced (should be default for most non-archival TEV videos)
- bit 1 = is NTSC framerate (repeat every 1000th frame)
uint8 File Role
- 0 = generic
- 1 = this file is header-only, and UCF payload will be followed (used by seekable movie file)
When header-only file contain video packets, they should be presented as an Intro Movie
before the user-interactable selector (served by the UCF payoad)
## Packet Types
0x10: I-frame (intra-coded frame)
0x11: P-frame (predicted frame)
0x1F: prohibited
0x20: MP2 audio packet
0x30: Subtitle in "Simple" format
0x31: Subtitle in "Karaoke" format
0xE0: EXIF packet
0xE1: ID3v1 packet
0xE2: ID3v2 packet
0xE3: Vorbis Comment packet
0xE4: CD-text packet
0xFF: sync packet
## Standard metadata payload packet structure
uint8 0xE0/0xE1/0xE2/.../0xEF (see Packet Types section)
uint32 Length of the payload
* Standard payload
note: metadata packets must precede any non-metadata packets
## Video Packet Structure
uint8 Packet Type
uint32 Compressed Size
* Zstd-compressed Block Data
## Block Data (per 16x16 block)
uint8 Mode: encoding mode
0x00 = SKIP (copy from previous frame)
0x01 = INTRA (DCT-coded, no prediction)
0x02 = INTER (DCT-coded with motion compensation) -- currently unused due to bugs
0x03 = MOTION (motion vector only)
int16 Motion Vector X ("capable of" 1/4 pixel precision, integer precision for now)
int16 Motion Vector Y ("capable of" 1/4 pixel precision, integer precision for now)
float32 Rate Control Factor (4 bytes, little-endian)
uint16 Coded Block Pattern (which 8x8 have non-zero coeffs)
int16[256] DCT Coefficients Y
int16[64] DCT Coefficients Co (subsampled by two)
int16[64] DCT Coefficients Cg (subsampled by two, aggressively quantised)
For SKIP and MOTION mode, DCT coefficients are filled with zero
## DCT Quantisation and Rate Control
TEV uses 5 quality levels (0=lowest, 4=highest) with progressive quantisation
tables optimized for perceptual quality. DC coefficients are encoded losslessly,
while AC coefficients are quantised according to quality tables.
### Rate Control Factor
Each block includes a Rate Control Factor that modifies quality level for that specific block.
This feature allows more efficient coding by allows higher quality for complex blocks and lower quality for
flat blocks.
## Motion Compensation
- Search range: ±8 pixels
- Sub-pixel precision: 1/4 pixel (again, integer precision for now)
- Block size: 16x16 pixels
- Uses Sum of Absolute Differences (SAD) for motion estimation
- Bilinear interpolation for sub-pixel motion vectors
## Colour Space
TEV operates in 8-Bit colour mode, colour space conversion required
## Compression Features
- 16x16 DCT blocks (vs 4x4 in iPF)
- Temporal prediction with motion compensation
- Rate-distortion optimized mode selection
- Hardware-accelerated encoding/decoding functions
## Performance Comparison
TEV achieves 60-80% better compression than iPF formats while maintaining
equivalent visual quality, with significantly faster decode performance due
to larger block sizes and hardware acceleration.
## Audio Support
Reuses existing MP2 audio infrastructure from TSVM MOV format for seamless
compatibility with existing audio processing pipeline.
## NTSC Framerate handling
The encoder encodes the frames as-is. The decoder must duplicate every 1000th frame to keep the decoding
in-sync.
--------------------------------------------------------------------------------
Simple Subtitle Format (SSF)
SSF is a simple subtitle that is intended to use text buffer to display texts.
The format is designed to be compatible with SubRip and SAMI (without markups) and interoperable with
TEV and TAV formats.
SSF-TC is an SSF with extra timecode so that subtitle packets can be desynchronised with video frames
on encoding.
When SSF is interleaved with MP2 audio, the payload must be inserted in-between MP2 frames.
## Packet Structure
uint8 0x30/0x31 (SSF/SSF-TC)
uint32 Packet Size
* SSF Payload (see below)
## SSF Packet Structure
uint24 Subtitle object ID (used to specify target subtitle object)
uint64 Timecode in nanoseconds (only present on SSF-TC format; regular SSF must not write these bytes)
uint8 opcode
0x00 = <argument terminator>, is NOP when used here
0x01 = show (arguments: UTF-8 text)
0x02 = hide (arguments: none)
0x03 = move to different nonant (arguments: 0x00-bottom centre; 0x01-bottom left; 0x02-centre left; 0x03-top left; 0x04-top centre; 0x05-top right; 0x06-centre right; 0x07-bottom right; 0x08-centre
0x10..0x2F = show in alternative languages (arguments: char[5] language code, UTF-8 text)
0x80 = upload to low font rom (arguments: uint16 payload length, var bytes)
0x81 = upload to high font rom (arguments: uint16 payload length, var bytes)
note: changing the font rom will change the appearance of the every subtitle currently being displayed
* arguments separated AND terminated by 0x00
text argument may be terminated by 0x00 BEFORE the entire arguments being terminated by 0x00,
leaving extra 0x00 on the byte stream. A decoder must be able to handle the extra zeros.
--------------------------------------------------------------------------------
Karaoke Subtitle Format (KSF)
KSF is a frame-synced subtitle that is intended to use Karaoke-style subtitles.
The format is designed to be interoperable with TEV and TAV formats.
For non-karaoke style synced lyrics, use SSF.
KSF-TC is an KSF with extra timecode so that subtitle packets can be desynchronised with video frames
on encoding.
When KSF is interleaved with MP2 audio, the payload must be inserted in-between MP2 frames.
## Packet Structure
uint8 0x32/0x33 (KSF/KSF-TC)
* KSF Payload (see below)
### KSF Packet Structure
KSF is line-based: you define an unrevealed line, then subsequent commands reveal words/syllables
on appropriate timings.
uint24 Subtitle object ID (used to specify target subtitle object)
uint64 Timecode in nanoseconds (only present on KSF-TC format; regular KSF must not write these bytes)
uint8 opcode
<definition opcodes>
0x00 = <argument terminator>, is NOP when used here
0x01 = define line (arguments: UTF-8 text. Players will also show it in grey)
0x02 = delete line (arguments: none)
0x03 = move to different nonant (arguments: 0x00-bottom centre; 0x01-bottom left; 0x02-centre left; 0x03-top left; 0x04-top centre; 0x05-top right; 0x06-centre right; 0x07-bottom right; 0x08-centre
<reveal opcodes>
0x30 = reveal text normally (arguments: UTF-8 text. The reveal text must contain spaces when required)
0x31 = reveal text slowly (arguments: UTF-8 text. The effect is implementation-dependent)
0x40 = reveal text normally with emphasise (arguments: UTF-8 text. On TEV/TAV player, the text will be white; otherwise, implementation-dependent)
0x41 = reveal text slowly with emphasise (arguments: UTF-8 text)
0x50 = reveal text normally with target colour (arguments: uint8 target colour; UTF-8 text)
0x51 = reveal text slowly with target colour (arguments: uint8 target colour; UTF-8 text)
<hardware control opcodes>
0x80 = upload to low font rom (arguments: uint16 payload length, var bytes)
0x81 = upload to high font rom (arguments: uint16 payload length, var bytes)
note: changing the font rom will change the appearance of the every subtitle currently being displayed
* arguments separated AND terminated by 0x00
text argument may be terminated by 0x00 BEFORE the entire arguments being terminated by 0x00,
leaving extra 0x00 on the byte stream. A decoder must be able to handle the extra zeros.
--------------------------------------------------------------------------------
TSVM Advanced Video (TAV) Format
Created by CuriousTorvald and Claude on 2025-09-13
TAV is a next-generation video codec for TSVM utilising Discrete Wavelet Transform (DWT)
similar to JPEG2000, providing superior compression efficiency and scalability compared
to DCT-based codecs like TEV. Features include multi-resolution encoding, progressive
transmission capability, and region-of-interest coding.
# File Structure
\x1F T S V M T A V (if video), \x1F T S V M T A P (if still picture)
[HEADER]
[PACKET 0]
[PACKET 1]
[PACKET 2]
...
## Header (32 bytes)
uint8 Magic[8]: "\x1FTSVMTAV" or "\x1FTSVMTAP"
uint8 Version:
Base version number:
- 1 = YCoCg-R multi-tile uniform
- 2 = ICtCp multi-tile uniform
- 3 = YCoCg-R monoblock uniform
- 4 = ICtCp monoblock uniform
- 5 = YCoCg-R monoblock perceptual
- 6 = ICtCp monoblock perceptual
- 7 = YCoCg-R multi-tile perceptual
- 8 = ICtCp multi-tile perceptual
When motion coder is Haar, take base version number.
When motion coder is CDF 5/3, add 8 to the base version number.
uint16 Width: picture width in pixels. Columns count for Videotex-only file.
uint16 Height: picture height in pixels. Rows count for Videotex-only file.
If either width or height exceeds 65535 pixels, above two fields must be filled with zero and the dimension must be sourced from XDIM entry of the Extended Header
uint8 FPS: frames per second. Use 0x00 for still pictures
If FPS is greater than 254 or fractional (excl. NTSC), the value must be 0xFF and the true framerate must be sourced from the XFPS entry of the Extended Header
uint32 Total Frames: number of video frames
- use 0 to denote not-finalised video stream
- use 0xFFFFFFFF to denote still picture (.im3 file)
uint8 Wavelet Filter Type:
- 0 = 5/3 reversible (LGT 5/3, JPEG 2000 standard)
- 1 = 9/7 irreversible (CDF 9/7, slight modification of JPEG 2000, default choice)
- 2 = CDF 13/7 (experimental)
- 16 = DD-4 (Four-point interpolating Deslauriers-Dubuc; experimental)
- 255 = Haar (demonstration purpose only)
uint8 Decomposition Levels: number of DWT levels (1-6+; use 0 if it has no video or Videotex only)
uint8 Quantiser Index for Y channel (uses exponential numeric system; 0: lossless, 255: potato)
uint8 Quantiser Index for Co channel (uses exponential numeric system; 0: lossless, 255: potato)
uint8 Quantiser Index for Cg channel (uses exponential numeric system; 0: lossless, 255: potato)
uint8 Extra Feature Flags
- bit 0 = has audio (for still pictures: has background music)
- bit 1 = has subtitle (for still pictures: has timed captions)
- bit 2 = infinite loop (has no effect for still pictures)
- bit 7 = has no actual packets, this file is header-only without an Intro Movie
uint8 Video Flags
- bit 0 = interlaced
- bit 1 = is NTSC framerate
- bit 2 = is lossless mode
- bit 3 = has region-of-interest coding (for still pictures only)
- bit 4 = no Zstd compression
- bit 7 = has no video
uint8 Encoder quality level (stored with bias of 1 (q0=1); used to derive anisotropy value)
uint8 Channel layout (bit-field: bit 0=has alpha, bit 1=has chroma inverted, bit 2=has luma inverted)
* Luma-only videos must be decoded with fixed Chroma=0
* Chroma-only videos must be decoded with fixed Luma=127
* No-alpha videos must be decoded with fixed Alpha=255
- 0 = Y-Co-Cg/I-Ct-Cp (000: no alpha, has chroma, has luma)
- 1 = Y-Co-Cg-A/I-Ct-Cp-A (001: has alpha, has chroma, has luma)
- 2 = Y/I only (010: no alpha, no chroma, has luma)
- 3 = Y-A/I-A (011: has alpha, no chroma, has luma)
- 4 = Co-Cg/Ct-Cp (100: no alpha, has chroma, no luma)
- 5 = Co-Cg-A/Ct-Cp-A (101: has alpha, has chroma, no luma)
- 6-7 = Reserved/invalid (would indicate no luma and no chroma)
uint8 Entropy Coder
- 0 = Twobit-plane significance map (deprecated)
- 1 = Embedded Zero Block Coding
- 2 = Raw coefficients (debugging purpose only)
uint8 Encoder Preset
- Bit 0 = use finer motion (finer temporal quantisation)
- Bit 1 = reduce grain synthesis
Preset "Default" -> 0x00
Preset "Sports" -> 0x01
Preset "Anime" -> 0x02
NOTE: not all presets have preset flags. See Preset section for details.
uint8 Reserved[1]: fill with zeros
uint8 Device Orientation
- 0 = No rotation
- 1 = Clockwise 90 deg
- 2 = 180 deg
- 3 = Clockwise 270 deg
- 4 = Mirrored, No rotation
- 5 = Mirrored, Clockwise 90 deg
- 6 = Mirrored, 180 deg
- 7 = Mirrored, Clockwise 270 deg
uint8 File Role
- 0 = generic
- 1 = this file is header-only, and UCF payload will be followed (used by movie file with chapters)
When header-only file contain video packets, they should be presented as an Intro Movie
before the user-interactable selector (served by the UCF payoad)
### Presets
The encoder supports following presets:
- Sports: use finer temporal quantisation, resulting in better-preserved motion. Less effective as resolution goes up
- Anime: instructs the decoder to disable grain synthensis
2025-12-08 Addendum: TAV-DT should be its own encoder, not preset
- D1/D1PAL: encode to TAV-DT (NTSC/PAL) format. Any non-compliant setup will be ignored and substituted to compliant values
- D1P/D1PALP: encode to TAV-DT Progressive (NTSC/PAL) format. Any non-compliant setup will be ignored and substituted to compliant values
## Packet Structure (some special packets have no payload. See Packet Types for details)
uint8 Packet Type
uint32 Payload Size
* Payload
## Packet Types
<video packets>
0x10: I-frame (intra-coded frame)
0x11: P-frame (delta/skip frame)
0x12: GOP Unified (temporal 3D DWT with unified preprocessing)
0x1F: (prohibited)
<audio packets>
0x20: MP2 audio packet (32 KHz)
0x21: Zstd-compressed 8-bit PCM (32 KHz, audio hardware's native format)
0x22: Zstd-compressed 16-bit PCM (32 KHz, little endian)
0x23: Zstd-compressed ADPCM (32 KHz)
0x24: TAD (TSVM Advanced Audio)
<subtitles>
0x30: Subtitle in "Simple" format
0x31: Subtitle in "Simple" format with timecodes
0x32: Subtitle in "Karaoke" format
0x33: Subtitle in "Karaoke" format with timecodes
0x3F: Videotex (full-frame text buffer memory image)
<synchronised tracks>
0x40: MP2 audio track (32 KHz)
0x41: Zstd-compressed 8-bit PCM (32 KHz, audio hardware's native format)
0x42: Zstd-compressed 16-bit PCM (32 KHz, little endian)
0x43: Zstd-compressed ADPCM (32 KHz)
0x44: TAD (TSVM Advanced Audio)
<multiplexed video>
0x70..7F: Reserved for Future Version
<Standard metadata payloads>
(it's called "standard" because you're expected to just copy-paste the metadata bytes verbatim)
0xE0: EXIF packet
0xE1: ID3v1 packet
0xE2: ID3v2 packet
0xE3: Vorbis Comment packet
0xE4: CD-text packet
<Extensible>
0x01: Vendor-specific video packets
0x02: Vendor-specific audio frame
0x03: Vendor-specific subtitle
0x04: Vendor-specific audio file
0x0E: Vendor-specific metadata
<Special packets>
0x00: No-op (no payload)
0xEF: TAV Extended Header
0xF0: Loop point start (insert right AFTER the TC packet; no payload)
0xF1: Loop point end (insert right AFTER the TC packet; no payload)
0xF2: Screen masking info
0xFC: GOP Sync packet (indicates N frames decoded from GOP block)
0xFD: Timecode (TC) Packet [for frame 0, insert at the beginning; otherwise, insert right AFTER the sync]
0xFE: NTSC sync packet (used by player to calculate exact framerate-wise performance; no payload)
0xFF: Sync packet (no payload)
### Packet Precedence
Before the first frame group:
1. TAV Extended header (if any)
2. Standard metadata payloads (if any)
3. SSF-TC/KSF-TC packets (if any)
When time-coded subtitles are used, the entire subtitles must precede the first video frame.
Think of it as tacking the whole subtitle file before the actual video.
4. Screen Masking packets (if any)
Frame group:
1. Timecode Packet (0xFD) or Next TAV File (0x1F) [mutually exclusive!]
2. Loop point packet (if any)
3. Audio packets (if any)
4. Subtitle packets (if any) [mutually exclusive with SSF-TC/KSF-TC packets]
5. Main video packets (0x10-0x1E)
6. Multiplexed video packets (0x70-7F; if any)
After a frame group:
1. Sync packet (0xFC or 0xFF)
2. NTSC Sync packet (if required; it will instruct players to duplicate the current frame)
## TAV Extended Header Specification and Structure
uint8 Packet Type (0xEF)
uint16 Number of Key-Value pairs
* Key-Value pairs
### Key-Value Pair
uint8 Key[4]
uint8 Value Type
- 0x00: (U)Int16
- 0x01: (U)Int24
- 0x02: (U)Int32
- 0x03: (U)Int48
- 0x04: (U)Int64
- 0x10: Bytes
<if Value Type is Bytes>
uint16 Length of bytes
* Bytes
<else>
type_t Value
<fi>
### List of Keys
- Uint64 BGNT: Video begin time in nanoseconds (must be equal to the value of the first Timecode packet)
- Uint64 ENDT: Video end time in nanoseconds (must be equal to BGNT + playback time)
- Uint64 CDAT: Creation time in microseconds since UNIX Epoch (must be in UTC timezone)
- Bytes VNDR: Name and version of the encoder (for Reference encoder: "Encoder-TAV 20251014 (list,of,features)")
- Bytes FMPG: FFmpeg version (typically "ffmpeg version 8.0 Copyright (c) 2000-2025 the FFmpeg developers"; the first line of text FFmpeg emits)
- Bytes XDIM: Video dimension in '<width>,<height>' format. Mandatory if either width or height exceeds 65535
- Bytes XFPS: Framerate in '<numerator>/<denominator>' format. Mandatory if either:
1. FPS exceeds 254
2. denominator is not 1 or 1001
## Extensible Packet Structure
uint8 Packet Type
uint8 Flags
- 0x01: 64-bit size
uint8 Identifier[4]
<if 64-bit size>
uint64 Length of the payload
<else>
uint32 Length of the payload
<fi>
* Payload
## Standard Metadata Payload Packet Structure
uint8 Packet Type (0xE0/0xE1/0xE2/.../0xEE; see Packet Types section)
uint32 Length of the payload
* Standard payload
Notes:
- metadata packets must precede any non-metadata packets
- when multiple metadata packets are present (e.g. ID3v2 and Vorbis Comment both present),
which gets precedence is implementation-dependent. ONE EXCEPTION is ID3v1 and ID3v2 where ID3v2 gets
precedence.
## Timecode Packet Structure
uint8 Packet Type (0xFD)
uint64 Time since stream start in nanoseconds (this may NOT start from zero if the video is coming from a livestream)
## Screen Masking Packet Structure
When letterbox/pillarbox detection is active, the encoder will only encode pictures in the active area.
Decoders must use this value to derive the size of the active area for decoding, and fill the blank on playback.
Encoders only need to insert this packets at the start of the video (if necessary) and whenever geometry change occurs.
uint8 Packet Type (0xF2)
uint32 Starting frame number
uint16 Mask size top in pixels
uint16 Mask size right in pixels
uint16 Mask size bottom in pixels
uint16 Mask size left in pixels
## Video Packet Structure
uint8 Packet Type (0x10/0x11)
uint32 Compressed Size
* Zstd-compressed Block Data
## TAD Packet Structure
uint8 Packet Type (0x24)
<header for decoding packet>
uint16 Sample Count
uint32 Compressed Size + 7
<header for decoding TAD chunk>
uint16 Sample Count
uint8 Quantiser Bits
uint32 Compressed Size
* Zstd-compressed TAD
## Videotex Packet Structure
uint8 Packet Type (0x3F)
uint32 Compressed Size
* Zstd-compressed payload, where:
uint8 Rows
uint8 Columns
* Foreground colours
* Background colours
* Characters
## GOP Unified Packet Structure (0x12)
Implemented on 2025-10-15 for temporal 3D DWT with unified preprocessing.
This packet contains multiple frames encoded as a single spacetime block for optimal
temporal compression.
uint8 Packet Type (0x12/0x13)
uint8 GOP Size (number of frames in this GOP)
<if packet type is 0x13>
uint32 Compressed Size
* Zstd-compressed Motion Data
<fi>
uint32 Compressed Size
* Zstd-compressed Unified Block Data
### Unified Block Data Format
The entire GOP (width×height×N_frames×3_channels) is preprocessed as a single block:
<if significance maps are used>
uint8 Y Significance Maps[(width*height + 7) / 8 * GOP Size] // All Y frames concatenated
uint8 Co Significance Maps[(width*height + 7) / 8 * GOP Size] // All Co frames concatenated
uint8 Cg Significance Maps[(width*height + 7) / 8 * GOP Size] // All Cg frames concatenated
int16 Y Non-zero Values[variable length] // All Y non-zero coefficients
int16 Co Non-zero Values[variable length] // All Co non-zero coefficients
int16 Cg Non-zero Values[variable length] // All Cg non-zero coefficients
<fi>
<if EZBC is used>
uint32 EZBC Size for Y
* EZBC Structure for Y
uint32 EZBC Size for Co
* EZBC Structure for Co
uint32 EZBC Size for Cg
* EZBC Structure for Cg
<fi>
This layout enables Zstd to find patterns across both spatial and temporal dimensions,
resulting in superior compression compared to per-frame encoding.
### Temporal 3D DWT Process
1. Detect where the scene change is happening on the first pass
2. Determine GOP slicing from the scene detection
3. Apply 1D DWT across temporal axis (GOP frames)
4. Apply 2D DWT on each spatial slice of temporal subbands
5. Perceptual quantisation with temporal-spatial awareness
6. Unified significance map preprocessing across all frames/channels
7. Single Zstd compression of entire GOP block
## GOP Sync Packet Structure (0xFC)
Indicates that N frames were decoded from a GOP Unified block.
Decoders must track this to maintain proper frame count and synchronization.
uint8 Packet Type (0xFC)
uint8 Frame Count (number of frames that were decoded from preceding GOP block)
Note: GOP Sync packets have no payload size field (fixed 2-byte packet).
## Block Data (per frame)
uint8 Mode: encoding mode
0x00 = SKIP (just use frame data from previous frame)
0x01 = INTRA (DWT-coded)
0x02 = DELTA (DWT delta)
- 0x02: DWT level 1
- 0x12: DWT level 2
- 0x22: DWT level 3
...
- 0xF2: DWT Level 16
uint8 Quantiser override Y (uses exponential numeric system; stored with index bias of 1 (127->252, 255->4032); use 0 to disable overriding)
uint8 Quantiser override Co (uses exponential numeric system; stored with index bias of 1 (127->252, 255->4032); use 0 to disable overriding)
uint8 Quantiser override Cg (uses exponential numeric system; stored with index bias of 1 (127->252, 255->4032); use 0 to disable overriding)
- note: quantiser overrides are always present regardless of the channel layout
* Tile data (one compressed payload per tile)
### Coefficient Storage Format (Significance Map Compression)
Starting with encoder version 2025-09-29, DWT coefficients are stored using
significance map compression with concatenated maps layout for optimal efficiency:
#### Concatenated Maps Format
All channels are processed together to maximize Zstd compression:
uint8 Y Significance Map[(coeff_count + 7) / 8] // 1 bit per Y coefficient
uint8 Co Significance Map[(coeff_count + 7) / 8] // 1 bit per Co coefficient
uint8 Cg Significance Map[(coeff_count + 7) / 8] // 1 bit per Cg coefficient
uint8 A Significance Map[(coeff_count + 7) / 8] // 1 bit per A coefficient (if alpha present)
int16 Y Non-zero Values[variable length] // Only non-zero Y coefficients
int16 Co Non-zero Values[variable length] // Only non-zero Co coefficients
int16 Cg Non-zero Values[variable length] // Only non-zero Cg coefficients
int16 A Non-zero Values[variable length] // Only non-zero A coefficients (if alpha present)
#### Significance Map Encoding
Each significance map uses 1 bit per coefficient position:
- Bit = 1: coefficient is non-zero, read value from corresponding Non-zero Values array
- Bit = 0: coefficient is zero
#### Compression Benefits
- **Sparsity exploitation**: Typically 85-95% zeros in quantised DWT coefficients
- **Cross-channel patterns**: Concatenated maps allow Zstd to find patterns across similar significance maps
- **Overall improvement**: 16-18% compression improvement before Zstd compression
## DWT Implementation Details
### Wavelet Filters
- 5/3 Reversible Filter (lossless capable):
* Analysis: Low-pass [1/2, 1, 1/2], High-pass [-1/8, -1/4, 3/4, -1/4, -1/8]
* Synthesis: Low-pass [1/4, 1/2, 1/4], High-pass [-1/16, -1/8, 3/8, -1/8, -1/16]
- 9/7 Irreversible Filter (higher compression):
* Analysis: CDF 9/7 coefficients optimized for image compression
* Provides better energy compaction than 5/3 but lossy reconstruction
### Quantisation Strategy
#### Uniform Quantisation (Versions 3-4)
Traditional approach using same quantisation factor for all DWT subbands within each channel.
#### Perceptual Quantisation (Versions 5-6, Default)
TAV versions 5 and 6 implement Human Visual System (HVS) optimized quantisation with
frequency-aware subband weighting for superior visual quality:
Anisotropic quantisation is applied for both Luma and Chroma channels to preserve horizontal details.
The anisotropic quantisation is the innovative upgrade to the traditional field-interlacing and
chroma subsampling.
This perceptual approach allocates more bits to visually important low-frequency
details while aggressively quantising high-frequency noise, resulting in superior
visual quality at equivalent bitrates.
#### Grain Synthesis
The decoder must synthesise a film grain on non-LL subbands at the amplitude half of the quantisation level.
The encoder may synthesise the exact same grain in sign-reversed on encoding (but not recommended for practical reasons).
The base noise function must be triangular noise in range [-1.0, 1.0].
## Colour Space
TAV supports two colour spaces:
**YCoCg-R (Versions 3, 5):**
- Y: Luma channel (full resolution)
- Co: Orange-Cyan chroma (full resolution)
- Cg: Green-Magenta chroma (full resolution)
**ICtCp (Versions 4, 6):**
- I: Intensity (similar to luma)
- Ct: Chroma tritanopia
- Cp: Chroma protanopia
Perceptual versions (5-6) apply HVS-optimized quantisation weights per channel,
while uniform versions (3-4) use consistent quantisation across all subbands.
The encoder expects linear alpha.
## Compression Features
- Single DWT tiles vs 16x16 DCT blocks in TEV
- Multi-resolution representation enables scalable decoding
- Better frequency localisation than DCT
- Reduced blocking artifacts due to overlapping basis functions
## Audio Support
MP2 frames, raw PCMu8, and TAD formats are supported.
## Subtitle Support
Uses same Simple Subtitle Format (SSF) as TEV for text overlay functionality.
## NTSC Framerate handling
Unlike the TEV format, TAV encoder emits extra sync packet for every 1000th frames. Decoder can just play the video without any special treatment.
## Exponential Numeric System
This system maps [0..255] to [1..4096]
Number|Index
------+-----
1|0
2|1
3|2
4|3
5|4
6|5
7|6
8|7
9|8
10|9
11|10
12|11
13|12
14|13
15|14
16|15
17|16
18|17
19|18
20|19
21|20
22|21
23|22
24|23
25|24
26|25
27|26
28|27
29|28
30|29
31|30
32|31
33|32
34|33
35|34
36|35
37|36
38|37
39|38
40|39
41|40
42|41
43|42
44|43
45|44
46|45
47|46
48|47
49|48
50|49
51|50
52|51
53|52
54|53
55|54
56|55
57|56
58|57
59|58
60|59
61|60
62|61
63|62
64|63
66|64
68|65
70|66
72|67
74|68
76|69
78|70
80|71
82|72
84|73
86|74
88|75
90|76
92|77
94|78
96|79
98|80
100|81
102|82
104|83
106|84
108|85
110|86
112|87
114|88
116|89
118|90
120|91
122|92
124|93
126|94
128|95
132|96
136|97
140|98
144|99
148|100
152|101
156|102
160|103
164|104
168|105
172|106
176|107
180|108
184|109
188|110
192|111
196|112
200|113
204|114
208|115
212|116
216|117
220|118
224|119
228|120
232|121
236|122
240|123
244|124
248|125
252|126
256|127
264|128
272|129
280|130
288|131
296|132
304|133
312|134
320|135
328|136
336|137
344|138
352|139
360|140
368|141
376|142
384|143
392|144
400|145
408|146
416|147
424|148
432|149
440|150
448|151
456|152
464|153
472|154
480|155
488|156
496|157
504|158
512|159
528|160
544|161
560|162
576|163
592|164
608|165
624|166
640|167
656|168
672|169
688|170
704|171
720|172
736|173
752|174
768|175
784|176
800|177
816|178
832|179
848|180
864|181
880|182
896|183
912|184
928|185
944|186
960|187
976|188
992|189
1008|190
1024|191
1056|192
1088|193
1120|194
1152|195
1184|196
1216|197
1248|198
1280|199
1312|200
1344|201
1376|202
1408|203
1440|204
1472|205
1504|206
1536|207
1568|208
1600|209
1632|210
1664|211
1696|212
1728|213
1760|214
1792|215
1824|216
1856|217
1888|218
1920|219
1952|220
1984|221
2016|222
2048|223
2112|224
2176|225
2240|226
2304|227
2368|228
2432|229
2496|230
2560|231
2624|232
2688|233
2752|234
2816|235
2880|236
2944|237
3008|238
3072|239
3136|240
3200|241
3264|242
3328|243
3392|244
3456|245
3520|246
3584|247
3648|248
3712|249
3776|250
3840|251
3904|252
3968|253
4032|254
4096|255
--------------------------------------------------------------------------------
TSVM Advanced Video - Digital Tape (TAV-DT) Format
Created by CuriousTorvald on 2025-12-01
TAV-DT is an extension to TAV format that is intended as filesystem-independent packetised video stream
with easy syncing: playback can start from the arbitrary position and decoder can easily sync up to the
start of the next packet
# Video Format
- Dimension: 720x480 for NTSC, 720x576 for PAL
- FPS: arbitrary (defined in packet header)
- Wavelet: 9/7 Spatial, Haar Temporal ("sport" preset always enabled)
- Decomposition levels: 4 spatial, 2 temporal
- Quantiser and encoder quality level: arbitrary (defined in packet header as quality index)
- Extra features:
- Audio is mandatory (TAD codec only)
- Everything else is unsupported
- Video flags: Interlaced/NTSC framerate (defined in packet header)
* interlaced is enabled by default
- Channel layout: Y-Co-Cg
- Entropy coder: EZBC
- Encoder preset: sports preset always enabled
- Tiles: monoblock
- GOP size: always 16 frames
# Packet Structure
uint32 Sync pattern (0xE3537A1F for NTSC Dimension, 0xD193A745 for PAL Dimension)
<packet header start>
uint8 Framerate
uint8 Flags
- bit 0 = interlaced
- bit 1 = is NTSC framerate
- bit 4-7 = quality index (0-5)
* Quality indices follow TSVM encoder's
int16 Reserved (zero-fill)
uint32 Total packet size (sum of TAD packet and TAV packet size)
uint64 Timecode in nanoseconds
uint32 Offset to video packet
uint32 Reserved (zero-fill)
uint32 CRC-32 of above
<packet header end; encoded with rate 1/2 LDPC> // NOTE: sync pattern must not be LDPC-coded
bytes TAD with forward error correction
<TAD header start>
uint16 Sample Count
uint8 Quantiser Bits
uint32 Compressed Size
uint24 Reed-Solomon Block Count
uint32 CRC-32 of above
<TAD chunk header end; encoded with rate 1/2 LDPC>
<Reed-Solomon (255,223) block start>
bytes TAD (EZBC, no Zstd)
bytes Parity for TAD
<Reed-Solomon (255,223) block end>
bytes TAV with forward error correction
uint32 TAV header sync pattern (0xA3F7C91E)
<TAV header start>
uint8 GOP Size (number of frames in this GOP)
uint16 Reserved (zero-fill)
uint32 Compressed Size
uint24 Reed-Solomon Block Count
uint32 CRC-32 of above
<TAV header end; encoded with rate 1/2 LDPC> // NOTE: sync pattern must not be LDPC-coded
<Reed-Solomon (255,223) block start>
bytes TAV (EZBC, no Zstd)
bytes Parity for TAV
<Reed-Solomon (255,223) block end>
Q1. Why headers have such low encoding rate (n byte input -> 2n byte output)?
A1. Headers are crucial for the decoding and thus must be protected rigorously
Q2. What to do when payload is smaller than RS block capacity?
A2. Fill with zero. It shouldn't affect Zstd, and compressed size is already specified, so they complement each other.
When decoding, reserved areas must be filled with zero before the actual decoding.
# How to sync to the stream
1. Find a sync pattern
2. Read remaining 8 bytes -> concatenate sync with what has been read
3. Calculate CRC-32 of concatenated 12 bytes
4. Read 4 bytes (stored CRC)
5. Check calculated CRC against stored CRC
6. If they match, sync to the stream; if not, find a next sync pattern
7. "Offset to video packet" and the actual length of the TAD packet can be used together to recover video packet when stream is damaged, using the fact that in error-free stream, length of TAD packet is equal to "Offset to video packet", and the internal packet order is always audio-then-video
## Soft Sync Recovery
The decoder MAY try to sync to the sync pattern that appears damaged when its contents are seem to be intact, under the following strategies.
### Stage 1
On the stream position where the sync pattern is supposed to be:
1. Substitute damaged sync pattern with known sync pattern (videos are not allowed to change NTSC/PAL mode mid-stream, so there's only one known value)
2. Zero-fill the reserved area if haven't already
3. Re-calculate CRC. If match, sync. If not, head to the next stage
### Stage 2
1. Further substitute the framerate, flags, timecode to the last known value (as these values rarely change mid-stream; timecode must be incremented appropriately. e.g. FPS=16, last known timecode=5.0, packets missed so far=4, then assumed timecode is 5.0 + 4 + 1 = 10.0)
2. Re-calculate CRC. If match, sync. If not, head to the next stage
### Stage 3 (mostly throwaway efforts)
1. Search for 0xA3F7C91E or next sync pattern
2. If 0xA3F7C91E is found, try to decode the subpacket by verifying header with CRC; if next sync pattern is found, sync to that packet.
3. If successful, sync. If not, soft sync recovery is failed, and discard the packet
Note: If CRC is unmatched, the packet MUST be discarded, as the header content cannot be trusted if all soft recovery stages have failed
--------------------------------------------------------------------------------
TSVM Advanced Audio (TAD) Format
Created by CuriousTorvald and Claude on 2025-10-23
Updated: 2025-10-30 (fixed non-power-of-2 sample count support)
TAD is a perceptual audio codec for TSVM utilising Discrete Wavelet Transform (DWT)
with CDF 9/7 biorthogonal wavelets, providing efficient compression through M/S stereo
decorrelation, frequency-dependent quantisation, and raw int8 coefficient storage.
Designed as an includable API for integration with TAV video encoder.
When used inside of a video codec, only zstd-compressed payload is stored, chunk length
is stored separately and quality index is shared with that of the video.
# Suggested File Structure
\x1F T S V M T A D
[HEADER]
[CHUNK 0]
[CHUNK 1]
[CHUNK 2]
...
## Header (16 bytes)
uint8 Magic[8]: "\x1FTSVMTAD"
uint8 Version: 1
uint8 Quality Level: 0-5 (0=lowest quality/smallest, 5=highest quality/largest)
uint8 Flags:
- bit 0: Zstd compression enabled (1=compressed, 0=uncompressed)
- bits 1-7: Reserved (must be 0)
uint32 Sample Rate: audio sample rate in Hz (always 32000 for TSVM)
uint8 Channels: number of audio channels (always 2 for stereo)
uint8 Reserved[2]: fill with zeros
## Audio Properties
- **Sample Rate**: 32000 Hz (TSVM audio hardware native format)
- **Channels**: 2 (stereo)
- **Input Format**: PCM32fLE (32-bit float little-endian PCM)
- **Preprocessing**: 16 Hz highpass filter applied during extraction
- **Internal Representation**: Float32 throughout encoding, PCM8 conversion only at decoder
- **Chunk Size**: Variable (1024-32768+ samples per channel, any size ≥1024 supported)
- Default: 32768 samples (1.024 seconds at 32 kHz) for standalone files
- TAV integration: Uses exact GOP sample count (e.g., 32016 for 1 second at 32 kHz)
- Minimum: 1024 samples (32 ms at 32 kHz)
- DWT levels: Fixed at 9 levels for all chunk sizes
- **Target Compression**: 2:1 against PCMu8 baseline
- **Wavelet**: CDF 9/7 biorthogonal
## Chunk Structure
Each chunk encodes a variable number of stereo samples (minimum 1024, any size supported).
Default is 32768 samples (65536 total samples, 1.024 seconds) for standalone files.
TAV integration uses exact GOP sample counts (e.g., 32016 samples for 1 second at 32 kHz).
uint16 Sample Count: number of samples per channel (min 1024, any size ≥1024)
uint8 Max quantisation index: this number * 2 + 1 is the total steps of quantisation
uint32 Chunk Payload Size: size of following payload in bytes
* Chunk Payload: encoded M/S stereo data (Zstd compressed if flag set)
### Chunk Payload Structure (before Zstd compression)
* Mid Channel EZBC Data (embedded zero block coded bitstream)
* Side Channel EZBC Data (embedded zero block coded bitstream)
Each EZBC channel structure:
uint8 MSB Bitplane: highest bitplane with significant coefficient
uint16 Coefficient Count: number of coefficients in this channel
* Binary Tree EZBC Bitstream: significance map + refinement bits
## Encoding Pipeline
### Step 1: Pre-emphasis Filter
Input stereo PCM32fLE undergoes first-order IIR pre-emphasis filtering (α=0.5):
H(z) = 1 - α·z⁻¹
This shifts quantisation noise toward lower frequencies where it's more maskable by
the psychoacoustic model. The filter has persistent state across chunks to prevent
discontinuities at chunk boundaries.
### Step 2: Dynamic Range Compression (Gamma Compression)
Pre-emphasised audio undergoes gamma compression for perceptual uniformity:
encode(x) = sign(x) * |x|^γ where γ=0.5
This compresses dynamic range before quantisation, improving perceptual quality.
### Step 3: M/S Stereo Decorrelation
Mid-Side transformation exploits stereo correlation:
Mid = (Left + Right) / 2
Side = (Left - Right) / 2
This typically concentrates energy in the Mid channel while the Side channel
contains mostly small values, improving compression efficiency.
### Step 4: 9-Level CDF 9/7 DWT
Each channel (Mid and Side) undergoes CDF 9/7 biorthogonal wavelet decomposition. The codec uses a fixed 9 decomposition levels for all chunk sizes:
DWT Levels = 9 (fixed)
For 32768-sample chunks:
- After 9 levels: 64 LL coefficients
- Frequency subbands: LL + 9 H bands (L9 to L1)
For 32016-sample chunks (TAV 1-second GOP):
- After 9 levels: 63 LL coefficients
- Supports non-power-of-2 sizes through proper length tracking (fixed 2025-10-30)
Sideband boundaries are calculated dynamically:
first_band_size = chunk_size >> dwt_levels
sideband[0] = 0
sideband[1] = first_band_size
sideband[i+1] = sideband[i] + (first_band_size << (i-1))
CDF 9/7 lifting coefficients:
α = -1.586134342
β = -0.052980118
γ = 0.882911076
δ = 0.443506852
K = 1.230174105
### Step 5: Frequency-Dependent Quantisation with Lambda Companding
DWT coefficients are quantized using:
1. **Lambda companding**: Maps normalised coefficients through Laplacian CDF with λ=6.0
2. **Perceptually-tuned weights**: Channel-specific (Mid/Side) frequency-dependent scaling
3. **Final quantisation**: base_weight[channel][subband] * quality_scale
The lambda companding provides perceptually uniform quantisation, allocating more bits
to perceptually important coefficient magnitudes.
Channel-specific base quantisation weights:
Mid (0): [4.0, 2.0, 1.8, 1.6, 1.4, 1.2, 1.0, 1.0, 1.3, 2.0]
Side (1): [6.0, 5.0, 2.6, 2.4, 1.8, 1.3, 1.0, 1.0, 1.6, 3.2]
Output: Quantized int8 coefficients in range [-max_index, +max_index]
### Step 6: EZBC Encoding (Embedded Zero Block Coding)
Quantized int8 coefficients are compressed using binary tree EZBC, a 1D variant of
the embedded zero-block coding.
**EZBC Algorithm**:
1. Find MSB bitplane (highest bit position with significant coefficient)
2. Initialise root block covering all coefficients as insignificant
3. For each bitplane from MSB to LSB:
- **Insignificant Pass**: Test each insignificant block for significance
- If still zero at this bitplane: emit 0 bit, keep in insignificant queue
- If becomes significant: emit 1 bit, recursively subdivide using binary tree
- **Refinement Pass**: For already-significant coefficients, emit next bit
4. Binary tree subdivision continues until blocks of size 1 (single coefficients)
5. When coefficient becomes significant: emit sign bit and reconstruct value
**EZBC Output Structure** (per channel):
uint8 MSB Bitplane (8 bits)
uint16 Coefficient Count (16 bits)
* Bitstream: [significance_bits][sign_bits][refinement_bits]
**Compression Benefits**:
- Exploits coefficient sparsity through significance testing
- Progressive refinement enables quality scalability
- Binary tree exploits spatial clustering of significant coefficients
- Typical sparsity: 86.9% zeros (Mid), 97.8% zeros (Side)
### Step 7: Concatenation and Zstd Compression
The Mid and Side EZBC bitstreams are concatenated:
Payload = [Mid_EZBC_data][Side_EZBC_data]
Then compressed using Zstd level 7 for additional compression without significant
CPU overhead. Zstd exploits redundancy in the concatenated bitstreams.
## Decoding Pipeline
### Step 1: Chunk Extraction and Decompression
Read chunk header (sample_count, max_index, payload_size).
If compressed (default), decompress payload using Zstd.
### Step 2: EZBC Decoding
Decode Mid and Side channels from concatenated EZBC bitstreams using binary tree
embedded zero block decoder:
For each channel:
1. Read EZBC header: MSB bitplane (8 bits), coefficient count (16 bits)
2. Initialise root block as insignificant, track coefficient states
3. Process bitplanes from MSB to LSB:
- **Insignificant Pass**: Read significance bits, recursively decode significant blocks
- **Refinement Pass**: Read refinement bits for already-significant coefficients
4. Reconstruct quantized int8 coefficients from bitplane representation
Output: Quantized int8 coefficients for Mid and Side channels
### Step 3: Dequantisation with Lambda Decompanding
Convert quantized int8 values back to float coefficients using:
1. Lambda decompanding (inverse of Laplacian CDF compression)
2. Multiply by frequency-dependent quantisation steps
3. [Optional] Apply coefficient-domain dithering (TPDF, ~-60 dBFS)
### Step 4: 9-Level Inverse CDF 9/7 DWT
Reconstruct Float32 audio from DWT coefficients using inverse CDF 9/7 transform.
**Critical Implementation (Fixed 2025-10-30)**:
The multi-level inverse DWT must use the EXACT sequence of lengths from forward
transform, in reverse order. Using simple doubling (length *= 2) is INCORRECT
for non-power-of-2 sizes.
Correct approach:
1. Pre-calculate all forward transform lengths:
lengths[0] = chunk_size
lengths[i] = (lengths[i-1] + 1) / 2 for i=1..9
2. Apply inverse DWT in reverse order:
for level from 8 down to 0:
apply inverse_dwt(data, lengths[level])
This ensures correct reconstruction for all chunk sizes including non-power-of-2
values (e.g., 32016 samples for TAV 1-second GOPs).
### Step 5: M/S to L/R Conversion
Convert Mid/Side back to Left/Right stereo:
Left = Mid + Side
Right = Mid - Side
### Step 6: Gamma Expansion
Expand dynamic range (inverse of encoder's gamma compression):
decode(y) = sign(y) * |y|^(1/γ) where γ=0.5, so 1/γ=2.0
### Step 7: De-emphasis Filter
Apply de-emphasis filter to reverse the pre-emphasis (α=0.5):
H(z) = 1 / (1 - α·z⁻¹)
This is a first-order IIR filter with persistent state across chunks to prevent
discontinuities at chunk boundaries. The de-emphasis must be applied AFTER gamma
expansion but BEFORE PCM8 conversion to correctly reconstruct the original audio.
### Step 8: PCM32f to PCM8 Conversion with Noise-Shaped Dithering
Convert Float32 samples to unsigned PCM8 (PCMu8) using second-order error-diffusion
dithering with reduced amplitude (0.2× TPDF) to coordinate with coefficient-domain
dithering.
## Compression Performance
- **Target Ratio**: 2:1 against PCMu8
- **Achieved Ratio**: 2.51:1 against PCMu8 at quality level 3
- **Quality**: Perceptually transparent at Q3+, preserves full 0-16 KHz bandwidth
- **Sparsity**: 86.9% zeros in Mid channel, 97.8% in Side channel (typical)
## Integration with TAV Encoder
TAD is designed as an includable API for TAV video encoder integration.
The encoder can be invoked programmatically to compress audio tracks:
#include "tad_encoder.h"
size_t encoded_size = tad_encode_from_file(
input_audio_path,
output_tad_path,
quality_level,
use_zstd,
verbose
);
This allows TAV video files to embed TAD-compressed audio using packet type 0x24.
## Audio Extraction Command
TAD encoder uses two-pass FFmpeg extraction for optimal quality:
# Pass 1: Extract at original sample rate
ffmpeg -i input.mp4 -f f32le -ac 2 temp.pcm
# Pass 2: High-quality resample with SoXR and highpass filter
ffmpeg -f f32le -ar {original_rate} -ac 2 -i temp.pcm \
-ar 32000 -af "aresample=resampler=soxr:precision=28:cutoff=0.99,highpass=f=16" \
output.pcm
This ensures resampling happens after extraction with optimal quality parameters.
--------------------------------------------------------------------------------
**TSVM Universal Cue format**
Created by CuriousTorvald on 2025-09-22
A universal, simple cue designed to work as both playlist to cue up external files and lookup table for internal bytes.
# File Structure
\x1F T S V M U C F
[HEADER]
[CUE ELEMENT 0]
[CUE ELEMENT 1]
[CUE ELEMENT 2]
...
## Header (16 bytes)
uint8 Magic[8]: "\x1FTSVMUCF"
uint8 Version: 1
uint16 Number of cue elements
uint32 (Optional) Size of the cue file, useful for allocating fixed length for future expansion; 0 when not used
unit8 Reserved
## Cue Element
uint8 Addressing Mode (low nybble) and Role Flags (high nybble)
- 0x01: External
- 0x02: Internal
- 0x10: Intended for machine interaction (GOP indices, frame indices, etc.)
- 0x20: Intended for human interaction (playlist, chapter markers, etc.)
- 0x30: Intended for both machine and human interaction
Role flags must be unset to assign no roles
uint16 String Length for name
* Name of the element in UTF-8
<if external addressing mode>
uint16 String Length for relative path
* Relative path
<fi>
<if internal addressing mode>
uint48 Offset to the file
<fi>
--------------------------------------------------------------------------------
**Audio Adapter**
Endianness: little
TSVM Audio Adapter is consisted of 4 playheads, each playhead is capable of playing one PCM or Tracker track.
Synchronisation between playheads are not guaranteed. Do not play music in multiple tracks.
Memory Space
0..720895 RW: Sample bin (704k)
720896..786431 RW: Instrument bin (256 instruments, 256 bytes each; instrument 0 does nothing; 64k)
786432..851967 RW: Play data 1 (currently exposed bank; 64k)
851968..917503 RW: Play data 2 (currently exposed bank; 64k)
917504..983039 RW: TAD Input Buffer (64k)
983040..1048575 RW: TAD Decode Output (64k)
(Layout note 2026-05-06: sample bin shrunk by 16k and instrument bin widened
by the same amount so all downstream dispatch ranges keep their existing
anchors at 786432. Total memory space stays at exactly 1 MiB.)
Sample bin: just raw sample data thrown in there. You need to keep track of starting point for each sample
Instrument bin: Registry for 256 instruments, formatted as:
The instrument record is 256 bytes wide. Envelopes are described by FOUR
independent regions per envelope (vol / pan / pitch-filter):
1. The 25 envelope nodes (offsets 21 / 71 / 121).
2. The LOOP word (offsets 15 / 17 / 19) — defines an always-active
wrap region. When enabled (b=1) and the envelope position reaches
loop_end, it wraps back to loop_start. Active regardless of key
state. This is the IT/FT2 envelope loop.
3. The SUSTAIN word (offsets 189 / 191 / 193) — defines a wrap
region that is ONLY active while the key is on. When the key
goes off the sustain "releases" and the envelope position is
free to walk past sus_end. Concretely:
- FT2-style "sustain point": store sus_start == sus_end (single
index). Engine wraps that index → itself, so the envelope
holds at the point until key-off.
- IT-style "sustain loop": store sus_start <= sus_end. Engine
wraps sus_end → sus_start while key is on, so the envelope
loops within the sustain range until key-off.
4. (none — there is no separate "release loop"; once sustain releases
the envelope walks forward and is captured by the LOOP region if
the LOOP region exists and the position enters it.)
Priority during playback follows schismtracker player/sndmix.c:480-499:
if SUSTAIN.b == 1 and !key_off : wrap (sus_start, sus_end)
elif LOOP.b == 1 : wrap (loop_start, loop_end)
else : hold at last node
This means SUSTAIN takes precedence over LOOP while the key is on; once
the key is released, LOOP becomes the active wrap region. Setting both
to b=0 disables envelope wrapping entirely (envelope plays once and holds
at its last node).
The b flag is the SOLE enable bit for each region; the historical 't'
(sustain breaks on key-off) and 'u' (sustain/loop enable) flags are NOT
present in this encoding — sustain vs loop is now a structural
distinction (different word at a different offset), not a flag bit.
Envelope PRESENCE — distinct from LOOP/SUSTAIN enable — is signalled by
the `P` bit at LOOP-word bit 13 (the high byte's bit 5; offsets 16/18/20
bit 5). Added 2026-05-06 to disambiguate two cases that the wrap-enable
bits cannot tell apart on their own:
P=0: the source had no envelope of this kind. Engine ignores the
node array entirely and the mixer skips envelope-driven output
for this voice (pan reads from channelPan only, cutoff/pitch
reads from sample defaults only). The 25 node slots may still
be left as default-fill garbage; nothing reads them.
P=1: envelope is defined. Engine evaluates the nodes every tick.
Wrap behaviour is independently controlled by LOOP.b and
SUSTAIN.b — when both are 0 the envelope walks once forward
and holds at its terminator (the IT idiom for envelope-driven
decay tails / shaped attacks).
The P bit was introduced to fix a gating ambiguity for pan and pitch/
filter envelopes: the engine could not distinguish "no envelope at all"
(treat as absent) from "envelope present but neither LOOP nor SUSTAIN
wrap is enabled" (evaluate and apply, just don't wrap). Volume envelope
evaluation has always been unconditional in the engine (a default
single-point envelope at value 63 is harmlessly held at unity), so
P_vol is currently informational only — converters should still set it
when the source defines a volume envelope, for consistency and to
support future per-voice gating.
P is the SOLE presence signal: converters MUST set P=1 whenever they
emit envelope nodes, regardless of whether the source enables LOOP or
SUSTAIN. Pre-2026-05-06 .taud files predate the P bit and will not have
their pan / pf envelopes evaluated by the current engine — re-convert
from source.
0 Uint32 Sample Pointer
4 Uint16 Sample length
6 Uint16 Sampling rate at C4 (note number 0x5000)
8 Uint16 Play Start (usually 0 but not always)
10 Uint16 Loop Start (can be smaller than Play Start)
12 Uint16 Loop End
14 Bit8 Sample Flags
0b 0000 0spp
pp: loop mode. 0-no loop, 1-loop, 2-backandforth, 3-oneshot (ignores note length unless overridden by other notes)
s: loop is sustain (key-off escapes the loop)
- IT: look for sample's SusLoop flag
15 Bit16 Volume envelope LOOP word
* Always-active wrap region for the volume envelope. See SUSTAIN word at offset 189 for the key-on-only wrap.
* IMPORTANT: the `b` bit gates only the LOOP wrap behaviour. The volume
envelope itself is always evaluated whenever the per-voice volume-envelope
toggle is on (default true on note-on; switched by effect S $7x / S $8x).
This matches IT/Schism (player/sndmix.c:470-502): CHN_VOLENV is independent
of ENV_VOLLOOP / ENV_VOLSUSTAIN. An envelope with no LOOP and no SUSTAIN
(both `b` bits = 0) walks once from start to its terminator and holds —
which is the IT idiom for envelope-driven decay tails.
* The cut rule: when the volume envelope walks past the last real node in
fall-through (no active sustain or loop wrap) AND that node's value is 0,
the engine deactivates the voice (player/sndmix.c:493-498). Without this,
instruments with stored fadeout=0 + envelope ending at 0 would silently
hold their voices forever.
0b 00P_sssss_0cb_eeeee
s (bits 12..8) : loop start index (0..24)
e (bits 4..0) : loop end index (0..24)
b (bit 5) : enable the LOOP wrap (0 = envelope walks once to its
terminator and holds; non-zero loops between s and e)
c (bit 6) : envelope carry (cross-trigger envelope position carry)
P (bit 13) : envelope present in source (informational for vol —
engine evaluates vol env unconditionally; converters
should set P=1 when emitting nodes for consistency
with pan/pf envelopes, see file-header preamble)
(bits 7, 14..15 reserved — set to 0)
17 Bit16 Panning envelope LOOP word
* Always-active wrap region for the pan envelope.
0b 00P_sssss_pcb_eeeee
s (bits 12..8) : loop start index
e (bits 4..0) : loop end index
b (bit 5) : enable the LOOP
c (bit 6) : envelope carry
p (bit 7) : use default pan (see offset 177 "Default pan value" below).
Independent of LOOP enable; the engine reads this bit
from the LOOP word as the canonical home for envelope-
level meta flags.
P (bit 13) : envelope present in source. Gates whether the mixer
applies envelope-driven pan at all. P=0 ⇒ mixer uses
channelPan only and the node array is ignored. P=1 ⇒
evaluate every tick, even when both LOOP.b and SUSTAIN.b
are 0 (envelope walks once and holds — IT pan-env
flag=0x01 idiom).
(bits 14..15 reserved)
19 Bit16 Pitch/Filter envelope LOOP word
* Always-active wrap region for the pitch/filter envelope.
0b 00P_sssss_mcb_eeeee
s (bits 12..8) : loop start index
e (bits 4..0) : loop end index
b (bit 5) : enable the LOOP
c (bit 6) : envelope carry
m (bit 7) : mode — 0 = pitch envelope, 1 = filter envelope
P (bit 13) : envelope present in source. Same semantics as the
pan envelope's P bit: gates whether the mixer applies
envelope-driven pitch / cutoff at all. P=0 ⇒ no
envelope contribution (sample plays at its own pitch /
default cutoff). P=1 ⇒ evaluate every tick regardless
of LOOP.b / SUSTAIN.b.
(bits 14..15 reserved)
21 Bit16x25 Volume envelopes
Byte 1: Volume (00..3F)
Byte 2: Time until the next point, in seconds (3.5 Unsigned Minifloat, biased; range 0..15.75 s, smallest non-zero step 1/256 s ≈ 3.91 ms — chosen so single tracker ticks resolve at every supported BPM). 0 = hold at this point indefinitely.
71 Bit16x25 Panning envelopes
Byte 1: Pan (00..FF)
Byte 2: Time until the next point, in seconds (3.5 Unsigned Minifloat, biased; range 0..15.75 s, smallest non-zero step 1/256 s ≈ 3.91 ms — chosen so single tracker ticks resolve at every supported BPM). 0 = hold at this point indefinitely.
121 Bit16x25 Pitch/Filter envelopes
Byte 1: Value (00..FF)
Byte 2: Time until the next point, in seconds (3.5 Unsigned Minifloat, biased; range 0..15.75 s, smallest non-zero step 1/256 s ≈ 3.91 ms — chosen so single tracker ticks resolve at every supported BPM). 0 = hold at this point indefinitely.
171 Uint8 Instrument Global Volume (0..255)
* ImpulseTracker has range of 0..128; multiply by (255/128) then round to int
- ImpulseTracker also has samplewise default volume (0..64) and samplewise global volume (0..64), and they must be taken into account because Taud has no samplewise config, following the ImpulseTracker spec
* FastTracker2 has range of 0..64; multiply by (255/64) then round to int
172 Uint8 Volume Fadeout low bits
173 Bit8 Volume Fadeout high bits
0b 0000 ffff
f: Volume Fadeout high bits (low nibble of byte 173; high nibble reserved, must be zero)
* Combined 12-bit unsigned value (range 0..4095). The engine maintains
a per-voice fadeoutVolume ∈ [0, 1] initialised to 1.0 on note-on, and
while the voice is in key-off or NNA Note-Fade state applies once per
song tick:
fadeoutVolume -= storedFadeout / 1024.0
clamp fadeoutVolume to [0, 1]
if fadeoutVolume == 0: voice deactivates
The voice's amplitude is multiplied by fadeoutVolume each tick.
* Stored value semantics (no separate "use fadeout" flag — like IT and
FT2 file formats, "no fade" and "instant cut" are both encoded as
extreme values of this same field):
- 0 : no fade. fadeoutVolume never moves; the voice plays
at envelope-driven volume indefinitely. Termination
must come from the volume envelope reaching a final
0-valued node, the sample ending, or a note-cut.
- 1..1023 : graduated fade. Completes in (1024 / storedFadeout)
ticks. e.g. 1 → 1024 ticks; 32 → 32 ticks.
- 1024 : exact 1-tick cut. fadeoutVolume goes 1.0 → 0.0 in
one tick (the canonical "kill on key-off" value).
- 1025..4095 : also a 1-tick cut (clamped at 0). The 4× headroom
over 1024 lets converters carry out-of-spec source
values without saturating prematurely.
* Tick-rate worked example at default 50 Hz (BPM 125, speed 6):
- storedFadeout = 1 → fade ≈ 20.5 s
- storedFadeout = 32 → fade ≈ 640 ms
- storedFadeout = 1024 → ~20 ms (one tick)
* Source-format mapping (converters scale source units → Taud field):
- IT: 16-bit field at IT instrument record offset 0x14, range
0..1024 per ITTECH (some loaders accept up to 2048). Schism's
per-tick decrement is stored / 1024 of unit volume — identical
to Taud's unit. Pass-through with clamp:
taud_fadeout = min(it_fadeout & 0xFFFF, 0x0FFF)
- FT2/XM: 16-bit field. Spec range 0..0xFFF; MilkyTracker writes
up to 32767 to encode the "cut" UI slider position
(SectionInstruments.cpp:499-500). FT2's per-tick decrement is
stored / 32768 of unit volume — to match Taud's stored / 1024
rate, divide source by 32 (round-to-nearest):
taud_fadeout = min((xm_fadeout + 16) // 32, 0x0FFF)
XM stored 1..15 round to Taud 0 (originals were >11 min at 50 Hz
— effectively "no fade" anyway). Stored 32 → Taud 1 (~20 s).
Stored 32767 (Milky cut sentinel) → Taud 1024 (1-tick cut).
- MOD/S3M/MON: no instrument-level fadeout in source; converters
write 0 (notes retire on sample-end or pattern note-cut).
174 Uint8 Volume swing (0..255 full range)
175 Uint8 Vibrato speed
* ImpulseTracker has samplewise vibrato speed (0..64), and they must be taken into account because Taud has no samplewise config
* FastTracker2 has instrumentwise config (0..255)
* The spec follows FastTracker2, and conversion must be performed when importing from FastTracker2
176 Uint8 Vibrato sweep
* FastTracker2 instrument config
177 Uint8 Default pan value (0..255 full range, see offset 17 for the enable flag)
* ImpulseTracker has samplewise default pan and instrumentwise default pan, and they must be taken into account because Taud has no samplewise config
178 Uint16 Pitch-pan centre (4096-TET note value)
180 Sint8 Pitch-pan separation (-128..127 full range)
181 Uint8 Pan swing (0..255 full range)
182 Uint8 Default cutoff (0..254 full range, 255 to off (-1 on IT). Effect range equals to that of ImpulseTracker -- 127 in IT is equal to 254 in Taud)
183 Uint8 Default resonance (0..254 full range, 255 to off (-1 on IT). Effect range equals to that of ImpulseTracker -- 127 in IT is equal to 254 in Taud)
184 Uint16 Sample detune (in 4096-TET unit) (FT2 finetune scale need to be rescaled accordingly)
186 Bit8 Instrument Flag
0b 000 www nn
n: New note action. 00: note off, 01: note cut, 10: continue, 11: note fade (arranged differently to IT)
ww: Vibrato waveform (IT: sample config, FT2: instrument config). 00: sine, 01: ramp-down saw, 10: square, 11: random, 100: ramp-up saw (FT2 only)
187 Uint8 Vibrato Depth (0..255 full range)
* ImpulseTracker has range of 0..32 ON THE SAMPLE SETTINGS; multiply by (255/32) then round to int
* FastTracker2 has range of 0..16; multiply by (255/16) then round to int
188 Uint8 Vibrato Rate (0..255 full range)
* ImpulseTracker sample config. The spec follows ImpulseTracker precisely
189 Bit16 Volume envelope SUSTAIN word
* Wrap region active ONLY while key is on. Released on key-off.
* FT2 single-point sustain: store sus_start == sus_end (the engine
wraps that index → itself, so the envelope holds there).
* IT sustain loop: store sus_start <= sus_end (engine wraps the range
while key is on; same shape as the LOOP word).
0b 000_sssss_00b_eeeee
s (bits 12..8) : sustain start index (0..24)
e (bits 4..0) : sustain end index (0..24)
b (bit 5) : enable the SUSTAIN (0 = no sustain wrap)
(bits 6..7, 13..15 reserved — the 'c' carry bit lives in the LOOP word)
191 Bit16 Panning envelope SUSTAIN word
* Same encoding as offset 189, applied to the pan envelope.
0b 000_sssss_00b_eeeee
193 Bit16 Pitch/Filter envelope SUSTAIN word
* Same encoding as offset 189, applied to the pitch/filter envelope.
0b 000_sssss_00b_eeeee
195 Bit8 Duplicate Check / Action (IT-only; FT2 leaves this 0)
0b 0000 dcdt
dt (bits 0..1) : Duplicate Check Type. 0=off, 1=note, 2=sample, 3=instrument.
dc (bits 2..3) : Duplicate Check Action. 0=note cut, 1=note off, 2=note fade.
* Relocated from offset 189 (which is now the volume sustain word) on 2026-05-06.
* Semantics (matches IT/Schism player/effects.c:1664-1764 csf_check_nna):
- Fires on every fresh foreground note trigger on a channel, BEFORE the
NNA-spawn step that would ghost the existing voice. Does NOT fire on
tone portamento, on note-off (0x0000), on note-cut (0xFFFE), or on
empty cells.
- The DCT/DCA values consulted belong to the EXISTING voice's instrument
(i.e. the OLD note's instrument, not the incoming note's). Different
instruments on the same channel can therefore have asymmetric duplicate
behaviour — IT-correct.
- Targets: the foreground voice on the same channel AND every background
(NNA-ghost) voice spawned earlier from that channel. Each is checked
independently against the new (instrument, note) pair.
- DCT match conditions:
off (0) : never matches; DCA never fires
note (1) : same noteVal AND same instrumentId
sample (2) : same instrumentId AND same canonical sample (matched
by samplePtr + sampleLength)
instrument (3) : same instrumentId
- DCA actions on a matching voice:
note cut (0) : fadeoutVolume := 0; voice deactivates this tick
note off (1) : keyOff := true (sustain releases; volume envelope
continues past the sustain point; if the instrument
carries a non-zero fadeout, the fadeout decay starts
per byte 172/173 semantics)
note fade (2) : noteFading := true (begin fadeout immediately, no
sustain release — sample/envelope loops continue)
- Order with NNA: applyDuplicateCheck → maybeSpawnBackgroundForNNA →
triggerNote. So when DCA flags the foreground voice, the NNA-ghost it
spawns inherits that DCA-modified state (e.g. noteFading carries over).
- The new note then triggers normally on the foreground channel.
196..255 Reserved (60 bytes free for future per-instrument fields)
TODO:
[x] implement Instrument Flag, Vibrato Depth, Vibrato Rate, other samplewise/instrumentwise changes to it2taud and audio engine
[x] implement new note action on the audio engine (IT uses "background channels", maybe we can do the same but make "background channels" mixer-private)
[x] (same context as above) implement S7x command
[x] on playback, panning changes randomly on Taud made by s3m2taud.py and mod2taud.py, but not by it2taud.py (maybe something's off with the instrument exports?)
[x] NNA not disabled for S3M and MOD
[x] `S B000` and `S B100` not working as intended -- on first playback it jumps to the next cue same row, on subsequent playbacks the commands are completely ignored
[x] implement S6x command
[x] implement Wxx command (global volume slide)
[x] implement sample loop sustain
"Caveat: on a foreground voice, key-off (row.note == 0x0000) currently sets voice.active = false at AudioAdapter.kt:1713, which silences the channel immediately. Sustain-loop escape therefore only takes effect on background voices spawned by NNA "Note Off" — which matches the IT idiom of layering a new note over a sustained one. Let me know if you also want the foreground key-off to keep the voice playing through fadeout."
[x] cue and pattern compression of the Taud format (taud_common.py, taud.mjs)
[x] figure out how IT (0..256) and FT2 (0..FFF + cut) handles volume fadeout numbers, and come up with a compatible Taud spec, then implement
[x] Pitchbend on Amiga frequency mode sometimes works right, sometimes works wrong. (effect underdelivers) Affects every song with Amiga picth mode, AND ON THE fresh taut.js session only
[x] Fix 4THSYM.it filters
[x] 4THSYM.it: pitchbend is wrong, some notes keep playing (loudly!) even if new notes are emitted
[x] `*2taud.py`: some notes are emitted with wrong volume-set command. Tested with GSLINGER.mod: on order 0x15 channel 1, mod2taud.py emits volume 8 -- also many of the effects are dropped. Suggested solution: currently all converters write default volume to the voleff when original modules (.mod/.s3m/.it) specify nothing; we should also write nothing and let the engine resolve the value just like other trackers do (also we now have "Instrument Global Volume" on instrument definition unlike the other time). This bug may affecting other formats, not just mod2taud.py, as well
[x] nearly_there_.mod: `C#5 SD300 / ... / C-5 SD200 / A#4 / G#4 (at tickspeed 4)`: every `C-5 SD200` (there are four occurances) gets skipped
[ ] low-number voleffs are too quiet (needs elaboration and test cases)
[x] scale Oxxxx when samples get resampled
[x] implement bitcrusher and overdrive (eff sym '8' and '9')
[x] note trigger with inst and note fx set (e.g. portamento) but no volume set is not getting their default volume but getting what was before instead (SATELL.taud ptn 23) -- and simulateRowState() of taut.js always shows old volume instead of default volume, regardless of note fx's existence
[x] how does fadeout=0 work on IT? On XM, the note don't decay at all (that's why there's separate CUT value). Also see what Global Behaviour 'm' flag actually do on Taud (or, which slop AI had fed me *sigh*). `slumberjack.xm` plays normally but notes of `4THSYM.it` don't decay at all
Resolution: confirmed against schismtracker (player/sndmix.c:330-342) and
ft2-clone (src/ft2_replayer.c:1467-1481). Both IT and FT2 treat stored
fadeout=0 as "no fade" — there is no separate "use fadeout" flag in
either file format; "cut" is just the slider-extreme of the same
magnitude (MilkyTracker SectionInstruments.cpp:499-500 maps the slider's
4097th position to internal 32767). The 'm' flag's claim that FT2 cuts
on key-off when fadeout=0 was AI slop. Dropped the flag entirely; the
engine now uses a single divisor (1024) and converters scale their
source units to match (IT pass-through, XM ÷32). See byte 172-173 of
the instrument record for engine semantics.
Subsequent fixes for the 4THSYM.it hang:
(1) Implemented Schism's envelope-end + last-value-0 ⇒ cut rule
(player/sndmix.c:493-498) in AudioAdapter.kt advanceEnvelope.
(2) Volume envelope evaluation ungated from LOOP/SUSTAIN `b` bits.
IT envelopes with flags=0x01 (enabled-no-loop-no-sustain) had been
skipped because vEnvActive required either b bit. Now evaluation
is gated only by voice.volEnvOn (matches CHN_VOLENV in Schism).
See byte 15 spec for the LOOP word.
[x] Same gate fix needed for pan and pitch/filter envelopes.
Resolution (2026-05-06): added P (envelope present) bit at LOOP-word
bit 13 (offsets 16/18/20 bit 5) for all three envelopes. Engine
gates pan/pf envelope evaluation on P alone; converters set P=1
whenever they emit envelope nodes, regardless of LOOP/SUSTAIN
enable, so an enabled-no-wrap envelope (IT pan-env flag=0x01)
animates correctly. Mixer's hasPanEnv/hasPfEnv read the same gate,
so absent envelopes still bypass envelope-driven output. Pre-
2026-05-06 .taud files predate the P bit and need re-conversion
for pan/pf envelopes to play. See byte 15/17/19 spec for the LOOP
word bit layout.
[x] slumberjack.xm: E6x commands are not processed
[x] implement linear-freq tone mode (MONOTONE compat)
Resolution: ff=2 in song-table flags byte (was reserved). E / F / G
arguments are interpreted as Hz/tick at A4 = 440 Hz / C4 ≈ 261.6256 Hz
reference, exactly matching MONOTONE's MT_PLAY.PAS `Frequency`
arithmetic (MTSRC/MT_PLAY.PAS:606-630). Per-voice `linearFreq` cache
in AudioAdapter.kt preserves sub-noteVal precision across ticks; the
Voice cache reseeds on note trigger, fine slides, S$2x finetune, and
the start of a fresh multi-tick coarse slide. mon2taud.py now emits
Hz values verbatim (no SLIDE_UNITS_PER_HZ scaling) and sets the
linear-freq flag in the song-table flags byte. Spec details in
TAUD_NOTE_EFFECTS.md §1, §E, §F, §G.
[x] milkytracker-style volume ramping (on sample-end only)
[x] make Cues tab move faster
Resolution: Cues panel now uses memory-shift (`shiftOrdersAreaHorizontal`)
for LEFT/RIGHT and `shiftPatternArea` for UP/DOWN, plus per-row
(`drawOrdersRowAt`) and per-column (`drawOrdersVoiceColumnAt`) helpers,
replacing the full-panel redraw on every keystroke.
[x] volume and panning policy to match note effect policy: when note is "retriggerred" (note command with instrument specified), the volume/pan must take default value; if not (note command with instrument 0) the volume/pan must stay at the old value. Make both audio engine and taut.js simulator changes.
[ ] xm volume column commands (+x, -x, Dx, Lx, Mx, Px, Rx, Sx, Ux, Vx) are completely ignored
[x] theday.xm order 0x28, channel 6..8 has 'note trigger with inst 1 but no volume -> key-off -> set-volume to 0x20 -> key-off -> set-volume to 0x10 -> key-off -> ...' and it sounds like gating: key-off silences the output, set-volume turns on the output again; notably, this behaviour only works when volume envelope is turned off (any fadeouts progress normally). FT2's keyOff (ft2_replayer.c:411-435) zeroes realVol/outVol when the volume envelope is disabled — IT/Schism does not, and Taud's engine follows IT semantics (no fade when fadeStep == 0). Resolved in xm2taud.py: a pre-pass tracks per-channel bound XM instrument across the order-list walk, and any key-off cell whose bound instrument has vol_env_type & XM_ENV_ON == 0 is paired with `SEL_SET vol=0` in the same row. A subsequent vol-col SET on the channel restores audibility — exactly mirroring FT2's outVol/realVol gate without diverging the engine. Engine semantics stay IT-pure.
[ ] remove panning mode selection and replace global panning rule to 3 dB rule (not the equal energy)
[x] FT2/MOD double effects with 00 as arg (500, 600) missing volume column -> easiest solution: fully implement `L xy00` and `K xy00` and map 5xx to L, 6xx to K (xm2taud, mod2taud), Kxy and Lxy verbatim (s3m2taud.py, it2taud.py). This is justified because the volume effects rely on memory when 00 is given, and said memory effect only get recalled when NoteFx is used. TAUD_NOTE_EFFECTS already has detailed implementation notes. Mark those two commands as implemented sorely for tracker compatibility.
Also document then implement `Mxx` (set channel volume, not just a note: 0x00 to 0x3F) `Nxy` (channel volume slide: similar to Dxy, but applies to the current channel's volume, not just a note) `Pxy` (channel panning slide. Similar to Dxx: P0y - to the right, Px0 - to the left, PFy - fine pan right, PxF - fine pan left) effects
[ ] 8 MB sample RAM via 512k banks
Play Data: play data are series of tracker-like instructions, visualised as:
rr||NOTE|Ins|E.Vol|E.Pan|EE.ff|
63||FFFF|255|3 63|3 63|FF FFFF| (8 bytes per line, 512 bytes per pattern, 128 patterns on 64 kB bank, 32 banks available (pattern 0xFFF -- bank 31, pattern 127 is a sentinel value for no-pattern))
notes are tuned as 4096 Tone-Equal Temperament. Tuning is set per-sample using their Sampling rate value.
Special values:
note 0xFFFF: no-op
note 0xFFFE: note cut
note 0x0000: key-off
inst 0: no instrument change
Audio Adapter MMIO
0..1 RW: Play head #0 position
PCM mode: number of buffers uploaded and received by the adapter (writing does nothing)
Tracker mode: current position in the cuesheet (writing changes current position in the cuesheet and resets pattern cursor back to zero)
2..3 RW: Play head #0 length param
PCM mode: length of the samples to upload to the speaker
Tracker mode:
Byte 2: Play data 1 bank
Byte 3: Play data 2 bank
4 RW: Play head #0 master volume
5 RW: Play head #0 master pan
6..9 RW: Play head #0 flags (see below)
10..11 RW:Play head #1 position
12..13 RW:Play head #1 length param
14 RW: Play head #1 master volume
15 RW: Play head #1 master pan
16..19 RW:Play head #1 flags
... auto-fill to Play head #4
40 WO: MP2 Decoder Control
Write 16 to initialise the MP2 context (call this before the decoding of NEW music)
Write 1 to decode the frame as MP2
Calling with more than one bit set will result in UNDEFINED BEHAVIOUR
41 RO: MP2 Decoder Status
Non-zero value indicates the decoder is busy. Different value may indicate different decoder status.
42 WO: TAD Decoder Control
Write 1 to decode TAD data
43 RW: TAD Quality
Must be set to appropriate value before decoding
44 RW: TAD Decoder Status
Non-zero value indicates the decoder is busy. Different value may indicate different decoder status.
45 RW: Select PCM Bin for playhead (writing causes side effects)
64..2367 RW: MP2 Decoded Samples (unsigned 8-bit stereo)
2368..4095 RW: MP2 Frame to be decoded
4096..4097 RO: MP2 Frame guard bytes; always return 0 on read
Sound Hardware Info
- Sampling rate: 32000 Hz
- Bit depth: 8 bits/sample, unsigned
- Always operate in stereo (mono samples must be expanded to stereo before uploading)
Play Head Flags
Byte 1
- 0b mrqp ssss
m: mode (0 for Tracker, 1 for PCM)
r: reset parameters; always 0 when read
resetting will:
set position to 0,
set length param to 0,
set queue capacity to 8 samples,
unset play bit
q: purge queues (likely do nothing if not PCM); always 0 when read
p: play (0 if not -- mute all output)
ssss: PCM Mode set PCM Queue Size
0 - 4 samples
1 - 6 samples
2 - 8 samples (the default size)
3 - 12 samples
4 - 16 samples
5 - 24 samples
6 - 32 samples
7 - 48 samples
8 - 64 samples
9 - 96 samples
10 - 128 samples
11 - 192 samples
12 - 256 samples
13 - 384 samples
14 - 512 samples
15 - 768 samples
NOTE: changing from PCM mode to Tracker mode or vice versa will also reset the parameters as described above
Byte 2
- PCM Mode: Write non-zero value to start uploading; always 0 when read
- Tracker Mode: Global mixer flags. Maps directly to Taud effect symbol '1'
0b 0000 0ffp
p: panning mode (0: linear, 1: equal-power)
ff: pitchshift mode (0: linear pitch slides, 1: Amiga period slides, 2: linear-frequency slides, 3: reserved)
Tracker command may change the mixer state, but the changes WILL NOT BE REFLECTED BACK.
Starting a new song will use whatever written to this register. In other words, changes
made by songs will not persist.
Byte 3 (Tracker Mode)
- BPM (24 to 279. Play Data will change this register)
Byte 4 (Tracker Mode)
- Tick Rate (Play Data will change this register)
Uploaded PCM data will be stored onto the queue before being consumed by hardware.
If the queue is full, any more uploads will be silently discarded.
32768..65535 RW: Cue Sheet (1024 cues)
Byte 1..10: Pattern number low nybble for voice 1..20
Byte 11..20: Pattern number middle nybble for voice 1..20
Byte 21..30: Pattern number high nybble for voice 1..20
To recap:
Byte 1..10: 0b loV1 loV2, 0b loV3 loV4, 0b loV5 loV6, ... 0b loV19 loV20
Byte 11..20: 0b miV1 miV2, 0b miV3 miV4, 0b miV5 miV6, ... 0b miV19 miV20
Byte 21..30: 0b hiV1 hiV2, 0b hiV3 hiV4, 0b hiV5 hiV6, ... 0b hiV19 hiV20
Byte 31..32: instruction
1000xxxx yyyyyyyy (BAK000) - Go back 0bxxxxyyyyyyyy patterns
1001xxxx yyyyyyyy (FWD000) - Skip forward 0bxxxxyyyyyyyy patterns
1111xxxx yyyyyyyy (JMP000) - Go to absolute pattern number 0bxxxxyyyyyyyy
00000010 00xxxxxx (LEN 00) - Pattern length for this cue (0..63), where 0: 1 row, 63: 64 rows (decoded by AudioAdapter as of 2026-05-05; emitted by xm2taud / it2taud for non-multiple-of-64 source patterns)
00000001 00000000 - Halt (HALT ) - Play the full length of the pattern then stop the playback
00000001 00xxxxxx - Fadeout (FADOUT) - Gradually decrease global volume such that at row 0bxxxxxx it reaches zero, then stop the playback
00000000 - No operation
65536..131071 RW: PCM Sample buffer
Table of 3.5 Minifloat values (CSV).
Rebiased 2026-05-07 so the smallest non-zero step is 1/256 s and the maximum
is 15.75 s — every cell is the original LUT value divided by 8. Chosen for
tracker envelopes: a single song tick (≈ 8.9 ms at BPM 280, ≈ 41.7 ms at
BPM 24) now lands within ±17 % of an LUT entry across the whole supported
BPM range; the previous bias was ±150 % at common BPMs.
,000,001,010,011,100,101,110,111,MSB
00000,0,0.125,0.25,0.5,1,2,4,8
00001,0.00390625,0.12890625,0.2578125,0.515625,1.03125,2.0625,4.125,8.25
00010,0.0078125,0.1328125,0.265625,0.53125,1.0625,2.125,4.25,8.5
00011,0.01171875,0.13671875,0.2734375,0.546875,1.09375,2.1875,4.375,8.75
00100,0.015625,0.140625,0.28125,0.5625,1.125,2.25,4.5,9
00101,0.01953125,0.14453125,0.2890625,0.578125,1.15625,2.3125,4.625,9.25
00110,0.0234375,0.1484375,0.296875,0.59375,1.1875,2.375,4.75,9.5
00111,0.02734375,0.15234375,0.3046875,0.609375,1.21875,2.4375,4.875,9.75
01000,0.03125,0.15625,0.3125,0.625,1.25,2.5,5,10
01001,0.03515625,0.16015625,0.3203125,0.640625,1.28125,2.5625,5.125,10.25
01010,0.0390625,0.1640625,0.328125,0.65625,1.3125,2.625,5.25,10.5
01011,0.04296875,0.16796875,0.3359375,0.671875,1.34375,2.6875,5.375,10.75
01100,0.046875,0.171875,0.34375,0.6875,1.375,2.75,5.5,11
01101,0.05078125,0.17578125,0.3515625,0.703125,1.40625,2.8125,5.625,11.25
01110,0.0546875,0.1796875,0.359375,0.71875,1.4375,2.875,5.75,11.5
01111,0.05859375,0.18359375,0.3671875,0.734375,1.46875,2.9375,5.875,11.75
10000,0.0625,0.1875,0.375,0.75,1.5,3,6,12
10001,0.06640625,0.19140625,0.3828125,0.765625,1.53125,3.0625,6.125,12.25
10010,0.0703125,0.1953125,0.390625,0.78125,1.5625,3.125,6.25,12.5
10011,0.07421875,0.19921875,0.3984375,0.796875,1.59375,3.1875,6.375,12.75
10100,0.078125,0.203125,0.40625,0.8125,1.625,3.25,6.5,13
10101,0.08203125,0.20703125,0.4140625,0.828125,1.65625,3.3125,6.625,13.25
10110,0.0859375,0.2109375,0.421875,0.84375,1.6875,3.375,6.75,13.5
10111,0.08984375,0.21484375,0.4296875,0.859375,1.71875,3.4375,6.875,13.75
11000,0.09375,0.21875,0.4375,0.875,1.75,3.5,7,14
11001,0.09765625,0.22265625,0.4453125,0.890625,1.78125,3.5625,7.125,14.25
11010,0.1015625,0.2265625,0.453125,0.90625,1.8125,3.625,7.25,14.5
11011,0.10546875,0.23046875,0.4609375,0.921875,1.84375,3.6875,7.375,14.75
11100,0.109375,0.234375,0.46875,0.9375,1.875,3.75,7.5,15
11101,0.11328125,0.23828125,0.4765625,0.953125,1.90625,3.8125,7.625,15.25
11110,0.1171875,0.2421875,0.484375,0.96875,1.9375,3.875,7.75,15.5
11111,0.12109375,0.24609375,0.4921875,0.984375,1.96875,3.9375,7.875,15.75
LSB
## Tracker Note Effects
Tracker Note Effects has been moved to `TAUD_NOTE_EFFECTS.md`
--------------------------------------------------------------------------------
**Taud serialisation format**
Created by CuriousTorvald on 2026-04-19
This is a file format for Taud tracker data. Taud can be extended with Microtone (taut.js) project data in backward-and-forward-compatible manner.
Endianness: Little
# File Structure
\x1F T S V M a u d
[HEADER]
[SAMPLE+INSTRUMENT BIN IMAGE (GZip or Zstd compressed. Read 4-byte magic to determine)]
[SONG TABLE]
[PATTERN BIN for SONG 0 (GZip or Zstd compressed)]
[CUE SHEET for SONG 0 (GZip or Zstd compressed)]
[PATTERN BIN for SONG 1 (GZip or Zstd compressed)]
[CUE SHEET for SONG 1 (GZip or Zstd compressed)]
[PATTERN BIN for SONG 2 (GZip or Zstd compressed)]
[CUE SHEET for SONG 2 (GZip or Zstd compressed)]
...
[PROJECT DATA] (optional)
[DATA BLOCKS WITH FOURCC HEADER (see Project Data section)]
## Header
Byte[8] Magic
Uint8 Format version (always 1)
Uint8 Number of songs in SONG TABLE
Uint32 Compressed size of SAMPLE+INST section (used to calculate offset to SONG TABLE)
Uint32 Offset to Project Data. Zero if Project Data is nonexistent
Byte[14]Tracker/Converter signature
## Song Table
* Rows of 32 bytes:
Uint32 Song offset
Uint8 Number of voices
Uint16 Number of patterns (0 is invalid. pattern bin length = numPats * 8 bytes)
Uint8 Initial BPM (bias of -24. 0x00=24, 0xFF=279)
Uint8 Initial Tickrate (0 is invalid)
Uint16 Current Tuning base note (1..65533). A4 (western default) is 0x5C00. C9 (tracker default) is 0xA000. If zero, assume the tracker default value
Float32 Frequency at the base note. Tracker default is 8363.0. If zero, assume the tracker default
Uint8 Flags for Global Behaviour (effect symbol '1')
0b 0000 0ffp
p: panning law (0: linear, 1: equal-power)
ff: tone mode (0: linear pitch slides, 1: Amiga period slides, 2: linear-frequency slides, 3: reserved)
(bit 2 reserved — was 'm' fadeout-zero policy, removed; fadeout
scaling now lives entirely in the converter — see byte 172/173
of the instrument record for engine semantics)
Uint8 Song global volume
* ImpulseTracker has range of 0..128; multiply by (255/128) then round to int
Uint8 Song mixing volume
* ImpulseTracker has range of 0..128; multiply by (255/128) then round to int
Uint32 Compressed size of PATTERN BIN for this song
Uint32 Compressed size of CUE SHEET for this song
Byte[6] Reserved
Taud device can queue up to 2 "playdata" in its buffer, which can be interpreted as a song.
* Known standard tunings:
A4 @ 440 Hz. ISO standard
A4 @ 435 Hz. Former French standard (year 1859)
A4 @ 452 Hz. Old Philharmonic pitch (19th century Britain)
C4 @ 256 Hz. Power of two
C4 @ 262 Hz. Modern Chinese a-ak tuning convention
C4 @ 311 Hz. Korean hyang-ak tuning standard (ROK National Gugak Center)
For your reference, tracker default tuning at A4 is 439.526 Hz (8363*2^(3/4) / 32)
## Pattern Bin and Cue Sheet
RAM image of Pattern Bin/Cue Sheet
## Project Data
Project Data is just a concatenation of blocks identified by their FourCC.
Byte[8] Magic (\x1E T a u d P r J)
Byte[8] Reserved
* Repetition of
Byte[4] Title of the section (fourcc)
Uint32 Section length
Byte[*] Section payload
### Predefined sections
prefixes:
- P: Project
- I: Instrument
- p: Pattern
- S: Sample
- s: Song
* PCom. Project author. Encoding: UTF-8
* PCpr. Project copyright string. Encoding: UTF-8
* PNam. Project name. Encoding: UTF-8
* INam. Instrument name table. Strings separated by 0x1E
* pNam. Pattern name table. Strings separated by 0x1E
* SNam. Sample name table. Strings separated by 0x1E
* sMet. Song metadata table
* Repetition of:
Uint8 Song index
Uint32 Size of this table following this field
Uint16 Notation used for this song (takes notation index)
0: raw numbers
10*n: TET-number times 10 (12-TET = 120)
* Following systems have alternative notation conventions:
531: 53-TET Pythagorean Notation
* Following list defines ethnic notations in 12-tone scale
10121: Pythagorean Diminished Fifth
10122: Pythagorean Augmented Fourth
10123: Shi'er lü (East Asian traditional tuning)
Uint8 Primary beat division (default: 4 rows)
Uint8 Secondary beat division (default: 16 rows)
Byte[*] Song name, null terminated. Encoding: UTF-8
Byte[*] Song composer, null terminated. Encoding: UTF-8
Byte[*] Song copyright string, null terminated. Encoding: UTF-8
* nota. Custom notation definition (version 'a')
* Repetition of:
Uint8 Notation index (starting from zero) used by songs
Uint32 Size of this notation following this field
Uint16 Reserved for flags
Float32 Interval size (octave system = 2.0f). If you are not using an interval system (which means you are responsible for defining every note expressible), this must be NaN. 0f and Infinity are considered illegal
Uint16 Notes between interval MINUS ONE (or octave); 12-TET will have value 11
Byte[8] Reserved
Byte[*] Name, null terminated. Encoding: UTF-8
Byte[*] Notation table. 0xFF-separated and null-terminated. Encoding: Taud charset
Uint16[*] Frequency table. Size of the table is defined by "Notes between interval MINUS ONE". This is a lookup table of relative pitch offsets (against the base tuning note) in 4096-TET space. Index zero of this table will be 0x0 if you read the spec right
Note: custom notations will use internal index 65535 down to 65520 (index 0 = 65535, index 15 = 65520)
Note Tuning:
1. "Base Note at C4" will be derived using "Current Tuning Base Note" and "Frequency at the Base Note" from the song table. If the values are A4,440Hz, it will be converted to C4,261.6255653Hz
2. Frequency at C5 will be (Base Note at C4) × (Interval Size)
3. 4096 notes will be equidistance-distributed between (Frequency at C3) and (Frequency at C4), with logarithmic pitch progression; this builds the frequency-offset table
4. Frequency-Offset Table from the previous step will be applied against the "Base Note at C3" to construct the notes within the notation. Value at index zero of the Frequency Table must be 0
5. The progress will continue outside the "root interval" (C3..C4) to build a complete note-to-frequency table
Note: if your sample is pre-tuned for your system, keep the project setting as the defaults. If you are not working with the conventional octave system, you still need to specify the Interval Size
* Suggested notation serialisation format (for notation editor, etc.)
Byte[8] Magic (\x1E T a u d n o t)
Uint8 Version (Ascii 'a')
Bytes Notation definitions (see above)
--------------------------------------------------------------------------------
**S3M (ScreamTracker 3) to Taud conversion notes**
(Implemented in s3m2taud.py)
Created by CuriousTorvald on 2026-04-20
## Instrument indexing
S3M instrument numbers are 1-based on disk and in pattern cells. Taud's cell instrument byte preserves this: 0 means "no instrument change, reuse whatever was last loaded on this channel"; 1..255 select an instrument slot. The converter passes the raw S3M instrument byte through unchanged (no subtract-1). The instrument bin is written at base = instrument_index * 64, with slot 0 left as an empty/silent entry.
## Effect encoding
Taud opcodes are base-36 digit values: digits 0..9 map to bytes 0x00..0x09; letters A..Z map to bytes 0x0A..0x23. Effects are encoded into a 1-byte opcode plus a 2-byte argument.
## ST3 shared-memory recall (pre-pass)
ST3 backs effects D, E, F, I, J, K, L, Q, R, and S with a single per-channel memory slot. A $00 argument on any of these recalls the last non-zero argument. Taud uses narrower per-cohort memory, so the converter walks patterns in order-list order (per channel) and replaces every $00-arg recall with the current slot value before encoding. Patterns reused by multiple order entries are mutated once on their first visit; later visits may diverge from the ST3 original if cross-pattern memory state changed, but this is acceptable for typical usage.
## Cxx BCD decode
ST3 stores pattern-break row numbers as BCD on disk ($10 means decimal row 10, not hex row 16). The converter decodes: row = (byte >> 4) * 10 + (byte & 0xF). Values that decode to 64 or above clamp to row 0.
## Pitch slide unit
ST3's coarse slide unit is 1/16 of a semitone. One semitone in Taud's 4096-TET grid is 4096/12 ≈ 341.33 units. One 1/16 semitone ≈ 21.33 units ≈ $0015. All E/F/G coarse arguments are therefore multiplied by $0015. Fine slide forms ($Fx, $Ex) are packed into Taud's $F0xx fine form after the same per-step scale.
## J arpeggio (12-TET to 4096-TET)
ST3 Jxy nibbles are 12-TET semitone offsets (0..15). Taud's J argument uses the high byte of a 16-bit pitch delta; one byte = 256 units ≈ 0.75 semitones.
Conversion: byte = round(semitones * 4 / 3).
The full lookup table:
Semitones 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Taud byte $00 $01 $03 $04 $05 $07 $08 $09 $0B $0C $0D $0F $10 $11 $13 $14
## K and L effects
The engine treats K and L as no-ops. The converter splits each into two parts:
K → effect column H $0000 (recall vibrato from HU memory) plus a volume-column slide derived from K's argument
L → effect column G $0000 plus the same volume-column slide. If the S3M cell already carries an explicit volume-column byte, the slide half is dropped with a -v warning.
## M, N (channel volume), X, P (pan) folding
M (set channel volume) and N (channel-vol slide) fold into the volume column. X (set pan) and P (pan slide) fold into the pan column. These effects consume no space in the effect slot. W (global vol slide) and Y (panbrello) are dropped with a -v warning.
## Volume column defaults
When a note trigger is present in a cell with no explicit S3M volume byte, the converter emits SEL_SET (selector 0) with the instrument's default volume. This prevents the channel's prior volume state from persisting into a fresh note. Cells with no note trigger and no explicit volume emit SEL_FINE value 0 (fine slide of 0 = no-op), which leaves channel volume unchanged.
## Pan column defaults
Row 0 of every pattern emits SEL_SET with the channel's default pan (derived from the S3M channel-setting byte: channels 0-7 → left ($10), channels 8-15 → right ($2F), otherwise centre ($1F)). All other rows emit SEL_FINE value 0 (no-op) unless an X, P, or S$8x effect overrides.
## Cue sheet halt placement
The halt instruction (byte value 0x01 at cue offset 30) is placed on the last active cue entry, not in a separate empty cue appended after it. This ensures playback stops immediately after the last pattern row completes, with no silent 64-row gap.
## Tempo mapping
S3M BPM is stored as a raw decimal value. Taud's initial BPM byte uses a bias of -24 (byte 0x00 = 24 BPM, 0xFF = 279 BPM). Conversion: taud_byte = bpm - 24. The converter also scans row 0 of the first pattern in the order list for A (set speed) and T (set tempo) effects and uses those values in preference to the S3M header defaults.
## Global volume
ST3 global volume is 0..$40; Taud's is 0..$FF. Import scale: Taud_vol = ST3_vol × 4 (clamped to $FF).
--------------------------------------------------------------------------------
RomBank / RamBank
Endianness: Little
MMIO
0 RW : Bank number for the first 512 kbytes
1 RW : Bank number for the last 512 kbytes
16..23 RW : DMA Control for Lane 1..8
Write 0x01: copy from Core to Peripheral
Write 0x02: copy from Peripheral to Core
* NOTE: after the transfer, the bank numbers will revert to the value that was before the operation
24..31 RW : DMA Control reserved
32..34 RW : DMA Lane 1 -- Addr on the Core Memory
35..37 RW : DMA Lane 1 -- Addr on the Peripheral's Memory (addr can be across-the-bank)
38..40 RW : DMA Lane 1 -- Transfer Length
41..42 RW : DMA Lane 1 -- First/Last Bank Number
43 RW : DMA Lane 1 -- (reserved)
44..55 RW : DMA Lane 2 Props
56..67 RW : DMA Lane 3 Props
68..79 RW : DMA Lane 4 Props
80..91 RW : DMA Lane 5 Props
92..103 RW : DMA Lane 6 Props
104..115 RW : DMA Lane 7 Props
116..127 RW : DMA Lane 8 Props
--------------------------------------------------------------------------------
High Speed Disk Peripheral Adapter (HSDPA)
An interface card to read and write to a single large disk sequentially which has no filesystem on it.
Endianness: Little
MMIO
0..2 RW: Block transfer status for Disk 1
0b nnnn nnnn, nnnn nnnn , a00z mmmm
n-read: size of the block from the other device, LSB (1048576-full block size is zero)
m-read: size of the block from the other device, MSB (1048576-full block size is zero)
a-read: if the other device hasNext (doYouHaveNext), false if device not present
z-read: set if the size is actually 0 instead of 1048576 (overrides n and m parameters)
n-write: size of the block I'm sending, LSB (1048576-full block size is zero)
m-write: size of the block I'm sending, MSB (1048576-full block size is zero)
a-write: if there's more to send (hasNext)
z-write: set if the size is actually 0 instead of 1048576 (overrides n and m parameters)
3..5 RW: Block transfer status for Disk 2
6..8 RW: Block transfer status for Disk 3
9..11 RW: Block transfer status for Disk 4
12..15 RW: Block transfer control for Disk 1 through 4
0b 0000 abcd
a: 1 for send, 0 for receive
b-write: 1 to start sending if a-bit is set; if a-bit is unset, make other device to start sending
b-read: if this bit is set, you're currently receiving something (aka busy)
c-write: I'm ready to receive
c-read: Are you ready to receive?
d-read: Are you there? (if the other device's recipient is myself)
NOTE: not ready AND not busy (bits b and d set when read) means the device is not connected to the port
16..19 RW: 8-bit status code for the disk
20 RW: Currently active disk (0: deselect all disk, 1: select disk #1, ...)
Selecting a disk will automatically unset and hold down "I'm ready to receive" flags of the other disks,
however, the target disk will NOT have its "I'm ready to receive" flag automatically set.
-- SEQUENTIAL IO SUPPORT MODULE --
NOTE: Sequential I/O will clobber the peripheral memory space.
256..257 RW: Sequential I/O control flags
258 RW: Opcode. Writing a value to this memory will execute the operation
0x00 - No operation
0x01 - Skip (arg 1) bytes
0x02 - Read (arg 1) bytes and store to core memory pointer (arg 2)
0x03 - Write (arg 1) bytes using data from the core memory from pointer (arg 2)
0xF0 - Rewind the file to the starting point
0xFF - Terminate sequential I/O session and free up the memory space
259..261 RW: Argument #1
262..264 RW: Argument #2
265..267 RW: Argument #3
268..270 RW: Argument #4
Memory Space
0..1048575 RW: Buffer for the block transfer lane
IMPLEMENTATION RECOMMENDATION: split the memory space into two 512K blocks, and when the sequential
reading reaches the second space, prepare the next bytes in the first memory space, so that the read
cursor reaches 1048576, it wraps into 0 and continue reading the content of the disk as if nothing happend.