mirror of
https://github.com/curioustorvald/tsvm.git
synced 2026-06-06 13:38:30 +09:00
2582 lines
96 KiB
Plaintext
2582 lines
96 KiB
Plaintext
1 byte = 2 pixels
|
||
|
||
560x448@4bpp = 125 440 bytes
|
||
560x448@8bpp = 250 880 bytes
|
||
|
||
-> 262144 bytes (256 kB)
|
||
|
||
[USER AREA | HW AREA]
|
||
|
||
Number of pheripherals = 8, of which the computer itself is considered as
|
||
a peripheral.
|
||
|
||
HW AREA = [Peripherals | MMIO | INTVEC]
|
||
|
||
User area: 8 MB, hardware area: 8 MB
|
||
|
||
8192 kB
|
||
User Space
|
||
1024 kB
|
||
Peripheral #7
|
||
1024 kB
|
||
Peripheral #6
|
||
...
|
||
1024 kB (where Peripheral #0 would be)
|
||
MMIO and Interrupt Vectors
|
||
128 kB
|
||
MMIO for Peri #7
|
||
128 kB
|
||
MMIO for Peri #6
|
||
...
|
||
128 kB (where Peripheral #0 would be)
|
||
MMIO for the computer
|
||
|
||
Certain memory mapper may allow extra 4 MB of User Space in exchange for the Peripheral slot #4 through #7.
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
IO Device
|
||
|
||
Endianness: little
|
||
Note: Always takes up the peripheral slot of zero
|
||
|
||
Latching: latching is used to "lock" the fluctuating values when you attempt to read them so you would get
|
||
reliable values when you try to read them, especially the multibyte values where another byte would
|
||
change after you read one byte, e.g. System uptime in nanoseconds
|
||
|
||
MMIO
|
||
|
||
0..31 RO: Raw Keyboard Buffer read. Won't shift the key buffer
|
||
32..33 RO: Mouse X pos
|
||
34..35 RO: Mouse Y pos
|
||
36 RO: Mouse down? (1 for TRUE, 0 for FALSE)
|
||
37 RW: Read/Write single key input. Key buffer will be shifted. Manual writing is
|
||
usually unnecessary as such action must be automatically managed via LibGDX
|
||
input processing.
|
||
Stores ASCII code representing the character, plus:
|
||
(1..26: Ctrl+[alph])
|
||
3 : Ctrl+C
|
||
4 : Ctrl+D
|
||
8 : Backspace
|
||
(13: Return)
|
||
19: Up arrow
|
||
20: Down arrow
|
||
21: Left arrow
|
||
22: Right arrow
|
||
38 RW: Request keyboard input be read (TTY Function). Write nonzero value to enable, write zero to
|
||
close it. Keyboard buffer will be cleared whenever request is received, so
|
||
MAKE SURE YOU REQUEST THE KEY INPUT ONLY ONCE!
|
||
39 WO: Latch Key/Mouse Input (Raw Input function). Write nonzero value to latch.
|
||
Stores LibGDX Key code
|
||
40..47 RO: Key Press buffer
|
||
stores keys that are held down. Can accomodate 8-key rollover (in keyboard geeks' terms)
|
||
0x0 is written for the empty area; numbers are always sorted
|
||
48..51 RO: System flags
|
||
48: 0b rq00 000t
|
||
t: STOP button (should raise SIGTERM)
|
||
r: RESET button (hypervisor should reset the system)
|
||
q: SysRq button (hypervisor should respond to it)
|
||
49: set to 1 if a key has pushed into key buffer (or, if the system has a key press to pull) via MMIO 38; othewise 0
|
||
|
||
64..67 RO: User area memory size in bytes
|
||
68 WO: Counter latch
|
||
0b 0000 00ba
|
||
a: System uptime
|
||
b: RTC
|
||
72..79 RO: System uptime in nanoseconds
|
||
80..87 RO: RTC in microseconds
|
||
|
||
88 RW: Rom mapping
|
||
write 0xFF to NOT map any rom
|
||
write 0x00 to map BIOS
|
||
write 0x01 to map first "extra ROM"
|
||
|
||
89 RW: BMS flags
|
||
0b P000 b0ca
|
||
a: 1 if charging (accepting power from the AC adapter)
|
||
c: 1 if battery is detected
|
||
b: 1 if the device is battery-operated
|
||
|
||
P: 1 if CPU halted (so that the "smart" power supply can shut itself down)
|
||
|
||
note: only the high nybbles are writable!
|
||
|
||
if the device is battery-operated but currently running off of an AC adapter and there is no battery inserted,
|
||
the flag would be 0000 1001
|
||
|
||
90 RO: BMS calculated battery percentage where 255 is 100%
|
||
91 RO: BMS battery voltage multiplied by 10 (127 = "12.7 V")
|
||
|
||
92 RW: Memory Mapping
|
||
0: 8 MB Core, 8 MB Hardware-reserved, 7 card slots
|
||
1: 12 MB Core, 4 MB Hardware-reserved, 3 card slots (HW addr 131072..1048575 cannot be reclaimed though)
|
||
|
||
1024..2047 RW: Reserved for integrated peripherals (e.g. built-in status display)
|
||
|
||
2048..4075 RW: Used by the hypervisor
|
||
2048..3071 RW: Interrupt vectors (0-255), 32-bit address. Used regardless of the existence of the hypervisor.
|
||
If hypervisor is installed, the interrupt calls are handled using the hypervisor
|
||
If no hypervisors are installed, the interrupt call is performed by the "hardware"
|
||
Interrupt Vector Table:
|
||
0x00 - Initial Stack Pointer (currently unused)
|
||
0x01 - Reset
|
||
0x02 - NMI
|
||
0x03 - Out of Memory
|
||
|
||
0x0C - IRQ_COM1
|
||
0x0D - IRQ_COM2
|
||
0x0E - IRQ_COM3
|
||
0x0F - IRQ_COM4
|
||
|
||
0x10 - Core Memory Access Violation
|
||
0x11 - Card 1 Access Violation
|
||
0x12 - Card 2 Access Violation
|
||
0x13 - Card 3 Access Violation
|
||
0x14 - Card 4 Access Violation
|
||
0x15 - Card 5 Access Violation
|
||
0x16 - Card 6 Access Violation
|
||
0x17 - Card 7 Access Violation
|
||
|
||
0x20 - IRQ_Core
|
||
0x21 - IRQ_CARD1
|
||
0x22 - IRQ_CARD2
|
||
0x23 - IRQ_CARD3
|
||
0x24 - IRQ_CARD4
|
||
0x25 - IRQ_CARD5
|
||
0x26 - IRQ_CARD6
|
||
0x27 - IRQ_CARD7
|
||
3072..3075 RW: Status flags
|
||
|
||
|
||
|
||
4076..4079 RW: 8-bit status code for the port
|
||
4080..4083 RO: 8-bit status code for connected device
|
||
|
||
4084..4091 RO: Block transfer status
|
||
0b nnnnnnnn a00z mmmm
|
||
|
||
n-read: size of the block from the other device, LSB (4096-full block size is zero)
|
||
m-read: size of the block from the other device, MSB (4096-full block size is zero)
|
||
a-read: if the other device hasNext (doYouHaveNext), false if device not present
|
||
z-read: set if the size is actually 0 instead of 4096 (overrides n and m parameters)
|
||
|
||
n-write: size of the block I'm sending, LSB (4096-full block size is zero)
|
||
m-write: size of the block I'm sending, MSB (4096-full block size is zero)
|
||
a-write: if there's more to send (hasNext)
|
||
z-write: set if the size is actually 0 instead of 4096 (overrides n and m parameters)
|
||
|
||
4092..4095 RW: Block transfer control for Port 1 through 4
|
||
0b 00ms abcd
|
||
|
||
m-readonly: device in master setup
|
||
s-readonly: device in slave setup
|
||
|
||
a: 1 for send, 0 for receive
|
||
|
||
b-write: 1 to start sending if a-bit is set; if a-bit is unset, make other device to start sending
|
||
b-read: if this bit is set, you're currently receiving something (aka busy)
|
||
|
||
c-write: I'm ready to receive
|
||
c-read: Are you ready to receive?
|
||
|
||
d-read: Are you there? (if the other device's recipient is myself)
|
||
|
||
NOTE: not ready AND not busy (bits b and d set when read) means the device is not connected to the port
|
||
|
||
4096..8191 RW: Buffer for block transfer lane #1
|
||
8192..12287 RW: Buffer for block transfer lane #2
|
||
12288..16383 RW: Buffer for block transfer lane #3
|
||
16384..20479 RW: Buffer for block transfer lane #4
|
||
|
||
65536..131071 RO: Mapped to ROM
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
VRAM Bank 0 (256 kB)
|
||
|
||
Endianness: little
|
||
|
||
|
||
Memory Space
|
||
|
||
250880 bytes
|
||
Framebuffer
|
||
3 bytes
|
||
Initial background (and the border) colour RGB, 8 bits per channel
|
||
1 byte
|
||
command (writing to this memory address changes the status)
|
||
1: reset palette to default
|
||
2: fill framebuffer with given colour (arg1)
|
||
3: do '1' then do '2' (with arg1) then do '4' (with arg2)
|
||
4: fill framebuffer2 with given colour (arg1)
|
||
|
||
16: copy Low Font ROM (char 0–127) to mapping area
|
||
17: copy High Font ROM (char 128–255) to mapping area
|
||
18: write contents of the font ROM mapping area to the Low Font ROM
|
||
19: write contents of the font ROM mapping area to the High Font ROM
|
||
20: reset Low Font ROM to default
|
||
21: reset High Font ROM to default
|
||
12 bytes
|
||
argument for "command" (arg1: Byte, arg2: Byte)
|
||
write to this address FIRST and then write to "command" to execute the command
|
||
1008 bytes
|
||
reserved
|
||
2046 bytes
|
||
unused
|
||
2 bytes
|
||
Cursor position in: (y*80 + x)
|
||
2560 bytes
|
||
Text foreground colours
|
||
2560 bytes
|
||
Text background colours
|
||
2560 bytes
|
||
Text buffer of 80x32 (7x14 character size, and yes: actual character data is on the bottom)
|
||
512 bytes
|
||
Palette stored in following pattern: 0b rrrr gggg, 0b bbbb aaaa, ....
|
||
Palette number 255 is always full transparent (bits being all zero)
|
||
|
||
MMIO
|
||
|
||
0..1 RO
|
||
Framebuffer width in pixels
|
||
2..3 RO
|
||
Framebuffer height in pixels
|
||
4 RO
|
||
Text mode columns
|
||
5 RO
|
||
Text mode rows
|
||
6 RW
|
||
Text-mode attributes
|
||
0b 0000 00rc (r: TTY Raw mode, c: Cursor blink)
|
||
7 RW
|
||
Graphics-mode attributes
|
||
0b 0000 rrrr (r: Resolution/colour depth)
|
||
8 RO
|
||
Last used colour (set by poking at the framebuffer)
|
||
9 RW
|
||
current TTY foreground colour (useful for print() function)
|
||
10 RW
|
||
current TTY background colour (useful for print() function)
|
||
11 RO
|
||
Number of Banks, or VRAM size (1 = 256 kB, max 4)
|
||
12 RW
|
||
Graphics Mode
|
||
0: 560x448, 256 Colours, 1 layer
|
||
1: 280x224, 256 Colours, 4 layers
|
||
2: 280x224, 4096 Colours, 2 layers
|
||
3: 560x448, 256 Colours, 2 layers (if bank 2 is not installed, mode change will not happen)
|
||
4: 560x448, 4096 Colours, 1 layer (if bank 2 is not installed, mode change will not happen)
|
||
5: 560x448, 15-bit colour, 1 layer (if bank 2 is not installed, mode change will not happen)
|
||
8: 560x448, 24-bit colour, 1 layer (if bank 3 and 4 are not installed, mode change will not happen)
|
||
4096 is also known as "direct colour mode" (4096 colours * 16 transparency -> 65536 colours)
|
||
Two layers are grouped to make a frame, "low layer" contains RG colours and "high layer" has BA colours,
|
||
Red and Blue occupies MSBs
|
||
13 RW
|
||
Layer Arrangement
|
||
If 4 layers are used:
|
||
Num LO<->HI
|
||
0 1234
|
||
1 1243
|
||
2 1324
|
||
3 1342
|
||
4 1423
|
||
5 1432
|
||
6 2134
|
||
7 2143
|
||
8 2314
|
||
9 2341
|
||
10 2413
|
||
11 2431
|
||
12 3124
|
||
13 3142
|
||
14 3214
|
||
15 3241
|
||
16 3412
|
||
17 3421
|
||
18 4123
|
||
19 4132
|
||
20 4213
|
||
21 4231
|
||
22 4312
|
||
23 4321
|
||
If 2 layers are used:
|
||
Num LO<->HI
|
||
0 12
|
||
1 12
|
||
2 12
|
||
3 12
|
||
4 12
|
||
5 12
|
||
6 12
|
||
7 21
|
||
8 21
|
||
9 21
|
||
10 21
|
||
11 21
|
||
12 12
|
||
13 12
|
||
14 21
|
||
15 21
|
||
16 12
|
||
17 21
|
||
18 12
|
||
19 12
|
||
20 21
|
||
21 21
|
||
22 12
|
||
23 21
|
||
If 1 layer is used, this field will do nothing and always fall back to 0
|
||
14..15 RW
|
||
framebuffer scroll X
|
||
16..17 RW
|
||
framebuffer scroll Y
|
||
18 RO
|
||
Busy flags
|
||
1: Codec in-use
|
||
2: Draw Instructions being decoded
|
||
19 WO
|
||
Write non-zero value to initiate the Draw Instruction decoding
|
||
20..21 RO
|
||
Program Counter for the Draw Instruction decoding
|
||
1024..2047 RW
|
||
horizontal scroll offset for scanlines
|
||
2048..4095 RW
|
||
!!NEW!! Font ROM Mapping Area
|
||
Format is always 8x16 pixels, 1bpp ROM format (so that it would be YY_CHR-Compatible)
|
||
(designer's note: it's still useful to divide the char rom to two halves, lower half being characters ROM and upper half being symbols ROM)
|
||
65536..131071 RW
|
||
Draw Instructions
|
||
|
||
Text-mode-font-ROM is immutable and does not belong to VRAM
|
||
Even in the text mode framebuffer is still being drawn onto the screen, and the texts are drawn on top of it
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
TSVM MOV file format
|
||
|
||
Endianness: Little
|
||
|
||
\x1F T S V M M O V
|
||
[METADATA]
|
||
[PACKET 0]
|
||
[PACKET 1]
|
||
[PACKET 2]
|
||
...
|
||
|
||
|
||
where:
|
||
|
||
METADATA -
|
||
uint16 WIDTH
|
||
uint16 HEIGHT
|
||
uint16 FPS (0: play as fast as can)
|
||
uint32 NUMBER OF FRAMES
|
||
uint16 UNUSED (fill with 255,0)
|
||
uint16 AUDIO QUEUE INFO
|
||
when read as little endian:
|
||
0b nnnn bbbb bbbb bbbb
|
||
[byte 21] [byte 20]
|
||
n: size of the queue (number of entries). Allocate at least 1 more entry than the number specified!
|
||
b: size of each entry in bytes DIVIDED BY FOUR (all zero = 16384; always 0x240 for MP2 because MP2-VBR is not supported)
|
||
|
||
n=0 indicates the video audio must be decoded on-the-fly instead of being queued, or has no audio packets
|
||
byte[10] RESERVED
|
||
|
||
|
||
Packet Types -
|
||
<video>
|
||
0,0: 256-Colour frame
|
||
1,0: 256-Colour frame with palette data
|
||
2,0: 4096-Colour frame (stored as two byte-planes)
|
||
4,t: iPF no-alpha indicator (see iPF Type Numbers for details)
|
||
5,t: iPF with alpha indicator (see iPF Type Numbers for details)
|
||
16,0: Series of JPEGs
|
||
18,0: Series of PNGs
|
||
20,0: Series of TGAs
|
||
21,0: Series of TGA/GZs
|
||
<audio>
|
||
0,16: Raw PCM Stereo
|
||
1,16: Raw PCM Mono
|
||
p,17: MP2, 32 kHz (see MP2 Format Details section for p-value)
|
||
q,18: ADPCM, 32 kHz (q = 2 * log_2(frameSize) + (1 if mono, 0 if stereo))
|
||
<special>
|
||
255,255: sync packet (wait until the next frame)
|
||
254,255: background colour packet
|
||
31,84 : prohibited
|
||
|
||
Packet Type High Byte (iPF Type Numbers)
|
||
0..7: iPF Type 1..8
|
||
|
||
- MP2 Format Details
|
||
Rate | 2ch | 1ch
|
||
32 | 0 | 1
|
||
48 | 2 | 3
|
||
56 | 4 | 5
|
||
64 | 6 | 7 (libtwolame does not allow bitrate lower than this on 32 kHz stereo)
|
||
80 | 8 | 9
|
||
96 | 10 | 11
|
||
112 | 12 | 13
|
||
128 | 14 | 15
|
||
160 | 16 | 17
|
||
192 | 18 | 19
|
||
224 | 20 | 21
|
||
256 | 22 | 23
|
||
320 | 24 | 25
|
||
384 | 26 | 27
|
||
Add 128 to the resulting number if the frame has a padding bit (should not happen on 32kHz sampling rate)
|
||
Special value of 255 may indicate some errors
|
||
|
||
To encode an audio to compliant format, use ffmpeg: ffmpeg -i <your_music> -acodec libtwolame -psymodel 4 -b:a <rate>k -ar 32000 <output.mp2>
|
||
Rationale:
|
||
-acodec libtwolame : ffmpeg has two mp2 encoders, and libtwolame produces vastly higher quality audio
|
||
-psymodel 4 : use alternative psychoacoustic model -- the default model (3) tends to insert "clunk" sounds throughout the audio
|
||
-b:a : 256k is recommended for high quality audio (trust me, you don't need 384k)
|
||
-ar 32000 : resample the audio to 32kHz, the sampling rate of the TSVM soundcard
|
||
|
||
TYPE 0 Packet -
|
||
uint32 SIZE OF COMPRESSED FRAMEDATA
|
||
* COMPRESSED FRAMEDATA
|
||
|
||
TYPE 1 Packet -
|
||
byte[512] Palette Data
|
||
uint32 SIZE OF COMPRESSED FRAMEDATA
|
||
* COMPRESSED FRAMEDATA
|
||
|
||
TYPE 2 Packet -
|
||
uint32 SIZE OF COMPRESSED FRAMEDATA BYTE-PLANE 1
|
||
* COMPRESSED FRAMEDATA
|
||
uint32 SIZE OF COMPRESSED FRAMEDATA BYTE-PLANE 2
|
||
* COMPRESSED FRAMEDATA
|
||
|
||
iPF Packet -
|
||
uint32 SIZE OF COMPRESSED FRAMEDATA
|
||
* COMPRESSED FRAMEDATA // only the actual gzip (and no UNCOMPRESSED SIZE) of the "Blocks.gz" is stored
|
||
|
||
TYPE 3 Packet (Patch-encoded iPF 1 Packet) -
|
||
uint32 SIZE OF COMPRESSED PATCHES
|
||
* COMPRESSED PATCHES
|
||
|
||
PATCHES are bunch of PATCHes concatenated
|
||
|
||
where each PATCH is encoded as:
|
||
|
||
uint8 X-coord of the patch (pixel position divided by four)
|
||
uint8 Y-coord of the patch (pixel position divided by four)
|
||
uint8 width of the patch (size divided by four)
|
||
uint8 height of the patch (size divided by four)
|
||
(calculating uncompressed size)
|
||
(iPF1 no alpha: width * height * 12)
|
||
(iPF1 with alpha: width * height * 20)
|
||
(iPF2 no alpha: width * height * 16)
|
||
(iPF2 with alpha: width * height * 24)
|
||
* UN-COMPRESSED PATCHDATA
|
||
|
||
|
||
TYPE 16+ Packet -
|
||
uint32 SIZE OF COMPRESSED FRAMEDATA BYTE-PLANE 1
|
||
* FRAMEDATA (COMPRESSED for TGA/GZ)
|
||
|
||
MP2 Packet & ADPCM Packet -
|
||
uint16 TYPE OF PACKET // follows the Metadata Packet Type scheme
|
||
* MP2 FRAME/ADPCM BLOCK
|
||
|
||
Sync Packet (subset of GLOBAL TYPE 255 Packet) -
|
||
uint16 0xFFFF (type of packet for Global Type 255)
|
||
|
||
Background Colour Packet -
|
||
uint16 0xFEFF
|
||
uint8 Red (0-255)
|
||
uint8 Green (0-255)
|
||
uint8 Blue (0-255)
|
||
uint8 0x00 (pad byte)
|
||
|
||
|
||
Frame Timing
|
||
If the global type is not 255, each packet is interpreted as a single full frame, and then will wait for the next
|
||
frame time; For type 255 however, the assumption no longer holds and each frame can have multiple packets, and thus
|
||
needs explicit "sync" packet for proper frame timing.
|
||
|
||
|
||
Comperssion Method
|
||
Old standard used Gzip, new standard is Zstd.
|
||
tsvm will read the zip header and will use appropriate decompression method, so that the old Gzipped
|
||
files remain compatible.
|
||
|
||
|
||
NOTE FROM DEVELOPER
|
||
In the future, the global packet type will be deprecated.
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
TSVM Interchangeable Picture Format (aka iPF Type 1/2)
|
||
|
||
Image is divided into 4x4 blocks and each block is serialised, then the entire iPF blocks are Zstd-compressed
|
||
|
||
|
||
# File Structure
|
||
\x1F T S V M i P F
|
||
[HEADER]
|
||
[Blocks]
|
||
|
||
- Header
|
||
uint16 WIDTH
|
||
uint16 HEIGHT
|
||
uint8 Flags
|
||
0b p00z 000a
|
||
- a: has alpha
|
||
- z: Zstd-compressed (p flag always sets this flag)
|
||
- p: progressive ordering (Adam7)
|
||
uint8 iPF Type/Colour Mode
|
||
0: Type 1 (4:2:0 chroma subsampling; 2048 colours?)
|
||
1: Type 2 (4:2:2 chroma subsampling; 2048 colours?)
|
||
byte[10] RESERVED
|
||
uint32 UNCOMPRESSED SIZE (somewhat redundant but included for convenience)
|
||
|
||
- Chroma Subsampled Blocks
|
||
Zstd-compressed unless the z-flag is not set.
|
||
4x4 pixels are sampled, then divided into YCoCg planes.
|
||
CoCg planes are "chroma subsampled" by 4:2:0, then quantised to 4 bits (8 bits for CoCg combined)
|
||
Y plane is quantised to 4 bits
|
||
|
||
By doing so, CoCg planes will reduce to 4 pixels
|
||
For the description of packing, pixels in Y/Cx plane will be numbered as:
|
||
Y0 Y1 Y2 Y3 || Cx1 Cx2 | Cx1 Cx2
|
||
Y4 Y5 Y6 Y7 || (iPF 1) | Cx3 Cx4
|
||
Y8 Y9 YA YB || Cx3 Cx4 | Cx5 Cx6
|
||
YC YD YE YF || (iPF 1) | Cx7 Cx8
|
||
|
||
Bits are packed like so:
|
||
|
||
iPF1:
|
||
uint16 [Co4 | Co3 | Co2 | Co1]
|
||
uint16 [Cg4 | Cg3 | Cg2 | Cg1]
|
||
uint16 [Y1 | Y0 | Y5 | Y4]
|
||
uint16 [Y3 | Y2 | Y7 | Y6]
|
||
uint16 [Y9 | Y8 | YD | YC]
|
||
uint16 [YB | YA | YF | YE]
|
||
(total: 12 bytes)
|
||
|
||
iPF2:
|
||
uint32 [Co8 | Co7 | Co6 | Co5 | Co4 | Co3 | Co2 | Co1]
|
||
uint32 [Cg8 | Cg7 | Cg6 | Cg5 | Cg4 | Cg3 | Cg2 | Cg1]
|
||
uint16 [Y1 | Y0 | Y5 | Y4]
|
||
uint16 [Y3 | Y2 | Y7 | Y6]
|
||
uint16 [Y9 | Y8 | YD | YC]
|
||
uint16 [YB | YA | YF | YE]
|
||
(total: 16 bytes)
|
||
|
||
If has alpha, append following bytes for alpha values
|
||
uint16 [a1 | a0 | a5 | a4]
|
||
uint16 [a3 | a2 | a7 | a6]
|
||
uint16 [a9 | a8 | aD | aC]
|
||
uint16 [aB | aA | aF | aE]
|
||
(total: 20/24 bytes)
|
||
|
||
Subsampling mask:
|
||
|
||
Least significant byte for top-left, most significant for bottom-right
|
||
For example, this default pattern
|
||
|
||
00 00 01 01
|
||
00 00 01 01
|
||
10 10 11 11
|
||
10 10 11 11
|
||
|
||
turns into:
|
||
|
||
01010000 -> 0x30
|
||
01010000 -> 0x30
|
||
11111010 -> 0xFA
|
||
11111010 -> 0xFA
|
||
|
||
which packs into: [ 30 | 30 | FA | FA ] (because little endian)
|
||
|
||
iPF1-delta (for video encoding):
|
||
|
||
Delta encoded frames contain "insutructions" for patch-encoding the existing frame.
|
||
Or, a collection of [StateChangeCode] [Optional VarInts] [Payload...] pairs
|
||
|
||
States:
|
||
0x00 SKIP [varint skipCount]
|
||
0x01 PATCH [varint blockCount] [12x blockCount bytes]
|
||
0x02 REPEAT [varint repeatCount] [a block]
|
||
0xFF END
|
||
|
||
Sample stream:
|
||
[SKIP 10] [PATCH A] [REPEAT 3] [SKIP 5] [PATCH B] [END]
|
||
|
||
Delta block format:
|
||
|
||
Each PATCH delta payload is still:
|
||
8 bytes of Luma (4-bit deltas for 16 pixels)
|
||
2 bytes of Co deltas (4× 4-bit deltas)
|
||
2 bytes of Cg deltas (4× 4-bit deltas)
|
||
Total: 12 bytes per PATCH.
|
||
|
||
These are always relative to the same-position block in the previous frame.
|
||
|
||
|
||
|
||
- Progressive Blocks
|
||
Ordered string of words (word size varies by the colour mode) are stored here.
|
||
If progressive mode is enabled, words are stored in the order that accomodates it.
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
TSVM Enhanced Video (TEV) Format
|
||
Created by CuriousTorvald and Claude on 2025-08-17
|
||
|
||
TEV is a modern video codec optimized for TSVM's 4096-color hardware, featuring
|
||
DCT-based compression, optional motion compensation, and efficient temporal coding.
|
||
|
||
## Version History
|
||
- Version 2.0: YCoCg-R 4:2:0 with 16x16/8x8 DCT blocks
|
||
- Version 2.1: Added Rate Control Factor to all video packets (breaking change)
|
||
* Enables bitrate-constrained encoding alongside quality modes
|
||
* All video frames now include 4-byte rate control factor after payload size
|
||
- Version 3.0: Additional support of ICtCp Colour space
|
||
|
||
# File Structure
|
||
\x1F T S V M T E V (if video), \x1F T S V M T E P (if still picture)
|
||
[HEADER]
|
||
[PACKET 0]
|
||
[PACKET 1]
|
||
[PACKET 2]
|
||
...
|
||
|
||
## Header (24 bytes)
|
||
uint8 Magic[8]: "\x1FTSVMTEV" or "\x1FTSVMTEP"
|
||
uint8 Version: 2 (YCoCg-R) or 3 (ICtCp)
|
||
uint16 Width: video width in pixels
|
||
uint16 Height: video height in pixels
|
||
uint8 FPS: frames per second
|
||
uint32 Total Frames: number of video frames
|
||
uint8 Quality Index for Y channel (0-99; 100 denotes all quantiser is 1)
|
||
uint8 Quality Index for Co channel (0-99; 100 denotes all quantiser is 1)
|
||
uint8 Quality Index for Cg channel (0-99; 100 denotes all quantiser is 1)
|
||
uint8 Extra Feature Flags
|
||
- bit 0 = has audio
|
||
- bit 1 = has subtitle
|
||
- bit 2 = infinite loop (must be ignored when File Role is 1)
|
||
- bit 7 = has no actual packets, this file is header-only without an Intro Movie
|
||
uint8 Video Flags
|
||
- bit 0 = is interlaced (should be default for most non-archival TEV videos)
|
||
- bit 1 = is NTSC framerate (repeat every 1000th frame)
|
||
uint8 File Role
|
||
- 0 = generic
|
||
- 1 = this file is header-only, and UCF payload will be followed (used by seekable movie file)
|
||
When header-only file contain video packets, they should be presented as an Intro Movie
|
||
before the user-interactable selector (served by the UCF payoad)
|
||
|
||
## Packet Types
|
||
0x10: I-frame (intra-coded frame)
|
||
0x11: P-frame (predicted frame)
|
||
0x1F: prohibited
|
||
0x20: MP2 audio packet
|
||
0x30: Subtitle in "Simple" format
|
||
0x31: Subtitle in "Karaoke" format
|
||
0xE0: EXIF packet
|
||
0xE1: ID3v1 packet
|
||
0xE2: ID3v2 packet
|
||
0xE3: Vorbis Comment packet
|
||
0xE4: CD-text packet
|
||
0xFF: sync packet
|
||
|
||
## Standard metadata payload packet structure
|
||
uint8 0xE0/0xE1/0xE2/.../0xEF (see Packet Types section)
|
||
uint32 Length of the payload
|
||
* Standard payload
|
||
|
||
note: metadata packets must precede any non-metadata packets
|
||
|
||
## Video Packet Structure
|
||
uint8 Packet Type
|
||
uint32 Compressed Size
|
||
* Zstd-compressed Block Data
|
||
|
||
## Block Data (per 16x16 block)
|
||
uint8 Mode: encoding mode
|
||
0x00 = SKIP (copy from previous frame)
|
||
0x01 = INTRA (DCT-coded, no prediction)
|
||
0x02 = INTER (DCT-coded with motion compensation) -- currently unused due to bugs
|
||
0x03 = MOTION (motion vector only)
|
||
int16 Motion Vector X ("capable of" 1/4 pixel precision, integer precision for now)
|
||
int16 Motion Vector Y ("capable of" 1/4 pixel precision, integer precision for now)
|
||
float32 Rate Control Factor (4 bytes, little-endian)
|
||
uint16 Coded Block Pattern (which 8x8 have non-zero coeffs)
|
||
int16[256] DCT Coefficients Y
|
||
int16[64] DCT Coefficients Co (subsampled by two)
|
||
int16[64] DCT Coefficients Cg (subsampled by two, aggressively quantised)
|
||
For SKIP and MOTION mode, DCT coefficients are filled with zero
|
||
|
||
## DCT Quantisation and Rate Control
|
||
TEV uses 5 quality levels (0=lowest, 4=highest) with progressive quantisation
|
||
tables optimized for perceptual quality. DC coefficients are encoded losslessly,
|
||
while AC coefficients are quantised according to quality tables.
|
||
|
||
### Rate Control Factor
|
||
Each block includes a Rate Control Factor that modifies quality level for that specific block.
|
||
This feature allows more efficient coding by allows higher quality for complex blocks and lower quality for
|
||
flat blocks.
|
||
|
||
## Motion Compensation
|
||
- Search range: ±8 pixels
|
||
- Sub-pixel precision: 1/4 pixel (again, integer precision for now)
|
||
- Block size: 16x16 pixels
|
||
- Uses Sum of Absolute Differences (SAD) for motion estimation
|
||
- Bilinear interpolation for sub-pixel motion vectors
|
||
|
||
## Colour Space
|
||
TEV operates in 8-Bit colour mode, colour space conversion required
|
||
|
||
## Compression Features
|
||
- 16x16 DCT blocks (vs 4x4 in iPF)
|
||
- Temporal prediction with motion compensation
|
||
- Rate-distortion optimized mode selection
|
||
- Hardware-accelerated encoding/decoding functions
|
||
|
||
## Performance Comparison
|
||
TEV achieves 60-80% better compression than iPF formats while maintaining
|
||
equivalent visual quality, with significantly faster decode performance due
|
||
to larger block sizes and hardware acceleration.
|
||
|
||
## Audio Support
|
||
Reuses existing MP2 audio infrastructure from TSVM MOV format for seamless
|
||
compatibility with existing audio processing pipeline.
|
||
|
||
## NTSC Framerate handling
|
||
The encoder encodes the frames as-is. The decoder must duplicate every 1000th frame to keep the decoding
|
||
in-sync.
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
Simple Subtitle Format (SSF)
|
||
|
||
SSF is a simple subtitle that is intended to use text buffer to display texts.
|
||
The format is designed to be compatible with SubRip and SAMI (without markups) and interoperable with
|
||
TEV and TAV formats.
|
||
|
||
SSF-TC is an SSF with extra timecode so that subtitle packets can be desynchronised with video frames
|
||
on encoding.
|
||
|
||
When SSF is interleaved with MP2 audio, the payload must be inserted in-between MP2 frames.
|
||
|
||
## Packet Structure
|
||
uint8 0x30/0x31 (SSF/SSF-TC)
|
||
uint32 Packet Size
|
||
* SSF Payload (see below)
|
||
|
||
## SSF Packet Structure
|
||
uint24 Subtitle object ID (used to specify target subtitle object)
|
||
uint64 Timecode in nanoseconds (only present on SSF-TC format; regular SSF must not write these bytes)
|
||
uint8 opcode
|
||
0x00 = <argument terminator>, is NOP when used here
|
||
0x01 = show (arguments: UTF-8 text)
|
||
0x02 = hide (arguments: none)
|
||
0x03 = move to different nonant (arguments: 0x00-bottom centre; 0x01-bottom left; 0x02-centre left; 0x03-top left; 0x04-top centre; 0x05-top right; 0x06-centre right; 0x07-bottom right; 0x08-centre
|
||
0x10..0x2F = show in alternative languages (arguments: char[5] language code, UTF-8 text)
|
||
0x80 = upload to low font rom (arguments: uint16 payload length, var bytes)
|
||
0x81 = upload to high font rom (arguments: uint16 payload length, var bytes)
|
||
note: changing the font rom will change the appearance of the every subtitle currently being displayed
|
||
* arguments separated AND terminated by 0x00
|
||
text argument may be terminated by 0x00 BEFORE the entire arguments being terminated by 0x00,
|
||
leaving extra 0x00 on the byte stream. A decoder must be able to handle the extra zeros.
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
Karaoke Subtitle Format (KSF)
|
||
|
||
KSF is a frame-synced subtitle that is intended to use Karaoke-style subtitles.
|
||
The format is designed to be interoperable with TEV and TAV formats.
|
||
For non-karaoke style synced lyrics, use SSF.
|
||
|
||
KSF-TC is an KSF with extra timecode so that subtitle packets can be desynchronised with video frames
|
||
on encoding.
|
||
|
||
When KSF is interleaved with MP2 audio, the payload must be inserted in-between MP2 frames.
|
||
|
||
## Packet Structure
|
||
uint8 0x32/0x33 (KSF/KSF-TC)
|
||
* KSF Payload (see below)
|
||
|
||
### KSF Packet Structure
|
||
KSF is line-based: you define an unrevealed line, then subsequent commands reveal words/syllables
|
||
on appropriate timings.
|
||
|
||
uint24 Subtitle object ID (used to specify target subtitle object)
|
||
uint64 Timecode in nanoseconds (only present on KSF-TC format; regular KSF must not write these bytes)
|
||
uint8 opcode
|
||
<definition opcodes>
|
||
0x00 = <argument terminator>, is NOP when used here
|
||
0x01 = define line (arguments: UTF-8 text. Players will also show it in grey)
|
||
0x02 = delete line (arguments: none)
|
||
0x03 = move to different nonant (arguments: 0x00-bottom centre; 0x01-bottom left; 0x02-centre left; 0x03-top left; 0x04-top centre; 0x05-top right; 0x06-centre right; 0x07-bottom right; 0x08-centre
|
||
|
||
<reveal opcodes>
|
||
0x30 = reveal text normally (arguments: UTF-8 text. The reveal text must contain spaces when required)
|
||
0x31 = reveal text slowly (arguments: UTF-8 text. The effect is implementation-dependent)
|
||
|
||
0x40 = reveal text normally with emphasise (arguments: UTF-8 text. On TEV/TAV player, the text will be white; otherwise, implementation-dependent)
|
||
0x41 = reveal text slowly with emphasise (arguments: UTF-8 text)
|
||
|
||
0x50 = reveal text normally with target colour (arguments: uint8 target colour; UTF-8 text)
|
||
0x51 = reveal text slowly with target colour (arguments: uint8 target colour; UTF-8 text)
|
||
|
||
<hardware control opcodes>
|
||
0x80 = upload to low font rom (arguments: uint16 payload length, var bytes)
|
||
0x81 = upload to high font rom (arguments: uint16 payload length, var bytes)
|
||
note: changing the font rom will change the appearance of the every subtitle currently being displayed
|
||
* arguments separated AND terminated by 0x00
|
||
text argument may be terminated by 0x00 BEFORE the entire arguments being terminated by 0x00,
|
||
leaving extra 0x00 on the byte stream. A decoder must be able to handle the extra zeros.
|
||
|
||
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
TSVM Advanced Video (TAV) Format
|
||
Created by CuriousTorvald and Claude on 2025-09-13
|
||
|
||
TAV is a next-generation video codec for TSVM utilising Discrete Wavelet Transform (DWT)
|
||
similar to JPEG2000, providing superior compression efficiency and scalability compared
|
||
to DCT-based codecs like TEV. Features include multi-resolution encoding, progressive
|
||
transmission capability, and region-of-interest coding.
|
||
|
||
# File Structure
|
||
\x1F T S V M T A V (if video), \x1F T S V M T A P (if still picture)
|
||
[HEADER]
|
||
[PACKET 0]
|
||
[PACKET 1]
|
||
[PACKET 2]
|
||
...
|
||
|
||
## Header (32 bytes)
|
||
uint8 Magic[8]: "\x1FTSVMTAV" or "\x1FTSVMTAP"
|
||
uint8 Version:
|
||
Base version number:
|
||
- 1 = YCoCg-R multi-tile uniform
|
||
- 2 = ICtCp multi-tile uniform
|
||
- 3 = YCoCg-R monoblock uniform
|
||
- 4 = ICtCp monoblock uniform
|
||
- 5 = YCoCg-R monoblock perceptual
|
||
- 6 = ICtCp monoblock perceptual
|
||
- 7 = YCoCg-R multi-tile perceptual
|
||
- 8 = ICtCp multi-tile perceptual
|
||
When motion coder is Haar, take base version number.
|
||
When motion coder is CDF 5/3, add 8 to the base version number.
|
||
uint16 Width: picture width in pixels. Columns count for Videotex-only file.
|
||
uint16 Height: picture height in pixels. Rows count for Videotex-only file.
|
||
If either width or height exceeds 65535 pixels, above two fields must be filled with zero and the dimension must be sourced from XDIM entry of the Extended Header
|
||
uint8 FPS: frames per second. Use 0x00 for still pictures
|
||
If FPS is greater than 254 or fractional (excl. NTSC), the value must be 0xFF and the true framerate must be sourced from the XFPS entry of the Extended Header
|
||
uint32 Total Frames: number of video frames
|
||
- use 0 to denote not-finalised video stream
|
||
- use 0xFFFFFFFF to denote still picture (.im3 file)
|
||
uint8 Wavelet Filter Type:
|
||
- 0 = 5/3 reversible (LGT 5/3, JPEG 2000 standard)
|
||
- 1 = 9/7 irreversible (CDF 9/7, slight modification of JPEG 2000, default choice)
|
||
- 2 = CDF 13/7 (experimental)
|
||
- 16 = DD-4 (Four-point interpolating Deslauriers-Dubuc; experimental)
|
||
- 255 = Haar (demonstration purpose only)
|
||
uint8 Decomposition Levels: number of DWT levels (1-6+; use 0 if it has no video or Videotex only)
|
||
uint8 Quantiser Index for Y channel (uses exponential numeric system; 0: lossless, 255: potato)
|
||
uint8 Quantiser Index for Co channel (uses exponential numeric system; 0: lossless, 255: potato)
|
||
uint8 Quantiser Index for Cg channel (uses exponential numeric system; 0: lossless, 255: potato)
|
||
uint8 Extra Feature Flags
|
||
- bit 0 = has audio (for still pictures: has background music)
|
||
- bit 1 = has subtitle (for still pictures: has timed captions)
|
||
- bit 2 = infinite loop (has no effect for still pictures)
|
||
- bit 7 = has no actual packets, this file is header-only without an Intro Movie
|
||
uint8 Video Flags
|
||
- bit 0 = interlaced
|
||
- bit 1 = is NTSC framerate
|
||
- bit 2 = is lossless mode
|
||
- bit 3 = has region-of-interest coding (for still pictures only)
|
||
- bit 4 = no Zstd compression
|
||
- bit 7 = has no video
|
||
uint8 Encoder quality level (stored with bias of 1 (q0=1); used to derive anisotropy value)
|
||
uint8 Channel layout (bit-field: bit 0=has alpha, bit 1=has chroma inverted, bit 2=has luma inverted)
|
||
* Luma-only videos must be decoded with fixed Chroma=0
|
||
* Chroma-only videos must be decoded with fixed Luma=127
|
||
* No-alpha videos must be decoded with fixed Alpha=255
|
||
- 0 = Y-Co-Cg/I-Ct-Cp (000: no alpha, has chroma, has luma)
|
||
- 1 = Y-Co-Cg-A/I-Ct-Cp-A (001: has alpha, has chroma, has luma)
|
||
- 2 = Y/I only (010: no alpha, no chroma, has luma)
|
||
- 3 = Y-A/I-A (011: has alpha, no chroma, has luma)
|
||
- 4 = Co-Cg/Ct-Cp (100: no alpha, has chroma, no luma)
|
||
- 5 = Co-Cg-A/Ct-Cp-A (101: has alpha, has chroma, no luma)
|
||
- 6-7 = Reserved/invalid (would indicate no luma and no chroma)
|
||
uint8 Entropy Coder
|
||
- 0 = Twobit-plane significance map (deprecated)
|
||
- 1 = Embedded Zero Block Coding
|
||
- 2 = Raw coefficients (debugging purpose only)
|
||
uint8 Encoder Preset
|
||
- Bit 0 = use finer motion (finer temporal quantisation)
|
||
- Bit 1 = reduce grain synthesis
|
||
Preset "Default" -> 0x00
|
||
Preset "Sports" -> 0x01
|
||
Preset "Anime" -> 0x02
|
||
NOTE: not all presets have preset flags. See Preset section for details.
|
||
uint8 Reserved[1]: fill with zeros
|
||
uint8 Device Orientation
|
||
- 0 = No rotation
|
||
- 1 = Clockwise 90 deg
|
||
- 2 = 180 deg
|
||
- 3 = Clockwise 270 deg
|
||
- 4 = Mirrored, No rotation
|
||
- 5 = Mirrored, Clockwise 90 deg
|
||
- 6 = Mirrored, 180 deg
|
||
- 7 = Mirrored, Clockwise 270 deg
|
||
uint8 File Role
|
||
- 0 = generic
|
||
- 1 = this file is header-only, and UCF payload will be followed (used by movie file with chapters)
|
||
When header-only file contain video packets, they should be presented as an Intro Movie
|
||
before the user-interactable selector (served by the UCF payoad)
|
||
|
||
### Presets
|
||
The encoder supports following presets:
|
||
- Sports: use finer temporal quantisation, resulting in better-preserved motion. Less effective as resolution goes up
|
||
- Anime: instructs the decoder to disable grain synthensis
|
||
|
||
2025-12-08 Addendum: TAV-DT should be its own encoder, not preset
|
||
- D1/D1PAL: encode to TAV-DT (NTSC/PAL) format. Any non-compliant setup will be ignored and substituted to compliant values
|
||
- D1P/D1PALP: encode to TAV-DT Progressive (NTSC/PAL) format. Any non-compliant setup will be ignored and substituted to compliant values
|
||
|
||
## Packet Structure (some special packets have no payload. See Packet Types for details)
|
||
uint8 Packet Type
|
||
uint32 Payload Size
|
||
* Payload
|
||
|
||
## Packet Types
|
||
<video packets>
|
||
0x10: I-frame (intra-coded frame)
|
||
0x11: P-frame (delta/skip frame)
|
||
0x12: GOP Unified (temporal 3D DWT with unified preprocessing)
|
||
0x1F: (prohibited)
|
||
<audio packets>
|
||
0x20: MP2 audio packet (32 KHz)
|
||
0x21: Zstd-compressed 8-bit PCM (32 KHz, audio hardware's native format)
|
||
0x22: Zstd-compressed 16-bit PCM (32 KHz, little endian)
|
||
0x23: Zstd-compressed ADPCM (32 KHz)
|
||
0x24: TAD (TSVM Advanced Audio)
|
||
<subtitles>
|
||
0x30: Subtitle in "Simple" format
|
||
0x31: Subtitle in "Simple" format with timecodes
|
||
0x32: Subtitle in "Karaoke" format
|
||
0x33: Subtitle in "Karaoke" format with timecodes
|
||
0x3F: Videotex (full-frame text buffer memory image)
|
||
<synchronised tracks>
|
||
0x40: MP2 audio track (32 KHz)
|
||
0x41: Zstd-compressed 8-bit PCM (32 KHz, audio hardware's native format)
|
||
0x42: Zstd-compressed 16-bit PCM (32 KHz, little endian)
|
||
0x43: Zstd-compressed ADPCM (32 KHz)
|
||
0x44: TAD (TSVM Advanced Audio)
|
||
<multiplexed video>
|
||
0x70..7F: Reserved for Future Version
|
||
<Standard metadata payloads>
|
||
(it's called "standard" because you're expected to just copy-paste the metadata bytes verbatim)
|
||
0xE0: EXIF packet
|
||
0xE1: ID3v1 packet
|
||
0xE2: ID3v2 packet
|
||
0xE3: Vorbis Comment packet
|
||
0xE4: CD-text packet
|
||
<Extensible>
|
||
0x01: Vendor-specific video packets
|
||
0x02: Vendor-specific audio frame
|
||
0x03: Vendor-specific subtitle
|
||
0x04: Vendor-specific audio file
|
||
0x0E: Vendor-specific metadata
|
||
<Special packets>
|
||
0x00: No-op (no payload)
|
||
0xEF: TAV Extended Header
|
||
0xF0: Loop point start (insert right AFTER the TC packet; no payload)
|
||
0xF1: Loop point end (insert right AFTER the TC packet; no payload)
|
||
0xF2: Screen masking info
|
||
0xFC: GOP Sync packet (indicates N frames decoded from GOP block)
|
||
0xFD: Timecode (TC) Packet [for frame 0, insert at the beginning; otherwise, insert right AFTER the sync]
|
||
0xFE: NTSC sync packet (used by player to calculate exact framerate-wise performance; no payload)
|
||
0xFF: Sync packet (no payload)
|
||
|
||
### Packet Precedence
|
||
|
||
Before the first frame group:
|
||
1. TAV Extended header (if any)
|
||
2. Standard metadata payloads (if any)
|
||
3. SSF-TC/KSF-TC packets (if any)
|
||
When time-coded subtitles are used, the entire subtitles must precede the first video frame.
|
||
Think of it as tacking the whole subtitle file before the actual video.
|
||
4. Screen Masking packets (if any)
|
||
|
||
Frame group:
|
||
1. Timecode Packet (0xFD) or Next TAV File (0x1F) [mutually exclusive!]
|
||
2. Loop point packet (if any)
|
||
3. Audio packets (if any)
|
||
4. Subtitle packets (if any) [mutually exclusive with SSF-TC/KSF-TC packets]
|
||
5. Main video packets (0x10-0x1E)
|
||
6. Multiplexed video packets (0x70-7F; if any)
|
||
|
||
After a frame group:
|
||
1. Sync packet (0xFC or 0xFF)
|
||
2. NTSC Sync packet (if required; it will instruct players to duplicate the current frame)
|
||
|
||
|
||
## TAV Extended Header Specification and Structure
|
||
uint8 Packet Type (0xEF)
|
||
uint16 Number of Key-Value pairs
|
||
* Key-Value pairs
|
||
|
||
### Key-Value Pair
|
||
uint8 Key[4]
|
||
uint8 Value Type
|
||
- 0x00: (U)Int16
|
||
- 0x01: (U)Int24
|
||
- 0x02: (U)Int32
|
||
- 0x03: (U)Int48
|
||
- 0x04: (U)Int64
|
||
- 0x10: Bytes
|
||
<if Value Type is Bytes>
|
||
uint16 Length of bytes
|
||
* Bytes
|
||
<else>
|
||
type_t Value
|
||
<fi>
|
||
|
||
### List of Keys
|
||
- Uint64 BGNT: Video begin time in nanoseconds (must be equal to the value of the first Timecode packet)
|
||
- Uint64 ENDT: Video end time in nanoseconds (must be equal to BGNT + playback time)
|
||
- Uint64 CDAT: Creation time in microseconds since UNIX Epoch (must be in UTC timezone)
|
||
- Bytes VNDR: Name and version of the encoder (for Reference encoder: "Encoder-TAV 20251014 (list,of,features)")
|
||
- Bytes FMPG: FFmpeg version (typically "ffmpeg version 8.0 Copyright (c) 2000-2025 the FFmpeg developers"; the first line of text FFmpeg emits)
|
||
- Bytes XDIM: Video dimension in '<width>,<height>' format. Mandatory if either width or height exceeds 65535
|
||
- Bytes XFPS: Framerate in '<numerator>/<denominator>' format. Mandatory if either:
|
||
1. FPS exceeds 254
|
||
2. denominator is not 1 or 1001
|
||
|
||
## Extensible Packet Structure
|
||
uint8 Packet Type
|
||
uint8 Flags
|
||
- 0x01: 64-bit size
|
||
uint8 Identifier[4]
|
||
<if 64-bit size>
|
||
uint64 Length of the payload
|
||
<else>
|
||
uint32 Length of the payload
|
||
<fi>
|
||
* Payload
|
||
|
||
## Standard Metadata Payload Packet Structure
|
||
uint8 Packet Type (0xE0/0xE1/0xE2/.../0xEE; see Packet Types section)
|
||
uint32 Length of the payload
|
||
* Standard payload
|
||
|
||
Notes:
|
||
- metadata packets must precede any non-metadata packets
|
||
- when multiple metadata packets are present (e.g. ID3v2 and Vorbis Comment both present),
|
||
which gets precedence is implementation-dependent. ONE EXCEPTION is ID3v1 and ID3v2 where ID3v2 gets
|
||
precedence.
|
||
|
||
## Timecode Packet Structure
|
||
uint8 Packet Type (0xFD)
|
||
uint64 Time since stream start in nanoseconds (this may NOT start from zero if the video is coming from a livestream)
|
||
|
||
## Screen Masking Packet Structure
|
||
When letterbox/pillarbox detection is active, the encoder will only encode pictures in the active area.
|
||
Decoders must use this value to derive the size of the active area for decoding, and fill the blank on playback.
|
||
Encoders only need to insert this packets at the start of the video (if necessary) and whenever geometry change occurs.
|
||
|
||
uint8 Packet Type (0xF2)
|
||
uint32 Starting frame number
|
||
uint16 Mask size top in pixels
|
||
uint16 Mask size right in pixels
|
||
uint16 Mask size bottom in pixels
|
||
uint16 Mask size left in pixels
|
||
|
||
## Video Packet Structure
|
||
uint8 Packet Type (0x10/0x11)
|
||
uint32 Compressed Size
|
||
* Zstd-compressed Block Data
|
||
|
||
## TAD Packet Structure
|
||
uint8 Packet Type (0x24)
|
||
<header for decoding packet>
|
||
uint16 Sample Count
|
||
uint32 Compressed Size + 7
|
||
<header for decoding TAD chunk>
|
||
uint16 Sample Count
|
||
uint8 Quantiser Bits
|
||
uint32 Compressed Size
|
||
* Zstd-compressed TAD
|
||
|
||
## Videotex Packet Structure
|
||
uint8 Packet Type (0x3F)
|
||
uint32 Compressed Size
|
||
* Zstd-compressed payload, where:
|
||
uint8 Rows
|
||
uint8 Columns
|
||
* Foreground colours
|
||
* Background colours
|
||
* Characters
|
||
|
||
## GOP Unified Packet Structure (0x12)
|
||
Implemented on 2025-10-15 for temporal 3D DWT with unified preprocessing.
|
||
|
||
This packet contains multiple frames encoded as a single spacetime block for optimal
|
||
temporal compression.
|
||
|
||
uint8 Packet Type (0x12/0x13)
|
||
uint8 GOP Size (number of frames in this GOP)
|
||
<if packet type is 0x13>
|
||
uint32 Compressed Size
|
||
* Zstd-compressed Motion Data
|
||
<fi>
|
||
uint32 Compressed Size
|
||
* Zstd-compressed Unified Block Data
|
||
|
||
### Unified Block Data Format
|
||
The entire GOP (width×height×N_frames×3_channels) is preprocessed as a single block:
|
||
|
||
<if significance maps are used>
|
||
uint8 Y Significance Maps[(width*height + 7) / 8 * GOP Size] // All Y frames concatenated
|
||
uint8 Co Significance Maps[(width*height + 7) / 8 * GOP Size] // All Co frames concatenated
|
||
uint8 Cg Significance Maps[(width*height + 7) / 8 * GOP Size] // All Cg frames concatenated
|
||
int16 Y Non-zero Values[variable length] // All Y non-zero coefficients
|
||
int16 Co Non-zero Values[variable length] // All Co non-zero coefficients
|
||
int16 Cg Non-zero Values[variable length] // All Cg non-zero coefficients
|
||
<fi>
|
||
|
||
<if EZBC is used>
|
||
uint32 EZBC Size for Y
|
||
* EZBC Structure for Y
|
||
uint32 EZBC Size for Co
|
||
* EZBC Structure for Co
|
||
uint32 EZBC Size for Cg
|
||
* EZBC Structure for Cg
|
||
<fi>
|
||
|
||
This layout enables Zstd to find patterns across both spatial and temporal dimensions,
|
||
resulting in superior compression compared to per-frame encoding.
|
||
|
||
### Temporal 3D DWT Process
|
||
1. Detect where the scene change is happening on the first pass
|
||
2. Determine GOP slicing from the scene detection
|
||
3. Apply 1D DWT across temporal axis (GOP frames)
|
||
4. Apply 2D DWT on each spatial slice of temporal subbands
|
||
5. Perceptual quantisation with temporal-spatial awareness
|
||
6. Unified significance map preprocessing across all frames/channels
|
||
7. Single Zstd compression of entire GOP block
|
||
|
||
## GOP Sync Packet Structure (0xFC)
|
||
Indicates that N frames were decoded from a GOP Unified block.
|
||
Decoders must track this to maintain proper frame count and synchronization.
|
||
|
||
uint8 Packet Type (0xFC)
|
||
uint8 Frame Count (number of frames that were decoded from preceding GOP block)
|
||
|
||
Note: GOP Sync packets have no payload size field (fixed 2-byte packet).
|
||
|
||
## Block Data (per frame)
|
||
uint8 Mode: encoding mode
|
||
0x00 = SKIP (just use frame data from previous frame)
|
||
0x01 = INTRA (DWT-coded)
|
||
0x02 = DELTA (DWT delta)
|
||
- 0x02: DWT level 1
|
||
- 0x12: DWT level 2
|
||
- 0x22: DWT level 3
|
||
...
|
||
- 0xF2: DWT Level 16
|
||
uint8 Quantiser override Y (uses exponential numeric system; stored with index bias of 1 (127->252, 255->4032); use 0 to disable overriding)
|
||
uint8 Quantiser override Co (uses exponential numeric system; stored with index bias of 1 (127->252, 255->4032); use 0 to disable overriding)
|
||
uint8 Quantiser override Cg (uses exponential numeric system; stored with index bias of 1 (127->252, 255->4032); use 0 to disable overriding)
|
||
- note: quantiser overrides are always present regardless of the channel layout
|
||
* Tile data (one compressed payload per tile)
|
||
|
||
### Coefficient Storage Format (Significance Map Compression)
|
||
|
||
Starting with encoder version 2025-09-29, DWT coefficients are stored using
|
||
significance map compression with concatenated maps layout for optimal efficiency:
|
||
|
||
#### Concatenated Maps Format
|
||
All channels are processed together to maximize Zstd compression:
|
||
|
||
uint8 Y Significance Map[(coeff_count + 7) / 8] // 1 bit per Y coefficient
|
||
uint8 Co Significance Map[(coeff_count + 7) / 8] // 1 bit per Co coefficient
|
||
uint8 Cg Significance Map[(coeff_count + 7) / 8] // 1 bit per Cg coefficient
|
||
uint8 A Significance Map[(coeff_count + 7) / 8] // 1 bit per A coefficient (if alpha present)
|
||
int16 Y Non-zero Values[variable length] // Only non-zero Y coefficients
|
||
int16 Co Non-zero Values[variable length] // Only non-zero Co coefficients
|
||
int16 Cg Non-zero Values[variable length] // Only non-zero Cg coefficients
|
||
int16 A Non-zero Values[variable length] // Only non-zero A coefficients (if alpha present)
|
||
|
||
#### Significance Map Encoding
|
||
Each significance map uses 1 bit per coefficient position:
|
||
- Bit = 1: coefficient is non-zero, read value from corresponding Non-zero Values array
|
||
- Bit = 0: coefficient is zero
|
||
|
||
#### Compression Benefits
|
||
- **Sparsity exploitation**: Typically 85-95% zeros in quantised DWT coefficients
|
||
- **Cross-channel patterns**: Concatenated maps allow Zstd to find patterns across similar significance maps
|
||
- **Overall improvement**: 16-18% compression improvement before Zstd compression
|
||
|
||
## DWT Implementation Details
|
||
|
||
### Wavelet Filters
|
||
- 5/3 Reversible Filter (lossless capable):
|
||
* Analysis: Low-pass [1/2, 1, 1/2], High-pass [-1/8, -1/4, 3/4, -1/4, -1/8]
|
||
* Synthesis: Low-pass [1/4, 1/2, 1/4], High-pass [-1/16, -1/8, 3/8, -1/8, -1/16]
|
||
|
||
- 9/7 Irreversible Filter (higher compression):
|
||
* Analysis: CDF 9/7 coefficients optimized for image compression
|
||
* Provides better energy compaction than 5/3 but lossy reconstruction
|
||
|
||
### Quantisation Strategy
|
||
|
||
#### Uniform Quantisation (Versions 3-4)
|
||
Traditional approach using same quantisation factor for all DWT subbands within each channel.
|
||
|
||
#### Perceptual Quantisation (Versions 5-6, Default)
|
||
TAV versions 5 and 6 implement Human Visual System (HVS) optimized quantisation with
|
||
frequency-aware subband weighting for superior visual quality:
|
||
|
||
Anisotropic quantisation is applied for both Luma and Chroma channels to preserve horizontal details.
|
||
The anisotropic quantisation is the innovative upgrade to the traditional field-interlacing and
|
||
chroma subsampling.
|
||
|
||
This perceptual approach allocates more bits to visually important low-frequency
|
||
details while aggressively quantising high-frequency noise, resulting in superior
|
||
visual quality at equivalent bitrates.
|
||
|
||
#### Grain Synthesis
|
||
|
||
The decoder must synthesise a film grain on non-LL subbands at the amplitude half of the quantisation level.
|
||
The encoder may synthesise the exact same grain in sign-reversed on encoding (but not recommended for practical reasons).
|
||
|
||
The base noise function must be triangular noise in range [-1.0, 1.0].
|
||
|
||
## Colour Space
|
||
TAV supports two colour spaces:
|
||
|
||
**YCoCg-R (Versions 3, 5):**
|
||
- Y: Luma channel (full resolution)
|
||
- Co: Orange-Cyan chroma (full resolution)
|
||
- Cg: Green-Magenta chroma (full resolution)
|
||
|
||
**ICtCp (Versions 4, 6):**
|
||
- I: Intensity (similar to luma)
|
||
- Ct: Chroma tritanopia
|
||
- Cp: Chroma protanopia
|
||
|
||
Perceptual versions (5-6) apply HVS-optimized quantisation weights per channel,
|
||
while uniform versions (3-4) use consistent quantisation across all subbands.
|
||
|
||
The encoder expects linear alpha.
|
||
|
||
## Compression Features
|
||
- Single DWT tiles vs 16x16 DCT blocks in TEV
|
||
- Multi-resolution representation enables scalable decoding
|
||
- Better frequency localisation than DCT
|
||
- Reduced blocking artifacts due to overlapping basis functions
|
||
|
||
## Audio Support
|
||
MP2 frames, raw PCMu8, and TAD formats are supported.
|
||
|
||
## Subtitle Support
|
||
Uses same Simple Subtitle Format (SSF) as TEV for text overlay functionality.
|
||
|
||
## NTSC Framerate handling
|
||
Unlike the TEV format, TAV encoder emits extra sync packet for every 1000th frames. Decoder can just play the video without any special treatment.
|
||
|
||
## Exponential Numeric System
|
||
This system maps [0..255] to [1..4096]
|
||
|
||
Number|Index
|
||
------+-----
|
||
1|0
|
||
2|1
|
||
3|2
|
||
4|3
|
||
5|4
|
||
6|5
|
||
7|6
|
||
8|7
|
||
9|8
|
||
10|9
|
||
11|10
|
||
12|11
|
||
13|12
|
||
14|13
|
||
15|14
|
||
16|15
|
||
17|16
|
||
18|17
|
||
19|18
|
||
20|19
|
||
21|20
|
||
22|21
|
||
23|22
|
||
24|23
|
||
25|24
|
||
26|25
|
||
27|26
|
||
28|27
|
||
29|28
|
||
30|29
|
||
31|30
|
||
32|31
|
||
33|32
|
||
34|33
|
||
35|34
|
||
36|35
|
||
37|36
|
||
38|37
|
||
39|38
|
||
40|39
|
||
41|40
|
||
42|41
|
||
43|42
|
||
44|43
|
||
45|44
|
||
46|45
|
||
47|46
|
||
48|47
|
||
49|48
|
||
50|49
|
||
51|50
|
||
52|51
|
||
53|52
|
||
54|53
|
||
55|54
|
||
56|55
|
||
57|56
|
||
58|57
|
||
59|58
|
||
60|59
|
||
61|60
|
||
62|61
|
||
63|62
|
||
64|63
|
||
66|64
|
||
68|65
|
||
70|66
|
||
72|67
|
||
74|68
|
||
76|69
|
||
78|70
|
||
80|71
|
||
82|72
|
||
84|73
|
||
86|74
|
||
88|75
|
||
90|76
|
||
92|77
|
||
94|78
|
||
96|79
|
||
98|80
|
||
100|81
|
||
102|82
|
||
104|83
|
||
106|84
|
||
108|85
|
||
110|86
|
||
112|87
|
||
114|88
|
||
116|89
|
||
118|90
|
||
120|91
|
||
122|92
|
||
124|93
|
||
126|94
|
||
128|95
|
||
132|96
|
||
136|97
|
||
140|98
|
||
144|99
|
||
148|100
|
||
152|101
|
||
156|102
|
||
160|103
|
||
164|104
|
||
168|105
|
||
172|106
|
||
176|107
|
||
180|108
|
||
184|109
|
||
188|110
|
||
192|111
|
||
196|112
|
||
200|113
|
||
204|114
|
||
208|115
|
||
212|116
|
||
216|117
|
||
220|118
|
||
224|119
|
||
228|120
|
||
232|121
|
||
236|122
|
||
240|123
|
||
244|124
|
||
248|125
|
||
252|126
|
||
256|127
|
||
264|128
|
||
272|129
|
||
280|130
|
||
288|131
|
||
296|132
|
||
304|133
|
||
312|134
|
||
320|135
|
||
328|136
|
||
336|137
|
||
344|138
|
||
352|139
|
||
360|140
|
||
368|141
|
||
376|142
|
||
384|143
|
||
392|144
|
||
400|145
|
||
408|146
|
||
416|147
|
||
424|148
|
||
432|149
|
||
440|150
|
||
448|151
|
||
456|152
|
||
464|153
|
||
472|154
|
||
480|155
|
||
488|156
|
||
496|157
|
||
504|158
|
||
512|159
|
||
528|160
|
||
544|161
|
||
560|162
|
||
576|163
|
||
592|164
|
||
608|165
|
||
624|166
|
||
640|167
|
||
656|168
|
||
672|169
|
||
688|170
|
||
704|171
|
||
720|172
|
||
736|173
|
||
752|174
|
||
768|175
|
||
784|176
|
||
800|177
|
||
816|178
|
||
832|179
|
||
848|180
|
||
864|181
|
||
880|182
|
||
896|183
|
||
912|184
|
||
928|185
|
||
944|186
|
||
960|187
|
||
976|188
|
||
992|189
|
||
1008|190
|
||
1024|191
|
||
1056|192
|
||
1088|193
|
||
1120|194
|
||
1152|195
|
||
1184|196
|
||
1216|197
|
||
1248|198
|
||
1280|199
|
||
1312|200
|
||
1344|201
|
||
1376|202
|
||
1408|203
|
||
1440|204
|
||
1472|205
|
||
1504|206
|
||
1536|207
|
||
1568|208
|
||
1600|209
|
||
1632|210
|
||
1664|211
|
||
1696|212
|
||
1728|213
|
||
1760|214
|
||
1792|215
|
||
1824|216
|
||
1856|217
|
||
1888|218
|
||
1920|219
|
||
1952|220
|
||
1984|221
|
||
2016|222
|
||
2048|223
|
||
2112|224
|
||
2176|225
|
||
2240|226
|
||
2304|227
|
||
2368|228
|
||
2432|229
|
||
2496|230
|
||
2560|231
|
||
2624|232
|
||
2688|233
|
||
2752|234
|
||
2816|235
|
||
2880|236
|
||
2944|237
|
||
3008|238
|
||
3072|239
|
||
3136|240
|
||
3200|241
|
||
3264|242
|
||
3328|243
|
||
3392|244
|
||
3456|245
|
||
3520|246
|
||
3584|247
|
||
3648|248
|
||
3712|249
|
||
3776|250
|
||
3840|251
|
||
3904|252
|
||
3968|253
|
||
4032|254
|
||
4096|255
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
TSVM Advanced Video - Digital Tape (TAV-DT) Format
|
||
Created by CuriousTorvald on 2025-12-01
|
||
|
||
TAV-DT is an extension to TAV format that is intended as filesystem-independent packetised video stream
|
||
with easy syncing: playback can start from the arbitrary position and decoder can easily sync up to the
|
||
start of the next packet
|
||
|
||
# Video Format
|
||
- Dimension: 720x480 for NTSC, 720x576 for PAL
|
||
- FPS: arbitrary (defined in packet header)
|
||
- Wavelet: 9/7 Spatial, Haar Temporal ("sport" preset always enabled)
|
||
- Decomposition levels: 4 spatial, 2 temporal
|
||
- Quantiser and encoder quality level: arbitrary (defined in packet header as quality index)
|
||
- Extra features:
|
||
- Audio is mandatory (TAD codec only)
|
||
- Everything else is unsupported
|
||
- Video flags: Interlaced/NTSC framerate (defined in packet header)
|
||
* interlaced is enabled by default
|
||
- Channel layout: Y-Co-Cg
|
||
- Entropy coder: EZBC
|
||
- Encoder preset: sports preset always enabled
|
||
- Tiles: monoblock
|
||
- GOP size: always 16 frames
|
||
|
||
# Packet Structure
|
||
uint32 Sync pattern (0xE3537A1F for NTSC Dimension, 0xD193A745 for PAL Dimension)
|
||
<packet header start>
|
||
uint8 Framerate
|
||
uint8 Flags
|
||
- bit 0 = interlaced
|
||
- bit 1 = is NTSC framerate
|
||
- bit 4-7 = quality index (0-5)
|
||
* Quality indices follow TSVM encoder's
|
||
int16 Reserved (zero-fill)
|
||
uint32 Total packet size (sum of TAD packet and TAV packet size)
|
||
uint64 Timecode in nanoseconds
|
||
uint32 Offset to video packet
|
||
uint32 Reserved (zero-fill)
|
||
uint32 CRC-32 of above
|
||
<packet header end; encoded with rate 1/2 LDPC> // NOTE: sync pattern must not be LDPC-coded
|
||
bytes TAD with forward error correction
|
||
<TAD header start>
|
||
uint16 Sample Count
|
||
uint8 Quantiser Bits
|
||
uint32 Compressed Size
|
||
uint24 Reed-Solomon Block Count
|
||
uint32 CRC-32 of above
|
||
<TAD chunk header end; encoded with rate 1/2 LDPC>
|
||
<Reed-Solomon (255,223) block start>
|
||
bytes TAD (EZBC, no Zstd)
|
||
bytes Parity for TAD
|
||
<Reed-Solomon (255,223) block end>
|
||
bytes TAV with forward error correction
|
||
uint32 TAV header sync pattern (0xA3F7C91E)
|
||
<TAV header start>
|
||
uint8 GOP Size (number of frames in this GOP)
|
||
uint16 Reserved (zero-fill)
|
||
uint32 Compressed Size
|
||
uint24 Reed-Solomon Block Count
|
||
uint32 CRC-32 of above
|
||
<TAV header end; encoded with rate 1/2 LDPC> // NOTE: sync pattern must not be LDPC-coded
|
||
<Reed-Solomon (255,223) block start>
|
||
bytes TAV (EZBC, no Zstd)
|
||
bytes Parity for TAV
|
||
<Reed-Solomon (255,223) block end>
|
||
|
||
Q1. Why headers have such low encoding rate (n byte input -> 2n byte output)?
|
||
A1. Headers are crucial for the decoding and thus must be protected rigorously
|
||
|
||
Q2. What to do when payload is smaller than RS block capacity?
|
||
A2. Fill with zero. It shouldn't affect Zstd, and compressed size is already specified, so they complement each other.
|
||
|
||
When decoding, reserved areas must be filled with zero before the actual decoding.
|
||
|
||
# How to sync to the stream
|
||
1. Find a sync pattern
|
||
2. Read remaining 8 bytes -> concatenate sync with what has been read
|
||
3. Calculate CRC-32 of concatenated 12 bytes
|
||
4. Read 4 bytes (stored CRC)
|
||
5. Check calculated CRC against stored CRC
|
||
6. If they match, sync to the stream; if not, find a next sync pattern
|
||
7. "Offset to video packet" and the actual length of the TAD packet can be used together to recover video packet when stream is damaged, using the fact that in error-free stream, length of TAD packet is equal to "Offset to video packet", and the internal packet order is always audio-then-video
|
||
|
||
## Soft Sync Recovery
|
||
|
||
The decoder MAY try to sync to the sync pattern that appears damaged when its contents are seem to be intact, under the following strategies.
|
||
|
||
### Stage 1
|
||
|
||
On the stream position where the sync pattern is supposed to be:
|
||
|
||
1. Substitute damaged sync pattern with known sync pattern (videos are not allowed to change NTSC/PAL mode mid-stream, so there's only one known value)
|
||
2. Zero-fill the reserved area if haven't already
|
||
3. Re-calculate CRC. If match, sync. If not, head to the next stage
|
||
|
||
### Stage 2
|
||
|
||
1. Further substitute the framerate, flags, timecode to the last known value (as these values rarely change mid-stream; timecode must be incremented appropriately. e.g. FPS=16, last known timecode=5.0, packets missed so far=4, then assumed timecode is 5.0 + 4 + 1 = 10.0)
|
||
2. Re-calculate CRC. If match, sync. If not, head to the next stage
|
||
|
||
### Stage 3 (mostly throwaway efforts)
|
||
|
||
1. Search for 0xA3F7C91E or next sync pattern
|
||
2. If 0xA3F7C91E is found, try to decode the subpacket by verifying header with CRC; if next sync pattern is found, sync to that packet.
|
||
3. If successful, sync. If not, soft sync recovery is failed, and discard the packet
|
||
|
||
Note: If CRC is unmatched, the packet MUST be discarded, as the header content cannot be trusted if all soft recovery stages have failed
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
TSVM Advanced Audio (TAD) Format
|
||
Created by CuriousTorvald and Claude on 2025-10-23
|
||
Updated: 2025-10-30 (fixed non-power-of-2 sample count support)
|
||
|
||
TAD is a perceptual audio codec for TSVM utilising Discrete Wavelet Transform (DWT)
|
||
with CDF 9/7 biorthogonal wavelets, providing efficient compression through M/S stereo
|
||
decorrelation, frequency-dependent quantisation, and raw int8 coefficient storage.
|
||
Designed as an includable API for integration with TAV video encoder.
|
||
|
||
When used inside of a video codec, only zstd-compressed payload is stored, chunk length
|
||
is stored separately and quality index is shared with that of the video.
|
||
|
||
# Suggested File Structure
|
||
\x1F T S V M T A D
|
||
[HEADER]
|
||
[CHUNK 0]
|
||
[CHUNK 1]
|
||
[CHUNK 2]
|
||
...
|
||
|
||
## Header (16 bytes)
|
||
uint8 Magic[8]: "\x1FTSVMTAD"
|
||
uint8 Version: 1
|
||
uint8 Quality Level: 0-5 (0=lowest quality/smallest, 5=highest quality/largest)
|
||
uint8 Flags:
|
||
- bit 0: Zstd compression enabled (1=compressed, 0=uncompressed)
|
||
- bits 1-7: Reserved (must be 0)
|
||
uint32 Sample Rate: audio sample rate in Hz (always 32000 for TSVM)
|
||
uint8 Channels: number of audio channels (always 2 for stereo)
|
||
uint8 Reserved[2]: fill with zeros
|
||
|
||
## Audio Properties
|
||
- **Sample Rate**: 32000 Hz (TSVM audio hardware native format)
|
||
- **Channels**: 2 (stereo)
|
||
- **Input Format**: PCM32fLE (32-bit float little-endian PCM)
|
||
- **Preprocessing**: 16 Hz highpass filter applied during extraction
|
||
- **Internal Representation**: Float32 throughout encoding, PCM8 conversion only at decoder
|
||
- **Chunk Size**: Variable (1024-32768+ samples per channel, any size ≥1024 supported)
|
||
- Default: 32768 samples (1.024 seconds at 32 kHz) for standalone files
|
||
- TAV integration: Uses exact GOP sample count (e.g., 32016 for 1 second at 32 kHz)
|
||
- Minimum: 1024 samples (32 ms at 32 kHz)
|
||
- DWT levels: Fixed at 9 levels for all chunk sizes
|
||
- **Target Compression**: 2:1 against PCMu8 baseline
|
||
- **Wavelet**: CDF 9/7 biorthogonal
|
||
|
||
## Chunk Structure
|
||
Each chunk encodes a variable number of stereo samples (minimum 1024, any size supported).
|
||
Default is 32768 samples (65536 total samples, 1.024 seconds) for standalone files.
|
||
TAV integration uses exact GOP sample counts (e.g., 32016 samples for 1 second at 32 kHz).
|
||
|
||
uint16 Sample Count: number of samples per channel (min 1024, any size ≥1024)
|
||
uint8 Max quantisation index: this number * 2 + 1 is the total steps of quantisation
|
||
uint32 Chunk Payload Size: size of following payload in bytes
|
||
* Chunk Payload: encoded M/S stereo data (Zstd compressed if flag set)
|
||
|
||
### Chunk Payload Structure (before Zstd compression)
|
||
* Mid Channel EZBC Data (embedded zero block coded bitstream)
|
||
* Side Channel EZBC Data (embedded zero block coded bitstream)
|
||
|
||
Each EZBC channel structure:
|
||
uint8 MSB Bitplane: highest bitplane with significant coefficient
|
||
uint16 Coefficient Count: number of coefficients in this channel
|
||
* Binary Tree EZBC Bitstream: significance map + refinement bits
|
||
|
||
## Encoding Pipeline
|
||
|
||
### Step 1: Pre-emphasis Filter
|
||
Input stereo PCM32fLE undergoes first-order IIR pre-emphasis filtering (α=0.5):
|
||
|
||
H(z) = 1 - α·z⁻¹
|
||
|
||
This shifts quantisation noise toward lower frequencies where it's more maskable by
|
||
the psychoacoustic model. The filter has persistent state across chunks to prevent
|
||
discontinuities at chunk boundaries.
|
||
|
||
### Step 2: Dynamic Range Compression (Gamma Compression)
|
||
Pre-emphasised audio undergoes gamma compression for perceptual uniformity:
|
||
|
||
encode(x) = sign(x) * |x|^γ where γ=0.5
|
||
|
||
This compresses dynamic range before quantisation, improving perceptual quality.
|
||
|
||
### Step 3: M/S Stereo Decorrelation
|
||
Mid-Side transformation exploits stereo correlation:
|
||
|
||
Mid = (Left + Right) / 2
|
||
Side = (Left - Right) / 2
|
||
|
||
This typically concentrates energy in the Mid channel while the Side channel
|
||
contains mostly small values, improving compression efficiency.
|
||
|
||
### Step 4: 9-Level CDF 9/7 DWT
|
||
Each channel (Mid and Side) undergoes CDF 9/7 biorthogonal wavelet decomposition. The codec uses a fixed 9 decomposition levels for all chunk sizes:
|
||
|
||
DWT Levels = 9 (fixed)
|
||
|
||
For 32768-sample chunks:
|
||
- After 9 levels: 64 LL coefficients
|
||
- Frequency subbands: LL + 9 H bands (L9 to L1)
|
||
|
||
For 32016-sample chunks (TAV 1-second GOP):
|
||
- After 9 levels: 63 LL coefficients
|
||
- Supports non-power-of-2 sizes through proper length tracking (fixed 2025-10-30)
|
||
|
||
Sideband boundaries are calculated dynamically:
|
||
first_band_size = chunk_size >> dwt_levels
|
||
sideband[0] = 0
|
||
sideband[1] = first_band_size
|
||
sideband[i+1] = sideband[i] + (first_band_size << (i-1))
|
||
|
||
CDF 9/7 lifting coefficients:
|
||
α = -1.586134342
|
||
β = -0.052980118
|
||
γ = 0.882911076
|
||
δ = 0.443506852
|
||
K = 1.230174105
|
||
|
||
### Step 5: Frequency-Dependent Quantisation with Lambda Companding
|
||
DWT coefficients are quantized using:
|
||
1. **Lambda companding**: Maps normalised coefficients through Laplacian CDF with λ=6.0
|
||
2. **Perceptually-tuned weights**: Channel-specific (Mid/Side) frequency-dependent scaling
|
||
3. **Final quantisation**: base_weight[channel][subband] * quality_scale
|
||
|
||
The lambda companding provides perceptually uniform quantisation, allocating more bits
|
||
to perceptually important coefficient magnitudes.
|
||
|
||
Channel-specific base quantisation weights:
|
||
Mid (0): [4.0, 2.0, 1.8, 1.6, 1.4, 1.2, 1.0, 1.0, 1.3, 2.0]
|
||
Side (1): [6.0, 5.0, 2.6, 2.4, 1.8, 1.3, 1.0, 1.0, 1.6, 3.2]
|
||
|
||
Output: Quantized int8 coefficients in range [-max_index, +max_index]
|
||
|
||
### Step 6: EZBC Encoding (Embedded Zero Block Coding)
|
||
Quantized int8 coefficients are compressed using binary tree EZBC, a 1D variant of
|
||
the embedded zero-block coding.
|
||
|
||
**EZBC Algorithm**:
|
||
1. Find MSB bitplane (highest bit position with significant coefficient)
|
||
2. Initialise root block covering all coefficients as insignificant
|
||
3. For each bitplane from MSB to LSB:
|
||
- **Insignificant Pass**: Test each insignificant block for significance
|
||
- If still zero at this bitplane: emit 0 bit, keep in insignificant queue
|
||
- If becomes significant: emit 1 bit, recursively subdivide using binary tree
|
||
- **Refinement Pass**: For already-significant coefficients, emit next bit
|
||
4. Binary tree subdivision continues until blocks of size 1 (single coefficients)
|
||
5. When coefficient becomes significant: emit sign bit and reconstruct value
|
||
|
||
**EZBC Output Structure** (per channel):
|
||
uint8 MSB Bitplane (8 bits)
|
||
uint16 Coefficient Count (16 bits)
|
||
* Bitstream: [significance_bits][sign_bits][refinement_bits]
|
||
|
||
**Compression Benefits**:
|
||
- Exploits coefficient sparsity through significance testing
|
||
- Progressive refinement enables quality scalability
|
||
- Binary tree exploits spatial clustering of significant coefficients
|
||
- Typical sparsity: 86.9% zeros (Mid), 97.8% zeros (Side)
|
||
|
||
### Step 7: Concatenation and Zstd Compression
|
||
The Mid and Side EZBC bitstreams are concatenated:
|
||
Payload = [Mid_EZBC_data][Side_EZBC_data]
|
||
|
||
Then compressed using Zstd level 7 for additional compression without significant
|
||
CPU overhead. Zstd exploits redundancy in the concatenated bitstreams.
|
||
|
||
## Decoding Pipeline
|
||
|
||
### Step 1: Chunk Extraction and Decompression
|
||
Read chunk header (sample_count, max_index, payload_size).
|
||
If compressed (default), decompress payload using Zstd.
|
||
|
||
### Step 2: EZBC Decoding
|
||
Decode Mid and Side channels from concatenated EZBC bitstreams using binary tree
|
||
embedded zero block decoder:
|
||
|
||
For each channel:
|
||
1. Read EZBC header: MSB bitplane (8 bits), coefficient count (16 bits)
|
||
2. Initialise root block as insignificant, track coefficient states
|
||
3. Process bitplanes from MSB to LSB:
|
||
- **Insignificant Pass**: Read significance bits, recursively decode significant blocks
|
||
- **Refinement Pass**: Read refinement bits for already-significant coefficients
|
||
4. Reconstruct quantized int8 coefficients from bitplane representation
|
||
|
||
Output: Quantized int8 coefficients for Mid and Side channels
|
||
|
||
### Step 3: Dequantisation with Lambda Decompanding
|
||
Convert quantized int8 values back to float coefficients using:
|
||
1. Lambda decompanding (inverse of Laplacian CDF compression)
|
||
2. Multiply by frequency-dependent quantisation steps
|
||
3. [Optional] Apply coefficient-domain dithering (TPDF, ~-60 dBFS)
|
||
|
||
### Step 4: 9-Level Inverse CDF 9/7 DWT
|
||
Reconstruct Float32 audio from DWT coefficients using inverse CDF 9/7 transform.
|
||
|
||
**Critical Implementation (Fixed 2025-10-30)**:
|
||
The multi-level inverse DWT must use the EXACT sequence of lengths from forward
|
||
transform, in reverse order. Using simple doubling (length *= 2) is INCORRECT
|
||
for non-power-of-2 sizes.
|
||
|
||
Correct approach:
|
||
1. Pre-calculate all forward transform lengths:
|
||
lengths[0] = chunk_size
|
||
lengths[i] = (lengths[i-1] + 1) / 2 for i=1..9
|
||
2. Apply inverse DWT in reverse order:
|
||
for level from 8 down to 0:
|
||
apply inverse_dwt(data, lengths[level])
|
||
|
||
This ensures correct reconstruction for all chunk sizes including non-power-of-2
|
||
values (e.g., 32016 samples for TAV 1-second GOPs).
|
||
|
||
### Step 5: M/S to L/R Conversion
|
||
Convert Mid/Side back to Left/Right stereo:
|
||
|
||
Left = Mid + Side
|
||
Right = Mid - Side
|
||
|
||
### Step 6: Gamma Expansion
|
||
Expand dynamic range (inverse of encoder's gamma compression):
|
||
|
||
decode(y) = sign(y) * |y|^(1/γ) where γ=0.5, so 1/γ=2.0
|
||
|
||
### Step 7: De-emphasis Filter
|
||
Apply de-emphasis filter to reverse the pre-emphasis (α=0.5):
|
||
|
||
H(z) = 1 / (1 - α·z⁻¹)
|
||
|
||
This is a first-order IIR filter with persistent state across chunks to prevent
|
||
discontinuities at chunk boundaries. The de-emphasis must be applied AFTER gamma
|
||
expansion but BEFORE PCM8 conversion to correctly reconstruct the original audio.
|
||
|
||
### Step 8: PCM32f to PCM8 Conversion with Noise-Shaped Dithering
|
||
Convert Float32 samples to unsigned PCM8 (PCMu8) using second-order error-diffusion
|
||
dithering with reduced amplitude (0.2× TPDF) to coordinate with coefficient-domain
|
||
dithering.
|
||
|
||
## Compression Performance
|
||
- **Target Ratio**: 2:1 against PCMu8
|
||
- **Achieved Ratio**: 2.51:1 against PCMu8 at quality level 3
|
||
- **Quality**: Perceptually transparent at Q3+, preserves full 0-16 KHz bandwidth
|
||
- **Sparsity**: 86.9% zeros in Mid channel, 97.8% in Side channel (typical)
|
||
|
||
## Integration with TAV Encoder
|
||
TAD is designed as an includable API for TAV video encoder integration.
|
||
The encoder can be invoked programmatically to compress audio tracks:
|
||
|
||
#include "tad_encoder.h"
|
||
|
||
size_t encoded_size = tad_encode_from_file(
|
||
input_audio_path,
|
||
output_tad_path,
|
||
quality_level,
|
||
use_zstd,
|
||
verbose
|
||
);
|
||
|
||
This allows TAV video files to embed TAD-compressed audio using packet type 0x24.
|
||
|
||
## Audio Extraction Command
|
||
TAD encoder uses two-pass FFmpeg extraction for optimal quality:
|
||
|
||
# Pass 1: Extract at original sample rate
|
||
ffmpeg -i input.mp4 -f f32le -ac 2 temp.pcm
|
||
|
||
# Pass 2: High-quality resample with SoXR and highpass filter
|
||
ffmpeg -f f32le -ar {original_rate} -ac 2 -i temp.pcm \
|
||
-ar 32000 -af "aresample=resampler=soxr:precision=28:cutoff=0.99,highpass=f=16" \
|
||
output.pcm
|
||
|
||
This ensures resampling happens after extraction with optimal quality parameters.
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
**TSVM Universal Cue format**
|
||
Created by CuriousTorvald on 2025-09-22
|
||
|
||
A universal, simple cue designed to work as both playlist to cue up external files and lookup table for internal bytes.
|
||
|
||
# File Structure
|
||
\x1F T S V M U C F
|
||
[HEADER]
|
||
[CUE ELEMENT 0]
|
||
[CUE ELEMENT 1]
|
||
[CUE ELEMENT 2]
|
||
...
|
||
|
||
## Header (16 bytes)
|
||
uint8 Magic[8]: "\x1FTSVMUCF"
|
||
uint8 Version: 1
|
||
uint16 Number of cue elements
|
||
uint32 (Optional) Size of the cue file, useful for allocating fixed length for future expansion; 0 when not used
|
||
unit8 Reserved
|
||
|
||
## Cue Element
|
||
uint8 Addressing Mode (low nybble) and Role Flags (high nybble)
|
||
- 0x01: External
|
||
- 0x02: Internal
|
||
- 0x10: Intended for machine interaction (GOP indices, frame indices, etc.)
|
||
- 0x20: Intended for human interaction (playlist, chapter markers, etc.)
|
||
- 0x30: Intended for both machine and human interaction
|
||
Role flags must be unset to assign no roles
|
||
uint16 String Length for name
|
||
* Name of the element in UTF-8
|
||
|
||
<if external addressing mode>
|
||
uint16 String Length for relative path
|
||
* Relative path
|
||
<fi>
|
||
|
||
<if internal addressing mode>
|
||
uint48 Offset to the file
|
||
<fi>
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
**Audio Adapter**
|
||
|
||
Endianness: little
|
||
|
||
|
||
TSVM Audio Adapter is consisted of 4 playheads, each playhead is capable of playing one PCM or Tracker track.
|
||
|
||
Synchronisation between playheads are not guaranteed. Do not play music in multiple tracks.
|
||
|
||
|
||
Memory Space
|
||
|
||
0..737279 RW: Sample bin (720k)
|
||
737280..786431 RW: Instrument bin (256 instruments, 192 bytes each; instrument 0 does nothing; 48k)
|
||
786432..851967 RW: Play data 1 (currently exposed bank; 64k)
|
||
851968..917503 RW: Play data 2 (currently exposed bank; 64k)
|
||
917504..983039 RW: TAD Input Buffer (64k)
|
||
983040..1048575 RW: TAD Decode Output (64k)
|
||
|
||
Sample bin: just raw sample data thrown in there. You need to keep track of starting point for each sample
|
||
|
||
Instrument bin: Registry for 256 instruments, formatted as:
|
||
Uint32 Sample Pointer
|
||
Uint16 Sample length
|
||
Uint16 Sampling rate at C4 (note number 0x5000)
|
||
Uint16 Play Start (usually 0 but not always)
|
||
Uint16 Loop Start (can be smaller than Play Start)
|
||
Uint16 Loop End
|
||
Bit8 Sample Flags
|
||
0b 0000 0spp
|
||
pp: loop mode. 0-no loop, 1-loop, 2-backandforth, 3-oneshot (ignores note length unless overridden by other notes)
|
||
s: loop is sustain (key-off escapes the loop)
|
||
- IT: look for sample's SusLoop flag
|
||
Bit16 Volume envelope sustain/loops and flags
|
||
* Sustain is implemented by enabling 't' flag. FastTracker has no 'Sus Loop' but only 'Sus Point'; use same value for start and end index
|
||
0b 0ut sssss pcb eeeee
|
||
s: sustain/loop start index
|
||
e: sustain/loop end index
|
||
|
||
b: use envelope
|
||
c: envelope carry
|
||
p: (IT) fadeout is zero; (XM) fadeout is cut
|
||
|
||
t: the loop must sustain (key-off escapes the loop)
|
||
u: set to enable the sustain/loop
|
||
Bit16 Panning envelope sustain/loops and flags
|
||
* Sustain is implemented by enabling 't' flag
|
||
0b 0ut sssss pcb eeeee
|
||
s: sustain/loop start index
|
||
e: sustain/loop end index
|
||
|
||
b: use envelope
|
||
c: envelope carry
|
||
p: use default pan (see offset 176 "Default pan value" below)
|
||
|
||
t: the loop must sustain (key-off escapes the loop)
|
||
u: set to enable the sustain/loop
|
||
Bit16 Pitch/Filter envelope sustain/loops and flags
|
||
* Sustain is implemented by enabling 't' flag
|
||
0b 0ut sssss mcb eeeee
|
||
s: sustain/loop start index
|
||
e: sustain/loop end index
|
||
|
||
b: use envelope
|
||
c: envelope carry
|
||
m: mode (0: on pitch, 1: on filter)
|
||
|
||
t: the loop must sustain (key-off escapes the loop)
|
||
u: set to enable the sustain/loop
|
||
Bit16x25 Volume envelopes
|
||
Byte 1: Volume (00..3F)
|
||
Byte 2: Time until the next point, in seconds (3.5 Unsigned Minifloat). 0 = hold at this point indefinitely.
|
||
Bit16x25 Panning envelopes
|
||
Byte 1: Pan (00..FF)
|
||
Byte 2: Time until the next point, in seconds (3.5 Unsigned Minifloat). 0 = hold at this point indefinitely.
|
||
Bit16x25 Pitch/Filter envelopes
|
||
Byte 1: Value (00..FF)
|
||
Byte 2: Time until the next point, in seconds (3.5 Unsigned Minifloat). 0 = hold at this point indefinitely.
|
||
Uint8 Instrument Global Volume (0..255)
|
||
* ImpulseTracker has range of 0..128; multiply by (255/128) then round to int
|
||
- ImpulseTracker also has samplewise default volume (0..64) and samplewise global volume (0..64), and they must be taken into account because Taud has no samplewise config, following the ImpulseTracker spec
|
||
* FastTracker2 has range of 0..64; multiply by (255/64) then round to int
|
||
Uint8 Volume Fadeout low bits (IT: 1..256; XM: 0..255)
|
||
Bit8 Fadeout and vibrato
|
||
0b 0000 ffff
|
||
f: Volume Fadeout high bits
|
||
Uint8 Volume swing (0..255 full range)
|
||
Uint8 Vibrato speed
|
||
* ImpulseTracker has samplewise vibrato speed (0..64), and they must be taken into account because Taud has no samplewise config
|
||
* FastTracker2 has instrumentwise config (0..255)
|
||
* The spec follows FastTracker2, and conversion must be performed when importing from FastTracker2
|
||
Uint8 Vibrato sweep
|
||
* FastTracker2 instrument config
|
||
Uint8 Default pan value (0..255 full range, see offset 17 for the enable flag)
|
||
* ImpulseTracker has samplewise default pan and instrumentwise default pan, and they must be taken into account because Taud has no samplewise config
|
||
Uint16 Pitch-pan centre (4096-TET note value)
|
||
Sint8 Pitch-pan separation (-128..127 full range)
|
||
Uint8 Pan swing (0..255 full range)
|
||
Uint8 Default cutoff (0..254 full range, 255 to off (-1 on IT). Effect range equals to that of ImpulseTracker -- 127 in IT is equal to 254 in Taud)
|
||
Uint8 Default resonance (0..254 full range, 255 to off (-1 on IT). Effect range equals to that of ImpulseTracker -- 127 in IT is equal to 254 in Taud)
|
||
Uint16 Sample detune (in 4096-TET unit) (XM finetune scale need to be rescaled accordingly)
|
||
Bit8 Instrument Flag
|
||
0b 000 www nn
|
||
n: New note action. 00: note off, 01: note cut, 10: continue, 11: note fade (arranged differently to IT)
|
||
ww: Vibrato waveform (IT: sample config, FT2: instrument config). 00: sine, 01: ramp-down saw, 10: square, 11: random, 100: ramp-up saw (FT2 only)
|
||
Uint8 Vibrato Depth (0..255 full range)
|
||
* ImpulseTracker has range of 0..32 ON THE SAMPLE SETTINGS; multiply by (255/32) then round to int
|
||
* FastTracker2 has range of 0..16; multiply by (255/16) then round to int
|
||
Uint8 Vibrato Rate (0..255 full range)
|
||
* ImpulseTracker sample config. The spec follows ImpulseTracker precisely
|
||
Byte[4] Reserved
|
||
|
||
|
||
TODO:
|
||
[x] implement Instrument Flag, Vibrato Depth, Vibrato Rate, other samplewise/instrumentwise changes to it2taud and audio engine
|
||
[x] implement new note action on the audio engine (IT uses "background channels", maybe we can do the same but make "background channels" mixer-private)
|
||
[x] (same context as above) implement S7x command
|
||
[ ] on playback, panning changes randomly on Taud made by s3m2taud.py and mod2taud.py, but not by it2taud.py (maybe something's off with the instrument exports?)
|
||
[ ] implement S6x command
|
||
[ ] `S B000` and `S B100` not working as intended -- on first playback it jumps to the next cue same row, on subsequent playbacks the commands are completely ignored
|
||
[ ] implement Wxx command (global volume slide)
|
||
[ ] implement sample loop sustain
|
||
[ ] Amiga mode freq shift now "underdelivers" (pitch bend not "strong" enough) -- appear to be fixed (2nd_pm.taud is the only one behaves incorrectly)
|
||
[ ] cue and pattern compression of the Taud format (taud_common.py, taud.mjs)
|
||
[ ] figure out how IT (8 bits) and FT2 (12 bits) handles volume fadeout numbers, and come up with a compatible Taud spec, then implement
|
||
[ ] implement bitcrusher (eff sym '8')
|
||
|
||
|
||
Play Data: play data are series of tracker-like instructions, visualised as:
|
||
|
||
rr||NOTE|Ins|E.Vol|E.Pan|EE.ff|
|
||
63||FFFF|255|3 63|3 63|FF FFFF| (8 bytes per line, 512 bytes per pattern, 128 patterns on 64 kB bank, 32 banks available (pattern 0xFFF -- bank 31, pattern 127 is a sentinel value for no-pattern))
|
||
|
||
notes are tuned as 4096 Tone-Equal Temperament. Tuning is set per-sample using their Sampling rate value.
|
||
|
||
Special values:
|
||
|
||
note 0xFFFF: no-op
|
||
note 0xFFFE: note cut
|
||
note 0x0000: key-off
|
||
|
||
inst 0: no instrument change
|
||
|
||
|
||
Audio Adapter MMIO
|
||
|
||
0..1 RW: Play head #0 position
|
||
PCM mode: number of buffers uploaded and received by the adapter (writing does nothing)
|
||
Tracker mode: current position in the cuesheet (writing changes current position in the cuesheet and resets pattern cursor back to zero)
|
||
2..3 RW: Play head #0 length param
|
||
PCM mode: length of the samples to upload to the speaker
|
||
Tracker mode:
|
||
Byte 2: Play data 1 bank
|
||
Byte 3: Play data 2 bank
|
||
4 RW: Play head #0 master volume
|
||
5 RW: Play head #0 master pan
|
||
6..9 RW: Play head #0 flags (see below)
|
||
|
||
10..11 RW:Play head #1 position
|
||
12..13 RW:Play head #1 length param
|
||
14 RW: Play head #1 master volume
|
||
15 RW: Play head #1 master pan
|
||
16..19 RW:Play head #1 flags
|
||
|
||
... auto-fill to Play head #4
|
||
|
||
40 WO: MP2 Decoder Control
|
||
Write 16 to initialise the MP2 context (call this before the decoding of NEW music)
|
||
Write 1 to decode the frame as MP2
|
||
|
||
Calling with more than one bit set will result in UNDEFINED BEHAVIOUR
|
||
|
||
41 RO: MP2 Decoder Status
|
||
Non-zero value indicates the decoder is busy. Different value may indicate different decoder status.
|
||
42 WO: TAD Decoder Control
|
||
Write 1 to decode TAD data
|
||
43 RW: TAD Quality
|
||
Must be set to appropriate value before decoding
|
||
44 RW: TAD Decoder Status
|
||
Non-zero value indicates the decoder is busy. Different value may indicate different decoder status.
|
||
45 RW: Select PCM Bin for playhead (writing causes side effects)
|
||
|
||
64..2367 RW: MP2 Decoded Samples (unsigned 8-bit stereo)
|
||
2368..4095 RW: MP2 Frame to be decoded
|
||
4096..4097 RO: MP2 Frame guard bytes; always return 0 on read
|
||
|
||
Sound Hardware Info
|
||
- Sampling rate: 32000 Hz
|
||
- Bit depth: 8 bits/sample, unsigned
|
||
- Always operate in stereo (mono samples must be expanded to stereo before uploading)
|
||
|
||
Play Head Flags
|
||
Byte 1
|
||
- 0b mrqp ssss
|
||
m: mode (0 for Tracker, 1 for PCM)
|
||
r: reset parameters; always 0 when read
|
||
resetting will:
|
||
set position to 0,
|
||
set length param to 0,
|
||
set queue capacity to 8 samples,
|
||
unset play bit
|
||
q: purge queues (likely do nothing if not PCM); always 0 when read
|
||
p: play (0 if not -- mute all output)
|
||
|
||
ssss: PCM Mode set PCM Queue Size
|
||
0 - 4 samples
|
||
1 - 6 samples
|
||
2 - 8 samples (the default size)
|
||
3 - 12 samples
|
||
4 - 16 samples
|
||
5 - 24 samples
|
||
6 - 32 samples
|
||
7 - 48 samples
|
||
8 - 64 samples
|
||
9 - 96 samples
|
||
10 - 128 samples
|
||
11 - 192 samples
|
||
12 - 256 samples
|
||
13 - 384 samples
|
||
14 - 512 samples
|
||
15 - 768 samples
|
||
|
||
NOTE: changing from PCM mode to Tracker mode or vice versa will also reset the parameters as described above
|
||
Byte 2
|
||
- PCM Mode: Write non-zero value to start uploading; always 0 when read
|
||
- Tracker Mode: Global mixer flags. Maps directly to Taud effect symbol '1'
|
||
0b 0000 00fp
|
||
p: panning mode (0: linear, 1: equal-power)
|
||
f: pitchshift mode (0: tone-linear, 1: Amiga)
|
||
Tracker command may change the mixer state, but the changes WILL NOT BE REFLECTED BACK.
|
||
Starting a new song will use whatever written to this register. In other words, changes
|
||
made by songs will not persist.
|
||
Byte 3 (Tracker Mode)
|
||
- BPM (24 to 279. Play Data will change this register)
|
||
Byte 4 (Tracker Mode)
|
||
- Tick Rate (Play Data will change this register)
|
||
|
||
Uploaded PCM data will be stored onto the queue before being consumed by hardware.
|
||
If the queue is full, any more uploads will be silently discarded.
|
||
|
||
|
||
32768..65535 RW: Cue Sheet (1024 cues)
|
||
Byte 1..10: Pattern number low nybble for voice 1..20
|
||
Byte 11..20: Pattern number middle nybble for voice 1..20
|
||
Byte 21..30: Pattern number high nybble for voice 1..20
|
||
To recap:
|
||
Byte 1..10: 0b loV1 loV2, 0b loV3 loV4, 0b loV5 loV6, ... 0b loV19 loV20
|
||
Byte 11..20: 0b miV1 miV2, 0b miV3 miV4, 0b miV5 miV6, ... 0b miV19 miV20
|
||
Byte 21..30: 0b hiV1 hiV2, 0b hiV3 hiV4, 0b hiV5 hiV6, ... 0b hiV19 hiV20
|
||
Byte 31..32: instruction
|
||
1000xxxx yyyyyyyy - Go back 0bxxxxyyyyyyyy patterns
|
||
1001xxxx yyyyyyyy - Skip forward 0bxxxxyyyyyyyy patterns
|
||
1111xxxx yyyyyyyy - Go to absolute pattern number 0bxxxxyyyyyyyy
|
||
00000001 - Halt
|
||
00000000 - No operation
|
||
|
||
65536..131071 RW: PCM Sample buffer
|
||
|
||
Table of 3.5 Minifloat values (CSV)
|
||
,000,001,010,011,100,101,110,111,MSB
|
||
00000,0,1,2,4,8,16,32,64
|
||
00001,0.03125,1.03125,2.0625,4.125,8.25,16.5,33,66
|
||
00010,0.0625,1.0625,2.125,4.25,8.5,17,34,68
|
||
00011,0.09375,1.09375,2.1875,4.375,8.75,17.5,35,70
|
||
00100,0.125,1.125,2.25,4.5,9,18,36,72
|
||
00101,0.15625,1.15625,2.3125,4.625,9.25,18.5,37,74
|
||
00110,0.1875,1.1875,2.375,4.75,9.5,19,38,76
|
||
00111,0.21875,1.21875,2.4375,4.875,9.75,19.5,39,78
|
||
01000,0.25,1.25,2.5,5,10,20,40,80
|
||
01001,0.28125,1.28125,2.5625,5.125,10.25,20.5,41,82
|
||
01010,0.3125,1.3125,2.625,5.25,10.5,21,42,84
|
||
01011,0.34375,1.34375,2.6875,5.375,10.75,21.5,43,86
|
||
01100,0.375,1.375,2.75,5.5,11,22,44,88
|
||
01101,0.40625,1.40625,2.8125,5.625,11.25,22.5,45,90
|
||
01110,0.4375,1.4375,2.875,5.75,11.5,23,46,92
|
||
01111,0.46875,1.46875,2.9375,5.875,11.75,23.5,47,94
|
||
10000,0.5,1.5,3,6,12,24,48,96
|
||
10001,0.53125,1.53125,3.0625,6.125,12.25,24.5,49,98
|
||
10010,0.5625,1.5625,3.125,6.25,12.5,25,50,100
|
||
10011,0.59375,1.59375,3.1875,6.375,12.75,25.5,51,102
|
||
10100,0.625,1.625,3.25,6.5,13,26,52,104
|
||
10101,0.65625,1.65625,3.3125,6.625,13.25,26.5,53,106
|
||
10110,0.6875,1.6875,3.375,6.75,13.5,27,54,108
|
||
10111,0.71875,1.71875,3.4375,6.875,13.75,27.5,55,110
|
||
11000,0.75,1.75,3.5,7,14,28,56,112
|
||
11001,0.78125,1.78125,3.5625,7.125,14.25,28.5,57,114
|
||
11010,0.8125,1.8125,3.625,7.25,14.5,29,58,116
|
||
11011,0.84375,1.84375,3.6875,7.375,14.75,29.5,59,118
|
||
11100,0.875,1.875,3.75,7.5,15,30,60,120
|
||
11101,0.90625,1.90625,3.8125,7.625,15.25,30.5,61,122
|
||
11110,0.9375,1.9375,3.875,7.75,15.5,31,62,124
|
||
11111,0.96875,1.96875,3.9375,7.875,15.75,31.5,63,126
|
||
LSB
|
||
|
||
## Tracker Note Effects
|
||
|
||
Tracker Note Effects has been moved to `TAUD_NOTE_EFFECTS.md`
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
**Taud serialisation format**
|
||
Created by CuriousTorvald on 2026-04-19
|
||
|
||
This is a file format for Taud tracker data. Taud can be extended with Microtone (taut.js) project data in backward-and-forward-compatible manner.
|
||
|
||
Endianness: Little
|
||
|
||
# File Structure
|
||
\x1F T S V M a u d
|
||
[HEADER]
|
||
[SAMPLE+INSTRUMENT BIN IMAGE (GZip or Zstd compressed. Read 4-byte magic to determine)]
|
||
[SONG TABLE]
|
||
[PATTERN BIN for SONG 0 (GZip or Zstd compressed)]
|
||
[CUE SHEET for SONG 0 (GZip or Zstd compressed)]
|
||
[PATTERN BIN for SONG 1 (GZip or Zstd compressed)]
|
||
[CUE SHEET for SONG 1 (GZip or Zstd compressed)]
|
||
[PATTERN BIN for SONG 2 (GZip or Zstd compressed)]
|
||
[CUE SHEET for SONG 2 (GZip or Zstd compressed)]
|
||
...
|
||
[PROJECT DATA] (optional)
|
||
[DATA BLOCKS WITH FOURCC HEADER (see Project Data section)]
|
||
|
||
## Header
|
||
Byte[8] Magic
|
||
Uint8 Format version (always 1)
|
||
Uint8 Number of songs in SONG TABLE
|
||
Uint32 Compressed size of SAMPLE+INST section (used to calculate offset to SONG TABLE)
|
||
Uint32 Offset to Project Data. Zero if Project Data is nonexistent
|
||
Byte[14]Tracker/Converter signature
|
||
|
||
## Song Table
|
||
* Rows of 32 bytes:
|
||
Uint32 Song offset
|
||
Uint8 Number of voices
|
||
Uint16 Number of patterns (0 is invalid. pattern bin length = numPats * 8 bytes)
|
||
Uint8 Initial BPM (bias of -24. 0x00=24, 0xFF=279)
|
||
Uint8 Initial Tickrate (0 is invalid)
|
||
Uint16 Current Tuning base note (1..65533). A4 (western default) is 0x5C00. C9 (tracker default) is 0xA000. If zero, assume the tracker default value
|
||
Float32 Frequency at the base note. Tracker default is 8363.0. If zero, assume the tracker default
|
||
Uint8 Flags for Global Behaviour (effect symbol '1')
|
||
Uint8 Song global volume
|
||
* ImpulseTracker has range of 0..128; multiply by (255/128) then round to int
|
||
Uint8 Song mixing volume
|
||
* ImpulseTracker has range of 0..128; multiply by (255/128) then round to int
|
||
Byte[14] Reserved
|
||
|
||
Taud device can queue up to 2 "playdata" in its buffer, which can be interpreted as a song.
|
||
|
||
* Known standard tunings:
|
||
A4 @ 440 Hz. ISO standard
|
||
A4 @ 435 Hz. Former French standard (year 1859)
|
||
A4 @ 452 Hz. Old Philharmonic pitch (19th century Britain)
|
||
C4 @ 256 Hz. Power of two
|
||
C4 @ 262 Hz. Modern Chinese a-ak tuning convention
|
||
C4 @ 311 Hz. Korean hyang-ak tuning standard (ROK National Gugak Center)
|
||
|
||
For your reference, tracker default tuning at A4 is 439.526 Hz (8363*2^(3/4) / 32)
|
||
|
||
## Pattern Bin and Cue Sheet
|
||
RAM image of Pattern Bin/Cue Sheet
|
||
|
||
## Project Data
|
||
|
||
Project Data is just a concatenation of blocks identified by their FourCC.
|
||
|
||
Byte[8] Magic (\x1E T a u d P r J)
|
||
Byte[8] Reserved
|
||
* Repetition of
|
||
Byte[4] Title of the section (fourcc)
|
||
Uint32 Section length
|
||
Byte[*] Section payload
|
||
|
||
### Predefined sections
|
||
|
||
prefixes:
|
||
|
||
- P: Project
|
||
- I: Instrument
|
||
- p: Pattern
|
||
- S: Sample
|
||
- s: Song
|
||
|
||
* PCom. Project author. Encoding: UTF-8
|
||
* PCpr. Project copyright string. Encoding: UTF-8
|
||
* PNam. Project name. Encoding: UTF-8
|
||
|
||
* INam. Instrument name table. Strings separated by 0x1E
|
||
|
||
* pNam. Pattern name table. Strings separated by 0x1E
|
||
|
||
* SNam. Sample name table. Strings separated by 0x1E
|
||
|
||
* sMet. Song metadata table
|
||
* Repetition of:
|
||
Uint8 Song index
|
||
Uint32 Size of this table following this field
|
||
Uint16 Notation used for this song (takes notation index)
|
||
0: raw numbers
|
||
10*n: TET-number times 10 (12-TET = 120)
|
||
* Following systems have alternative notation conventions:
|
||
531: 53-TET Pythagorean Notation
|
||
* Following list defines ethnic notations in 12-tone scale
|
||
10121: Pythagorean Diminished Fifth
|
||
10122: Pythagorean Augmented Fourth
|
||
10123: Shi'er lü (East Asian traditional tuning)
|
||
Uint8 Primary beat division (default: 4 rows)
|
||
Uint8 Secondary beat division (default: 16 rows)
|
||
|
||
Byte[*] Song name, null terminated. Encoding: UTF-8
|
||
Byte[*] Song composer, null terminated. Encoding: UTF-8
|
||
Byte[*] Song copyright string, null terminated. Encoding: UTF-8
|
||
|
||
* nota. Custom notation definition (version 'a')
|
||
* Repetition of:
|
||
Uint8 Notation index (starting from zero) used by songs
|
||
Uint32 Size of this notation following this field
|
||
Uint16 Reserved for flags
|
||
Float32 Interval size (octave system = 2.0f). If you are not using an interval system (which means you are responsible for defining every note expressible), this must be NaN. 0f and Infinity are considered illegal
|
||
Uint16 Notes between interval MINUS ONE (or octave); 12-TET will have value 11
|
||
Byte[8] Reserved
|
||
Byte[*] Name, null terminated. Encoding: UTF-8
|
||
Byte[*] Notation table. 0xFF-separated and null-terminated. Encoding: Taud charset
|
||
Uint16[*] Frequency table. Size of the table is defined by "Notes between interval MINUS ONE". This is a lookup table of relative pitch offsets (against the base tuning note) in 4096-TET space. Index zero of this table will be 0x0 if you read the spec right
|
||
|
||
Note: custom notations will use internal index 65535 down to 65520 (index 0 = 65535, index 15 = 65520)
|
||
|
||
Note Tuning:
|
||
1. "Base Note at C4" will be derived using "Current Tuning Base Note" and "Frequency at the Base Note" from the song table. If the values are A4,440Hz, it will be converted to C4,261.6255653Hz
|
||
2. Frequency at C5 will be (Base Note at C4) × (Interval Size)
|
||
3. 4096 notes will be equidistance-distributed between (Frequency at C3) and (Frequency at C4), with logarithmic pitch progression; this builds the frequency-offset table
|
||
4. Frequency-Offset Table from the previous step will be applied against the "Base Note at C3" to construct the notes within the notation. Value at index zero of the Frequency Table must be 0
|
||
5. The progress will continue outside the "root interval" (C3..C4) to build a complete note-to-frequency table
|
||
|
||
Note: if your sample is pre-tuned for your system, keep the project setting as the defaults. If you are not working with the conventional octave system, you still need to specify the Interval Size
|
||
|
||
* Suggested notation serialisation format (for notation editor, etc.)
|
||
Byte[8] Magic (\x1E T a u d n o t)
|
||
Uint8 Version (Ascii 'a')
|
||
Bytes Notation definitions (see above)
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
**S3M (ScreamTracker 3) to Taud conversion notes**
|
||
(Implemented in s3m2taud.py)
|
||
Created by CuriousTorvald on 2026-04-20
|
||
|
||
## Instrument indexing
|
||
|
||
S3M instrument numbers are 1-based on disk and in pattern cells. Taud's cell instrument byte preserves this: 0 means "no instrument change, reuse whatever was last loaded on this channel"; 1..255 select an instrument slot. The converter passes the raw S3M instrument byte through unchanged (no subtract-1). The instrument bin is written at base = instrument_index * 64, with slot 0 left as an empty/silent entry.
|
||
|
||
## Effect encoding
|
||
|
||
Taud opcodes are base-36 digit values: digits 0..9 map to bytes 0x00..0x09; letters A..Z map to bytes 0x0A..0x23. Effects are encoded into a 1-byte opcode plus a 2-byte argument.
|
||
|
||
## ST3 shared-memory recall (pre-pass)
|
||
|
||
ST3 backs effects D, E, F, I, J, K, L, Q, R, and S with a single per-channel memory slot. A $00 argument on any of these recalls the last non-zero argument. Taud uses narrower per-cohort memory, so the converter walks patterns in order-list order (per channel) and replaces every $00-arg recall with the current slot value before encoding. Patterns reused by multiple order entries are mutated once on their first visit; later visits may diverge from the ST3 original if cross-pattern memory state changed, but this is acceptable for typical usage.
|
||
|
||
## Cxx BCD decode
|
||
|
||
ST3 stores pattern-break row numbers as BCD on disk ($10 means decimal row 10, not hex row 16). The converter decodes: row = (byte >> 4) * 10 + (byte & 0xF). Values that decode to 64 or above clamp to row 0.
|
||
|
||
## Pitch slide unit
|
||
|
||
ST3's coarse slide unit is 1/16 of a semitone. One semitone in Taud's 4096-TET grid is 4096/12 ≈ 341.33 units. One 1/16 semitone ≈ 21.33 units ≈ $0015. All E/F/G coarse arguments are therefore multiplied by $0015. Fine slide forms ($Fx, $Ex) are packed into Taud's $F0xx fine form after the same per-step scale.
|
||
|
||
## J arpeggio (12-TET to 4096-TET)
|
||
|
||
ST3 Jxy nibbles are 12-TET semitone offsets (0..15). Taud's J argument uses the high byte of a 16-bit pitch delta; one byte = 256 units ≈ 0.75 semitones.
|
||
|
||
Conversion: byte = round(semitones * 4 / 3).
|
||
|
||
The full lookup table:
|
||
|
||
Semitones 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
|
||
Taud byte $00 $01 $03 $04 $05 $07 $08 $09 $0B $0C $0D $0F $10 $11 $13 $14
|
||
|
||
## K and L effects
|
||
|
||
The engine treats K and L as no-ops. The converter splits each into two parts:
|
||
K → effect column H $0000 (recall vibrato from HU memory) plus a volume-column slide derived from K's argument
|
||
L → effect column G $0000 plus the same volume-column slide. If the S3M cell already carries an explicit volume-column byte, the slide half is dropped with a -v warning.
|
||
|
||
## M, N (channel volume), X, P (pan) folding
|
||
|
||
M (set channel volume) and N (channel-vol slide) fold into the volume column. X (set pan) and P (pan slide) fold into the pan column. These effects consume no space in the effect slot. W (global vol slide) and Y (panbrello) are dropped with a -v warning.
|
||
|
||
## Volume column defaults
|
||
|
||
When a note trigger is present in a cell with no explicit S3M volume byte, the converter emits SEL_SET (selector 0) with the instrument's default volume. This prevents the channel's prior volume state from persisting into a fresh note. Cells with no note trigger and no explicit volume emit SEL_FINE value 0 (fine slide of 0 = no-op), which leaves channel volume unchanged.
|
||
|
||
## Pan column defaults
|
||
|
||
Row 0 of every pattern emits SEL_SET with the channel's default pan (derived from the S3M channel-setting byte: channels 0-7 → left ($10), channels 8-15 → right ($2F), otherwise centre ($1F)). All other rows emit SEL_FINE value 0 (no-op) unless an X, P, or S$8x effect overrides.
|
||
|
||
## Cue sheet halt placement
|
||
|
||
The halt instruction (byte value 0x01 at cue offset 30) is placed on the last active cue entry, not in a separate empty cue appended after it. This ensures playback stops immediately after the last pattern row completes, with no silent 64-row gap.
|
||
|
||
## Tempo mapping
|
||
|
||
S3M BPM is stored as a raw decimal value. Taud's initial BPM byte uses a bias of -24 (byte 0x00 = 24 BPM, 0xFF = 279 BPM). Conversion: taud_byte = bpm - 24. The converter also scans row 0 of the first pattern in the order list for A (set speed) and T (set tempo) effects and uses those values in preference to the S3M header defaults.
|
||
|
||
## Global volume
|
||
|
||
ST3 global volume is 0..$40; Taud's is 0..$FF. Import scale: Taud_vol = ST3_vol × 4 (clamped to $FF).
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
RomBank / RamBank
|
||
|
||
Endianness: Little
|
||
|
||
MMIO
|
||
|
||
0 RW : Bank number for the first 512 kbytes
|
||
1 RW : Bank number for the last 512 kbytes
|
||
16..23 RW : DMA Control for Lane 1..8
|
||
Write 0x01: copy from Core to Peripheral
|
||
Write 0x02: copy from Peripheral to Core
|
||
* NOTE: after the transfer, the bank numbers will revert to the value that was before the operation
|
||
24..31 RW : DMA Control reserved
|
||
32..34 RW : DMA Lane 1 -- Addr on the Core Memory
|
||
35..37 RW : DMA Lane 1 -- Addr on the Peripheral's Memory (addr can be across-the-bank)
|
||
38..40 RW : DMA Lane 1 -- Transfer Length
|
||
41..42 RW : DMA Lane 1 -- First/Last Bank Number
|
||
43 RW : DMA Lane 1 -- (reserved)
|
||
44..55 RW : DMA Lane 2 Props
|
||
56..67 RW : DMA Lane 3 Props
|
||
68..79 RW : DMA Lane 4 Props
|
||
80..91 RW : DMA Lane 5 Props
|
||
92..103 RW : DMA Lane 6 Props
|
||
104..115 RW : DMA Lane 7 Props
|
||
116..127 RW : DMA Lane 8 Props
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
High Speed Disk Peripheral Adapter (HSDPA)
|
||
|
||
An interface card to read and write to a single large disk sequentially which has no filesystem on it.
|
||
|
||
Endianness: Little
|
||
|
||
MMIO
|
||
|
||
0..2 RW: Block transfer status for Disk 1
|
||
0b nnnn nnnn, nnnn nnnn , a00z mmmm
|
||
|
||
n-read: size of the block from the other device, LSB (1048576-full block size is zero)
|
||
m-read: size of the block from the other device, MSB (1048576-full block size is zero)
|
||
a-read: if the other device hasNext (doYouHaveNext), false if device not present
|
||
z-read: set if the size is actually 0 instead of 1048576 (overrides n and m parameters)
|
||
|
||
n-write: size of the block I'm sending, LSB (1048576-full block size is zero)
|
||
m-write: size of the block I'm sending, MSB (1048576-full block size is zero)
|
||
a-write: if there's more to send (hasNext)
|
||
z-write: set if the size is actually 0 instead of 1048576 (overrides n and m parameters)
|
||
3..5 RW: Block transfer status for Disk 2
|
||
6..8 RW: Block transfer status for Disk 3
|
||
9..11 RW: Block transfer status for Disk 4
|
||
12..15 RW: Block transfer control for Disk 1 through 4
|
||
0b 0000 abcd
|
||
|
||
a: 1 for send, 0 for receive
|
||
|
||
b-write: 1 to start sending if a-bit is set; if a-bit is unset, make other device to start sending
|
||
b-read: if this bit is set, you're currently receiving something (aka busy)
|
||
|
||
c-write: I'm ready to receive
|
||
c-read: Are you ready to receive?
|
||
|
||
d-read: Are you there? (if the other device's recipient is myself)
|
||
|
||
NOTE: not ready AND not busy (bits b and d set when read) means the device is not connected to the port
|
||
16..19 RW: 8-bit status code for the disk
|
||
20 RW: Currently active disk (0: deselect all disk, 1: select disk #1, ...)
|
||
|
||
Selecting a disk will automatically unset and hold down "I'm ready to receive" flags of the other disks,
|
||
however, the target disk will NOT have its "I'm ready to receive" flag automatically set.
|
||
|
||
-- SEQUENTIAL IO SUPPORT MODULE --
|
||
|
||
NOTE: Sequential I/O will clobber the peripheral memory space.
|
||
|
||
256..257 RW: Sequential I/O control flags
|
||
|
||
258 RW: Opcode. Writing a value to this memory will execute the operation
|
||
0x00 - No operation
|
||
0x01 - Skip (arg 1) bytes
|
||
0x02 - Read (arg 1) bytes and store to core memory pointer (arg 2)
|
||
0x03 - Write (arg 1) bytes using data from the core memory from pointer (arg 2)
|
||
0xF0 - Rewind the file to the starting point
|
||
0xFF - Terminate sequential I/O session and free up the memory space
|
||
259..261 RW: Argument #1
|
||
262..264 RW: Argument #2
|
||
265..267 RW: Argument #3
|
||
268..270 RW: Argument #4
|
||
|
||
|
||
Memory Space
|
||
|
||
0..1048575 RW: Buffer for the block transfer lane
|
||
IMPLEMENTATION RECOMMENDATION: split the memory space into two 512K blocks, and when the sequential
|
||
reading reaches the second space, prepare the next bytes in the first memory space, so that the read
|
||
cursor reaches 1048576, it wraps into 0 and continue reading the content of the disk as if nothing happend.
|
||
|