tsvm/terranmon.txt

1 byte = 2 pixels

560x448@4bpp = 125 440 bytes
560x448@8bpp = 250 880 bytes

-> 262144 bytes (256 kB)

[USER AREA | HW AREA]

    Number of pheripherals = 8, of which the computer itself is considered as
     a peripheral.

HW AREA = [Peripherals | MMIO | INTVEC]

User area: 8 MB, hardware area: 8 MB

8192 kB
    User Space
1024 kB
    Peripheral #7
1024 kB
    Peripheral #6
...
1024 kB  (where Peripheral #0 would be)
    MMIO and Interrupt Vectors
    128 kB
        MMIO for Peri #7
    128 kB
        MMIO for Peri #6
    ...
    128 kB  (where Peripheral #0 would be)
        MMIO for the computer

Certain memory mapper may allow extra 4 MB of User Space in exchange for the Peripheral slot #4 through #7.

--------------------------------------------------------------------------------

IO Device

Endianness: little
Note: Always takes up the peripheral slot of zero

Latching: latching is used to "lock" the fluctuating values when you attempt to read them so you would get
          reliable values when you try to read them, especially the multibyte values where another byte would
          change after you read one byte, e.g. System uptime in nanoseconds

MMIO

0..31 RO: Raw Keyboard Buffer read. Won't shift the key buffer
32..33 RO: Mouse X pos
34..35 RO: Mouse Y pos
36 RO: Mouse down? (1 for TRUE, 0 for FALSE)
37 RW: Read/Write single key input. Key buffer will be shifted. Manual writing is
    usually unnecessary as such action must be automatically managed via LibGDX
    input processing.
    Stores ASCII code representing the character, plus:
   (1..26: Ctrl+[alph])
    3 : Ctrl+C
    4 : Ctrl+D
    8 : Backspace
   (13: Return)
    19: Up arrow
    20: Down arrow
    21: Left arrow
    22: Right arrow
38 RW: Request keyboard input be read (TTY Function). Write nonzero value to enable, write zero to
    close it. Keyboard buffer will be cleared whenever request is received, so
    MAKE SURE YOU REQUEST THE KEY INPUT ONLY ONCE!
39 WO: Latch Key/Mouse Input (Raw Input function). Write nonzero value to latch.
    Stores LibGDX Key code
40..47 RO: Key Press buffer
    stores keys that are held down. Can accomodate 8-key rollover (in keyboard geeks' terms)
    0x0 is written for the empty area; numbers are always sorted
48..51 RO: System flags
    48: 0b rq00 000t
        t: STOP button (should raise SIGTERM)
        r: RESET button (hypervisor should reset the system)
        q: SysRq button (hypervisor should respond to it)
    49: set to 1 if a key has pushed into key buffer (or, if the system has a key press to pull) via MMIO 38; othewise 0

64..67 RO: User area memory size in bytes
68 WO: Counter latch
    0b 0000 00ba
    a: System uptime
    b: RTC
72..79 RO: System uptime in nanoseconds
80..87 RO: RTC in microseconds

88 RW: Rom mapping
    write 0xFF to NOT map any rom
    write 0x00 to map BIOS
    write 0x01 to map first "extra ROM"

89 RW: BMS flags
    0b P000 b0ca
    a: 1 if charging (accepting power from the AC adapter)
    c: 1 if battery is detected
    b: 1 if the device is battery-operated

    P: 1 if CPU halted (so that the "smart" power supply can shut itself down)

    note: only the high nybbles are writable!

    if the device is battery-operated but currently running off of an AC adapter and there is no battery inserted,
    the flag would be 0000 1001

90 RO: BMS calculated battery percentage where 255 is 100%
91 RO: BMS battery voltage multiplied by 10 (127 = "12.7 V")

92 RW: Memory Mapping
    0: 8 MB Core, 8 MB Hardware-reserved, 7 card slots
    1: 12 MB Core, 4 MB Hardware-reserved, 3 card slots (HW addr 131072..1048575 cannot be reclaimed though)

1024..2047 RW: Reserved for integrated peripherals (e.g. built-in status display)

2048..4075 RW: Used by the hypervisor
    2048..3071 RW: Interrupt vectors (0-255), 32-bit address. Used regardless of the existence of the hypervisor.
        If hypervisor is installed, the interrupt calls are handled using the hypervisor
        If no hypervisors are installed, the interrupt call is performed by the "hardware"
    Interrupt Vector Table:
        0x00 - Initial Stack Pointer (currently unused)
        0x01 - Reset
        0x02 - NMI
        0x03 - Out of Memory

        0x0C - IRQ_COM1
        0x0D - IRQ_COM2
        0x0E - IRQ_COM3
        0x0F - IRQ_COM4

        0x10 - Core Memory Access Violation
        0x11 - Card 1 Access Violation
        0x12 - Card 2 Access Violation
        0x13 - Card 3 Access Violation
        0x14 - Card 4 Access Violation
        0x15 - Card 5 Access Violation
        0x16 - Card 6 Access Violation
        0x17 - Card 7 Access Violation

        0x20 - IRQ_Core
        0x21 - IRQ_CARD1
        0x22 - IRQ_CARD2
        0x23 - IRQ_CARD3
        0x24 - IRQ_CARD4
        0x25 - IRQ_CARD5
        0x26 - IRQ_CARD6
        0x27 - IRQ_CARD7
    3072..3075 RW: Status flags


4076..4079 RW: 8-bit status code for the port
4080..4083 RO: 8-bit status code for connected device

4084..4091 RO: Block transfer status
    0b nnnnnnnn a00z mmmm

    n-read: size of the block from the other device, LSB (4096-full block size is zero)
    m-read: size of the block from the other device, MSB (4096-full block size is zero)
    a-read: if the other device hasNext (doYouHaveNext), false if device not present
    z-read: set if the size is actually 0 instead of 4096 (overrides n and m parameters)

    n-write: size of the block I'm sending, LSB (4096-full block size is zero)
    m-write: size of the block I'm sending, MSB (4096-full block size is zero)
    a-write: if there's more to send (hasNext)
    z-write: set if the size is actually 0 instead of 4096 (overrides n and m parameters)

4092..4095 RW: Block transfer control for Port 1 through 4
    0b 00ms abcd

    m-readonly: device in master setup
    s-readonly: device in slave setup

    a: 1 for send, 0 for receive

    b-write: 1 to start sending if a-bit is set; if a-bit is unset, make other device to start sending
    b-read: if this bit is set, you're currently receiving something (aka busy)

    c-write: I'm ready to receive
    c-read: Are you ready to receive?

    d-read: Are you there? (if the other device's recipient is myself)

    NOTE: not ready AND not busy (bits b and d set when read) means the device is not connected to the port

4096..8191 RW: Buffer for block transfer lane #1
8192..12287 RW: Buffer for block transfer lane #2
12288..16383 RW: Buffer for block transfer lane #3
16384..20479 RW: Buffer for block transfer lane #4

65536..131071 RO: Mapped to ROM

--------------------------------------------------------------------------------

VRAM Bank 0 (256 kB)

Endianness: little


Memory Space

250880 bytes
    Framebuffer
3 bytes
    Initial background (and the border) colour RGB, 8 bits per channel
1 byte
    command (writing to this memory address changes the status)
        1: reset palette to default
        2: fill framebuffer with given colour (arg1)
        3: do '1' then do '2' (with arg1) then do '4' (with arg2)
        4: fill framebuffer2 with given colour (arg1)

        16: copy Low Font ROM (char 0–127) to mapping area
        17: copy High Font ROM (char 128–255) to mapping area
        18: write contents of the font ROM mapping area to the Low Font ROM
        19: write contents of the font ROM mapping area to the High Font ROM
        20: reset Low Font ROM to default
        21: reset High Font ROM to default
12 bytes
    argument for "command" (arg1: Byte, arg2: Byte)
    write to this address FIRST and then write to "command" to execute the command
1008 bytes
    reserved
2046 bytes
    unused
2 bytes
    Cursor position in: (y*80 + x)
2560 bytes
    Text foreground colours
2560 bytes
    Text background colours
2560 bytes
    Text buffer of 80x32 (7x14 character size, and yes: actual character data is on the bottom)
512 bytes
    Palette stored in following pattern: 0b rrrr gggg, 0b bbbb aaaa, ....
    Palette number 255 is always full transparent (bits being all zero)

(DRAFT) Optional Sprite Card (VRAM Bank 1 (256 kB))
250880 bytes
    One of:
        Secondary layer
        Other 8-bit of the primary framebuffer (4K colour mode)

SPRITE FORMAT DRAFT 1

    533 bytes: Sprite attribute table
        (41 sprites total, of which 1 is GUI cursor)
        12 bytes - signed fixed point
            X-position
            Y-position
            Transform matrix A..D
        1 bytes
            0b 0000 00vp
            (p: 0 for above-all, 1 for below-text, v: show/hide)
    10496 bytes: Sprite table
        256 bytes
            16x16 texture for the sprite
    235 bytes:
        unused

SPRITE FORMAT DRAFT 2

    DMA Sprite Area - 18 bytes each, total of ??? sprites
    1 byte
        Sprite width
    1 byte
        Sprite height
    12 bytes - signed fixed point
        Affine transformation A,B,C,D,X,Y
    1 byte
        Attributes
        0b 0000 00vp
        (p: 0 for above-all, 1 for below-text, v: show/hide)
    3 bytes
        Pointer to raw pixmap data in Core Memory

MMIO

0..1 RO
    Framebuffer width in pixels
2..3 RO
    Framebuffer height in pixels
4 RO
    Text mode columns
5 RO
    Text mode rows
6 RW
    Text-mode attributes
    0b 0000 00rc (r: TTY Raw mode, c: Cursor blink)
7 RW
    Graphics-mode attributes
    0b 0000 rrrr (r: Resolution/colour depth)
8 RO
    Last used colour (set by poking at the framebuffer)
9 RW
    current TTY foreground colour (useful for print() function)
10 RW
    current TTY background colour (useful for print() function)
11 RO
    Number of Banks, or VRAM size (1 = 256 kB, max 4)
12 RW
    Graphics Mode
    0: 560x448,  256 Colours, 1 layer
    1: 280x224,  256 Colours, 4 layers
    2: 280x224, 4096 Colours, 2 layers
    3: 560x448,  256 Colours, 2 layers (if bank 2 is not installed, will fall back to mode 0)
    4: 560x448, 4096 Colours, 1 layer  (if bank 2 is not installed, will fall back to mode 0)
    4096 is also known as "direct colour mode" (4096 colours * 16 transparency -> 65536 colours)
        Two layers are grouped to make a frame, "low layer" contains RG colours and "high layer" has BA colours,
        Red and Blue occupies MSBs
13 RW
    Layer Arrangement
    If 4 layers are used:
     Num  LO<->HI
        0	1234
        1	1243
        2	1324
        3	1342
        4	1423
        5	1432
        6	2134
        7	2143
        8	2314
        9	2341
        10	2413
        11	2431
        12	3124
        13	3142
        14	3214
        15	3241
        16	3412
        17	3421
        18	4123
        19	4132
        20	4213
        21	4231
        22	4312
        23	4321
    If 2 layers are used:
     Num  LO<->HI
        0	12
        1	12
        2	12
        3	12
        4	12
        5	12
        6	12
        7	21
        8	21
        9	21
        10	21
        11	21
        12	12
        13	12
        14	21
        15	21
        16	12
        17	21
        18	12
        19	12
        20	21
        21	21
        22	12
        23	21
    If 1 layer is used, this field will do nothing and always fall back to 0
14..15 RW
    framebuffer scroll X
16..17 RW
    framebuffer scroll Y
18 RO
    Busy flags
    1: Codec in-use
    2: Draw Instructions being decoded
19 WO
    Write non-zero value to initiate the Draw Instruction decoding
20..21 RO
    Program Counter for the Draw Instruction decoding
1024..2047 RW
    horizontal scroll offset for scanlines
2048..4095 RW
    !!NEW!! Font ROM Mapping Area
    Format is always 8x16 pixels, 1bpp ROM format (so that it would be YY_CHR-Compatible)
    (designer's note: it's still useful to divide the char rom to two halves, lower half being characters ROM and upper half being symbols ROM)
65536..131071 RW
    Draw Instructions

Text-mode-font-ROM is immutable and does not belong to VRAM
Even in the text mode framebuffer is still being drawn onto the screen, and the texts are drawn on top of it

Copper Commands (suggestion withdrawn)

WAITFOR 3,32
    80·03 46 00 (0x004603: offset on the framebuffer)
SCROLLX 569
    A0·39 02 00
SCROLLY 321
    B0·41 01 00
SETPAL 5 (15 2 8 15)
    C0·05·F2 8F (0x05: Palette number, 0xF28F: RGBA colour)
SETBG (15 2 8 15)
    D0·00·F2 8F (0xF28F: RGBA colour)
END (pseudocommand of WAITFOR)
    80·FF FF FF

--------------------------------------------------------------------------------

TSVM MOV file format

Endianness: Little

\x1F T S V M M O V
[METADATA]
[PACKET 0]
[PACKET 1]
[PACKET 2]
...


where:

METADATA -
    uint16 WIDTH
    uint16 HEIGHT
    uint16 FPS (0: play as fast as can)
    uint32 NUMBER OF FRAMES
    uint16 UNUSED (fill with 255,0)
    uint16 AUDIO QUEUE INFO
           when read as little endian:
           0b nnnn bbbb bbbb bbbb
              [byte 21] [byte 20]
           n: size of the queue (number of entries). Allocate at least 1 more entry than the number specified!
           b: size of each entry in bytes DIVIDED BY FOUR (all zero = 16384; always 0x240 for MP2 because MP2-VBR is not supported)

           n=0 indicates the video audio must be decoded on-the-fly instead of being queued, or has no audio packets
    byte[10] RESERVED


Packet Types -
    <video>
       0,0: 256-Colour frame
       1,0: 256-Colour frame with palette data
       2,0: 4096-Colour frame (stored as two byte-planes)
       4,t: iPF no-alpha indicator (see iPF Type Numbers for details)
       5,t: iPF with alpha indicator (see iPF Type Numbers for details)
      16,0: Series of JPEGs
      18,0: Series of PNGs
      20,0: Series of TGAs
      21,0: Series of TGA/GZs
    <audio>
      0,16: Raw PCM Stereo
      1,16: Raw PCM Mono
      p,17: MP2, 32 kHz (see MP2 Format Details section for p-value)
      q,18: ADPCM, 32 kHz (q = 2 * log_2(frameSize) + (1 if mono, 0 if stereo))
    <special>
    255,255: sync packet (wait until the next frame)
    254,255: background colour packet
     31,84 : prohibited

    Packet Type High Byte (iPF Type Numbers)
        0..7: iPF Type 1..8

    - MP2 Format Details
    Rate | 2ch | 1ch
      32 |   0 |   1
      48 |   2 |   3
      56 |   4 |   5
      64 |   6 |   7  (libtwolame does not allow bitrate lower than this on 32 kHz stereo)
      80 |   8 |   9
      96 |  10 |  11
     112 |  12 |  13
     128 |  14 |  15
     160 |  16 |  17
     192 |  18 |  19
     224 |  20 |  21
     256 |  22 |  23
     320 |  24 |  25
     384 |  26 |  27
    Add 128 to the resulting number if the frame has a padding bit (should not happen on 32kHz sampling rate)
    Special value of 255 may indicate some errors

    To encode an audio to compliant format, use ffmpeg: ffmpeg -i <your_music> -acodec libtwolame -psymodel 4 -b:a <rate>k -ar 32000 <output.mp2>
        Rationale:
        -acodec libtwolame : ffmpeg has two mp2 encoders, and libtwolame produces vastly higher quality audio
        -psymodel 4 : use alternative psychoacoustic model -- the default model (3) tends to insert "clunk" sounds throughout the audio
        -b:a : 256k is recommended for high quality audio (trust me, you don't need 384k)
        -ar 32000 : resample the audio to 32kHz, the sampling rate of the TSVM soundcard

TYPE 0 Packet -
    uint32 SIZE OF COMPRESSED FRAMEDATA
    *      COMPRESSED FRAMEDATA

TYPE 1 Packet -
    byte[512] Palette Data
    uint32 SIZE OF COMPRESSED FRAMEDATA
    *      COMPRESSED FRAMEDATA

TYPE 2 Packet -
    uint32 SIZE OF COMPRESSED FRAMEDATA BYTE-PLANE 1
    *      COMPRESSED FRAMEDATA
    uint32 SIZE OF COMPRESSED FRAMEDATA BYTE-PLANE 2
    *      COMPRESSED FRAMEDATA

iPF Packet -
    uint32 SIZE OF COMPRESSED FRAMEDATA
    *      COMPRESSED FRAMEDATA // only the actual gzip (and no UNCOMPRESSED SIZE) of the "Blocks.gz" is stored

TYPE 3 Packet (Patch-encoded iPF 1 Packet) -
    uint32 SIZE OF COMPRESSED PATCHES
    *      COMPRESSED PATCHES

    PATCHES are bunch of PATCHes concatenated

    where each PATCH is encoded as:

        uint8  X-coord of the patch (pixel position divided by four)
        uint8  Y-coord of the patch (pixel position divided by four)
        uint8  width of the patch (size divided by four)
        uint8  height of the patch (size divided by four)
        (calculating uncompressed size)
        (iPF1 no alpha: width * height * 12)
        (iPF1 with alpha: width * height * 20)
        (iPF2 no alpha: width * height * 16)
        (iPF2 with alpha: width * height * 24)
        *      UN-COMPRESSED PATCHDATA


TYPE 16+ Packet -
    uint32 SIZE OF COMPRESSED FRAMEDATA BYTE-PLANE 1
    *      FRAMEDATA (COMPRESSED for TGA/GZ)

MP2 Packet & ADPCM Packet -
    uint16 TYPE OF PACKET // follows the Metadata Packet Type scheme
    *      MP2 FRAME/ADPCM BLOCK

Sync Packet (subset of GLOBAL TYPE 255 Packet) -
    uint16 0xFFFF (type of packet for Global Type 255)

Background Colour Packet -
    uint16 0xFEFF
    uint8  Red (0-255)
    uint8  Green (0-255)
    uint8  Blue (0-255)
    uint8  0x00 (pad byte)


Frame Timing
    If the global type is not 255, each packet is interpreted as a single full frame, and then will wait for the next
    frame time; For type 255 however, the assumption no longer holds and each frame can have multiple packets, and thus
    needs explicit "sync" packet for proper frame timing.


Comperssion Method
    Old standard used Gzip, new standard is Zstd.
    tsvm will read the zip header and will use appropriate decompression method, so that the old Gzipped
    files remain compatible.


NOTE FROM DEVELOPER
    In the future, the global packet type will be deprecated.

--------------------------------------------------------------------------------

TSVM Interchangeable Picture Format (aka iPF Type 1/2)

Image is divided into 4x4 blocks and each block is serialised, then the entire iPF blocks are gzipped


# File Structure
\x1F T S V M i P F
[HEADER]
[Blocks]

- Header
    uint16 WIDTH
    uint16 HEIGHT
    uint8 Flags
        0b p00z 000a
        - a: has alpha
        - z: gzipped (p flag always sets this flag)
        - p: progressive ordering (Adam7)
    uint8  iPF Type/Colour Mode
        0: Type 1 (4:2:0 chroma subsampling; 2048 colours?)
        1: Type 2 (4:2:2 chroma subsampling; 2048 colours?)
    byte[10] RESERVED
    uint32 UNCOMPRESSED SIZE (somewhat redundant but included for convenience)

- Chroma Subsampled Blocks
    Gzipped unless the z-flag is not set.
    4x4 pixels are sampled, then divided into YCoCg planes.
    CoCg planes are "chroma subsampled" by 4:2:0, then quantised to 4 bits (8 bits for CoCg combined)
    Y plane is quantised to 4 bits

    By doing so, CoCg planes will reduce to 4 pixels
    For the description of packing, pixels in Y/Cx plane will be numbered as:
        Y0 Y1 Y2 Y3 || Cx1   Cx2 | Cx1   Cx2
        Y4 Y5 Y6 Y7 ||  (iPF 1)  | Cx3   Cx4
        Y8 Y9 YA YB || Cx3   Cx4 | Cx5   Cx6
        YC YD YE YF ||  (iPF 1)  | Cx7   Cx8

    Bits are packed like so:

iPF1:
    uint16 [Co4 | Co3 | Co2 | Co1]
    uint16 [Cg4 | Cg3 | Cg2 | Cg1]
    uint16 [Y1 | Y0 | Y5 | Y4]
    uint16 [Y3 | Y2 | Y7 | Y6]
    uint16 [Y9 | Y8 | YD | YC]
    uint16 [YB | YA | YF | YE]
    (total: 12 bytes)

iPF2:
    uint32 [Co8 | Co7 | Co6 | Co5 | Co4 | Co3 | Co2 | Co1]
    uint32 [Cg8 | Cg7 | Cg6 | Cg5 | Cg4 | Cg3 | Cg2 | Cg1]
    uint16 [Y1 | Y0 | Y5 | Y4]
    uint16 [Y3 | Y2 | Y7 | Y6]
    uint16 [Y9 | Y8 | YD | YC]
    uint16 [YB | YA | YF | YE]
    (total: 16 bytes)

    If has alpha, append following bytes for alpha values
    uint16 [a1 | a0 | a5 | a4]
    uint16 [a3 | a2 | a7 | a6]
    uint16 [a9 | a8 | aD | aC]
    uint16 [aB | aA | aF | aE]
    (total: 20/24 bytes)

    Subsampling mask:

    Least significant byte for top-left, most significant for bottom-right
    For example, this default pattern

    00 00 01 01
    00 00 01 01
    10 10 11 11
    10 10 11 11

    turns into:

    01010000 -> 0x30
    01010000 -> 0x30
    11111010 -> 0xFA
    11111010 -> 0xFA

    which packs into: [ 30 | 30 | FA | FA ] (because little endian)

iPF1-delta (for video encoding):

Delta encoded frames contain "insutructions" for patch-encoding the existing frame.
Or, a collection of [StateChangeCode] [Optional VarInts] [Payload...] pairs

States:
0x00 SKIP [varint skipCount]
0x01 PATCH [varint blockCount] [12x blockCount bytes]
0x02 REPEAT [varint repeatCount] [a block]
0xFF END

Sample stream:
    [SKIP 10] [PATCH A] [REPEAT 3] [SKIP 5] [PATCH B] [END]

Delta block format:

    Each PATCH delta payload is still:
        8 bytes of Luma (4-bit deltas for 16 pixels)
        2 bytes of Co deltas (4× 4-bit deltas)
        2 bytes of Cg deltas (4× 4-bit deltas)
    Total: 12 bytes per PATCH.

    These are always relative to the same-position block in the previous frame.


- Progressive Blocks
    Ordered string of words (word size varies by the colour mode) are stored here.
    If progressive mode is enabled, words are stored in the order that accomodates it.

--------------------------------------------------------------------------------

TSVM Enhanced Video (TEV) Format
Created by CuriousTorvald and Claude on 2025-08-17

TEV is a modern video codec optimized for TSVM's 4096-color hardware, featuring
DCT-based compression, optional motion compensation, and efficient temporal coding.

## Version History
- Version 2.0: YCoCg-R 4:2:0 with 16x16/8x8 DCT blocks
- Version 2.1: Added Rate Control Factor to all video packets (breaking change)
  * Enables bitrate-constrained encoding alongside quality modes
  * All video frames now include 4-byte rate control factor after payload size
- Version 3.0: Additional support of ICtCp Colour space

# File Structure
\x1F T S V M T E V (if video), \x1F T S V M T E P (if still picture)
[HEADER]
[PACKET 0]
[PACKET 1]
[PACKET 2]
...

## Header (24 bytes)
    uint8  Magic[8]: "\x1FTSVMTEV" or "\x1FTSVMTEP"
    uint8  Version: 2 (YCoCg-R) or 3 (ICtCp)
    uint16 Width: video width in pixels
    uint16 Height: video height in pixels
    uint8 FPS: frames per second
    uint32 Total Frames: number of video frames
    uint8  Quality Index for Y channel (0-99; 100 denotes all quantiser is 1)
    uint8  Quality Index for Co channel (0-99; 100 denotes all quantiser is 1)
    uint8  Quality Index for Cg channel (0-99; 100 denotes all quantiser is 1)
    uint8  Extra Feature Flags
            - bit 0 = has audio
            - bit 1 = has subtitle
            - bit 2 = infinite loop (must be ignored when File Role is 1)
            - bit 7 = has no actual packets, this file is header-only without an Intro Movie
    uint8  Video Flags
            - bit 0 = is interlaced (should be default for most non-archival TEV videos)
            - bit 1 = is NTSC framerate (repeat every 1000th frame)
    uint8  File Role
            - 0 = generic
            - 1 = this file is header-only, and UCF payload will be followed (used by seekable movie file)
                   When header-only file contain video packets, they should be presented as an Intro Movie
                   before the user-interactable selector (served by the UCF payoad)

## Packet Types
    0x10: I-frame (intra-coded frame)
    0x11: P-frame (predicted frame)
    0x1F: prohibited
    0x20: MP2 audio packet
    0x30: Subtitle in "Simple" format
    0x31: Subtitle in "Karaoke" format
    0xE0: EXIF packet
    0xE1: ID3v1 packet
    0xE2: ID3v2 packet
    0xE3: Vorbis Comment packet
    0xE4: CD-text packet
    0xFF: sync packet

## Standard metadata payload packet structure
    uint8  0xE0/0xE1/0xE2/.../0xEF (see Packet Types section)
    uint32 Length of the payload
    *      Standard payload

note: metadata packets must precede any non-metadata packets

## Video Packet Structure
    uint8  Packet Type
    uint32 Compressed Size
    *      Zstd-compressed Block Data

## Block Data (per 16x16 block)
    uint8  Mode: encoding mode
           0x00 = SKIP (copy from previous frame)
           0x01 = INTRA (DCT-coded, no prediction)
           0x02 = INTER (DCT-coded with motion compensation) -- currently unused due to bugs
           0x03 = MOTION (motion vector only)
    int16  Motion Vector X ("capable of" 1/4 pixel precision, integer precision for now)
    int16  Motion Vector Y ("capable of" 1/4 pixel precision, integer precision for now)
    float32  Rate Control Factor (4 bytes, little-endian)
    uint16 Coded Block Pattern (which 8x8 have non-zero coeffs)
    int16[256]  DCT Coefficients Y
    int16[64]  DCT Coefficients Co (subsampled by two)
    int16[64]  DCT Coefficients Cg (subsampled by two, aggressively quantised)
        For SKIP and MOTION mode, DCT coefficients are filled with zero

## DCT Quantisation and Rate Control
TEV uses 5 quality levels (0=lowest, 4=highest) with progressive quantisation
tables optimized for perceptual quality. DC coefficients are encoded losslessly,
while AC coefficients are quantised according to quality tables.

### Rate Control Factor
Each block includes a Rate Control Factor that modifies quality level for that specific block.
This feature allows more efficient coding by allows higher quality for complex blocks and lower quality for
flat blocks.

## Motion Compensation
- Search range: ±8 pixels
- Sub-pixel precision: 1/4 pixel (again, integer precision for now)
- Block size: 16x16 pixels
- Uses Sum of Absolute Differences (SAD) for motion estimation
- Bilinear interpolation for sub-pixel motion vectors

## Colour Space
TEV operates in 8-Bit colour mode, colour space conversion required

## Compression Features
- 16x16 DCT blocks (vs 4x4 in iPF)
- Temporal prediction with motion compensation
- Rate-distortion optimized mode selection
- Hardware-accelerated encoding/decoding functions

## Performance Comparison
TEV achieves 60-80% better compression than iPF formats while maintaining
equivalent visual quality, with significantly faster decode performance due
to larger block sizes and hardware acceleration.

## Audio Support
Reuses existing MP2 audio infrastructure from TSVM MOV format for seamless
compatibility with existing audio processing pipeline.

## NTSC Framerate handling
The encoder encodes the frames as-is. The decoder must duplicate every 1000th frame to keep the decoding
in-sync.

--------------------------------------------------------------------------------

Simple Subtitle Format (SSF)

SSF is a simple subtitle that is intended to use text buffer to display texts.
The format is designed to be compatible with SubRip and SAMI (without markups) and interoperable with
TEV and TAV formats.

SSF-TC is an SSF with extra timecode so that subtitle packets can be desynchronised with video frames
on encoding.

When SSF is interleaved with MP2 audio, the payload must be inserted in-between MP2 frames.

## Packet Structure
    uint8  0x30/0x31 (SSF/SSF-TC)
    uint32 Packet Size
    *      SSF Payload (see below)

## SSF Packet Structure
    uint24 Subtitle object ID (used to specify target subtitle object)
    uint64 Timecode in nanoseconds (only present on SSF-TC format; regular SSF must not write these bytes)
    uint8 opcode
          0x00 = <argument terminator>, is NOP when used here
          0x01 = show (arguments: UTF-8 text)
          0x02 = hide (arguments: none)
          0x03 = move to different nonant (arguments: 0x00-bottom centre; 0x01-bottom left; 0x02-centre left; 0x03-top left; 0x04-top centre; 0x05-top right; 0x06-centre right; 0x07-bottom right; 0x08-centre
          0x10..0x2F = show in alternative languages (arguments: char[5] language code, UTF-8 text)
          0x80 = upload to low font rom (arguments: uint16 payload length, var bytes)
          0x81 = upload to high font rom (arguments: uint16 payload length, var bytes)
            note: changing the font rom will change the appearance of the every subtitle currently being displayed
    *    arguments separated AND terminated by 0x00
            text argument may be terminated by 0x00 BEFORE the entire arguments being terminated by 0x00,
            leaving extra 0x00 on the byte stream. A decoder must be able to handle the extra zeros.

--------------------------------------------------------------------------------

Karaoke Subtitle Format (KSF)

KSF is a frame-synced subtitle that is intended to use Karaoke-style subtitles.
The format is designed to be interoperable with TEV and TAV formats.
For non-karaoke style synced lyrics, use SSF.

KSF-TC is an KSF with extra timecode so that subtitle packets can be desynchronised with video frames
on encoding.

When KSF is interleaved with MP2 audio, the payload must be inserted in-between MP2 frames.

## Packet Structure
    uint8  0x32/0x33 (KSF/KSF-TC)
    *      KSF Payload (see below)

### KSF Packet Structure
    KSF is line-based: you define an unrevealed line, then subsequent commands reveal words/syllables
    on appropriate timings.

    uint24 Subtitle object ID (used to specify target subtitle object)
    uint64 Timecode in nanoseconds (only present on KSF-TC format; regular KSF must not write these bytes)
    uint8 opcode
          <definition opcodes>
          0x00 = <argument terminator>, is NOP when used here
          0x01 = define line (arguments: UTF-8 text. Players will also show it in grey)
          0x02 = delete line (arguments: none)
          0x03 = move to different nonant (arguments: 0x00-bottom centre; 0x01-bottom left; 0x02-centre left; 0x03-top left; 0x04-top centre; 0x05-top right; 0x06-centre right; 0x07-bottom right; 0x08-centre

          <reveal opcodes>
          0x30 = reveal text normally (arguments: UTF-8 text. The reveal text must contain spaces when required)
          0x31 = reveal text slowly (arguments: UTF-8 text. The effect is implementation-dependent)

          0x40 = reveal text normally with emphasise (arguments: UTF-8 text. On TEV/TAV player, the text will be white; otherwise, implementation-dependent)
          0x41 = reveal text slowly with emphasise (arguments: UTF-8 text)

          0x50 = reveal text normally with target colour (arguments: uint8 target colour; UTF-8 text)
          0x51 = reveal text slowly with target colour (arguments: uint8 target colour; UTF-8 text)

          <hardware control opcodes>
          0x80 = upload to low font rom (arguments: uint16 payload length, var bytes)
          0x81 = upload to high font rom (arguments: uint16 payload length, var bytes)
            note: changing the font rom will change the appearance of the every subtitle currently being displayed
    *    arguments separated AND terminated by 0x00
            text argument may be terminated by 0x00 BEFORE the entire arguments being terminated by 0x00,
            leaving extra 0x00 on the byte stream. A decoder must be able to handle the extra zeros.


--------------------------------------------------------------------------------

TSVM Advanced Video (TAV) Format
Created by CuriousTorvald and Claude on 2025-09-13

TAV is a next-generation video codec for TSVM utilising Discrete Wavelet Transform (DWT)
similar to JPEG2000, providing superior compression efficiency and scalability compared
to DCT-based codecs like TEV. Features include multi-resolution encoding, progressive
transmission capability, and region-of-interest coding.

# File Structure
\x1F T S V M T A V (if video), \x1F T S V M T A P (if still picture)
[HEADER]
[PACKET 0]
[PACKET 1]
[PACKET 2]
...

## Header (32 bytes)
    uint8  Magic[8]: "\x1FTSVMTAV" or "\x1FTSVMTAP"
    uint8  Version:
            Base version number:
            - 1 = YCoCg-R multi-tile uniform
            - 2 = ICtCp multi-tile uniform
            - 3 = YCoCg-R monoblock uniform
            - 4 = ICtCp monoblock uniform
            - 5 = YCoCg-R monoblock perceptual
            - 6 = ICtCp monoblock perceptual
            - 7 = YCoCg-R multi-tile perceptual
            - 8 = ICtCp multi-tile perceptual
            When motion coder is Haar, take base version number.
            When motion coder is CDF 5/3, add 8 to the base version number.
    uint16 Width: picture width in pixels. Columns count for Videotex-only file.
    uint16 Height: picture height in pixels. Rows count for Videotex-only file.
    uint8  FPS: frames per second. Use 0x00 for still pictures
    uint32 Total Frames: number of video frames
            - use 0 to denote not-finalised video stream
            - use 0xFFFFFFFF to denote still picture (.im3 file)
    uint8  Wavelet Filter Type:
            - 0 = 5/3 reversible (LGT 5/3, JPEG 2000 standard)
            - 1 = 9/7 irreversible (CDF 9/7, slight modification of JPEG 2000, default choice)
            - 2 = CDF 13/7 (experimental)
            - 16 = DD-4 (Four-point interpolating Deslauriers-Dubuc; experimental)
            - 255 = Haar (demonstration purpose only)
    uint8  Decomposition Levels: number of DWT levels (1-6+; use 0 if it has no video or Videotex only)
    uint8  Quantiser Index for Y channel (uses exponential numeric system; 0: lossless, 255: potato)
    uint8  Quantiser Index for Co channel (uses exponential numeric system; 0: lossless, 255: potato)
    uint8  Quantiser Index for Cg channel (uses exponential numeric system; 0: lossless, 255: potato)
    uint8  Extra Feature Flags
            - bit 0 = has audio (for still pictures: has background music)
            - bit 1 = has subtitle (for still pictures: has timed captions)
            - bit 2 = infinite loop (has no effect for still pictures)
            - bit 7 = has no actual packets, this file is header-only without an Intro Movie
    uint8  Video Flags
            - bit 0 = interlaced
            - bit 1 = is NTSC framerate
            - bit 2 = is lossless mode
                (shorthand for `-q 6 -Q0,0,0 -w 0 --intra-only --no-perceptual-tuning --arate 384`)
            - bit 3 = has region-of-interest coding (for still pictures only)
            - bit 4 = reserved (crop encoding?)
            - bit 7 = has no video
    uint8  Encoder quality level (stored with bias of 1 (q0=1); used to derive anisotropy value)
    uint8  Channel layout (bit-field: bit 0=has alpha, bit 1=has chroma inverted, bit 2=has luma inverted)
            * Luma-only videos must be decoded with fixed Chroma=0
            * Chroma-only videos must be decoded with fixed Luma=127
            * No-alpha videos must be decoded with fixed Alpha=255
            - 0 = Y-Co-Cg/I-Ct-Cp (000: no alpha, has chroma, has luma)
            - 1 = Y-Co-Cg-A/I-Ct-Cp-A (001: has alpha, has chroma, has luma)
            - 2 = Y/I only (010: no alpha, no chroma, has luma)
            - 3 = Y-A/I-A (011: has alpha, no chroma, has luma)
            - 4 = Co-Cg/Ct-Cp (100: no alpha, has chroma, no luma)
            - 5 = Co-Cg-A/Ct-Cp-A (101: has alpha, has chroma, no luma)
            - 6-7 = Reserved/invalid (would indicate no luma and no chroma)
    uint8  Entropy Coder
            - 0 = Twobit-plane significance map (deprecated)
            - 1 = Embedded Zero Block Coding
            - 2 = Raw coefficients (debugging purpose only)
    uint8  Encoder Preset
            - Bit 0 = use finer motion (finer temporal quantisation)
            - Bit 1 = reduce grain synthesis
            Preset "Default" -> 0x00
            Preset "Sports" -> 0x01
            Preset "Anime" -> 0x02
            NOTE: not all presets have preset flags. See Preset section for details.
    uint8  Reserved[1]: fill with zeros
    uint8  Device Orientation
            - 0 = No rotation
            - 1 = Clockwise 90 deg
            - 2 = 180 deg
            - 3 = Clockwise 270 deg
            - 4 = Mirrored, No rotation
            - 5 = Mirrored, Clockwise 90 deg
            - 6 = Mirrored, 180 deg
            - 7 = Mirrored, Clockwise 270 deg
    uint8  File Role
            - 0 = generic
            - 1 = this file is header-only, and UCF payload will be followed (used by movie file with chapters)
                   When header-only file contain video packets, they should be presented as an Intro Movie
                   before the user-interactable selector (served by the UCF payoad)

### Presets
The encoder supports following presets:
- Sports: use finer temporal quantisation, resulting in better-preserved motion. Less effective as resolution goes up
- Anime: instructs the decoder to disable grain synthensis
- D1/D1PAL: encode to TAV-DT (NTSC/PAL) format. Any non-compliant setup will be ignored and substituted to compliant values
- D1P/D1PALP: encode to TAV-DT Progressive (NTSC/PAL) format. Any non-compliant setup will be ignored and substituted to compliant values


## Packet Structure (some special packets have no payload. See Packet Types for details)
    uint8  Packet Type
    uint32 Payload Size
    *      Payload

## Packet Types
    <video packets>
    0x10: I-frame (intra-coded frame)
    0x11: P-frame (delta/skip frame)
    0x12: GOP Unified (temporal 3D DWT with unified preprocessing)
    0x1F: (prohibited)
    <audio packets>
    0x20: MP2 audio packet (32 KHz)
    0x21: Zstd-compressed 8-bit PCM (32 KHz, audio hardware's native format)
    0x22: Zstd-compressed 16-bit PCM (32 KHz, little endian)
    0x23: Zstd-compressed ADPCM (32 KHz)
    0x24: TAD (TSVM Advanced Audio)
    <subtitles>
    0x30: Subtitle in "Simple" format
    0x31: Subtitle in "Simple" format with timecodes
    0x32: Subtitle in "Karaoke" format
    0x33: Subtitle in "Karaoke" format with timecodes
    0x3F: Videotex (full-frame text buffer memory image)
    <synchronised tracks>
    0x40: MP2 audio track (32 KHz)
    0x41: Zstd-compressed 8-bit PCM (32 KHz, audio hardware's native format)
    0x42: Zstd-compressed 16-bit PCM (32 KHz, little endian)
    0x43: Zstd-compressed ADPCM (32 KHz)
    0x44: TAD (TSVM Advanced Audio)
    <multiplexed video>
    0x70..7F: Reserved for Future Version
    <Standard metadata payloads>
    (it's called "standard" because you're expected to just copy-paste the metadata bytes verbatim)
    0xE0: EXIF packet
    0xE1: ID3v1 packet
    0xE2: ID3v2 packet
    0xE3: Vorbis Comment packet
    0xE4: CD-text packet
    <Extensible>
    0x01: Vendor-specific video packets
    0x02: Vendor-specific audio frame
    0x03: Vendor-specific subtitle
    0x04: Vendor-specific audio file
    0x0E: Vendor-specific metadata
    <Special packets>
    0x00: No-op (no payload)
    0xEF: TAV Extended Header
    0xF0: Loop point start (insert right AFTER the TC packet; no payload)
    0xF1: Loop point end (insert right AFTER the TC packet; no payload)
    0xF2: Screen masking info
    0xFC: GOP Sync packet (indicates N frames decoded from GOP block)
    0xFD: Timecode (TC) Packet [for frame 0, insert at the beginning; otherwise, insert right AFTER the sync]
    0xFE: NTSC sync packet (used by player to calculate exact framerate-wise performance; no payload)
    0xFF: Sync packet (no payload)

    ### Packet Precedence

        Before the first frame group:
        1. TAV Extended header (if any)
        2. Standard metadata payloads (if any)
        3. SSF-TC/KSF-TC packets (if any)
            When time-coded subtitles are used, the entire subtitles must precede the first video frame.
            Think of it as tacking the whole subtitle file before the actual video.
        4. Screen Masking packets (if any)

        Frame group:
        1. Timecode Packet (0xFD) or Next TAV File (0x1F) [mutually exclusive!]
        2. Loop point packet (if any)
        3. Audio packets (if any)
        4. Subtitle packets (if any) [mutually exclusive with SSF-TC/KSF-TC packets]
        5. Main video packets (0x10-0x1E)
        6. Multiplexed video packets (0x70-7F; if any)

        After a frame group:
        1. Sync packet (0xFC or 0xFF)
        2. NTSC Sync packet (if required; it will instruct players to duplicate the current frame)


## TAV Extended Header Specification and Structure
    uint8  Packet Type (0xEF)
    uint16 Number of Key-Value pairs
    *      Key-Value pairs

    ### Key-Value Pair
    uint8  Key[4]
    uint8  Value Type
           - 0x00: (U)Int16
           - 0x01: (U)Int24
           - 0x02: (U)Int32
           - 0x03: (U)Int48
           - 0x04: (U)Int64
           - 0x10: Bytes
    <if Value Type is Bytes>
    uint16 Length of bytes
    *      Bytes
    <otherwise>
    type_t Value

    ### List of Keys
    - Uint64 BGNT: Video begin time in nanoseconds (must be equal to the value of the first Timecode packet)
    - Uint64 ENDT: Video end time in nanoseconds (must be equal to BGNT + playback time)
    - Uint64 CDAT: Creation time in microseconds since UNIX Epoch (must be in UTC timezone)
    - Bytes  VNDR: Name and version of the encoder (for Reference encoder: "Encoder-TAV 20251014 (list,of,features)")
    - Bytes FMPG: FFmpeg version (typically "ffmpeg version 8.0 Copyright (c) 2000-2025 the FFmpeg developers"; the first line of text FFmpeg emits)

## Extensible Packet Structure
    uint8  Packet Type
    uint8  Flags
           - 0x01: 64-bit size
    uint8  Identifier[4]
    <if 64-bit size>
    uint64 Length of the payload
    <if not>
    uint32 Length of the payload
    *      Payload

## Standard Metadata Payload Packet Structure
    uint8  Packet Type (0xE0/0xE1/0xE2/.../0xEE; see Packet Types section)
    uint32 Length of the payload
    *      Standard payload

    Notes:
    - metadata packets must precede any non-metadata packets
    - when multiple metadata packets are present (e.g. ID3v2 and Vorbis Comment both present),
      which gets precedence is implementation-dependent. ONE EXCEPTION is ID3v1 and ID3v2 where ID3v2 gets
      precedence.

## Timecode Packet Structure
    uint8  Packet Type (0xFD)
    uint64 Time since stream start in nanoseconds (this may NOT start from zero if the video is coming from a livestream)

## Screen Masking Packet Structure
    When letterbox/pillarbox detection is active, the encoder will only encode pictures in the active area.
    Decoders must use this value to derive the size of the active area for decoding, and fill the blank on playback.
    Encoders only need to insert this packets at the start of the video (if necessary) and whenever geometry change occurs.

    uint8  Packet Type (0xF2)
    uint32 Starting frame number
    uint16 Mask size top in pixels
    uint16 Mask size right in pixels
    uint16 Mask size bottom in pixels
    uint16 Mask size left in pixels

## Video Packet Structure
    uint8  Packet Type (0x10/0x11)
    uint32 Compressed Size
    *      Zstd-compressed Block Data

## TAD Packet Structure
    uint8  Packet Type (0x24)
    <header for decoding packet>
    uint16 Sample Count
    uint32 Compressed Size + 7
    <header for decoding TAD chunk>
    uint16 Sample Count
    uint8  Quantiser Bits
    uint32 Compressed Size
    *      Zstd-compressed TAD

## Videotex Packet Structure
    uint8  Packet Type (0x3F)
    uint32 Compressed Size
    *      Zstd-compressed payload, where:
        uint8  Rows
        uint8  Columns
        *      Foreground colours
        *      Background colours
        *      Characters


## GOP Unified Packet Structure (0x12)
Implemented on 2025-10-15 for temporal 3D DWT with unified preprocessing.

This packet contains multiple frames encoded as a single spacetime block for optimal
temporal compression.

    uint8  Packet Type (0x12/0x13)
    uint8  GOP Size (number of frames in this GOP)
    <if packet type is 0x13>
    uint32 Compressed Size
    *      Zstd-compressed Motion Data
    <endif>
    uint32 Compressed Size
    *      Zstd-compressed Unified Block Data

### Unified Block Data Format
The entire GOP (width×height×N_frames×3_channels) is preprocessed as a single block:

    <if significance maps are used>
    uint8  Y Significance Maps[(width*height + 7) / 8 * GOP Size]     // All Y frames concatenated
    uint8  Co Significance Maps[(width*height + 7) / 8 * GOP Size]    // All Co frames concatenated
    uint8  Cg Significance Maps[(width*height + 7) / 8 * GOP Size]    // All Cg frames concatenated
    int16  Y Non-zero Values[variable length]                          // All Y non-zero coefficients
    int16  Co Non-zero Values[variable length]                         // All Co non-zero coefficients
    int16  Cg Non-zero Values[variable length]                         // All Cg non-zero coefficients

    <if EZBC is used>
    uint32 EZBC Size for Y
    *      EZBC Structure for Y
    uint32 EZBC Size for Co
    *      EZBC Structure for Co
    uint32 EZBC Size for Cg
    *      EZBC Structure for Cg

This layout enables Zstd to find patterns across both spatial and temporal dimensions,
resulting in superior compression compared to per-frame encoding.

### Motion Vectors
- Stored in 1/4-pixel units (divide by 4.0 for pixel displacement)
- Computed using dense optical flow
- Cumulative relative to frame 0 (not frame-to-frame deltas)
- First frame (frame 0) always has motion vector (0, 0)

### Temporal 3D DWT Process
1. Detect where the scene change is happening on the first pass
2. Determine GOP slicing from the scene detection
3. Apply 1D DWT across temporal axis (GOP frames)
4. Apply 2D DWT on each spatial slice of temporal subbands
5. Perceptual quantisation with temporal-spatial awareness
6. Unified significance map preprocessing across all frames/channels
7. Single Zstd compression of entire GOP block

## GOP Sync Packet Structure (0xFC)
Indicates that N frames were decoded from a GOP Unified block.
Decoders must track this to maintain proper frame count and synchronization.

    uint8  Packet Type (0xFC)
    uint8  Frame Count (number of frames that were decoded from preceding GOP block)

Note: GOP Sync packets have no payload size field (fixed 2-byte packet).

## Block Data (per frame)
    uint8  Mode: encoding mode
            0x00 = SKIP (just use frame data from previous frame)
            0x01 = INTRA (DWT-coded)
            0x02 = DELTA (DWT delta)
            - 0x02: DWT level 1
            - 0x12: DWT level 2
            - 0x22: DWT level 3
              ...
            - 0xF2: DWT Level 16
    uint8  Quantiser override Y  (uses exponential numeric system; stored with index bias of 1 (127->252, 255->4032); use 0 to disable overriding)
    uint8  Quantiser override Co (uses exponential numeric system; stored with index bias of 1 (127->252, 255->4032); use 0 to disable overriding)
    uint8  Quantiser override Cg (uses exponential numeric system; stored with index bias of 1 (127->252, 255->4032); use 0 to disable overriding)
        - note: quantiser overrides are always present regardless of the channel layout
    *       Tile data (one compressed payload per tile)

    ### Coefficient Storage Format (Significance Map Compression)

    Starting with encoder version 2025-09-29, DWT coefficients are stored using
    significance map compression with concatenated maps layout for optimal efficiency:

    #### Concatenated Maps Format
    All channels are processed together to maximize Zstd compression:

        uint8  Y Significance Map[(coeff_count + 7) / 8]    // 1 bit per Y coefficient
        uint8  Co Significance Map[(coeff_count + 7) / 8]   // 1 bit per Co coefficient
        uint8  Cg Significance Map[(coeff_count + 7) / 8]   // 1 bit per Cg coefficient
        uint8  A Significance Map[(coeff_count + 7) / 8]    // 1 bit per A coefficient (if alpha present)
        int16  Y Non-zero Values[variable length]           // Only non-zero Y coefficients
        int16  Co Non-zero Values[variable length]          // Only non-zero Co coefficients
        int16  Cg Non-zero Values[variable length]          // Only non-zero Cg coefficients
        int16  A Non-zero Values[variable length]           // Only non-zero A coefficients (if alpha present)

    #### Significance Map Encoding
    Each significance map uses 1 bit per coefficient position:
        - Bit = 1: coefficient is non-zero, read value from corresponding Non-zero Values array
        - Bit = 0: coefficient is zero

    #### Compression Benefits
    - **Sparsity exploitation**: Typically 85-95% zeros in quantised DWT coefficients
    - **Cross-channel patterns**: Concatenated maps allow Zstd to find patterns across similar significance maps
    - **Overall improvement**: 16-18% compression improvement before Zstd compression

## DWT Implementation Details

### Wavelet Filters
- 5/3 Reversible Filter (lossless capable):
  * Analysis: Low-pass [1/2, 1, 1/2], High-pass [-1/8, -1/4, 3/4, -1/4, -1/8]
  * Synthesis: Low-pass [1/4, 1/2, 1/4], High-pass [-1/16, -1/8, 3/8, -1/8, -1/16]

- 9/7 Irreversible Filter (higher compression):
  * Analysis: CDF 9/7 coefficients optimized for image compression
  * Provides better energy compaction than 5/3 but lossy reconstruction

### Quantisation Strategy

#### Uniform Quantisation (Versions 3-4)
Traditional approach using same quantisation factor for all DWT subbands within each channel.

#### Perceptual Quantisation (Versions 5-6, Default)
TAV versions 5 and 6 implement Human Visual System (HVS) optimized quantisation with
frequency-aware subband weighting for superior visual quality:

Anisotropic quantisation is applied for both Luma and Chroma channels to preserve horizontal details.
The anisotropic quantisation is the innovative upgrade to the traditional field-interlacing and
chroma subsampling.

This perceptual approach allocates more bits to visually important low-frequency
details while aggressively quantising high-frequency noise, resulting in superior
visual quality at equivalent bitrates.

#### Grain Synthesis

The decoder must synthesise a film grain on non-LL subbands at the amplitude half of the quantisation level.
The encoder may synthesise the exact same grain in sign-reversed on encoding (but not recommended for practical reasons).

The base noise function must be triangular noise in range [-1.0, 1.0].

## Colour Space
TAV supports two colour spaces:

**YCoCg-R (Versions 3, 5):**
- Y: Luma channel (full resolution)
- Co: Orange-Cyan chroma (full resolution)
- Cg: Green-Magenta chroma (full resolution)

**ICtCp (Versions 4, 6):**
- I: Intensity (similar to luma)
- Ct: Chroma tritanopia
- Cp: Chroma protanopia

Perceptual versions (5-6) apply HVS-optimized quantisation weights per channel,
while uniform versions (3-4) use consistent quantisation across all subbands.

The encoder expects linear alpha.

## Compression Features
- Single DWT tiles vs 16x16 DCT blocks in TEV
- Multi-resolution representation enables scalable decoding
- Better frequency localisation than DCT
- Reduced blocking artifacts due to overlapping basis functions

## Hardware Acceleration Functions
TAV decoder requires new GraphicsJSR223Delegate functions:
- tavDecode(): Main DWT decoding function
- tavDWT2D(): 2D DWT/IDWT transforms
- tavQuantise(): Multi-band quantisation

## Audio Support
MP2 frames, raw PCMu8, and TAD formats are supported.

## Subtitle Support
Uses same Simple Subtitle Format (SSF) as TEV for text overlay functionality.

## NTSC Framerate handling
Unlike the TEV format, TAV encoder emits extra sync packet for every 1000th frames. Decoder can just play the video without any special treatment.

## Exponential Numeric System
This system maps [0..255] to [1..4096]

Number|Index
------+-----
1|0
2|1
3|2
4|3
5|4
6|5
7|6
8|7
9|8
10|9
11|10
12|11
13|12
14|13
15|14
16|15
17|16
18|17
19|18
20|19
21|20
22|21
23|22
24|23
25|24
26|25
27|26
28|27
29|28
30|29
31|30
32|31
33|32
34|33
35|34
36|35
37|36
38|37
39|38
40|39
41|40
42|41
43|42
44|43
45|44
46|45
47|46
48|47
49|48
50|49
51|50
52|51
53|52
54|53
55|54
56|55
57|56
58|57
59|58
60|59
61|60
62|61
63|62
64|63
66|64
68|65
70|66
72|67
74|68
76|69
78|70
80|71
82|72
84|73
86|74
88|75
90|76
92|77
94|78
96|79
98|80
100|81
102|82
104|83
106|84
108|85
110|86
112|87
114|88
116|89
118|90
120|91
122|92
124|93
126|94
128|95
132|96
136|97
140|98
144|99
148|100
152|101
156|102
160|103
164|104
168|105
172|106
176|107
180|108
184|109
188|110
192|111
196|112
200|113
204|114
208|115
212|116
216|117
220|118
224|119
228|120
232|121
236|122
240|123
244|124
248|125
252|126
256|127
264|128
272|129
280|130
288|131
296|132
304|133
312|134
320|135
328|136
336|137
344|138
352|139
360|140
368|141
376|142
384|143
392|144
400|145
408|146
416|147
424|148
432|149
440|150
448|151
456|152
464|153
472|154
480|155
488|156
496|157
504|158
512|159
528|160
544|161
560|162
576|163
592|164
608|165
624|166
640|167
656|168
672|169
688|170
704|171
720|172
736|173
752|174
768|175
784|176
800|177
816|178
832|179
848|180
864|181
880|182
896|183
912|184
928|185
944|186
960|187
976|188
992|189
1008|190
1024|191
1056|192
1088|193
1120|194
1152|195
1184|196
1216|197
1248|198
1280|199
1312|200
1344|201
1376|202
1408|203
1440|204
1472|205
1504|206
1536|207
1568|208
1600|209
1632|210
1664|211
1696|212
1728|213
1760|214
1792|215
1824|216
1856|217
1888|218
1920|219
1952|220
1984|221
2016|222
2048|223
2112|224
2176|225
2240|226
2304|227
2368|228
2432|229
2496|230
2560|231
2624|232
2688|233
2752|234
2816|235
2880|236
2944|237
3008|238
3072|239
3136|240
3200|241
3264|242
3328|243
3392|244
3456|245
3520|246
3584|247
3648|248
3712|249
3776|250
3840|251
3904|252
3968|253
4032|254
4096|255

--------------------------------------------------------------------------------

TSVM Advanced Video - Digital Tape (TAV-DT) Format
Created by CuriousTorvald on 2025-12-01

TAV-DT is an extension to TAV format that is intended as filesystem-independent packetised video stream
with easy syncing: playback can start from the arbitrary position and decoder can easily sync up to the
start of the next packet

# Video Format
    - Dimension: 720x480 for NTSC, 720x576 for PAL
    - FPS: arbitrary (defined in packet header)
    - Wavelet: 9/7 Spatial, 5/3 Temporal
    - Decomposition levels: 4 spatial, 2 temporal
    - Quantiser and encoder quality level: arbitrary (defined in packet header as quality index)
    - Extra features:
        - Audio is mandatory (TAD codec only)
        - Everything else is unsupported
    - Video flags: Interlaced/NTSC framerate (defined in packet header)
    - Channel layout: Y-Co-Cg
    - Entropy coder: EZBC
    - Encoder preset: default preset only
    - Tiles: monoblock

# Packet Structure
    uint32 Sync pattern (0xE3537A1F for NTSC Dimension, 0xD193A745 for PAL Dimension)
    uint8  Framerate
    uint8  Flags
        - bit 0 = interlaced
        - bit 1 = is NTSC framerate
        - bit 4-7 = quality index (0-5)
            * Quality indices follow TSVM encoder's
    int16  Reserved (zero-fill)
    uint32 Total packet size past header
    uint32 CRC-32 of 12-byte header
    uint64 Timecode (0xFD packet) without header byte
    *      TAD packet (full 0x24 packet)
    *      TAV packet (full 0x10 or 0x12 packet)

# How to sync to the stream
    1. Find a sync pattern
    2. Read remaining 8 bytes -> concatenate sync with what has been read
    3. Calculate CRC-32 of concatenated 12 bytes
    4. Read 4 bytes (stored CRC)
    5. Check calculated CRC against stored CRC
    6. If they match, sync to the stream; if not, find a next sync pattern


--------------------------------------------------------------------------------

TSVM Advanced Audio (TAD) Format
Created by CuriousTorvald and Claude on 2025-10-23
Updated: 2025-10-30 (fixed non-power-of-2 sample count support)

TAD is a perceptual audio codec for TSVM utilising Discrete Wavelet Transform (DWT)
with CDF 9/7 biorthogonal wavelets, providing efficient compression through M/S stereo
decorrelation, frequency-dependent quantisation, and raw int8 coefficient storage.
Designed as an includable API for integration with TAV video encoder.

When used inside of a video codec, only zstd-compressed payload is stored, chunk length
is stored separately and quality index is shared with that of the video.

# Suggested File Structure
\x1F T S V M T A D
[HEADER]
[CHUNK 0]
[CHUNK 1]
[CHUNK 2]
...

## Header (16 bytes)
    uint8  Magic[8]: "\x1FTSVMTAD"
    uint8  Version: 1
    uint8  Quality Level: 0-5 (0=lowest quality/smallest, 5=highest quality/largest)
    uint8  Flags:
            - bit 0: Zstd compression enabled (1=compressed, 0=uncompressed)
            - bits 1-7: Reserved (must be 0)
    uint32 Sample Rate: audio sample rate in Hz (always 32000 for TSVM)
    uint8  Channels: number of audio channels (always 2 for stereo)
    uint8  Reserved[2]: fill with zeros

## Audio Properties
- **Sample Rate**: 32000 Hz (TSVM audio hardware native format)
- **Channels**: 2 (stereo)
- **Input Format**: PCM32fLE (32-bit float little-endian PCM)
- **Preprocessing**: 16 Hz highpass filter applied during extraction
- **Internal Representation**: Float32 throughout encoding, PCM8 conversion only at decoder
- **Chunk Size**: Variable (1024-32768+ samples per channel, any size ≥1024 supported)
  - Default: 32768 samples (1.024 seconds at 32 kHz) for standalone files
  - TAV integration: Uses exact GOP sample count (e.g., 32016 for 1 second at 32 kHz)
  - Minimum: 1024 samples (32 ms at 32 kHz)
  - DWT levels: Fixed at 9 levels for all chunk sizes
- **Target Compression**: 2:1 against PCMu8 baseline
- **Wavelet**: CDF 9/7 biorthogonal

## Chunk Structure
Each chunk encodes a variable number of stereo samples (minimum 1024, any size supported).
Default is 32768 samples (65536 total samples, 1.024 seconds) for standalone files.
TAV integration uses exact GOP sample counts (e.g., 32016 samples for 1 second at 32 kHz).

    uint16 Sample Count: number of samples per channel (min 1024, any size ≥1024)
    uint8  Max quantisation index: this number * 2 + 1 is the total steps of quantisation
    uint32 Chunk Payload Size: size of following payload in bytes
    *      Chunk Payload: encoded M/S stereo data (Zstd compressed if flag set)

### Chunk Payload Structure (before Zstd compression)
    *      Mid Channel EZBC Data (embedded zero block coded bitstream)
    *      Side Channel EZBC Data (embedded zero block coded bitstream)

Each EZBC channel structure:
    uint8  MSB Bitplane: highest bitplane with significant coefficient
    uint16 Coefficient Count: number of coefficients in this channel
    *      Binary Tree EZBC Bitstream: significance map + refinement bits

## Encoding Pipeline

### Step 1: Pre-emphasis Filter
Input stereo PCM32fLE undergoes first-order IIR pre-emphasis filtering (α=0.5):

    H(z) = 1 - α·z⁻¹

This shifts quantisation noise toward lower frequencies where it's more maskable by
the psychoacoustic model. The filter has persistent state across chunks to prevent
discontinuities at chunk boundaries.

### Step 2: Dynamic Range Compression (Gamma Compression)
Pre-emphasised audio undergoes gamma compression for perceptual uniformity:

    encode(x) = sign(x) * |x|^γ  where γ=0.5

This compresses dynamic range before quantisation, improving perceptual quality.

### Step 3: M/S Stereo Decorrelation
Mid-Side transformation exploits stereo correlation:

    Mid = (Left + Right) / 2
    Side = (Left - Right) / 2

This typically concentrates energy in the Mid channel while the Side channel
contains mostly small values, improving compression efficiency.

### Step 4: 9-Level CDF 9/7 DWT
Each channel (Mid and Side) undergoes CDF 9/7 biorthogonal wavelet decomposition. The codec uses a fixed 9 decomposition levels for all chunk sizes:

    DWT Levels = 9 (fixed)

For 32768-sample chunks:
    - After 9 levels: 64 LL coefficients
    - Frequency subbands: LL + 9 H bands (L9 to L1)

For 32016-sample chunks (TAV 1-second GOP):
    - After 9 levels: 63 LL coefficients
    - Supports non-power-of-2 sizes through proper length tracking (fixed 2025-10-30)

Sideband boundaries are calculated dynamically:
    first_band_size = chunk_size >> dwt_levels
    sideband[0] = 0
    sideband[1] = first_band_size
    sideband[i+1] = sideband[i] + (first_band_size << (i-1))

CDF 9/7 lifting coefficients:
    α = -1.586134342
    β = -0.052980118
    γ = 0.882911076
    δ = 0.443506852
    K = 1.230174105

### Step 5: Frequency-Dependent Quantisation with Lambda Companding
DWT coefficients are quantized using:
1. **Lambda companding**: Maps normalised coefficients through Laplacian CDF with λ=6.0
2. **Perceptually-tuned weights**: Channel-specific (Mid/Side) frequency-dependent scaling
3. **Final quantisation**: base_weight[channel][subband] * quality_scale

The lambda companding provides perceptually uniform quantisation, allocating more bits
to perceptually important coefficient magnitudes.

Channel-specific base quantisation weights:
    Mid (0):  [4.0, 2.0, 1.8, 1.6, 1.4, 1.2, 1.0, 1.0, 1.3, 2.0]
    Side (1): [6.0, 5.0, 2.6, 2.4, 1.8, 1.3, 1.0, 1.0, 1.6, 3.2]

Output: Quantized int8 coefficients in range [-max_index, +max_index]

### Step 6: EZBC Encoding (Embedded Zero Block Coding)
Quantized int8 coefficients are compressed using binary tree EZBC, a 1D variant of
the embedded zero-block coding.

**EZBC Algorithm**:
1. Find MSB bitplane (highest bit position with significant coefficient)
2. Initialise root block covering all coefficients as insignificant
3. For each bitplane from MSB to LSB:
   - **Insignificant Pass**: Test each insignificant block for significance
     - If still zero at this bitplane: emit 0 bit, keep in insignificant queue
     - If becomes significant: emit 1 bit, recursively subdivide using binary tree
   - **Refinement Pass**: For already-significant coefficients, emit next bit
4. Binary tree subdivision continues until blocks of size 1 (single coefficients)
5. When coefficient becomes significant: emit sign bit and reconstruct value

**EZBC Output Structure** (per channel):
    uint8  MSB Bitplane (8 bits)
    uint16 Coefficient Count (16 bits)
    *      Bitstream: [significance_bits][sign_bits][refinement_bits]

**Compression Benefits**:
- Exploits coefficient sparsity through significance testing
- Progressive refinement enables quality scalability
- Binary tree exploits spatial clustering of significant coefficients
- Typical sparsity: 86.9% zeros (Mid), 97.8% zeros (Side)

### Step 7: Concatenation and Zstd Compression
The Mid and Side EZBC bitstreams are concatenated:
    Payload = [Mid_EZBC_data][Side_EZBC_data]

Then compressed using Zstd level 7 for additional compression without significant
CPU overhead. Zstd exploits redundancy in the concatenated bitstreams.

## Decoding Pipeline

### Step 1: Chunk Extraction and Decompression
Read chunk header (sample_count, max_index, payload_size).
If compressed (default), decompress payload using Zstd.

### Step 2: EZBC Decoding
Decode Mid and Side channels from concatenated EZBC bitstreams using binary tree
embedded zero block decoder:

For each channel:
1. Read EZBC header: MSB bitplane (8 bits), coefficient count (16 bits)
2. Initialise root block as insignificant, track coefficient states
3. Process bitplanes from MSB to LSB:
   - **Insignificant Pass**: Read significance bits, recursively decode significant blocks
   - **Refinement Pass**: Read refinement bits for already-significant coefficients
4. Reconstruct quantized int8 coefficients from bitplane representation

Output: Quantized int8 coefficients for Mid and Side channels

### Step 3: Dequantisation with Lambda Decompanding
Convert quantized int8 values back to float coefficients using:
    1. Lambda decompanding (inverse of Laplacian CDF compression)
    2. Multiply by frequency-dependent quantisation steps
    3. [Optional] Apply coefficient-domain dithering (TPDF, ~-60 dBFS)

### Step 4: 9-Level Inverse CDF 9/7 DWT
Reconstruct Float32 audio from DWT coefficients using inverse CDF 9/7 transform.

**Critical Implementation (Fixed 2025-10-30)**:
The multi-level inverse DWT must use the EXACT sequence of lengths from forward
transform, in reverse order. Using simple doubling (length *= 2) is INCORRECT
for non-power-of-2 sizes.

Correct approach:
    1. Pre-calculate all forward transform lengths:
       lengths[0] = chunk_size
       lengths[i] = (lengths[i-1] + 1) / 2  for i=1..9
    2. Apply inverse DWT in reverse order:
       for level from 8 down to 0:
           apply inverse_dwt(data, lengths[level])

This ensures correct reconstruction for all chunk sizes including non-power-of-2
values (e.g., 32016 samples for TAV 1-second GOPs).

### Step 5: M/S to L/R Conversion
Convert Mid/Side back to Left/Right stereo:

    Left = Mid + Side
    Right = Mid - Side

### Step 6: Gamma Expansion
Expand dynamic range (inverse of encoder's gamma compression):

    decode(y) = sign(y) * |y|^(1/γ)  where γ=0.5, so 1/γ=2.0

### Step 7: De-emphasis Filter
Apply de-emphasis filter to reverse the pre-emphasis (α=0.5):

    H(z) = 1 / (1 - α·z⁻¹)

This is a first-order IIR filter with persistent state across chunks to prevent
discontinuities at chunk boundaries. The de-emphasis must be applied AFTER gamma
expansion but BEFORE PCM8 conversion to correctly reconstruct the original audio.

### Step 8: PCM32f to PCM8 Conversion with Noise-Shaped Dithering
Convert Float32 samples to unsigned PCM8 (PCMu8) using second-order error-diffusion
dithering with reduced amplitude (0.2× TPDF) to coordinate with coefficient-domain
dithering.

## Compression Performance
- **Target Ratio**: 2:1 against PCMu8
- **Achieved Ratio**: 2.51:1 against PCMu8 at quality level 3
- **Quality**: Perceptually transparent at Q3+, preserves full 0-16 KHz bandwidth
- **Sparsity**: 86.9% zeros in Mid channel, 97.8% in Side channel (typical)

## Integration with TAV Encoder
TAD is designed as an includable API for TAV video encoder integration.
The encoder can be invoked programmatically to compress audio tracks:

    #include "tad_encoder.h"

    size_t encoded_size = tad_encode_from_file(
        input_audio_path,
        output_tad_path,
        quality_level,
        use_zstd,
        verbose
    );

This allows TAV video files to embed TAD-compressed audio using packet type 0x24.

## Audio Extraction Command
TAD encoder uses two-pass FFmpeg extraction for optimal quality:

    # Pass 1: Extract at original sample rate
    ffmpeg -i input.mp4 -f f32le -ac 2 temp.pcm

    # Pass 2: High-quality resample with SoXR and highpass filter
    ffmpeg -f f32le -ar {original_rate} -ac 2 -i temp.pcm \
           -ar 32000 -af "aresample=resampler=soxr:precision=28:cutoff=0.99,highpass=f=16" \
           output.pcm

This ensures resampling happens after extraction with optimal quality parameters.

## Hardware Acceleration API
TAD decoder is accelerated through AudioAdapter.kt peripheral (backend) and
AudioJSR223Delegate.kt (JavaScript API):

Backend (AudioAdapter.kt):
- decodeTad(): Main decoding function (chunk-based, reads from tadInputBin)
- dwt97Inverse1d(): Single-level inverse CDF 9/7 DWT
- dwt97InverseMultilevel(): 9-level inverse DWT with non-power-of-2 support

JavaScript API (audio.* functions):
- audio.tadDecode(): Trigger TAD decoding from peripheral input buffer
- audio.tadUploadDecoded(offset, count): Upload decoded PCMu8 to playback buffer
- audio.getMemAddr(): Get peripheral memory base address for buffer access

## Usage Examples
    # Encode with default quality (Q3)
    tad_encoder -i input.mp4 -o output.tad

    # Encode with highest quality
    tad_encoder -i input.mp4 -o output.tad -q 5

    # Encode without Zstd compression
    tad_encoder -i input.mp4 -o output.tad --no-zstd

    # Verbose output with statistics
    tad_encoder -i input.mp4 -o output.tad -v


--------------------------------------------------------------------------------

TSVM Universal Cue format
Created by CuriousTorvald on 2025-09-22

A universal, simple cue designed to work as both playlist to cue up external files and lookup table for internal bytes.

# File Structure
\x1F T S V M U C F
[HEADER]
[CUE ELEMENT 0]
[CUE ELEMENT 1]
[CUE ELEMENT 2]
...

## Header (16 bytes)
    uint8  Magic[8]: "\x1FTSVMUCF"
    uint8  Version: 1
    uint16 Number of cue elements
    uint32 (Optional) Size of the cue file, useful for allocating fixed length for future expansion; 0 when not used
    unit8  Reserved

## Cue Element
    uint8  Addressing Mode (low nybble) and Role Flags (high nybble)
            - 0x01: External
            - 0x02: Internal
            - 0x10: Intended for machine interaction (GOP indices, frame indices, etc.)
            - 0x20: Intended for human interaction (playlist, chapter markers, etc.)
            - 0x30: Intended for both machine and human interaction
            Role flags must be unset to assign no roles
    uint16 String Length for name
    *       Name of the element in UTF-8

    <if external addressing mode>
    uint16 String Length for relative path
    *       Relative path

    <if internal addressing mode>
    uint48 Offset to the file

--------------------------------------------------------------------------------

Sound Adapter

Endianness: little


Memory Space

0..114687 RW: Sample bin
114688..131071 RW: Instrument bin (256 instruments, 64 bytes each)
131072..196607 RW: Play data 1
196608..262143 RW: Play data 2
262144..327679 RW: TAD Input Buffer
327680..393215 RW: TAD Decode Output

Sample bin: just raw sample data thrown in there. You need to keep track of starting point for each sample

Instrument bin: Registry for 256 instruments, formatted as:
    Uint16 Sample Pointer
    Uint16 Sample length
    Uint16 Sampling rate at C3
    Uint16 Play Start (usually 0 but not always)
    Uint16 Loop Start (can be smaller than Play Start)
    Uint16 Loop End
     Bit32 Flags
        0b h000 00pp
            h: sample pointer high bit
           pp: loop mode. 0-no loop, 1-loop, 2-backandforth, 3-oneshot (ignores note length unless overridden by other notes)
  Bit16x24 Volume envelopes
       Byte 1: Volume
       Byte 2: Second offset from the prev point, in 3.5 Unsigned Minifloat

Play Data: play data are series of tracker-like instructions, visualised as:

rr||NOTE|Ins|E.Vol|E.Pan|EE.ff|
63||FFFF|255|3+ 64|3+ 64|16 FF| (8 bytes per line, 512 bytes per pattern, 256 patterns on 128 kB block)

notes are tuned as 4096 Tone-Equal Temperament. Tuning is set per-sample using their Sampling rate value.


Sound Adapter MMIO

0..1 RW:  Play head #1 position
2..3 RW:  Play head #1 length param
4 RW:     Play head #1 master volume
5 RW:     Play head #1 master pan
6..9 RW:  Play head #1 flags

10..11 RW:Play head #2 position
12..13 RW:Play head #2 length param
14 RW:    Play head #2 master volume
15 RW:    Play head #2 master pan
16..19 RW:Play head #2 flags

... auto-fill to Play head #4

40 WO: MP2 Decoder Control
    Write 16 to initialise the MP2 context (call this before the decoding of NEW music)
    Write 1 to decode the frame as MP2

    When called with byte 17, initialisation will precede before the decoding

41 RO: Media Decoder Status
    Non-zero value indicates the decoder is busy

42 WO: TAD Decoder Control
    Write 1 to decode TAD data
43 RW: TAD Quality
    Must be set to appropriate value before decoding

64..2367 RW: MP2 Decoded Samples (unsigned 8-bit stereo)
2368..4095 RW: MP2 Frame to be decoded
4096..4097 RO: MP2 Frame guard bytes; always return 0 on read

Sound Hardware Info
    - Sampling rate: 32000 Hz
    - Bit depth: 8 bits/sample, unsigned
    - Always operate in stereo (mono samples must be expanded to stereo before uploading)

Play Head Position
    - Tracker mode: Cuesheet Counter
    - PCM mode: Number of buffers uploaded and received by the adapter

Length Param
    PCM Mode: length of the samples to upload to the speaker
    Tracker mode: unused

Play Head Flags
    Byte 1
        - 0b mrqp ssss
          m: mode (0 for Tracker, 1 for PCM)
          r: reset parameters; always 0 when read
            resetting will:
                set position to 0,
                set length param to 0,
                set queue capacity to 8 samples,
                unset play bit
          q: purge queues (likely do nothing if not PCM); always 0 when read
          p: play (0 if not -- mute all output)

          ssss: PCM Mode set PCM Queue Size
            0 - 4 samples
            1 - 6 samples
            2 - 8 samples  (the default size)
            3 - 12 samples
            4 - 16 samples
            5 - 24 samples
            6 - 32 samples
            7 - 48 samples
            8 - 64 samples
            9 - 96 samples
           10 - 128 samples
           11 - 192 samples
           12 - 256 samples
           13 - 384 samples
           14 - 512 samples
           15 - 768 samples

          NOTE: changing from PCM mode to Tracker mode or vice versa will also reset the parameters as described above
    Byte 2
        - PCM Mode: Write non-zero value to start uploading; always 0 when read

    Byte 3 (Tracker Mode)
        - BPM (24 to 280. Play Data will change this register)
    Byte 4 (Tracker Mode)
        - Tick Rate (Play Data will change this register)

    Uploaded PCM data will be stored onto the queue before being consumed by hardware.
    If the queue is full, any more uploads will be silently discarded.


32768..65535 RW: Cue Sheet (2048 cues)
    Byte 1..15: pattern number for voice 1..15
    Byte 16: instruction
        1 xxxxxxx - Go back (128, 1-127) patterns to form a loop
        01 xxxxxx -
        001 xxxxx -
        0001 xxxx - Skip (16, 1-15) patterns
        00001 xxx -
        000001 xx -
        0000001 x -
        0000000 1 -
        0000000 0 - No operation

65536..131071 RW: PCM Sample buffer

Table of 3.5 Minifloat values (CSV)
     ,000,001,010,011,100,101,110,111,MSB
00000,0,1,2,4,8,16,32,64
00001,0.03125,1.03125,2.0625,4.125,8.25,16.5,33,66
00010,0.0625,1.0625,2.125,4.25,8.5,17,34,68
00011,0.09375,1.09375,2.1875,4.375,8.75,17.5,35,70
00100,0.125,1.125,2.25,4.5,9,18,36,72
00101,0.15625,1.15625,2.3125,4.625,9.25,18.5,37,74
00110,0.1875,1.1875,2.375,4.75,9.5,19,38,76
00111,0.21875,1.21875,2.4375,4.875,9.75,19.5,39,78
01000,0.25,1.25,2.5,5,10,20,40,80
01001,0.28125,1.28125,2.5625,5.125,10.25,20.5,41,82
01010,0.3125,1.3125,2.625,5.25,10.5,21,42,84
01011,0.34375,1.34375,2.6875,5.375,10.75,21.5,43,86
01100,0.375,1.375,2.75,5.5,11,22,44,88
01101,0.40625,1.40625,2.8125,5.625,11.25,22.5,45,90
01110,0.4375,1.4375,2.875,5.75,11.5,23,46,92
01111,0.46875,1.46875,2.9375,5.875,11.75,23.5,47,94
10000,0.5,1.5,3,6,12,24,48,96
10001,0.53125,1.53125,3.0625,6.125,12.25,24.5,49,98
10010,0.5625,1.5625,3.125,6.25,12.5,25,50,100
10011,0.59375,1.59375,3.1875,6.375,12.75,25.5,51,102
10100,0.625,1.625,3.25,6.5,13,26,52,104
10101,0.65625,1.65625,3.3125,6.625,13.25,26.5,53,106
10110,0.6875,1.6875,3.375,6.75,13.5,27,54,108
10111,0.71875,1.71875,3.4375,6.875,13.75,27.5,55,110
11000,0.75,1.75,3.5,7,14,28,56,112
11001,0.78125,1.78125,3.5625,7.125,14.25,28.5,57,114
11010,0.8125,1.8125,3.625,7.25,14.5,29,58,116
11011,0.84375,1.84375,3.6875,7.375,14.75,29.5,59,118
11100,0.875,1.875,3.75,7.5,15,30,60,120
11101,0.90625,1.90625,3.8125,7.625,15.25,30.5,61,122
11110,0.9375,1.9375,3.875,7.75,15.5,31,62,124
11111,0.96875,1.96875,3.9375,7.875,15.75,31.5,63,126
LSB


--------------------------------------------------------------------------------

RomBank / RamBank

Endianness: Little

MMIO

0 RW : Bank number for the first 512 kbytes
1 RW : Bank number for the last 512 kbytes
16..23 RW : DMA Control for Lane 1..8
    Write 0x01: copy from Core to Peripheral
    Write 0x02: copy from Peripheral to Core
    * NOTE: after the transfer, the bank numbers will revert to the value that was before the operation
24..31 RW : DMA Control reserved
32..34 RW : DMA Lane 1 -- Addr on the Core Memory
35..37 RW : DMA Lane 1 -- Addr on the Peripheral's Memory (addr can be across-the-bank)
38..40 RW : DMA Lane 1 -- Transfer Length
41..42 RW : DMA Lane 1 -- First/Last Bank Number
43 RW : DMA Lane 1 -- (reserved)
44..55 RW : DMA Lane 2 Props
56..67 RW : DMA Lane 3 Props
68..79 RW : DMA Lane 4 Props
80..91 RW : DMA Lane 5 Props
92..103 RW : DMA Lane 6 Props
104..115 RW : DMA Lane 7 Props
116..127 RW : DMA Lane 8 Props

--------------------------------------------------------------------------------

High Speed Disk Peripheral Adapter (HSDPA)

An interface card to read and write to a single large disk sequentially which has no filesystem on it.

Endianness: Little

MMIO

0..2 RW: Block transfer status for Disk 1
    0b nnnn nnnn, nnnn nnnn , a00z mmmm

    n-read: size of the block from the other device, LSB (1048576-full block size is zero)
    m-read: size of the block from the other device, MSB (1048576-full block size is zero)
    a-read: if the other device hasNext (doYouHaveNext), false if device not present
    z-read: set if the size is actually 0 instead of 1048576 (overrides n and m parameters)

    n-write: size of the block I'm sending, LSB (1048576-full block size is zero)
    m-write: size of the block I'm sending, MSB (1048576-full block size is zero)
    a-write: if there's more to send (hasNext)
    z-write: set if the size is actually 0 instead of 1048576 (overrides n and m parameters)
3..5 RW: Block transfer status for Disk 2
6..8 RW: Block transfer status for Disk 3
9..11 RW: Block transfer status for Disk 4
12..15 RW: Block transfer control for Disk 1 through 4
    0b 0000 abcd

    a: 1 for send, 0 for receive

    b-write: 1 to start sending if a-bit is set; if a-bit is unset, make other device to start sending
    b-read: if this bit is set, you're currently receiving something (aka busy)

    c-write: I'm ready to receive
    c-read: Are you ready to receive?

    d-read: Are you there? (if the other device's recipient is myself)

    NOTE: not ready AND not busy (bits b and d set when read) means the device is not connected to the port
16..19 RW: 8-bit status code for the disk
20 RW: Currently active disk (0: deselect all disk, 1: select disk #1, ...)

    Selecting a disk will automatically unset and hold down "I'm ready to receive" flags of the other disks,
    however, the target disk will NOT have its "I'm ready to receive" flag automatically set.

-- SEQUENTIAL IO SUPPORT MODULE --

NOTE: Sequential I/O will clobber the peripheral memory space.

256..257 RW: Sequential I/O control flags

258 RW: Opcode. Writing a value to this memory will execute the operation
    0x00 - No operation
    0x01 - Skip (arg 1) bytes
    0x02 - Read (arg 1) bytes and store to core memory pointer (arg 2)
    0x03 - Write (arg 1) bytes using data from the core memory from pointer (arg 2)
    0xF0 - Rewind the file to the starting point
    0xFF - Terminate sequential I/O session and free up the memory space
259..261 RW: Argument #1
262..264 RW: Argument #2
265..267 RW: Argument #3
268..270 RW: Argument #4


Memory Space

0..1048575 RW: Buffer for the block transfer lane
    IMPLEMENTATION RECOMMENDATION: split the memory space into two 512K blocks, and when the sequential
    reading reaches the second space, prepare the next bytes in the first memory space, so that the read
    cursor reaches 1048576, it wraps into 0 and continue reading the content of the disk as if nothing happend.