more TAV/TAD documentation update

2026-03-07 19:51:51 +09:00 · 2025-11-11 21:08:48 +09:00
parent 6add391d07
commit 9247471bf2
3 changed files with 613 additions and 3 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -173,10 +173,10 @@ Peripheral memories can be accessed using `vm.peek()` and `vm.poke()` functions,
  - **Multiple Wavelet Types**: 5/3 reversible, 9/7 irreversible, CDF 13/7, DD-4, Haar
  - **Single-tile encoding**: One large DWT tile for optimal quality (no blocking artifacts)
  - **Perceptual quantisation**: HVS-optimized coefficient scaling
-  - **YCoCg-R color space**: Efficient chroma representation with "simulated" subsampling using anisotropic quantisation (search for "ANISOTROPY_MULT_CHROMA" on the encoder)
+  - **YCoCg-R colour space**: Efficient chroma representation with "simulated" subsampling using anisotropic quantisation (search for "ANISOTROPY_MULT_CHROMA" on the encoder)
  - **6-level DWT decomposition**: Deep frequency analysis for better compression (deeper levels possible but 6 is the maximum for the default TSVM size)
  - **Significance Map Compression**: Improved coefficient storage format exploiting sparsity for 16-18% additional compression (2025-09-29 update)
-  - **Concatenated Maps Layout**: Cross-channel compression optimization for additional 1.6% improvement (2025-09-29 enhanced)
+  - **Concatenated Maps Layout**: Cross-channel compression optimisation for additional 1.6% improvement (2025-09-29 enhanced)
 - **Usage Examples**:
  ```bash
  # Different wavelets
@@ -259,7 +259,7 @@ Concatenated Maps Layout:
 - Significance map: 1 bit per coefficient (0=zero, 1=non-zero)
 - Value arrays: Only non-zero coefficients in sequence per channel
- Cross-channel optimization: Zstd finds patterns across similar significance maps
+- Cross-channel optimisation: Zstd finds patterns across similar significance maps
 - Result: 16-18% compression improvement + 1.6% additional from concatenation
 ```
--- a/video_encoder/TAD_README.md
+++ b/video_encoder/TAD_README.md
@@ -0,0 +1,350 @@
 # TAD - TSVM Advanced Audio Codec
 A perceptually-optimised wavelet-based audio codec designed for resource-constrained systems, featuring CDF 9/7 wavelets, EZBC sparse coding, and sophisticated perceptual quantisation.
 ## Overview
 TAD (TSVM Advanced Audio) is a modern audio codec built on discrete wavelet transform (DWT) using Cohen-Daubechies-Feauveau (CDF) 9/7 biorthogonal wavelets. It combines perceptual quantisation, advanced entropy coding, and careful optimisation for resource-constrained systems.
 ### Key Advantages
 - **Perceptual optimisation**: HVS-aware quantisation preserves audio quality where it matters
 - **Efficient sparse coding**: EZBC encoding exploits coefficient sparsity (86.9% zeros in typical content)
 - **Variable chunk sizes**: Supports any chunk size ≥1024 samples, including non-power-of-2
 - **Stereo decorrelation**: Mid/Side encoding exploits stereo correlation for better compression
 - **Hardware-friendly**: Designed for efficient decoding on resource-constrained platforms
 ## Features
 ### Compression Technology
 - **CDF 9/7 Biorthogonal Wavelets**
  - 9-level fixed decomposition for all chunk sizes
  - Lifting scheme implementation for efficient computation
  - Optimal frequency discrimination for audio signals
 - **Pre-processing**
  - First-order IIR pre-emphasis filter (α=0.5) shifts quantisation noise to lower frequencies, where they are less objectionable to listeners
  - Gamma compression (γ=0.5) for dynamic range compression before quantisation
  - Mid/Side stereo transformation exploits stereo correlation
  - Lambda companding (λ=6.0) with Laplacian CDF mapping for full bit utilisation
 - **Perceptual Quantisation**
  - Channel-specific (Mid/Side) frequency-dependent weights
  - Subband-aware quantisation preserves perceptually important frequencies
 - **EZBC Encoding**
  - Binary tree embedded zero block coding
  - Exploits coefficient sparsity (86.9% Mid, 97.8% Side typical)
  - Progressive refinement structure
  - Spatial clustering of non-zero coefficients
 - **Entropy Coding**
  - Zstandard compression (level 7) on concatenated EZBC bitstreams
  - Cross-channel compression optimisation
  - Optional Zstd bypass for debugging
 ### Audio Format
 - **Sample Rate**: 32 KHz (TSVM audio hardware native format)
 - **Channels**: Stereo (L/R input, Mid/Side internal representation)
 - **Chunk Sizes**: Variable, any size ≥1024 samples (including non-power-of-2)
 - **Bit Depth**: 32-bit float internal, 8-bit unsigned PCM output with noise-shaped dithering
 - **Bandwidth**: Full 0-16 KHz frequency range preserved
 ### Quality Levels
 Six quality levels (0-5) provide a wide range of compression/quality trade-offs:
 - **Level 0**: Lowest quality, smallest file size
 - **Level 3**: Default, balanced quality/compression (2.51:1 vs PCMu8)
 - **Level 5**: Highest quality, largest file size
 Quality levels are designed to be synchronised with TAV video codec for unified encoding.
 ## Building
 ### Prerequisites
 - C compiler (GCC/Clang)
 - Zstandard library (libzstd)
 - Math library (libm)
 ### Compilation
 ```bash
 # Build TAD encoder/decoder
 make tad
 # Build all tools
 make all
 # Clean build artifacts
 make clean
 ```
 ### Build Targets
 - `encoder_tad` - Standalone audio encoder with FFmpeg calls
 - `decoder_tad` - Standalone audio decoder
 ## Usage
 ### Basic Encoding
 Encoding requires FFmpeg executable installed in your system.
 ```bash
 # Default encoding (quality level 3)
 ./encoder_tad -i input.mp3 -o output.tad
 # Specify quality level (0-5)
 ./encoder_tad -i input.m4a -o output.tad -q 0    # Lowest quality
 ./encoder_tad -i input.ogg -o output.tad -q 5    # Highest quality
 # Disable Zstd compression (for debugging)
 ./encoder_tad -i input.opus -o output.tad --no-zstd
 # Verbose output with statistics
 ./encoder_tad -i input.flac -o output.tad -v
 ```
 ### Decoding
 ```bash
 # Decode to PCMu8
 ./decoder_tad -i input.tad -o output.pcm --raw-pcm
 # Decode to WAV
 ./decoder_tad -i input.tad -o output.wav
 ```
 ### Input Formats
 TAD encoder accepts any audio format supported by FFmpeg:
 - Audio files: WAV, MP3, FLAC, OGG, AAC, etc.
 - Video files with audio streams: MP4, MKV, AVI, etc.
 - Raw PCM formats
 Audio is automatically resampled to 32 KHz stereo if necessary.
 ## Technical Architecture
 ### Encoder Pipeline
 1. **Input Processing**
   - FFmpeg demuxing and audio stream extraction
   - Resampling to 32 KHz stereo
   - Conversion to PCM32f
 2. **Pre-emphasis Filter**
   - First-order IIR filter with α=0.5
   - Shifts quantisation noise toward lower frequencies
   - Improves perceptual quality
 3. **Gamma Compression**
   - Dynamic range compression with γ=0.5
   - Applied independently to each sample
   - Reduces quantisation error for low-amplitude signals
 4. **Stereo Decorrelation**
   - Left/Right to Mid/Side transformation
   - Mid = (L + R) / 2
   - Side = (L - R) / 2
   - Exploits stereo correlation for better compression
 5. **9-Level CDF 9/7 DWT**
   - Fixed 9 decomposition levels for all chunk sizes
   - Forward lifting scheme implementation
   - Correct length tracking for non-power-of-2 sizes
 6. **Perceptual Quantisation**
   - Channel-specific (Mid/Side) subband weights
   - Lambda companding with λ=6.0
   - Laplacian CDF mapping: `sign(x) * floor(λ * log(1 + |x|/λ))`
   - Quantised to int8 coefficients
 7. **EZBC Encoding**
   - Binary tree structure per channel
   - Progressive refinement by bitplanes
   - Zero block coding exploits sparsity
   - Independent bitstreams for Mid and Side
 8. **Zstd Compression**
   - Level 7 compression on concatenated `[Mid_bitstream][Side_bitstream]`
   - Cross-channel optimisation opportunities
   - Adaptive compression based on content
 ### Decoder Pipeline
 1. **Container Parsing**
   - TAD packet identification (type 0x24)
   - Chunk size extraction
   - Compressed data boundaries
 2. **Zstd Decompression**
   - Decompress concatenated bitstreams
   - Split into Mid and Side EZBC streams
 3. **EZBC Decoding**
   - Binary tree decoder per channel
   - Reconstruct quantised int8 coefficients
   - Progressive refinement reconstruction
 4. **Lambda Decompanding**
   - Inverse Laplacian CDF with channel-specific weights
   - Reconstruct float32 DWT coefficients
   - Apply subband-specific perceptual weights
 5. **9-Level Inverse CDF 9/7 DWT**
   - Inverse lifting scheme implementation
   - Correct length tracking for non-power-of-2 chunk sizes
   - Pre-calculated length sequence from forward transform
 6. **Mid/Side to Left/Right**
   - L = Mid + Side
   - R = Mid - Side
   - Reconstruct stereo channels
 7. **Gamma Expansion**
   - Inverse gamma with γ⁻¹=2.0
   - Restore original dynamic range
 8. **De-emphasis Filter**
   - Reverse pre-emphasis with α=0.5
   - Remove frequency shaping
   - Restore flat frequency response
 9. **PCM32f to PCM8u Conversion**
   - Noise-shaped dithering for 8-bit output
   - Clamping to valid range
   - Final output format
 ### Wavelet Implementation
 CDF 9/7 wavelet follows a **two-stage lifting scheme**:
 ```c
 // Forward Transform: Predict → Update
 // Predict step (generate high-pass)
 temp[half + i] = data[odd] - α * (data[even_left] + data[even_right]);
 // Update step (generate low-pass)
 temp[i] = data[even] + β * (temp[half + i - 1] + temp[half + i]);
 // Normalization (K factor)
 temp[i] *= K;
 temp[half + i] /= K;
 // Inverse Transform: Denormalize → Undo Update → Undo Predict (reversed order)
 temp[i] /= K;
 temp[half + i] *= K;
 temp[i] -= β * (temp[half + i - 1] + temp[half + i]);
 data[odd] = temp[half + i] + α * (temp[i] + temp[i + 1]);
 data[even] = temp[i];
 ```
 **CDF 9/7 Coefficients**:
 - α = -1.586134342
 - β = -0.052980118
 - γ = +0.882911075
 - δ = +0.443506852
 - K = 1.230174105
 ### Non-Power-of-2 Chunk Size Handling
 Critical implementation detail for variable chunk sizes:
 ```c
 // Pre-calculate exact length sequence from forward transform
 int lengths[MAX_LEVELS + 1];
 lengths[0] = chunk_size;
 for (int i = 1; i <= levels; i++) {
    lengths[i] = (lengths[i - 1] + 1) / 2;
 }
 // Apply inverse DWT using lengths[level] for each level
 // NEVER use simple doubling (length *= 2) - incorrect for non-power-of-2!
 ```
 Incorrect length tracking causes mirrored subband artefacts in decoded audio.
 ### Perceptual Quantisation Weights
 Channel-specific weights for Mid (channel 0) and Side (channel 1):
 ```c
 // Base quantiser weights per subband (9 levels + approximation)
 float BASE_QUANTISER_WEIGHTS[2][10] = {
    // Mid channel (0)
    {4.0f, 2.0f, 1.8f, 1.6f, 1.4f, 1.2f, 1.0f, 1.0f, 1.3f, 2.0f},
    // Side channel (1)
    {6.0f, 5.0f, 2.6f, 2.4f, 1.8f, 1.3f, 1.0f, 1.0f, 1.6f, 3.2f}
 };
 // During dequantisation:
 float weight = BASE_QUANTISER_WEIGHTS[channel][subband] * quantiser_scale;
 coeffs[i] = normalised_val * TAD32_COEFF_SCALARS[subband] * weight;
 ```
 Different weights for Mid and Side channels reflect perceptual importance of frequency bands in each channel. DC frequency has highest weight (4.0 Mid, 6.0 Side) due to energy concentration.
 ## Performance Characteristics
 ### Compression Efficiency
 - **Target Compression**: 2:1 against PCMu8 baseline (4:1 against PCM16LE input)
 - **Achieved Compression**: 2.51:1 against PCMu8 at quality level 3
 - **Audio Quality**: Preserves full 0-16 KHz bandwidth
 - **Coefficient Sparsity**: 86.9% zeros in Mid channel, 97.8% in Side channel (typical)
 - **EZBC Benefits**: Exploits sparsity, progressive refinement, spatial clustering
 ### Computational Complexity
 - **Encoding**: O(n log n) per chunk for DWT, O(n) for EZBC encoding
 - **Decoding**: O(n log n) per chunk for inverse DWT, O(n) for EZBC decoding
 - **Memory**: O(n) working memory for chunk processing
 ### Quality Characteristics
 - **Frequency Response**: Flat 0-16 KHz within perceptual limits
 - **Dynamic Range**: Preserved through gamma compression/expansion
 - **Stereo Imaging**: Maintained through Mid/Side decorrelation
 - **Perceptual Quality**: Optimised for human auditory system characteristics
 ## Integration with TAV
 TAD is designed as an includable API for TAV video encoder integration:
 - **Variable Chunk Sizes**: Audio chunks can match video GOP boundaries (e.g., 32016 samples for 1-second TAV GOP)
 - **Unified Quality Levels**: TAD quality 0-5 synchronised with TAV quality 0-5
 - **Embedded Packets**: TAV embeds TAD-compressed audio using packet type 0x24
 - **Shared Container**: Single .tav file contains both video and audio streams
 ### TAV Integration Example
 ```c
 // TAD handles non-power-of-2 chunk size correctly
 tad_encode_chunk(audio_buffer, audio_samples_per_gop, output_buffer, &output_size);
 // TAV embeds TAD packet
 tav_write_packet(TAV_PACKET_AUDIO, output_buffer, output_size);
 ```
 ## Format Specification
 For complete packet structure and bitstream format details, refer to `format documentation.txt`.
 ### Key Packet Types
 - `0x24`: TAD audio packet (used in standalone .tad files and embedded in .tav files)
 ## Related Projects
 - **TAV** (TSVM Advanced Video): Wavelet-based video codec with integrated TAD audio
 - **TSVM**: Target virtual machine platform for TAD playback
 ## Licence
 MIT.
--- a/video_encoder/TAV_README.md
+++ b/video_encoder/TAV_README.md
@@ -0,0 +1,260 @@
 # TAV - TSVM Advanced Video Codec
 A perceptually-optimised wavelet-based video codec designed for resource-constrained systems, featuring multiple wavelet types, temporal 3D DWT, and sophisticated compression techniques.
 ## Overview
 TAV (TSVM Advanced Video) is a modern video codec built on discrete wavelet transformation (DWT). It combines cutting-edge compression techniques with careful optimisation for resource-constrained systems.
 ### Key Advantages
 - **No blocking artefacts**: Large-tile DWT encoding with padding eliminates DCT block boundaries
 - **Perceptual optimisation**: HVS-aware quantisation preserves visual quality where it matters
 - **Temporal coherence**: 3D DWT with GOP encoding exploits inter-frame similarity
 - **Efficient sparse coding**: EZBC encoding exploits coefficient sparsity for 16-18% additional compression
 - **Hardware-friendly**: Designed for efficient decoding on resource-constrained platforms
 ## Features
 ### Compression Technology
 - **Wavelet Types**
  - **5/3 Reversible** (JPEG 2000 standard): Lossless-capable, good for archival
  - **9/7 Irreversible** (default): Best overall compression, CDF 9/7 variant
 - **Spatial Encoding**
  - Large-tile encoding with padding, with optional single-tile mode (no blocking artefacts)
  - 6-level DWT decomposition for deep frequency analysis
  - Perceptual quantisation with HVS-optimised coefficient scaling
  - YCoCg-R colour space with anisotropic chroma quantisation
 - **Temporal Encoding** (3D DWT Mode)
  - Group-of-pictures (GOP) encoding with adaptive size (typically 20 frames)
  - Unified EZBC encoding across temporal dimension
  - Adaptive GOP boundaries with scene change detection
 - **EZBC Encoding**
  - Binary tree embedded zero block coding exploits coefficient sparsity
  - Progressive refinement structure with bitplane encoding
  - Concatenated channel layout for cross-channel compression optimisation
  - Typical sparsity: 86.9% (Y), 97.8% (Co), 99.5% (Cg)
  - 16-18% compression improvement over naive coefficient encoding
 ### Audio Integration
 TAV seamlessly integrates with the TAD (TSVM Advanced Audio) codec for synchronised audio/video encoding:
 - Variable chunk sizes match video GOP boundaries
 - Embedded TAD packets (type 0x24) with Zstd compression
 - Unified container format
 ## Building
 ### Prerequisites
 - C compiler (GCC/Clang)
 - Zstandard library
 - OpenCV 4 library (only used by experimental motion estimation feature)
 ### Compilation
 ```bash
 # Build TAV encoder/decoder
 make tav
 # Build all tools including TAD audio codec
 make all
 # Clean build artefacts
 make clean
 ```
 ### Build Targets
 - `encoder_tav` - Main video encoder
 - `decoder_tav` - Standalone video decoder
 - `tav_inspector` - Packet analysis and debugging tool
 ## Usage
 ### Basic Encoding
 Encoding requires FFmpeg executable installed in your system.
 ```bash
 # Default encoding (CDF 9/7 wavelet, quality level 3)
 ./encoder_tav -i input.mp4 -o output.tav
 # Quality levels (0-5)
 ./encoder_tav -i input.avi -q 0 -o output.tav    # Lowest quality, smallest file
 ./encoder_tav -i input.mkv -q 5 -o output.tav    # Highest quality, largest file
 ```
 ### Intra-only Encoding
 ```bash
 # Enable Intra-only encoding
 ./encoder_tav -i input.mp4 --intra-only -o output.tav
 ```
 ### Decoding and Inspection
 ```bash
 # Decode TAV to raw video
 ./decoder_tav -i input.tav -o output.mkv
 # Inspect packet structure (debugging)
 ./tav_inspector input.tav -v
 ```
 ### Frame Limiting
 ```bash
 # Encode only first N frames (useful for testing)
 ./encoder_tav -i input.mp4 -o output.tav --encode-limit 100
 ```
 ## Technical Architecture
 ### Encoder Pipeline
 1. **Input Processing**
   - FFmpeg demuxing and frame extraction
   - RGB to YCoCg-R colour space conversion
   - Resolution validation and padding
 2. **DWT Transform**
   - Spatial: 6-level decomposition per frame
   - Temporal: 1D DWT across GOP frames (3D DWT mode)
   - Lifting scheme implementation for all wavelets
 3. **Perceptual Quantisation**
   - HVS-based subband weights
   - Anisotropic chroma quantisation (YCoCg-R specific)
   - Quality-dependent quantisation matrices
 4. **EZBC Encoding**
   - Binary tree embedded zero block coding per channel
   - Progressive refinement by bitplanes
   - Concatenated bitstream layout: `[Y_bitstream][Co_bitstream][Cg_bitstream]`
   - Cross-channel compression optimisation
 5. **Entropy Coding**
   - Zstandard compression (level 7) on concatenated EZBC bitstreams
   - Cross-channel compression opportunities
   - Adaptive compression based on GOP structure
 ### Decoder Pipeline
 1. **Container Parsing**
   - Packet type identification (0x00-0xFF)
   - Timecode synchronisation
   - GOP boundary detection
 2. **Entropy Decoding**
   - Zstd decompression of concatenated bitstreams
   - EZBC binary tree decoding per channel
   - Progressive coefficient reconstruction
 3. **Inverse Quantisation**
   - Perceptual weight application
   - Subband-specific scaling
   - Coefficient reconstruction from sparse representation
 4. **Inverse DWT**
   - Temporal: 1D inverse DWT across frames (3D DWT mode)
   - Spatial: 6-level inverse wavelet reconstruction
 5. **Output Conversion**
   - YCoCg-R to RGB colour space
   - Clamping and dithering
   - Frame buffering for display
 ### Wavelet Implementation
 All wavelets follow a **lifting scheme** pattern with symmetric boundary extension:
 ```c
 // Forward Transform: Predict → Update
 temp[half + i] = data[odd] - predict(data[even]);  // High-pass
 temp[i] = data[even] + update(temp[half]);         // Low-pass
 // Inverse Transform: Undo Update → Undo Predict (reversed order)
 data[even] = temp[i] - update(temp[half]);         // Undo low-pass
 data[odd] = temp[half + i] + predict(data[even]);  // Undo high-pass
 ```
 **Critical**: Forward and inverse transforms must use identical coefficient indexing and exactly reverse operations to avoid grid artefacts.
 ### Coefficient Layout
 TAV uses **2D Spatial Layout** in memory for each decomposition level:
 ```
 [LL] [LH] [HL] [HH] [LH] [HL] [HH] ...
 └── Level 0 ──┘ └─── Level 1 ───┘
 ```
 - `LL`: Low-pass (approximation) - progressively smaller with each level
 - `LH`, `HL`, `HH`: High-pass subbands (horizontal, vertical, diagonal detail)
 ## Performance Characteristics
 ### Compression Efficiency
 - **Sparsity Exploitation**: Typical quantised coefficient sparsity
  - Y channel: 86.9% zeros
  - Co channel: 97.8% zeros
  - Cg channel: 99.5% zeros
 - **EZBC Benefits**: 16-18% compression improvement over naive coefficient encoding through sparsity exploitation
 - **Temporal Coherence**: Additional 15-25% improvement with 3D DWT (content-dependent)
 ### Computational Complexity
 - **Encoding**: O(n log n) per frame for spatial DWT
 - **Decoding**: O(n log n) per frame, optimised lifting scheme implementation
 - **Memory**: Single-tile encoding requires O(w × h) working memory
 ### Quality Characteristics
 - **No blocking artefacts**: Wavelet-based encoding is inherently smooth
 - **Perceptual optimisation**: Better subjective quality than bitrate-equivalent DCT codecs
 - **Scalability**: 6 quality levels (0-5) provide wide range of bitrate/quality trade-offs
 - **Temporal stability**: 3D DWT mode reduces flickering and temporal artefacts
 ## Format Specification
 For complete packet structure and bitstream format details, refer to `format documentation.txt`.
 ### Key Packet Types
 - `0x00`: Metadata and initialisation
 - `0x01`: I-frame (intra-coded frame)
 - `0x12`: GOP unified packet (3D DWT mode)
 - `0x24`: Embedded TAD audio
 - `0xFC`: GOP synchronisation
 - `0xFD`: Timecode
 ## Debugging Tools
 ### TAV Inspector
 Analyse TAV packet structure and decode individual frames:
 ```bash
 # Verbose packet analysis
 ./tav_inspector input.tav -v
 # Extract specific frame ranges
 ./tav_inspector input.tav --frame-range 100-200
 ```
 ## Related Projects
 - **TAD** (TSVM Advanced Audio): Perceptual audio codec using CDF 9/7 wavelets
 - **TSVM**: Target virtual machine platform for TAV playback
 ## Licence
 MIT.