TAD documentation update

2026-06-15 00:44:05 +09:00 · 2025-10-30 22:13:29 +09:00
parent c61bf7750f
commit 46ad919407
2 changed files with 131 additions and 66 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -316,20 +316,23 @@ Implemented on 2025-10-15 for improved temporal compression through group-of-pic
 - **Adaptive GOPs**: Scene change detection ensures optimal GOP boundaries

 #### TAD Format (TSVM Advanced Audio)
- **Perceptual audio codec** for TSVM using DWT with 4-tap interpolating Deslauriers-Dubuc wavelets
+- **Perceptual audio codec** for TSVM using CDF 9/7 biorthogonal wavelets
 - **C Encoder**: `video_encoder/encoder_tad.c` - Core Encoder library; `video_encoder/encoder_tad_standalone.c` - Standalone encoder with FFmpeg integration
  - How to build: `make tad`
  - **Quality Levels**: 0-5 (0=lowest quality/smallest, 5=highest quality/largest; designed to be in sync with TAV encoder)
 - **C Decoder**: `video_encoder/decoder_tad.c` - Standalone decoder for TAD format
+- **Kotlin Decoder**: `AudioAdapter.kt` - Hardware-accelerated TAD decoder for TSVM runtime
 - **Features**:
  - **32 KHz stereo**: TSVM audio hardware native format
-  - **Variable chunk sizes**: 1024-32768+ samples, enables flexible TAV integration
+  - **Variable chunk sizes**: Any size ≥1024 samples, including non-power-of-2 (e.g., 32016 for TAV 1-second GOPs)
  - **M/S stereo decorrelation**: Exploits stereo correlation for better compression
-  - **PCM16→PCM8 conversion**: Error-diffusion dithering to minimize quantization noise
-  - **Variable-level DD-4 DWT**: Dynamic levels (log2(chunk_size) - 2) for frequency domain analysis
-  - **Perceptual quantization**: Frequency-dependent weights preserving critical 2-4 KHz range
-  - **2-bit twobitmap significance map**: Efficient encoding of sparse coefficients
-  - **Optional Zstd compression**: Level 7 for additional compression
+  - **Gamma compression**: Dynamic range compression (γ=0.707) before quantization
+  - **9-level CDF 9/7 DWT**: Fixed 9 decomposition levels for all chunk sizes
+  - **Perceptual quantization**: Frequency-dependent weights with lambda companding
+  - **Raw int8 storage**: Direct coefficient storage (no significance map, better Zstd compression)
+  - **Coefficient-domain dithering**: Light TPDF dithering to reduce banding
+  - **Zstd compression**: Level 7 for additional compression
+  - **Non-power-of-2 support**: Fixed 2025-10-30 to handle arbitrary chunk sizes correctly
 - **Usage Examples**:
  ```bash
  # Encode with default quality (Q3)
@@ -348,7 +351,7 @@ Implemented on 2025-10-15 for improved temporal compression through group-of-pic
  decoder_tad -i input.tad -o output.pcm
  ```
 - **Format documentation**: `terranmon.txt` (search for "TSVM Advanced Audio (TAD) Format")
- **Version**: 1 (2-bit twobitmap significance map)
+- **Version**: 1.1 (raw int8 storage with non-power-of-2 support, updated 2025-10-30)

 **TAD Compression Performance**:
 - **Target Compression**: 2:1 against PCMu8 baseline (4:1 against PCM16LE input)
@@ -358,15 +361,16 @@ Implemented on 2025-10-15 for improved temporal compression through group-of-pic

 **TAD Encoding Pipeline**:
 1. **FFmpeg Two-Pass Extraction**: High-quality SoXR resampling to 32 KHz with 16 Hz highpass filter
-2. **PCM16→PCM8 with Dithering**: Error-diffusion dithering minimizes quantization noise
+2. **Gamma Compression**: Dynamic range compression (γ=0.707) for perceptual uniformity
 3. **M/S Stereo Decorrelation**: Transforms Left/Right to Mid/Side for better compression
-4. **Variable-Level DD-4 DWT**: Deslauriers-Dubuc 4-tap interpolating wavelets with dynamic levels
-   - Default 32768 samples → 13 DWT levels
-   - Minimum 1024 samples → 8 DWT levels
-5. **Frequency-Dependent Quantization**: Perceptual weights favor 2-4 KHz (speech intelligibility)
+4. **9-Level CDF 9/7 DWT**: biorthogonal wavelets with fixed 9 levels
+   - All chunk sizes use 9 levels (sufficient for ≥512 samples after 9 halvings)
+   - Supports non-power-of-2 sizes through proper length tracking
+5. **Frequency-Dependent Quantization**: Perceptual weights with lambda companding
 6. **Dead Zone Quantization**: Zeros high-frequency noise (highest band)
-7. **2-bit Twobitmap Encoding**: Maps coefficients to 00=0, 01=+1, 10=-1, 11=other
-8. **Optional Zstd Compression**: Level 7 compression on concatenated Mid+Side data
+7. **Coefficient-Domain Dithering**: Light TPDF dithering (±0.5 quantization steps)
+8. **Raw Int8 Storage**: Direct coefficient storage as signed int8 values
+9. **Optional Zstd Compression**: Level 7 compression on concatenated Mid+Side data

 **TAD Integration with TAV**:
 TAD is designed as an includable API for TAV video encoder integration. The variable chunk size
@@ -375,7 +379,20 @@ TAV embeds TAD-compressed audio using packet type 0x24 with Zstd compression.

 **TAD Hardware Acceleration**:
 TSVM accelerates TAD decoding with AudioAdapter.kt (backend) and AudioJSR223Delegate.kt (API):
- Backend decoder in AudioAdapter.kt with variable chunk size support
+- Backend decoder in AudioAdapter.kt with non-power-of-2 chunk size support (fixed 2025-10-30)
 - API functions in AudioJSR223Delegate.kt for JavaScript access
- Supports chunk sizes from 1024 to 32768+ samples
- Dynamic DWT level calculation for optimal performance
+- Supports chunk sizes from 1024 to 32768+ samples (any size ≥1024)
+- Fixed 9-level CDF 9/7 inverse DWT with correct length tracking for non-power-of-2 sizes
+
+**Critical Implementation Note (Fixed 2025-10-30)**:
+Multi-level inverse DWT must pre-calculate the exact sequence of lengths from forward transform:
+```kotlin
+val lengths = IntArray(levels + 1)
+lengths[0] = chunk_size
+for (i in 1..levels) {
+    lengths[i] = (lengths[i - 1] + 1) / 2
+}
+// Apply inverse DWT using lengths[level] for each level
+```
+Using simple doubling (`length *= 2`) is incorrect for non-power-of-2 sizes and causes
+mirrored subband artifacts.