TAD: Terrarum Advanced Audio to use with video compression

2026-06-11 15:24:05 +09:00 · 2025-10-23 18:56:57 +09:00
parent 6f669f4fd9
commit a9319fd812
10 changed files with 1887 additions and 22 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -314,3 +314,68 @@ Implemented on 2025-10-15 for improved temporal compression through group-of-pic
 - **Unified Compression**: Zstd compresses entire GOP as single block, finding patterns across time
 - **Motion Compensation**: FFT-based phase correlation provides accurate global motion estimation
 - **Adaptive GOPs**: Scene change detection ensures optimal GOP boundaries
+
+#### TAD Format (TSVM Advanced Audio)
+- **Perceptual audio codec** for TSVM using DWT with 4-tap interpolating Deslauriers-Dubuc wavelets
+- **C Encoder**: `video_encoder/encoder_tad.c` - Core Encoder library; `video_encoder/encoder_tad_standalone.c` - Standalone encoder with FFmpeg integration
+  - How to build: `make tad`
+  - **Quality Levels**: 0-5 (0=lowest quality/smallest, 5=highest quality/largest; designed to be in sync with TAV encoder)
+- **C Decoder**: `video_encoder/decoder_tad.c` - Standalone decoder for TAD format
+- **Features**:
+  - **32 KHz stereo**: TSVM audio hardware native format
+  - **Variable chunk sizes**: 1024-32768+ samples, enables flexible TAV integration
+  - **M/S stereo decorrelation**: Exploits stereo correlation for better compression
+  - **PCM16→PCM8 conversion**: Error-diffusion dithering to minimize quantization noise
+  - **Variable-level DD-4 DWT**: Dynamic levels (log2(chunk_size) - 2) for frequency domain analysis
+  - **Perceptual quantization**: Frequency-dependent weights preserving critical 2-4 KHz range
+  - **2-bit twobitmap significance map**: Efficient encoding of sparse coefficients
+  - **Optional Zstd compression**: Level 7 for additional compression
+- **Usage Examples**:
+  ```bash
+  # Encode with default quality (Q3)
+  encoder_tad -i input.mp4 -o output.tad
+
+  # Encode with highest quality
+  encoder_tad -i input.mp4 -o output.tad -q 5
+
+  # Encode without Zstd compression
+  encoder_tad -i input.mp4 -o output.tad --no-zstd
+
+  # Verbose output with statistics
+  encoder_tad -i input.mp4 -o output.tad -v
+
+  # Decode back to PCM16
+  decoder_tad -i input.tad -o output.pcm
+  ```
+- **Format documentation**: `terranmon.txt` (search for "TSVM Advanced Audio (TAD) Format")
+- **Version**: 1 (2-bit twobitmap significance map)
+
+**TAD Compression Performance**:
+- **Target Compression**: 2:1 against PCMu8 baseline (4:1 against PCM16LE input)
+- **Achieved Compression**: 2.51:1 against PCMu8 at quality level 3
+- **Audio Quality**: Preserves full 0-16 KHz bandwidth
+- **Coefficient Sparsity**: 86.9% zeros in Mid channel, 97.8% in Side channel (typical)
+
+**TAD Encoding Pipeline**:
+1. **FFmpeg Two-Pass Extraction**: High-quality SoXR resampling to 32 KHz with 16 Hz highpass filter
+2. **PCM16→PCM8 with Dithering**: Error-diffusion dithering minimizes quantization noise
+3. **M/S Stereo Decorrelation**: Transforms Left/Right to Mid/Side for better compression
+4. **Variable-Level DD-4 DWT**: Deslauriers-Dubuc 4-tap interpolating wavelets with dynamic levels
+   - Default 32768 samples → 13 DWT levels
+   - Minimum 1024 samples → 8 DWT levels
+5. **Frequency-Dependent Quantization**: Perceptual weights favor 2-4 KHz (speech intelligibility)
+6. **Dead Zone Quantization**: Zeros high-frequency noise (highest band)
+7. **2-bit Twobitmap Encoding**: Maps coefficients to 00=0, 01=+1, 10=-1, 11=other
+8. **Optional Zstd Compression**: Level 7 compression on concatenated Mid+Side data
+
+**TAD Integration with TAV**:
+TAD is designed as an includable API for TAV video encoder integration. The variable chunk size
+support enables synchronized audio/video encoding where audio chunks can match video GOP boundaries.
+TAV embeds TAD-compressed audio using packet type 0x24 with Zstd compression.
+
+**TAD Hardware Acceleration**:
+TSVM accelerates TAD decoding with AudioAdapter.kt (backend) and AudioJSR223Delegate.kt (API):
+- Backend decoder in AudioAdapter.kt with variable chunk size support
+- API functions in AudioJSR223Delegate.kt for JavaScript access
+- Supports chunk sizes from 1024 to 32768+ samples
+- Dynamic DWT level calculation for optimal performance