TAD: Terrarum Advanced Audio to use with video compression

This commit is contained in:
minjaesong
2025-10-23 18:56:57 +09:00
parent 6f669f4fd9
commit a9319fd812
10 changed files with 1887 additions and 22 deletions

View File

@@ -314,3 +314,68 @@ Implemented on 2025-10-15 for improved temporal compression through group-of-pic
- **Unified Compression**: Zstd compresses entire GOP as single block, finding patterns across time
- **Motion Compensation**: FFT-based phase correlation provides accurate global motion estimation
- **Adaptive GOPs**: Scene change detection ensures optimal GOP boundaries
#### TAD Format (TSVM Advanced Audio)
- **Perceptual audio codec** for TSVM using DWT with 4-tap interpolating Deslauriers-Dubuc wavelets
- **C Encoder**: `video_encoder/encoder_tad.c` - Core Encoder library; `video_encoder/encoder_tad_standalone.c` - Standalone encoder with FFmpeg integration
- How to build: `make tad`
- **Quality Levels**: 0-5 (0=lowest quality/smallest, 5=highest quality/largest; designed to be in sync with TAV encoder)
- **C Decoder**: `video_encoder/decoder_tad.c` - Standalone decoder for TAD format
- **Features**:
- **32 KHz stereo**: TSVM audio hardware native format
- **Variable chunk sizes**: 1024-32768+ samples, enables flexible TAV integration
- **M/S stereo decorrelation**: Exploits stereo correlation for better compression
- **PCM16→PCM8 conversion**: Error-diffusion dithering to minimize quantization noise
- **Variable-level DD-4 DWT**: Dynamic levels (log2(chunk_size) - 2) for frequency domain analysis
- **Perceptual quantization**: Frequency-dependent weights preserving critical 2-4 KHz range
- **2-bit twobitmap significance map**: Efficient encoding of sparse coefficients
- **Optional Zstd compression**: Level 7 for additional compression
- **Usage Examples**:
```bash
# Encode with default quality (Q3)
encoder_tad -i input.mp4 -o output.tad
# Encode with highest quality
encoder_tad -i input.mp4 -o output.tad -q 5
# Encode without Zstd compression
encoder_tad -i input.mp4 -o output.tad --no-zstd
# Verbose output with statistics
encoder_tad -i input.mp4 -o output.tad -v
# Decode back to PCM16
decoder_tad -i input.tad -o output.pcm
```
- **Format documentation**: `terranmon.txt` (search for "TSVM Advanced Audio (TAD) Format")
- **Version**: 1 (2-bit twobitmap significance map)
**TAD Compression Performance**:
- **Target Compression**: 2:1 against PCMu8 baseline (4:1 against PCM16LE input)
- **Achieved Compression**: 2.51:1 against PCMu8 at quality level 3
- **Audio Quality**: Preserves full 0-16 KHz bandwidth
- **Coefficient Sparsity**: 86.9% zeros in Mid channel, 97.8% in Side channel (typical)
**TAD Encoding Pipeline**:
1. **FFmpeg Two-Pass Extraction**: High-quality SoXR resampling to 32 KHz with 16 Hz highpass filter
2. **PCM16→PCM8 with Dithering**: Error-diffusion dithering minimizes quantization noise
3. **M/S Stereo Decorrelation**: Transforms Left/Right to Mid/Side for better compression
4. **Variable-Level DD-4 DWT**: Deslauriers-Dubuc 4-tap interpolating wavelets with dynamic levels
- Default 32768 samples → 13 DWT levels
- Minimum 1024 samples → 8 DWT levels
5. **Frequency-Dependent Quantization**: Perceptual weights favor 2-4 KHz (speech intelligibility)
6. **Dead Zone Quantization**: Zeros high-frequency noise (highest band)
7. **2-bit Twobitmap Encoding**: Maps coefficients to 00=0, 01=+1, 10=-1, 11=other
8. **Optional Zstd Compression**: Level 7 compression on concatenated Mid+Side data
**TAD Integration with TAV**:
TAD is designed as an includable API for TAV video encoder integration. The variable chunk size
support enables synchronized audio/video encoding where audio chunks can match video GOP boundaries.
TAV embeds TAD-compressed audio using packet type 0x24 with Zstd compression.
**TAD Hardware Acceleration**:
TSVM accelerates TAD decoding with AudioAdapter.kt (backend) and AudioJSR223Delegate.kt (API):
- Backend decoder in AudioAdapter.kt with variable chunk size support
- API functions in AudioJSR223Delegate.kt for JavaScript access
- Supports chunk sizes from 1024 to 32768+ samples
- Dynamic DWT level calculation for optimal performance