mirror of
https://github.com/curioustorvald/tsvm.git
synced 2026-03-07 11:51:49 +09:00
TAD: Terrarum Advanced Audio to use with video compression
This commit is contained in:
65
CLAUDE.md
65
CLAUDE.md
@@ -314,3 +314,68 @@ Implemented on 2025-10-15 for improved temporal compression through group-of-pic
|
||||
- **Unified Compression**: Zstd compresses entire GOP as single block, finding patterns across time
|
||||
- **Motion Compensation**: FFT-based phase correlation provides accurate global motion estimation
|
||||
- **Adaptive GOPs**: Scene change detection ensures optimal GOP boundaries
|
||||
|
||||
#### TAD Format (TSVM Advanced Audio)
|
||||
- **Perceptual audio codec** for TSVM using DWT with 4-tap interpolating Deslauriers-Dubuc wavelets
|
||||
- **C Encoder**: `video_encoder/encoder_tad.c` - Core Encoder library; `video_encoder/encoder_tad_standalone.c` - Standalone encoder with FFmpeg integration
|
||||
- How to build: `make tad`
|
||||
- **Quality Levels**: 0-5 (0=lowest quality/smallest, 5=highest quality/largest; designed to be in sync with TAV encoder)
|
||||
- **C Decoder**: `video_encoder/decoder_tad.c` - Standalone decoder for TAD format
|
||||
- **Features**:
|
||||
- **32 KHz stereo**: TSVM audio hardware native format
|
||||
- **Variable chunk sizes**: 1024-32768+ samples, enables flexible TAV integration
|
||||
- **M/S stereo decorrelation**: Exploits stereo correlation for better compression
|
||||
- **PCM16→PCM8 conversion**: Error-diffusion dithering to minimize quantization noise
|
||||
- **Variable-level DD-4 DWT**: Dynamic levels (log2(chunk_size) - 2) for frequency domain analysis
|
||||
- **Perceptual quantization**: Frequency-dependent weights preserving critical 2-4 KHz range
|
||||
- **2-bit twobitmap significance map**: Efficient encoding of sparse coefficients
|
||||
- **Optional Zstd compression**: Level 7 for additional compression
|
||||
- **Usage Examples**:
|
||||
```bash
|
||||
# Encode with default quality (Q3)
|
||||
encoder_tad -i input.mp4 -o output.tad
|
||||
|
||||
# Encode with highest quality
|
||||
encoder_tad -i input.mp4 -o output.tad -q 5
|
||||
|
||||
# Encode without Zstd compression
|
||||
encoder_tad -i input.mp4 -o output.tad --no-zstd
|
||||
|
||||
# Verbose output with statistics
|
||||
encoder_tad -i input.mp4 -o output.tad -v
|
||||
|
||||
# Decode back to PCM16
|
||||
decoder_tad -i input.tad -o output.pcm
|
||||
```
|
||||
- **Format documentation**: `terranmon.txt` (search for "TSVM Advanced Audio (TAD) Format")
|
||||
- **Version**: 1 (2-bit twobitmap significance map)
|
||||
|
||||
**TAD Compression Performance**:
|
||||
- **Target Compression**: 2:1 against PCMu8 baseline (4:1 against PCM16LE input)
|
||||
- **Achieved Compression**: 2.51:1 against PCMu8 at quality level 3
|
||||
- **Audio Quality**: Preserves full 0-16 KHz bandwidth
|
||||
- **Coefficient Sparsity**: 86.9% zeros in Mid channel, 97.8% in Side channel (typical)
|
||||
|
||||
**TAD Encoding Pipeline**:
|
||||
1. **FFmpeg Two-Pass Extraction**: High-quality SoXR resampling to 32 KHz with 16 Hz highpass filter
|
||||
2. **PCM16→PCM8 with Dithering**: Error-diffusion dithering minimizes quantization noise
|
||||
3. **M/S Stereo Decorrelation**: Transforms Left/Right to Mid/Side for better compression
|
||||
4. **Variable-Level DD-4 DWT**: Deslauriers-Dubuc 4-tap interpolating wavelets with dynamic levels
|
||||
- Default 32768 samples → 13 DWT levels
|
||||
- Minimum 1024 samples → 8 DWT levels
|
||||
5. **Frequency-Dependent Quantization**: Perceptual weights favor 2-4 KHz (speech intelligibility)
|
||||
6. **Dead Zone Quantization**: Zeros high-frequency noise (highest band)
|
||||
7. **2-bit Twobitmap Encoding**: Maps coefficients to 00=0, 01=+1, 10=-1, 11=other
|
||||
8. **Optional Zstd Compression**: Level 7 compression on concatenated Mid+Side data
|
||||
|
||||
**TAD Integration with TAV**:
|
||||
TAD is designed as an includable API for TAV video encoder integration. The variable chunk size
|
||||
support enables synchronized audio/video encoding where audio chunks can match video GOP boundaries.
|
||||
TAV embeds TAD-compressed audio using packet type 0x24 with Zstd compression.
|
||||
|
||||
**TAD Hardware Acceleration**:
|
||||
TSVM accelerates TAD decoding with AudioAdapter.kt (backend) and AudioJSR223Delegate.kt (API):
|
||||
- Backend decoder in AudioAdapter.kt with variable chunk size support
|
||||
- API functions in AudioJSR223Delegate.kt for JavaScript access
|
||||
- Supports chunk sizes from 1024 to 32768+ samples
|
||||
- Dynamic DWT level calculation for optimal performance
|
||||
|
||||
Reference in New Issue
Block a user