# TAD - TSVM Advanced Audio Codec A perceptually-optimised wavelet-based audio codec designed for resource-constrained systems, featuring CDF 9/7 wavelets, EZBC sparse coding, and sophisticated perceptual quantisation. ## Overview TAD (TSVM Advanced Audio) is a modern audio codec built on discrete wavelet transform (DWT) using Cohen-Daubechies-Feauveau (CDF) 9/7 biorthogonal wavelets. It combines perceptual quantisation, advanced entropy coding, and careful optimisation for resource-constrained systems. ### Key Advantages - **Perceptual optimisation**: HVS-aware quantisation preserves audio quality where it matters - **Efficient sparse coding**: EZBC encoding exploits coefficient sparsity (86.9% zeros in typical content) - **Variable chunk sizes**: Supports any chunk size ≥1024 samples, including non-power-of-2 - **Stereo decorrelation**: Mid/Side encoding exploits stereo correlation for better compression - **Hardware-friendly**: Designed for efficient decoding on resource-constrained platforms ## Features ### Compression Technology - **CDF 9/7 Biorthogonal Wavelets** - 9-level fixed decomposition for all chunk sizes - Lifting scheme implementation for efficient computation - Optimal frequency discrimination for audio signals - **Pre-processing** - First-order IIR pre-emphasis filter (α=0.5) shifts quantisation noise to lower frequencies, where they are less objectionable to listeners - Gamma compression (γ=0.5) for dynamic range compression before quantisation - Mid/Side stereo transformation exploits stereo correlation - Lambda companding (λ=6.0) with Laplacian CDF mapping for full bit utilisation - **Perceptual Quantisation** - Channel-specific (Mid/Side) frequency-dependent weights - Subband-aware quantisation preserves perceptually important frequencies - **EZBC Encoding** - Binary tree embedded zero block coding - Exploits coefficient sparsity (86.9% Mid, 97.8% Side typical) - Progressive refinement structure - Spatial clustering of non-zero coefficients - **Entropy Coding** - Zstandard compression (level 7) on concatenated EZBC bitstreams - Cross-channel compression optimisation - Optional Zstd bypass for debugging ### Audio Format - **Sample Rate**: 32 KHz (TSVM audio hardware native format) - **Channels**: Stereo (L/R input, Mid/Side internal representation) - **Chunk Sizes**: Variable, any size ≥1024 samples (including non-power-of-2) - **Bit Depth**: 32-bit float internal, 8-bit unsigned PCM output with noise-shaped dithering - **Bandwidth**: Full 0-16 KHz frequency range preserved ### Quality Levels Six quality levels (0-5) provide a wide range of compression/quality trade-offs: - **Level 0**: Lowest quality, smallest file size - **Level 3**: Default, balanced quality/compression (2.51:1 vs PCMu8) - **Level 5**: Highest quality, largest file size Quality levels are designed to be synchronised with TAV video codec for unified encoding. ## Building ### Prerequisites - C compiler (GCC/Clang) - Zstandard library (libzstd) - Math library (libm) ### Compilation ```bash # Build TAD encoder/decoder make tad # Build all tools make all # Clean build artifacts make clean ``` ### Build Targets - `encoder_tad` - Standalone audio encoder with FFmpeg calls - `decoder_tad` - Standalone audio decoder ## Usage ### Basic Encoding Encoding requires FFmpeg executable installed in your system. ```bash # Default encoding (quality level 3) ./encoder_tad -i input.mp3 -o output.tad # Specify quality level (0-5) ./encoder_tad -i input.m4a -o output.tad -q 0 # Lowest quality ./encoder_tad -i input.ogg -o output.tad -q 5 # Highest quality # Disable Zstd compression (for debugging) ./encoder_tad -i input.opus -o output.tad --no-zstd # Verbose output with statistics ./encoder_tad -i input.flac -o output.tad -v ``` ### Decoding ```bash # Decode to PCMu8 ./decoder_tad -i input.tad -o output.pcm --raw-pcm # Decode to WAV ./decoder_tad -i input.tad -o output.wav ``` ### Input Formats TAD encoder accepts any audio format supported by FFmpeg: - Audio files: WAV, MP3, FLAC, OGG, AAC, etc. - Video files with audio streams: MP4, MKV, AVI, etc. - Raw PCM formats Audio is automatically resampled to 32 KHz stereo if necessary. ## Technical Architecture ### Encoder Pipeline 1. **Input Processing** - FFmpeg demuxing and audio stream extraction - Resampling to 32 KHz stereo - Conversion to PCM32f 2. **Pre-emphasis Filter** - First-order IIR filter with α=0.5 - Shifts quantisation noise toward lower frequencies - Improves perceptual quality 3. **Gamma Compression** - Dynamic range compression with γ=0.5 - Applied independently to each sample - Reduces quantisation error for low-amplitude signals 4. **Stereo Decorrelation** - Left/Right to Mid/Side transformation - Mid = (L + R) / 2 - Side = (L - R) / 2 - Exploits stereo correlation for better compression 5. **9-Level CDF 9/7 DWT** - Fixed 9 decomposition levels for all chunk sizes - Forward lifting scheme implementation - Correct length tracking for non-power-of-2 sizes 6. **Perceptual Quantisation** - Channel-specific (Mid/Side) subband weights - Lambda companding with λ=6.0 - Laplacian CDF mapping: `sign(x) * floor(λ * log(1 + |x|/λ))` - Quantised to int8 coefficients 7. **EZBC Encoding** - Binary tree structure per channel - Progressive refinement by bitplanes - Zero block coding exploits sparsity - Independent bitstreams for Mid and Side 8. **Zstd Compression** - Level 7 compression on concatenated `[Mid_bitstream][Side_bitstream]` - Cross-channel optimisation opportunities - Adaptive compression based on content ### Decoder Pipeline 1. **Container Parsing** - TAD packet identification (type 0x24) - Chunk size extraction - Compressed data boundaries 2. **Zstd Decompression** - Decompress concatenated bitstreams - Split into Mid and Side EZBC streams 3. **EZBC Decoding** - Binary tree decoder per channel - Reconstruct quantised int8 coefficients - Progressive refinement reconstruction 4. **Lambda Decompanding** - Inverse Laplacian CDF with channel-specific weights - Reconstruct float32 DWT coefficients - Apply subband-specific perceptual weights 5. **9-Level Inverse CDF 9/7 DWT** - Inverse lifting scheme implementation - Correct length tracking for non-power-of-2 chunk sizes - Pre-calculated length sequence from forward transform 6. **Mid/Side to Left/Right** - L = Mid + Side - R = Mid - Side - Reconstruct stereo channels 7. **Gamma Expansion** - Inverse gamma with γ⁻¹=2.0 - Restore original dynamic range 8. **De-emphasis Filter** - Reverse pre-emphasis with α=0.5 - Remove frequency shaping - Restore flat frequency response 9. **PCM32f to PCM8u Conversion** - Noise-shaped dithering for 8-bit output - Clamping to valid range - Final output format ### Wavelet Implementation CDF 9/7 wavelet follows a **two-stage lifting scheme**: ```c // Forward Transform: Predict → Update // Predict step (generate high-pass) temp[half + i] = data[odd] - α * (data[even_left] + data[even_right]); // Update step (generate low-pass) temp[i] = data[even] + β * (temp[half + i - 1] + temp[half + i]); // Normalization (K factor) temp[i] *= K; temp[half + i] /= K; // Inverse Transform: Denormalize → Undo Update → Undo Predict (reversed order) temp[i] /= K; temp[half + i] *= K; temp[i] -= β * (temp[half + i - 1] + temp[half + i]); data[odd] = temp[half + i] + α * (temp[i] + temp[i + 1]); data[even] = temp[i]; ``` **CDF 9/7 Coefficients**: - α = -1.586134342 - β = -0.052980118 - γ = +0.882911075 - δ = +0.443506852 - K = 1.230174105 ### Non-Power-of-2 Chunk Size Handling Critical implementation detail for variable chunk sizes: ```c // Pre-calculate exact length sequence from forward transform int lengths[MAX_LEVELS + 1]; lengths[0] = chunk_size; for (int i = 1; i <= levels; i++) { lengths[i] = (lengths[i - 1] + 1) / 2; } // Apply inverse DWT using lengths[level] for each level // NEVER use simple doubling (length *= 2) - incorrect for non-power-of-2! ``` Incorrect length tracking causes mirrored subband artefacts in decoded audio. ### Perceptual Quantisation Weights Channel-specific weights for Mid (channel 0) and Side (channel 1): ```c // Base quantiser weights per subband (9 levels + approximation) float BASE_QUANTISER_WEIGHTS[2][10] = { // Mid channel (0) {4.0f, 2.0f, 1.8f, 1.6f, 1.4f, 1.2f, 1.0f, 1.0f, 1.3f, 2.0f}, // Side channel (1) {6.0f, 5.0f, 2.6f, 2.4f, 1.8f, 1.3f, 1.0f, 1.0f, 1.6f, 3.2f} }; // During dequantisation: float weight = BASE_QUANTISER_WEIGHTS[channel][subband] * quantiser_scale; coeffs[i] = normalised_val * TAD32_COEFF_SCALARS[subband] * weight; ``` Different weights for Mid and Side channels reflect perceptual importance of frequency bands in each channel. DC frequency has highest weight (4.0 Mid, 6.0 Side) due to energy concentration. ## Performance Characteristics ### Compression Efficiency - **Target Compression**: 2:1 against PCMu8 baseline (4:1 against PCM16LE input) - **Achieved Compression**: 2.51:1 against PCMu8 at quality level 3 - **Audio Quality**: Preserves full 0-16 KHz bandwidth - **Coefficient Sparsity**: 86.9% zeros in Mid channel, 97.8% in Side channel (typical) - **EZBC Benefits**: Exploits sparsity, progressive refinement, spatial clustering ### Computational Complexity - **Encoding**: O(n log n) per chunk for DWT, O(n) for EZBC encoding - **Decoding**: O(n log n) per chunk for inverse DWT, O(n) for EZBC decoding - **Memory**: O(n) working memory for chunk processing ### Quality Characteristics - **Frequency Response**: Flat 0-16 KHz within perceptual limits - **Dynamic Range**: Preserved through gamma compression/expansion - **Stereo Imaging**: Maintained through Mid/Side decorrelation - **Perceptual Quality**: Optimised for human auditory system characteristics ## Integration with TAV TAD is designed as an includable API for TAV video encoder integration: - **Variable Chunk Sizes**: Audio chunks can match video GOP boundaries (e.g., 32016 samples for 1-second TAV GOP) - **Unified Quality Levels**: TAD quality 0-5 synchronised with TAV quality 0-5 - **Embedded Packets**: TAV embeds TAD-compressed audio using packet type 0x24 - **Shared Container**: Single .tav file contains both video and audio streams ### TAV Integration Example ```c // TAD handles non-power-of-2 chunk size correctly tad_encode_chunk(audio_buffer, audio_samples_per_gop, output_buffer, &output_size); // TAV embeds TAD packet tav_write_packet(TAV_PACKET_AUDIO, output_buffer, output_size); ``` ## Format Specification For complete packet structure and bitstream format details, refer to `format documentation.txt`. ### Key Packet Types - `0x24`: TAD audio packet (used in standalone .tad files and embedded in .tav files) ## Related Projects - **TAV** (TSVM Advanced Video): Wavelet-based video codec with integrated TAD audio - **TSVM**: Target virtual machine platform for TAD playback ## Licence MIT.