diff --git a/CLAUDE.md b/CLAUDE.md index 623df7c..ec3c8f7 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -173,10 +173,10 @@ Peripheral memories can be accessed using `vm.peek()` and `vm.poke()` functions, - **Multiple Wavelet Types**: 5/3 reversible, 9/7 irreversible, CDF 13/7, DD-4, Haar - **Single-tile encoding**: One large DWT tile for optimal quality (no blocking artifacts) - **Perceptual quantisation**: HVS-optimized coefficient scaling - - **YCoCg-R color space**: Efficient chroma representation with "simulated" subsampling using anisotropic quantisation (search for "ANISOTROPY_MULT_CHROMA" on the encoder) + - **YCoCg-R colour space**: Efficient chroma representation with "simulated" subsampling using anisotropic quantisation (search for "ANISOTROPY_MULT_CHROMA" on the encoder) - **6-level DWT decomposition**: Deep frequency analysis for better compression (deeper levels possible but 6 is the maximum for the default TSVM size) - **Significance Map Compression**: Improved coefficient storage format exploiting sparsity for 16-18% additional compression (2025-09-29 update) - - **Concatenated Maps Layout**: Cross-channel compression optimization for additional 1.6% improvement (2025-09-29 enhanced) + - **Concatenated Maps Layout**: Cross-channel compression optimisation for additional 1.6% improvement (2025-09-29 enhanced) - **Usage Examples**: ```bash # Different wavelets @@ -259,7 +259,7 @@ Concatenated Maps Layout: - Significance map: 1 bit per coefficient (0=zero, 1=non-zero) - Value arrays: Only non-zero coefficients in sequence per channel -- Cross-channel optimization: Zstd finds patterns across similar significance maps +- Cross-channel optimisation: Zstd finds patterns across similar significance maps - Result: 16-18% compression improvement + 1.6% additional from concatenation ``` diff --git a/video_encoder/TAD_README.md b/video_encoder/TAD_README.md new file mode 100644 index 0000000..dc71d12 --- /dev/null +++ b/video_encoder/TAD_README.md @@ -0,0 +1,350 @@ +# TAD - TSVM Advanced Audio Codec + +A perceptually-optimised wavelet-based audio codec designed for resource-constrained systems, featuring CDF 9/7 wavelets, EZBC sparse coding, and sophisticated perceptual quantisation. + +## Overview + +TAD (TSVM Advanced Audio) is a modern audio codec built on discrete wavelet transform (DWT) using Cohen-Daubechies-Feauveau (CDF) 9/7 biorthogonal wavelets. It combines perceptual quantisation, advanced entropy coding, and careful optimisation for resource-constrained systems. + +### Key Advantages + +- **Perceptual optimisation**: HVS-aware quantisation preserves audio quality where it matters +- **Efficient sparse coding**: EZBC encoding exploits coefficient sparsity (86.9% zeros in typical content) +- **Variable chunk sizes**: Supports any chunk size ≥1024 samples, including non-power-of-2 +- **Stereo decorrelation**: Mid/Side encoding exploits stereo correlation for better compression +- **Hardware-friendly**: Designed for efficient decoding on resource-constrained platforms + +## Features + +### Compression Technology + +- **CDF 9/7 Biorthogonal Wavelets** + - 9-level fixed decomposition for all chunk sizes + - Lifting scheme implementation for efficient computation + - Optimal frequency discrimination for audio signals + +- **Pre-processing** + - First-order IIR pre-emphasis filter (α=0.5) shifts quantisation noise to lower frequencies, where they are less objectionable to listeners + - Gamma compression (γ=0.5) for dynamic range compression before quantisation + - Mid/Side stereo transformation exploits stereo correlation + - Lambda companding (λ=6.0) with Laplacian CDF mapping for full bit utilisation + +- **Perceptual Quantisation** + - Channel-specific (Mid/Side) frequency-dependent weights + - Subband-aware quantisation preserves perceptually important frequencies + +- **EZBC Encoding** + - Binary tree embedded zero block coding + - Exploits coefficient sparsity (86.9% Mid, 97.8% Side typical) + - Progressive refinement structure + - Spatial clustering of non-zero coefficients + +- **Entropy Coding** + - Zstandard compression (level 7) on concatenated EZBC bitstreams + - Cross-channel compression optimisation + - Optional Zstd bypass for debugging + +### Audio Format + +- **Sample Rate**: 32 KHz (TSVM audio hardware native format) +- **Channels**: Stereo (L/R input, Mid/Side internal representation) +- **Chunk Sizes**: Variable, any size ≥1024 samples (including non-power-of-2) +- **Bit Depth**: 32-bit float internal, 8-bit unsigned PCM output with noise-shaped dithering +- **Bandwidth**: Full 0-16 KHz frequency range preserved + +### Quality Levels + +Six quality levels (0-5) provide a wide range of compression/quality trade-offs: +- **Level 0**: Lowest quality, smallest file size +- **Level 3**: Default, balanced quality/compression (2.51:1 vs PCMu8) +- **Level 5**: Highest quality, largest file size + +Quality levels are designed to be synchronised with TAV video codec for unified encoding. + +## Building + +### Prerequisites + +- C compiler (GCC/Clang) +- Zstandard library (libzstd) +- Math library (libm) + +### Compilation + +```bash +# Build TAD encoder/decoder +make tad + +# Build all tools +make all + +# Clean build artifacts +make clean +``` + +### Build Targets + +- `encoder_tad` - Standalone audio encoder with FFmpeg calls +- `decoder_tad` - Standalone audio decoder + +## Usage + +### Basic Encoding + +Encoding requires FFmpeg executable installed in your system. + +```bash +# Default encoding (quality level 3) +./encoder_tad -i input.mp3 -o output.tad + +# Specify quality level (0-5) +./encoder_tad -i input.m4a -o output.tad -q 0 # Lowest quality +./encoder_tad -i input.ogg -o output.tad -q 5 # Highest quality + +# Disable Zstd compression (for debugging) +./encoder_tad -i input.opus -o output.tad --no-zstd + +# Verbose output with statistics +./encoder_tad -i input.flac -o output.tad -v +``` + +### Decoding + +```bash +# Decode to PCMu8 +./decoder_tad -i input.tad -o output.pcm --raw-pcm + +# Decode to WAV +./decoder_tad -i input.tad -o output.wav +``` + +### Input Formats + +TAD encoder accepts any audio format supported by FFmpeg: +- Audio files: WAV, MP3, FLAC, OGG, AAC, etc. +- Video files with audio streams: MP4, MKV, AVI, etc. +- Raw PCM formats + +Audio is automatically resampled to 32 KHz stereo if necessary. + +## Technical Architecture + +### Encoder Pipeline + +1. **Input Processing** + - FFmpeg demuxing and audio stream extraction + - Resampling to 32 KHz stereo + - Conversion to PCM32f + +2. **Pre-emphasis Filter** + - First-order IIR filter with α=0.5 + - Shifts quantisation noise toward lower frequencies + - Improves perceptual quality + +3. **Gamma Compression** + - Dynamic range compression with γ=0.5 + - Applied independently to each sample + - Reduces quantisation error for low-amplitude signals + +4. **Stereo Decorrelation** + - Left/Right to Mid/Side transformation + - Mid = (L + R) / 2 + - Side = (L - R) / 2 + - Exploits stereo correlation for better compression + +5. **9-Level CDF 9/7 DWT** + - Fixed 9 decomposition levels for all chunk sizes + - Forward lifting scheme implementation + - Correct length tracking for non-power-of-2 sizes + +6. **Perceptual Quantisation** + - Channel-specific (Mid/Side) subband weights + - Lambda companding with λ=6.0 + - Laplacian CDF mapping: `sign(x) * floor(λ * log(1 + |x|/λ))` + - Quantised to int8 coefficients + +7. **EZBC Encoding** + - Binary tree structure per channel + - Progressive refinement by bitplanes + - Zero block coding exploits sparsity + - Independent bitstreams for Mid and Side + +8. **Zstd Compression** + - Level 7 compression on concatenated `[Mid_bitstream][Side_bitstream]` + - Cross-channel optimisation opportunities + - Adaptive compression based on content + +### Decoder Pipeline + +1. **Container Parsing** + - TAD packet identification (type 0x24) + - Chunk size extraction + - Compressed data boundaries + +2. **Zstd Decompression** + - Decompress concatenated bitstreams + - Split into Mid and Side EZBC streams + +3. **EZBC Decoding** + - Binary tree decoder per channel + - Reconstruct quantised int8 coefficients + - Progressive refinement reconstruction + +4. **Lambda Decompanding** + - Inverse Laplacian CDF with channel-specific weights + - Reconstruct float32 DWT coefficients + - Apply subband-specific perceptual weights + +5. **9-Level Inverse CDF 9/7 DWT** + - Inverse lifting scheme implementation + - Correct length tracking for non-power-of-2 chunk sizes + - Pre-calculated length sequence from forward transform + +6. **Mid/Side to Left/Right** + - L = Mid + Side + - R = Mid - Side + - Reconstruct stereo channels + +7. **Gamma Expansion** + - Inverse gamma with γ⁻¹=2.0 + - Restore original dynamic range + +8. **De-emphasis Filter** + - Reverse pre-emphasis with α=0.5 + - Remove frequency shaping + - Restore flat frequency response + +9. **PCM32f to PCM8u Conversion** + - Noise-shaped dithering for 8-bit output + - Clamping to valid range + - Final output format + +### Wavelet Implementation + +CDF 9/7 wavelet follows a **two-stage lifting scheme**: + +```c +// Forward Transform: Predict → Update +// Predict step (generate high-pass) +temp[half + i] = data[odd] - α * (data[even_left] + data[even_right]); + +// Update step (generate low-pass) +temp[i] = data[even] + β * (temp[half + i - 1] + temp[half + i]); + +// Normalization (K factor) +temp[i] *= K; +temp[half + i] /= K; + +// Inverse Transform: Denormalize → Undo Update → Undo Predict (reversed order) +temp[i] /= K; +temp[half + i] *= K; + +temp[i] -= β * (temp[half + i - 1] + temp[half + i]); +data[odd] = temp[half + i] + α * (temp[i] + temp[i + 1]); +data[even] = temp[i]; +``` + +**CDF 9/7 Coefficients**: +- α = -1.586134342 +- β = -0.052980118 +- γ = +0.882911075 +- δ = +0.443506852 +- K = 1.230174105 + +### Non-Power-of-2 Chunk Size Handling + +Critical implementation detail for variable chunk sizes: + +```c +// Pre-calculate exact length sequence from forward transform +int lengths[MAX_LEVELS + 1]; +lengths[0] = chunk_size; +for (int i = 1; i <= levels; i++) { + lengths[i] = (lengths[i - 1] + 1) / 2; +} + +// Apply inverse DWT using lengths[level] for each level +// NEVER use simple doubling (length *= 2) - incorrect for non-power-of-2! +``` + +Incorrect length tracking causes mirrored subband artefacts in decoded audio. + +### Perceptual Quantisation Weights + +Channel-specific weights for Mid (channel 0) and Side (channel 1): + +```c +// Base quantiser weights per subband (9 levels + approximation) +float BASE_QUANTISER_WEIGHTS[2][10] = { + // Mid channel (0) + {4.0f, 2.0f, 1.8f, 1.6f, 1.4f, 1.2f, 1.0f, 1.0f, 1.3f, 2.0f}, + + // Side channel (1) + {6.0f, 5.0f, 2.6f, 2.4f, 1.8f, 1.3f, 1.0f, 1.0f, 1.6f, 3.2f} +}; + +// During dequantisation: +float weight = BASE_QUANTISER_WEIGHTS[channel][subband] * quantiser_scale; +coeffs[i] = normalised_val * TAD32_COEFF_SCALARS[subband] * weight; +``` + +Different weights for Mid and Side channels reflect perceptual importance of frequency bands in each channel. DC frequency has highest weight (4.0 Mid, 6.0 Side) due to energy concentration. + +## Performance Characteristics + +### Compression Efficiency + +- **Target Compression**: 2:1 against PCMu8 baseline (4:1 against PCM16LE input) +- **Achieved Compression**: 2.51:1 against PCMu8 at quality level 3 +- **Audio Quality**: Preserves full 0-16 KHz bandwidth +- **Coefficient Sparsity**: 86.9% zeros in Mid channel, 97.8% in Side channel (typical) +- **EZBC Benefits**: Exploits sparsity, progressive refinement, spatial clustering + +### Computational Complexity + +- **Encoding**: O(n log n) per chunk for DWT, O(n) for EZBC encoding +- **Decoding**: O(n log n) per chunk for inverse DWT, O(n) for EZBC decoding +- **Memory**: O(n) working memory for chunk processing + +### Quality Characteristics + +- **Frequency Response**: Flat 0-16 KHz within perceptual limits +- **Dynamic Range**: Preserved through gamma compression/expansion +- **Stereo Imaging**: Maintained through Mid/Side decorrelation +- **Perceptual Quality**: Optimised for human auditory system characteristics + +## Integration with TAV + +TAD is designed as an includable API for TAV video encoder integration: + +- **Variable Chunk Sizes**: Audio chunks can match video GOP boundaries (e.g., 32016 samples for 1-second TAV GOP) +- **Unified Quality Levels**: TAD quality 0-5 synchronised with TAV quality 0-5 +- **Embedded Packets**: TAV embeds TAD-compressed audio using packet type 0x24 +- **Shared Container**: Single .tav file contains both video and audio streams + +### TAV Integration Example + +```c +// TAD handles non-power-of-2 chunk size correctly +tad_encode_chunk(audio_buffer, audio_samples_per_gop, output_buffer, &output_size); + +// TAV embeds TAD packet +tav_write_packet(TAV_PACKET_AUDIO, output_buffer, output_size); +``` + +## Format Specification + +For complete packet structure and bitstream format details, refer to `format documentation.txt`. + +### Key Packet Types + +- `0x24`: TAD audio packet (used in standalone .tad files and embedded in .tav files) + +## Related Projects + +- **TAV** (TSVM Advanced Video): Wavelet-based video codec with integrated TAD audio +- **TSVM**: Target virtual machine platform for TAD playback + +## Licence + +MIT. diff --git a/video_encoder/TAV_README.md b/video_encoder/TAV_README.md new file mode 100644 index 0000000..8a1758a --- /dev/null +++ b/video_encoder/TAV_README.md @@ -0,0 +1,260 @@ +# TAV - TSVM Advanced Video Codec + +A perceptually-optimised wavelet-based video codec designed for resource-constrained systems, featuring multiple wavelet types, temporal 3D DWT, and sophisticated compression techniques. + +## Overview + +TAV (TSVM Advanced Video) is a modern video codec built on discrete wavelet transformation (DWT). It combines cutting-edge compression techniques with careful optimisation for resource-constrained systems. + +### Key Advantages + +- **No blocking artefacts**: Large-tile DWT encoding with padding eliminates DCT block boundaries +- **Perceptual optimisation**: HVS-aware quantisation preserves visual quality where it matters +- **Temporal coherence**: 3D DWT with GOP encoding exploits inter-frame similarity +- **Efficient sparse coding**: EZBC encoding exploits coefficient sparsity for 16-18% additional compression +- **Hardware-friendly**: Designed for efficient decoding on resource-constrained platforms + +## Features + +### Compression Technology + +- **Wavelet Types** + - **5/3 Reversible** (JPEG 2000 standard): Lossless-capable, good for archival + - **9/7 Irreversible** (default): Best overall compression, CDF 9/7 variant + +- **Spatial Encoding** + - Large-tile encoding with padding, with optional single-tile mode (no blocking artefacts) + - 6-level DWT decomposition for deep frequency analysis + - Perceptual quantisation with HVS-optimised coefficient scaling + - YCoCg-R colour space with anisotropic chroma quantisation + +- **Temporal Encoding** (3D DWT Mode) + - Group-of-pictures (GOP) encoding with adaptive size (typically 20 frames) + - Unified EZBC encoding across temporal dimension + - Adaptive GOP boundaries with scene change detection + +- **EZBC Encoding** + - Binary tree embedded zero block coding exploits coefficient sparsity + - Progressive refinement structure with bitplane encoding + - Concatenated channel layout for cross-channel compression optimisation + - Typical sparsity: 86.9% (Y), 97.8% (Co), 99.5% (Cg) + - 16-18% compression improvement over naive coefficient encoding + +### Audio Integration + +TAV seamlessly integrates with the TAD (TSVM Advanced Audio) codec for synchronised audio/video encoding: +- Variable chunk sizes match video GOP boundaries +- Embedded TAD packets (type 0x24) with Zstd compression +- Unified container format + +## Building + +### Prerequisites + +- C compiler (GCC/Clang) +- Zstandard library +- OpenCV 4 library (only used by experimental motion estimation feature) + +### Compilation + +```bash +# Build TAV encoder/decoder +make tav + +# Build all tools including TAD audio codec +make all + +# Clean build artefacts +make clean +``` + +### Build Targets + +- `encoder_tav` - Main video encoder +- `decoder_tav` - Standalone video decoder +- `tav_inspector` - Packet analysis and debugging tool + +## Usage + +### Basic Encoding + +Encoding requires FFmpeg executable installed in your system. + +```bash +# Default encoding (CDF 9/7 wavelet, quality level 3) +./encoder_tav -i input.mp4 -o output.tav + +# Quality levels (0-5) +./encoder_tav -i input.avi -q 0 -o output.tav # Lowest quality, smallest file +./encoder_tav -i input.mkv -q 5 -o output.tav # Highest quality, largest file +``` + +### Intra-only Encoding + +```bash +# Enable Intra-only encoding +./encoder_tav -i input.mp4 --intra-only -o output.tav +``` + +### Decoding and Inspection + +```bash +# Decode TAV to raw video +./decoder_tav -i input.tav -o output.mkv + +# Inspect packet structure (debugging) +./tav_inspector input.tav -v +``` + +### Frame Limiting + +```bash +# Encode only first N frames (useful for testing) +./encoder_tav -i input.mp4 -o output.tav --encode-limit 100 +``` + +## Technical Architecture + +### Encoder Pipeline + +1. **Input Processing** + - FFmpeg demuxing and frame extraction + - RGB to YCoCg-R colour space conversion + - Resolution validation and padding + +2. **DWT Transform** + - Spatial: 6-level decomposition per frame + - Temporal: 1D DWT across GOP frames (3D DWT mode) + - Lifting scheme implementation for all wavelets + +3. **Perceptual Quantisation** + - HVS-based subband weights + - Anisotropic chroma quantisation (YCoCg-R specific) + - Quality-dependent quantisation matrices + +4. **EZBC Encoding** + - Binary tree embedded zero block coding per channel + - Progressive refinement by bitplanes + - Concatenated bitstream layout: `[Y_bitstream][Co_bitstream][Cg_bitstream]` + - Cross-channel compression optimisation + +5. **Entropy Coding** + - Zstandard compression (level 7) on concatenated EZBC bitstreams + - Cross-channel compression opportunities + - Adaptive compression based on GOP structure + +### Decoder Pipeline + +1. **Container Parsing** + - Packet type identification (0x00-0xFF) + - Timecode synchronisation + - GOP boundary detection + +2. **Entropy Decoding** + - Zstd decompression of concatenated bitstreams + - EZBC binary tree decoding per channel + - Progressive coefficient reconstruction + +3. **Inverse Quantisation** + - Perceptual weight application + - Subband-specific scaling + - Coefficient reconstruction from sparse representation + +4. **Inverse DWT** + - Temporal: 1D inverse DWT across frames (3D DWT mode) + - Spatial: 6-level inverse wavelet reconstruction + +5. **Output Conversion** + - YCoCg-R to RGB colour space + - Clamping and dithering + - Frame buffering for display + +### Wavelet Implementation + +All wavelets follow a **lifting scheme** pattern with symmetric boundary extension: + +```c +// Forward Transform: Predict → Update +temp[half + i] = data[odd] - predict(data[even]); // High-pass +temp[i] = data[even] + update(temp[half]); // Low-pass + +// Inverse Transform: Undo Update → Undo Predict (reversed order) +data[even] = temp[i] - update(temp[half]); // Undo low-pass +data[odd] = temp[half + i] + predict(data[even]); // Undo high-pass +``` + +**Critical**: Forward and inverse transforms must use identical coefficient indexing and exactly reverse operations to avoid grid artefacts. + +### Coefficient Layout + +TAV uses **2D Spatial Layout** in memory for each decomposition level: + +``` +[LL] [LH] [HL] [HH] [LH] [HL] [HH] ... + └── Level 0 ──┘ └─── Level 1 ───┘ +``` + +- `LL`: Low-pass (approximation) - progressively smaller with each level +- `LH`, `HL`, `HH`: High-pass subbands (horizontal, vertical, diagonal detail) + +## Performance Characteristics + +### Compression Efficiency + +- **Sparsity Exploitation**: Typical quantised coefficient sparsity + - Y channel: 86.9% zeros + - Co channel: 97.8% zeros + - Cg channel: 99.5% zeros + +- **EZBC Benefits**: 16-18% compression improvement over naive coefficient encoding through sparsity exploitation + +- **Temporal Coherence**: Additional 15-25% improvement with 3D DWT (content-dependent) + +### Computational Complexity + +- **Encoding**: O(n log n) per frame for spatial DWT +- **Decoding**: O(n log n) per frame, optimised lifting scheme implementation +- **Memory**: Single-tile encoding requires O(w × h) working memory + +### Quality Characteristics + +- **No blocking artefacts**: Wavelet-based encoding is inherently smooth +- **Perceptual optimisation**: Better subjective quality than bitrate-equivalent DCT codecs +- **Scalability**: 6 quality levels (0-5) provide wide range of bitrate/quality trade-offs +- **Temporal stability**: 3D DWT mode reduces flickering and temporal artefacts + +## Format Specification + +For complete packet structure and bitstream format details, refer to `format documentation.txt`. + +### Key Packet Types + +- `0x00`: Metadata and initialisation +- `0x01`: I-frame (intra-coded frame) +- `0x12`: GOP unified packet (3D DWT mode) +- `0x24`: Embedded TAD audio +- `0xFC`: GOP synchronisation +- `0xFD`: Timecode + +## Debugging Tools + +### TAV Inspector + +Analyse TAV packet structure and decode individual frames: + +```bash +# Verbose packet analysis +./tav_inspector input.tav -v + +# Extract specific frame ranges +./tav_inspector input.tav --frame-range 100-200 +``` + +## Related Projects + +- **TAD** (TSVM Advanced Audio): Perceptual audio codec using CDF 9/7 wavelets +- **TSVM**: Target virtual machine platform for TAV playback + +## Licence + +MIT.