# TAV - TSVM Advanced Video Codec A perceptually-optimised wavelet-based video codec designed for resource-constrained systems, featuring multiple wavelet types, temporal 3D DWT, and sophisticated compression techniques. ## Overview TAV (TSVM Advanced Video) is a modern video codec built on discrete wavelet transformation (DWT). It combines cutting-edge compression techniques with careful optimisation for resource-constrained systems. ### Key Advantages - **No blocking artefacts**: Large-tile DWT encoding with padding eliminates DCT block boundaries - **No colour banding**: Wavelets spreads gradients across scales, preventing banding in the first place - **Perceptual optimisation**: HVS-aware quantisation preserves visual quality where it matters - **Temporal coherence**: 3D DWT with GOP encoding exploits inter-frame similarity - **Efficient sparse coding**: EZBC encoding exploits coefficient sparsity for 16-18% additional compression - **Hardware-friendly**: Designed for efficient decoding on resource-constrained platforms ## Features ### Compression Technology - **Wavelet Types** - **5/3 Reversible** (JPEG 2000 standard): Lossless-capable, good for archival - **9/7 Irreversible** (default): Best overall compression, CDF 9/7 variant - **Spatial Encoding** - Large-tile encoding with padding, with optional single-tile mode (no blocking artefacts) - 6-level DWT decomposition for deep frequency analysis - Perceptual quantisation with HVS-optimised coefficient scaling - YCoCg-R colour space with anisotropic chroma quantisation - **Temporal Encoding** (3D DWT Mode) - Group-of-pictures (GOP) encoding with adaptive size (typically 20 frames) - Unified EZBC encoding across temporal dimension - Adaptive GOP boundaries with scene change detection - **EZBC Encoding** - Binary tree embedded zero block coding exploits coefficient sparsity - Progressive refinement structure with bitplane encoding - Concatenated channel layout for cross-channel compression optimisation - Typical sparsity: 86.9% (Y), 97.8% (Co), 99.5% (Cg) - 16-18% compression improvement over naive coefficient encoding ### Audio Integration TAV seamlessly integrates with the TAD (TSVM Advanced Audio) codec for synchronised audio/video encoding: - Variable chunk sizes match video GOP boundaries - Embedded TAD packets (type 0x24) with Zstd compression - Unified container format ## Building ### Prerequisites - C compiler (GCC/Clang) - Zstandard library - OpenCV 4 library (only used by experimental motion estimation feature) ### Compilation ```bash # Build TAV encoder/decoder make tav # Build all tools including TAD audio codec make all # Clean build artefacts make clean ``` ### Build Targets - `encoder_tav` - Main video encoder - `decoder_tav` - Standalone video decoder - `tav_inspector` - Packet analysis and debugging tool ## Usage ### Basic Encoding Encoding requires FFmpeg executable installed in your system. ```bash # Default encoding (CDF 9/7 wavelet, quality level 3) ./encoder_tav -i input.mp4 -o output.tav # Quality levels (0-5) ./encoder_tav -i input.avi -q 0 -o output.tav # Lowest quality, smallest file ./encoder_tav -i input.mkv -q 5 -o output.tav # Highest quality, largest file ``` ### Intra-only Encoding ```bash # Enable Intra-only encoding ./encoder_tav -i input.mp4 --intra-only -o output.tav ``` ### Decoding and Inspection ```bash # Decode TAV to raw video ./decoder_tav -i input.tav -o output.mkv # Inspect packet structure (debugging) ./tav_inspector input.tav -v ``` ### Frame Limiting ```bash # Encode only first N frames (useful for testing) ./encoder_tav -i input.mp4 -o output.tav --encode-limit 100 ``` ## Technical Architecture ### Encoder Pipeline 1. **Input Processing** - FFmpeg demuxing and frame extraction - RGB to YCoCg-R colour space conversion - Resolution validation and padding 2. **DWT Transform** - Spatial: 6-level decomposition per frame - Temporal: 1D DWT across GOP frames (3D DWT mode) - Lifting scheme implementation for all wavelets 3. **Perceptual Quantisation** - HVS-based subband weights - Anisotropic chroma quantisation (YCoCg-R specific) - Quality-dependent quantisation matrices 4. **EZBC Encoding** - Binary tree embedded zero block coding per channel - Progressive refinement by bitplanes - Concatenated bitstream layout: `[Y_bitstream][Co_bitstream][Cg_bitstream]` - Cross-channel compression optimisation 5. **Entropy Coding** - Zstandard compression (level 7) on concatenated EZBC bitstreams - Cross-channel compression opportunities - Adaptive compression based on GOP structure ### Decoder Pipeline 1. **Container Parsing** - Packet type identification (0x00-0xFF) - Timecode synchronisation - GOP boundary detection 2. **Entropy Decoding** - Zstd decompression of concatenated bitstreams - EZBC binary tree decoding per channel - Progressive coefficient reconstruction 3. **Inverse Quantisation** - Perceptual weight application - Subband-specific scaling - Coefficient reconstruction from sparse representation 4. **Inverse DWT** - Temporal: 1D inverse DWT across frames (3D DWT mode) - Spatial: 6-level inverse wavelet reconstruction 5. **Output Conversion** - YCoCg-R to RGB colour space - Clamping and dithering - Frame buffering for display ### Wavelet Implementation All wavelets follow a **lifting scheme** pattern with symmetric boundary extension: ```c // Forward Transform: Predict → Update temp[half + i] = data[odd] - predict(data[even]); // High-pass temp[i] = data[even] + update(temp[half]); // Low-pass // Inverse Transform: Undo Update → Undo Predict (reversed order) data[even] = temp[i] - update(temp[half]); // Undo low-pass data[odd] = temp[half + i] + predict(data[even]); // Undo high-pass ``` **Critical**: Forward and inverse transforms must use identical coefficient indexing and exactly reverse operations to avoid grid artefacts. ### Coefficient Layout TAV uses **2D Spatial Layout** in memory for each decomposition level: ``` [LL] [LH] [HL] [HH] [LH] [HL] [HH] ... └── Level 0 ──┘ └─── Level 1 ───┘ ``` - `LL`: Low-pass (approximation) - progressively smaller with each level - `LH`, `HL`, `HH`: High-pass subbands (horizontal, vertical, diagonal detail) ## Performance Characteristics ### Compression Efficiency - **Sparsity Exploitation**: Typical quantised coefficient sparsity - Y channel: 86.9% zeros - Co channel: 97.8% zeros - Cg channel: 99.5% zeros - **EZBC Benefits**: 16-18% compression improvement over naive coefficient encoding through sparsity exploitation - **Temporal Coherence**: Additional 15-25% improvement with 3D DWT (content-dependent) ### Computational Complexity - **Encoding**: O(n log n) per frame for spatial DWT - **Decoding**: O(n log n) per frame, optimised lifting scheme implementation - **Memory**: Single-tile encoding requires O(w × h) working memory ### Quality Characteristics - **No blocking artefacts**: Wavelet-based encoding is inherently smooth - **Perceptual optimisation**: Better subjective quality than bitrate-equivalent DCT codecs - **Scalability**: 6 quality levels (0-5) provide wide range of bitrate/quality trade-offs - **Temporal stability**: 3D DWT mode reduces flickering and temporal artefacts ## Format Specification For complete packet structure and bitstream format details, refer to `format documentation.txt`. ### Key Packet Types - `0x00`: Metadata and initialisation - `0x01`: I-frame (intra-coded frame) - `0x12`: GOP unified packet (3D DWT mode) - `0x24`: Embedded TAD audio - `0xFC`: GOP synchronisation - `0xFD`: Timecode ## Debugging Tools ### TAV Inspector Analyse TAV packet structure and decode individual frames: ```bash # Verbose packet analysis ./tav_inspector input.tav -v # Extract specific frame ranges ./tav_inspector input.tav --frame-range 100-200 ``` ## Related Projects - **TAD** (TSVM Advanced Audio): Perceptual audio codec using CDF 9/7 wavelets - **TSVM**: Target virtual machine platform for TAV playback ## Licence MIT.