mirror of
https://github.com/curioustorvald/tsvm.git
synced 2026-03-07 19:51:51 +09:00
more TAV/TAD documentation update
This commit is contained in:
@@ -173,10 +173,10 @@ Peripheral memories can be accessed using `vm.peek()` and `vm.poke()` functions,
|
||||
- **Multiple Wavelet Types**: 5/3 reversible, 9/7 irreversible, CDF 13/7, DD-4, Haar
|
||||
- **Single-tile encoding**: One large DWT tile for optimal quality (no blocking artifacts)
|
||||
- **Perceptual quantisation**: HVS-optimized coefficient scaling
|
||||
- **YCoCg-R color space**: Efficient chroma representation with "simulated" subsampling using anisotropic quantisation (search for "ANISOTROPY_MULT_CHROMA" on the encoder)
|
||||
- **YCoCg-R colour space**: Efficient chroma representation with "simulated" subsampling using anisotropic quantisation (search for "ANISOTROPY_MULT_CHROMA" on the encoder)
|
||||
- **6-level DWT decomposition**: Deep frequency analysis for better compression (deeper levels possible but 6 is the maximum for the default TSVM size)
|
||||
- **Significance Map Compression**: Improved coefficient storage format exploiting sparsity for 16-18% additional compression (2025-09-29 update)
|
||||
- **Concatenated Maps Layout**: Cross-channel compression optimization for additional 1.6% improvement (2025-09-29 enhanced)
|
||||
- **Concatenated Maps Layout**: Cross-channel compression optimisation for additional 1.6% improvement (2025-09-29 enhanced)
|
||||
- **Usage Examples**:
|
||||
```bash
|
||||
# Different wavelets
|
||||
@@ -259,7 +259,7 @@ Concatenated Maps Layout:
|
||||
|
||||
- Significance map: 1 bit per coefficient (0=zero, 1=non-zero)
|
||||
- Value arrays: Only non-zero coefficients in sequence per channel
|
||||
- Cross-channel optimization: Zstd finds patterns across similar significance maps
|
||||
- Cross-channel optimisation: Zstd finds patterns across similar significance maps
|
||||
- Result: 16-18% compression improvement + 1.6% additional from concatenation
|
||||
```
|
||||
|
||||
|
||||
350
video_encoder/TAD_README.md
Normal file
350
video_encoder/TAD_README.md
Normal file
@@ -0,0 +1,350 @@
|
||||
# TAD - TSVM Advanced Audio Codec
|
||||
|
||||
A perceptually-optimised wavelet-based audio codec designed for resource-constrained systems, featuring CDF 9/7 wavelets, EZBC sparse coding, and sophisticated perceptual quantisation.
|
||||
|
||||
## Overview
|
||||
|
||||
TAD (TSVM Advanced Audio) is a modern audio codec built on discrete wavelet transform (DWT) using Cohen-Daubechies-Feauveau (CDF) 9/7 biorthogonal wavelets. It combines perceptual quantisation, advanced entropy coding, and careful optimisation for resource-constrained systems.
|
||||
|
||||
### Key Advantages
|
||||
|
||||
- **Perceptual optimisation**: HVS-aware quantisation preserves audio quality where it matters
|
||||
- **Efficient sparse coding**: EZBC encoding exploits coefficient sparsity (86.9% zeros in typical content)
|
||||
- **Variable chunk sizes**: Supports any chunk size ≥1024 samples, including non-power-of-2
|
||||
- **Stereo decorrelation**: Mid/Side encoding exploits stereo correlation for better compression
|
||||
- **Hardware-friendly**: Designed for efficient decoding on resource-constrained platforms
|
||||
|
||||
## Features
|
||||
|
||||
### Compression Technology
|
||||
|
||||
- **CDF 9/7 Biorthogonal Wavelets**
|
||||
- 9-level fixed decomposition for all chunk sizes
|
||||
- Lifting scheme implementation for efficient computation
|
||||
- Optimal frequency discrimination for audio signals
|
||||
|
||||
- **Pre-processing**
|
||||
- First-order IIR pre-emphasis filter (α=0.5) shifts quantisation noise to lower frequencies, where they are less objectionable to listeners
|
||||
- Gamma compression (γ=0.5) for dynamic range compression before quantisation
|
||||
- Mid/Side stereo transformation exploits stereo correlation
|
||||
- Lambda companding (λ=6.0) with Laplacian CDF mapping for full bit utilisation
|
||||
|
||||
- **Perceptual Quantisation**
|
||||
- Channel-specific (Mid/Side) frequency-dependent weights
|
||||
- Subband-aware quantisation preserves perceptually important frequencies
|
||||
|
||||
- **EZBC Encoding**
|
||||
- Binary tree embedded zero block coding
|
||||
- Exploits coefficient sparsity (86.9% Mid, 97.8% Side typical)
|
||||
- Progressive refinement structure
|
||||
- Spatial clustering of non-zero coefficients
|
||||
|
||||
- **Entropy Coding**
|
||||
- Zstandard compression (level 7) on concatenated EZBC bitstreams
|
||||
- Cross-channel compression optimisation
|
||||
- Optional Zstd bypass for debugging
|
||||
|
||||
### Audio Format
|
||||
|
||||
- **Sample Rate**: 32 KHz (TSVM audio hardware native format)
|
||||
- **Channels**: Stereo (L/R input, Mid/Side internal representation)
|
||||
- **Chunk Sizes**: Variable, any size ≥1024 samples (including non-power-of-2)
|
||||
- **Bit Depth**: 32-bit float internal, 8-bit unsigned PCM output with noise-shaped dithering
|
||||
- **Bandwidth**: Full 0-16 KHz frequency range preserved
|
||||
|
||||
### Quality Levels
|
||||
|
||||
Six quality levels (0-5) provide a wide range of compression/quality trade-offs:
|
||||
- **Level 0**: Lowest quality, smallest file size
|
||||
- **Level 3**: Default, balanced quality/compression (2.51:1 vs PCMu8)
|
||||
- **Level 5**: Highest quality, largest file size
|
||||
|
||||
Quality levels are designed to be synchronised with TAV video codec for unified encoding.
|
||||
|
||||
## Building
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- C compiler (GCC/Clang)
|
||||
- Zstandard library (libzstd)
|
||||
- Math library (libm)
|
||||
|
||||
### Compilation
|
||||
|
||||
```bash
|
||||
# Build TAD encoder/decoder
|
||||
make tad
|
||||
|
||||
# Build all tools
|
||||
make all
|
||||
|
||||
# Clean build artifacts
|
||||
make clean
|
||||
```
|
||||
|
||||
### Build Targets
|
||||
|
||||
- `encoder_tad` - Standalone audio encoder with FFmpeg calls
|
||||
- `decoder_tad` - Standalone audio decoder
|
||||
|
||||
## Usage
|
||||
|
||||
### Basic Encoding
|
||||
|
||||
Encoding requires FFmpeg executable installed in your system.
|
||||
|
||||
```bash
|
||||
# Default encoding (quality level 3)
|
||||
./encoder_tad -i input.mp3 -o output.tad
|
||||
|
||||
# Specify quality level (0-5)
|
||||
./encoder_tad -i input.m4a -o output.tad -q 0 # Lowest quality
|
||||
./encoder_tad -i input.ogg -o output.tad -q 5 # Highest quality
|
||||
|
||||
# Disable Zstd compression (for debugging)
|
||||
./encoder_tad -i input.opus -o output.tad --no-zstd
|
||||
|
||||
# Verbose output with statistics
|
||||
./encoder_tad -i input.flac -o output.tad -v
|
||||
```
|
||||
|
||||
### Decoding
|
||||
|
||||
```bash
|
||||
# Decode to PCMu8
|
||||
./decoder_tad -i input.tad -o output.pcm --raw-pcm
|
||||
|
||||
# Decode to WAV
|
||||
./decoder_tad -i input.tad -o output.wav
|
||||
```
|
||||
|
||||
### Input Formats
|
||||
|
||||
TAD encoder accepts any audio format supported by FFmpeg:
|
||||
- Audio files: WAV, MP3, FLAC, OGG, AAC, etc.
|
||||
- Video files with audio streams: MP4, MKV, AVI, etc.
|
||||
- Raw PCM formats
|
||||
|
||||
Audio is automatically resampled to 32 KHz stereo if necessary.
|
||||
|
||||
## Technical Architecture
|
||||
|
||||
### Encoder Pipeline
|
||||
|
||||
1. **Input Processing**
|
||||
- FFmpeg demuxing and audio stream extraction
|
||||
- Resampling to 32 KHz stereo
|
||||
- Conversion to PCM32f
|
||||
|
||||
2. **Pre-emphasis Filter**
|
||||
- First-order IIR filter with α=0.5
|
||||
- Shifts quantisation noise toward lower frequencies
|
||||
- Improves perceptual quality
|
||||
|
||||
3. **Gamma Compression**
|
||||
- Dynamic range compression with γ=0.5
|
||||
- Applied independently to each sample
|
||||
- Reduces quantisation error for low-amplitude signals
|
||||
|
||||
4. **Stereo Decorrelation**
|
||||
- Left/Right to Mid/Side transformation
|
||||
- Mid = (L + R) / 2
|
||||
- Side = (L - R) / 2
|
||||
- Exploits stereo correlation for better compression
|
||||
|
||||
5. **9-Level CDF 9/7 DWT**
|
||||
- Fixed 9 decomposition levels for all chunk sizes
|
||||
- Forward lifting scheme implementation
|
||||
- Correct length tracking for non-power-of-2 sizes
|
||||
|
||||
6. **Perceptual Quantisation**
|
||||
- Channel-specific (Mid/Side) subband weights
|
||||
- Lambda companding with λ=6.0
|
||||
- Laplacian CDF mapping: `sign(x) * floor(λ * log(1 + |x|/λ))`
|
||||
- Quantised to int8 coefficients
|
||||
|
||||
7. **EZBC Encoding**
|
||||
- Binary tree structure per channel
|
||||
- Progressive refinement by bitplanes
|
||||
- Zero block coding exploits sparsity
|
||||
- Independent bitstreams for Mid and Side
|
||||
|
||||
8. **Zstd Compression**
|
||||
- Level 7 compression on concatenated `[Mid_bitstream][Side_bitstream]`
|
||||
- Cross-channel optimisation opportunities
|
||||
- Adaptive compression based on content
|
||||
|
||||
### Decoder Pipeline
|
||||
|
||||
1. **Container Parsing**
|
||||
- TAD packet identification (type 0x24)
|
||||
- Chunk size extraction
|
||||
- Compressed data boundaries
|
||||
|
||||
2. **Zstd Decompression**
|
||||
- Decompress concatenated bitstreams
|
||||
- Split into Mid and Side EZBC streams
|
||||
|
||||
3. **EZBC Decoding**
|
||||
- Binary tree decoder per channel
|
||||
- Reconstruct quantised int8 coefficients
|
||||
- Progressive refinement reconstruction
|
||||
|
||||
4. **Lambda Decompanding**
|
||||
- Inverse Laplacian CDF with channel-specific weights
|
||||
- Reconstruct float32 DWT coefficients
|
||||
- Apply subband-specific perceptual weights
|
||||
|
||||
5. **9-Level Inverse CDF 9/7 DWT**
|
||||
- Inverse lifting scheme implementation
|
||||
- Correct length tracking for non-power-of-2 chunk sizes
|
||||
- Pre-calculated length sequence from forward transform
|
||||
|
||||
6. **Mid/Side to Left/Right**
|
||||
- L = Mid + Side
|
||||
- R = Mid - Side
|
||||
- Reconstruct stereo channels
|
||||
|
||||
7. **Gamma Expansion**
|
||||
- Inverse gamma with γ⁻¹=2.0
|
||||
- Restore original dynamic range
|
||||
|
||||
8. **De-emphasis Filter**
|
||||
- Reverse pre-emphasis with α=0.5
|
||||
- Remove frequency shaping
|
||||
- Restore flat frequency response
|
||||
|
||||
9. **PCM32f to PCM8u Conversion**
|
||||
- Noise-shaped dithering for 8-bit output
|
||||
- Clamping to valid range
|
||||
- Final output format
|
||||
|
||||
### Wavelet Implementation
|
||||
|
||||
CDF 9/7 wavelet follows a **two-stage lifting scheme**:
|
||||
|
||||
```c
|
||||
// Forward Transform: Predict → Update
|
||||
// Predict step (generate high-pass)
|
||||
temp[half + i] = data[odd] - α * (data[even_left] + data[even_right]);
|
||||
|
||||
// Update step (generate low-pass)
|
||||
temp[i] = data[even] + β * (temp[half + i - 1] + temp[half + i]);
|
||||
|
||||
// Normalization (K factor)
|
||||
temp[i] *= K;
|
||||
temp[half + i] /= K;
|
||||
|
||||
// Inverse Transform: Denormalize → Undo Update → Undo Predict (reversed order)
|
||||
temp[i] /= K;
|
||||
temp[half + i] *= K;
|
||||
|
||||
temp[i] -= β * (temp[half + i - 1] + temp[half + i]);
|
||||
data[odd] = temp[half + i] + α * (temp[i] + temp[i + 1]);
|
||||
data[even] = temp[i];
|
||||
```
|
||||
|
||||
**CDF 9/7 Coefficients**:
|
||||
- α = -1.586134342
|
||||
- β = -0.052980118
|
||||
- γ = +0.882911075
|
||||
- δ = +0.443506852
|
||||
- K = 1.230174105
|
||||
|
||||
### Non-Power-of-2 Chunk Size Handling
|
||||
|
||||
Critical implementation detail for variable chunk sizes:
|
||||
|
||||
```c
|
||||
// Pre-calculate exact length sequence from forward transform
|
||||
int lengths[MAX_LEVELS + 1];
|
||||
lengths[0] = chunk_size;
|
||||
for (int i = 1; i <= levels; i++) {
|
||||
lengths[i] = (lengths[i - 1] + 1) / 2;
|
||||
}
|
||||
|
||||
// Apply inverse DWT using lengths[level] for each level
|
||||
// NEVER use simple doubling (length *= 2) - incorrect for non-power-of-2!
|
||||
```
|
||||
|
||||
Incorrect length tracking causes mirrored subband artefacts in decoded audio.
|
||||
|
||||
### Perceptual Quantisation Weights
|
||||
|
||||
Channel-specific weights for Mid (channel 0) and Side (channel 1):
|
||||
|
||||
```c
|
||||
// Base quantiser weights per subband (9 levels + approximation)
|
||||
float BASE_QUANTISER_WEIGHTS[2][10] = {
|
||||
// Mid channel (0)
|
||||
{4.0f, 2.0f, 1.8f, 1.6f, 1.4f, 1.2f, 1.0f, 1.0f, 1.3f, 2.0f},
|
||||
|
||||
// Side channel (1)
|
||||
{6.0f, 5.0f, 2.6f, 2.4f, 1.8f, 1.3f, 1.0f, 1.0f, 1.6f, 3.2f}
|
||||
};
|
||||
|
||||
// During dequantisation:
|
||||
float weight = BASE_QUANTISER_WEIGHTS[channel][subband] * quantiser_scale;
|
||||
coeffs[i] = normalised_val * TAD32_COEFF_SCALARS[subband] * weight;
|
||||
```
|
||||
|
||||
Different weights for Mid and Side channels reflect perceptual importance of frequency bands in each channel. DC frequency has highest weight (4.0 Mid, 6.0 Side) due to energy concentration.
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Compression Efficiency
|
||||
|
||||
- **Target Compression**: 2:1 against PCMu8 baseline (4:1 against PCM16LE input)
|
||||
- **Achieved Compression**: 2.51:1 against PCMu8 at quality level 3
|
||||
- **Audio Quality**: Preserves full 0-16 KHz bandwidth
|
||||
- **Coefficient Sparsity**: 86.9% zeros in Mid channel, 97.8% in Side channel (typical)
|
||||
- **EZBC Benefits**: Exploits sparsity, progressive refinement, spatial clustering
|
||||
|
||||
### Computational Complexity
|
||||
|
||||
- **Encoding**: O(n log n) per chunk for DWT, O(n) for EZBC encoding
|
||||
- **Decoding**: O(n log n) per chunk for inverse DWT, O(n) for EZBC decoding
|
||||
- **Memory**: O(n) working memory for chunk processing
|
||||
|
||||
### Quality Characteristics
|
||||
|
||||
- **Frequency Response**: Flat 0-16 KHz within perceptual limits
|
||||
- **Dynamic Range**: Preserved through gamma compression/expansion
|
||||
- **Stereo Imaging**: Maintained through Mid/Side decorrelation
|
||||
- **Perceptual Quality**: Optimised for human auditory system characteristics
|
||||
|
||||
## Integration with TAV
|
||||
|
||||
TAD is designed as an includable API for TAV video encoder integration:
|
||||
|
||||
- **Variable Chunk Sizes**: Audio chunks can match video GOP boundaries (e.g., 32016 samples for 1-second TAV GOP)
|
||||
- **Unified Quality Levels**: TAD quality 0-5 synchronised with TAV quality 0-5
|
||||
- **Embedded Packets**: TAV embeds TAD-compressed audio using packet type 0x24
|
||||
- **Shared Container**: Single .tav file contains both video and audio streams
|
||||
|
||||
### TAV Integration Example
|
||||
|
||||
```c
|
||||
// TAD handles non-power-of-2 chunk size correctly
|
||||
tad_encode_chunk(audio_buffer, audio_samples_per_gop, output_buffer, &output_size);
|
||||
|
||||
// TAV embeds TAD packet
|
||||
tav_write_packet(TAV_PACKET_AUDIO, output_buffer, output_size);
|
||||
```
|
||||
|
||||
## Format Specification
|
||||
|
||||
For complete packet structure and bitstream format details, refer to `format documentation.txt`.
|
||||
|
||||
### Key Packet Types
|
||||
|
||||
- `0x24`: TAD audio packet (used in standalone .tad files and embedded in .tav files)
|
||||
|
||||
## Related Projects
|
||||
|
||||
- **TAV** (TSVM Advanced Video): Wavelet-based video codec with integrated TAD audio
|
||||
- **TSVM**: Target virtual machine platform for TAD playback
|
||||
|
||||
## Licence
|
||||
|
||||
MIT.
|
||||
260
video_encoder/TAV_README.md
Normal file
260
video_encoder/TAV_README.md
Normal file
@@ -0,0 +1,260 @@
|
||||
# TAV - TSVM Advanced Video Codec
|
||||
|
||||
A perceptually-optimised wavelet-based video codec designed for resource-constrained systems, featuring multiple wavelet types, temporal 3D DWT, and sophisticated compression techniques.
|
||||
|
||||
## Overview
|
||||
|
||||
TAV (TSVM Advanced Video) is a modern video codec built on discrete wavelet transformation (DWT). It combines cutting-edge compression techniques with careful optimisation for resource-constrained systems.
|
||||
|
||||
### Key Advantages
|
||||
|
||||
- **No blocking artefacts**: Large-tile DWT encoding with padding eliminates DCT block boundaries
|
||||
- **Perceptual optimisation**: HVS-aware quantisation preserves visual quality where it matters
|
||||
- **Temporal coherence**: 3D DWT with GOP encoding exploits inter-frame similarity
|
||||
- **Efficient sparse coding**: EZBC encoding exploits coefficient sparsity for 16-18% additional compression
|
||||
- **Hardware-friendly**: Designed for efficient decoding on resource-constrained platforms
|
||||
|
||||
## Features
|
||||
|
||||
### Compression Technology
|
||||
|
||||
- **Wavelet Types**
|
||||
- **5/3 Reversible** (JPEG 2000 standard): Lossless-capable, good for archival
|
||||
- **9/7 Irreversible** (default): Best overall compression, CDF 9/7 variant
|
||||
|
||||
- **Spatial Encoding**
|
||||
- Large-tile encoding with padding, with optional single-tile mode (no blocking artefacts)
|
||||
- 6-level DWT decomposition for deep frequency analysis
|
||||
- Perceptual quantisation with HVS-optimised coefficient scaling
|
||||
- YCoCg-R colour space with anisotropic chroma quantisation
|
||||
|
||||
- **Temporal Encoding** (3D DWT Mode)
|
||||
- Group-of-pictures (GOP) encoding with adaptive size (typically 20 frames)
|
||||
- Unified EZBC encoding across temporal dimension
|
||||
- Adaptive GOP boundaries with scene change detection
|
||||
|
||||
- **EZBC Encoding**
|
||||
- Binary tree embedded zero block coding exploits coefficient sparsity
|
||||
- Progressive refinement structure with bitplane encoding
|
||||
- Concatenated channel layout for cross-channel compression optimisation
|
||||
- Typical sparsity: 86.9% (Y), 97.8% (Co), 99.5% (Cg)
|
||||
- 16-18% compression improvement over naive coefficient encoding
|
||||
|
||||
### Audio Integration
|
||||
|
||||
TAV seamlessly integrates with the TAD (TSVM Advanced Audio) codec for synchronised audio/video encoding:
|
||||
- Variable chunk sizes match video GOP boundaries
|
||||
- Embedded TAD packets (type 0x24) with Zstd compression
|
||||
- Unified container format
|
||||
|
||||
## Building
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- C compiler (GCC/Clang)
|
||||
- Zstandard library
|
||||
- OpenCV 4 library (only used by experimental motion estimation feature)
|
||||
|
||||
### Compilation
|
||||
|
||||
```bash
|
||||
# Build TAV encoder/decoder
|
||||
make tav
|
||||
|
||||
# Build all tools including TAD audio codec
|
||||
make all
|
||||
|
||||
# Clean build artefacts
|
||||
make clean
|
||||
```
|
||||
|
||||
### Build Targets
|
||||
|
||||
- `encoder_tav` - Main video encoder
|
||||
- `decoder_tav` - Standalone video decoder
|
||||
- `tav_inspector` - Packet analysis and debugging tool
|
||||
|
||||
## Usage
|
||||
|
||||
### Basic Encoding
|
||||
|
||||
Encoding requires FFmpeg executable installed in your system.
|
||||
|
||||
```bash
|
||||
# Default encoding (CDF 9/7 wavelet, quality level 3)
|
||||
./encoder_tav -i input.mp4 -o output.tav
|
||||
|
||||
# Quality levels (0-5)
|
||||
./encoder_tav -i input.avi -q 0 -o output.tav # Lowest quality, smallest file
|
||||
./encoder_tav -i input.mkv -q 5 -o output.tav # Highest quality, largest file
|
||||
```
|
||||
|
||||
### Intra-only Encoding
|
||||
|
||||
```bash
|
||||
# Enable Intra-only encoding
|
||||
./encoder_tav -i input.mp4 --intra-only -o output.tav
|
||||
```
|
||||
|
||||
### Decoding and Inspection
|
||||
|
||||
```bash
|
||||
# Decode TAV to raw video
|
||||
./decoder_tav -i input.tav -o output.mkv
|
||||
|
||||
# Inspect packet structure (debugging)
|
||||
./tav_inspector input.tav -v
|
||||
```
|
||||
|
||||
### Frame Limiting
|
||||
|
||||
```bash
|
||||
# Encode only first N frames (useful for testing)
|
||||
./encoder_tav -i input.mp4 -o output.tav --encode-limit 100
|
||||
```
|
||||
|
||||
## Technical Architecture
|
||||
|
||||
### Encoder Pipeline
|
||||
|
||||
1. **Input Processing**
|
||||
- FFmpeg demuxing and frame extraction
|
||||
- RGB to YCoCg-R colour space conversion
|
||||
- Resolution validation and padding
|
||||
|
||||
2. **DWT Transform**
|
||||
- Spatial: 6-level decomposition per frame
|
||||
- Temporal: 1D DWT across GOP frames (3D DWT mode)
|
||||
- Lifting scheme implementation for all wavelets
|
||||
|
||||
3. **Perceptual Quantisation**
|
||||
- HVS-based subband weights
|
||||
- Anisotropic chroma quantisation (YCoCg-R specific)
|
||||
- Quality-dependent quantisation matrices
|
||||
|
||||
4. **EZBC Encoding**
|
||||
- Binary tree embedded zero block coding per channel
|
||||
- Progressive refinement by bitplanes
|
||||
- Concatenated bitstream layout: `[Y_bitstream][Co_bitstream][Cg_bitstream]`
|
||||
- Cross-channel compression optimisation
|
||||
|
||||
5. **Entropy Coding**
|
||||
- Zstandard compression (level 7) on concatenated EZBC bitstreams
|
||||
- Cross-channel compression opportunities
|
||||
- Adaptive compression based on GOP structure
|
||||
|
||||
### Decoder Pipeline
|
||||
|
||||
1. **Container Parsing**
|
||||
- Packet type identification (0x00-0xFF)
|
||||
- Timecode synchronisation
|
||||
- GOP boundary detection
|
||||
|
||||
2. **Entropy Decoding**
|
||||
- Zstd decompression of concatenated bitstreams
|
||||
- EZBC binary tree decoding per channel
|
||||
- Progressive coefficient reconstruction
|
||||
|
||||
3. **Inverse Quantisation**
|
||||
- Perceptual weight application
|
||||
- Subband-specific scaling
|
||||
- Coefficient reconstruction from sparse representation
|
||||
|
||||
4. **Inverse DWT**
|
||||
- Temporal: 1D inverse DWT across frames (3D DWT mode)
|
||||
- Spatial: 6-level inverse wavelet reconstruction
|
||||
|
||||
5. **Output Conversion**
|
||||
- YCoCg-R to RGB colour space
|
||||
- Clamping and dithering
|
||||
- Frame buffering for display
|
||||
|
||||
### Wavelet Implementation
|
||||
|
||||
All wavelets follow a **lifting scheme** pattern with symmetric boundary extension:
|
||||
|
||||
```c
|
||||
// Forward Transform: Predict → Update
|
||||
temp[half + i] = data[odd] - predict(data[even]); // High-pass
|
||||
temp[i] = data[even] + update(temp[half]); // Low-pass
|
||||
|
||||
// Inverse Transform: Undo Update → Undo Predict (reversed order)
|
||||
data[even] = temp[i] - update(temp[half]); // Undo low-pass
|
||||
data[odd] = temp[half + i] + predict(data[even]); // Undo high-pass
|
||||
```
|
||||
|
||||
**Critical**: Forward and inverse transforms must use identical coefficient indexing and exactly reverse operations to avoid grid artefacts.
|
||||
|
||||
### Coefficient Layout
|
||||
|
||||
TAV uses **2D Spatial Layout** in memory for each decomposition level:
|
||||
|
||||
```
|
||||
[LL] [LH] [HL] [HH] [LH] [HL] [HH] ...
|
||||
└── Level 0 ──┘ └─── Level 1 ───┘
|
||||
```
|
||||
|
||||
- `LL`: Low-pass (approximation) - progressively smaller with each level
|
||||
- `LH`, `HL`, `HH`: High-pass subbands (horizontal, vertical, diagonal detail)
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Compression Efficiency
|
||||
|
||||
- **Sparsity Exploitation**: Typical quantised coefficient sparsity
|
||||
- Y channel: 86.9% zeros
|
||||
- Co channel: 97.8% zeros
|
||||
- Cg channel: 99.5% zeros
|
||||
|
||||
- **EZBC Benefits**: 16-18% compression improvement over naive coefficient encoding through sparsity exploitation
|
||||
|
||||
- **Temporal Coherence**: Additional 15-25% improvement with 3D DWT (content-dependent)
|
||||
|
||||
### Computational Complexity
|
||||
|
||||
- **Encoding**: O(n log n) per frame for spatial DWT
|
||||
- **Decoding**: O(n log n) per frame, optimised lifting scheme implementation
|
||||
- **Memory**: Single-tile encoding requires O(w × h) working memory
|
||||
|
||||
### Quality Characteristics
|
||||
|
||||
- **No blocking artefacts**: Wavelet-based encoding is inherently smooth
|
||||
- **Perceptual optimisation**: Better subjective quality than bitrate-equivalent DCT codecs
|
||||
- **Scalability**: 6 quality levels (0-5) provide wide range of bitrate/quality trade-offs
|
||||
- **Temporal stability**: 3D DWT mode reduces flickering and temporal artefacts
|
||||
|
||||
## Format Specification
|
||||
|
||||
For complete packet structure and bitstream format details, refer to `format documentation.txt`.
|
||||
|
||||
### Key Packet Types
|
||||
|
||||
- `0x00`: Metadata and initialisation
|
||||
- `0x01`: I-frame (intra-coded frame)
|
||||
- `0x12`: GOP unified packet (3D DWT mode)
|
||||
- `0x24`: Embedded TAD audio
|
||||
- `0xFC`: GOP synchronisation
|
||||
- `0xFD`: Timecode
|
||||
|
||||
## Debugging Tools
|
||||
|
||||
### TAV Inspector
|
||||
|
||||
Analyse TAV packet structure and decode individual frames:
|
||||
|
||||
```bash
|
||||
# Verbose packet analysis
|
||||
./tav_inspector input.tav -v
|
||||
|
||||
# Extract specific frame ranges
|
||||
./tav_inspector input.tav --frame-range 100-200
|
||||
```
|
||||
|
||||
## Related Projects
|
||||
|
||||
- **TAD** (TSVM Advanced Audio): Perceptual audio codec using CDF 9/7 wavelets
|
||||
- **TSVM**: Target virtual machine platform for TAV playback
|
||||
|
||||
## Licence
|
||||
|
||||
MIT.
|
||||
Reference in New Issue
Block a user