mirror of
https://github.com/curioustorvald/tsvm.git
synced 2026-03-07 19:51:51 +09:00
TAV/TAD doc update
This commit is contained in:
96
CLAUDE.md
96
CLAUDE.md
@@ -83,11 +83,11 @@ Use the build scripts in `buildapp/`:
|
|||||||
- `assets/disk0/`: Virtual disk content including TVDOS system files
|
- `assets/disk0/`: Virtual disk content including TVDOS system files
|
||||||
- `assets/bios/`: BIOS ROM files and implementations
|
- `assets/bios/`: BIOS ROM files and implementations
|
||||||
- `My_BASIC_Programs/`: Example BASIC programs for testing
|
- `My_BASIC_Programs/`: Example BASIC programs for testing
|
||||||
- TVDOS filesystem uses custom format with specialized drivers
|
- TVDOS filesystem uses custom format with specialised drivers
|
||||||
|
|
||||||
## Videotron2K
|
## Videotron2K
|
||||||
|
|
||||||
The Videotron2K is a specialized video display controller with:
|
The Videotron2K is a specialised video display controller with:
|
||||||
- Assembly-like programming language
|
- Assembly-like programming language
|
||||||
- 6 general registers (r1-r6) and special registers (tmr, frm, px, py, c1-c6)
|
- 6 general registers (r1-r6) and special registers (tmr, frm, px, py, c1-c6)
|
||||||
- Scene-based programming model
|
- Scene-based programming model
|
||||||
@@ -148,7 +148,7 @@ Peripheral memories can be accessed using `vm.peek()` and `vm.poke()` functions,
|
|||||||
- **Features**:
|
- **Features**:
|
||||||
- 16×16 DCT blocks (vs 4×4 in iPF) for better compression
|
- 16×16 DCT blocks (vs 4×4 in iPF) for better compression
|
||||||
- Motion compensation with ±8 pixel search range
|
- Motion compensation with ±8 pixel search range
|
||||||
- YCoCg-R 4:2:0 Chroma subsampling (more aggressive quantization on Cg channel)
|
- YCoCg-R 4:2:0 Chroma subsampling (more aggressive quantisation on Cg channel)
|
||||||
- Full 8-Bit RGB colour for increased visual fidelity, rendered down to TSVM-compliant 4-Bit RGB with dithering upon playback
|
- Full 8-Bit RGB colour for increased visual fidelity, rendered down to TSVM-compliant 4-Bit RGB with dithering upon playback
|
||||||
- **Usage Examples**:
|
- **Usage Examples**:
|
||||||
```bash
|
```bash
|
||||||
@@ -163,7 +163,7 @@ Peripheral memories can be accessed using `vm.peek()` and `vm.poke()` functions,
|
|||||||
|
|
||||||
#### TAV Format (TSVM Advanced Video)
|
#### TAV Format (TSVM Advanced Video)
|
||||||
- **Successor to TEV**: DWT-based video codec using wavelet transforms instead of DCT
|
- **Successor to TEV**: DWT-based video codec using wavelet transforms instead of DCT
|
||||||
- **C Encoder**: `video_encoder/encoder_tav.c` - Multi-wavelet encoder with perceptual quantization
|
- **C Encoder**: `video_encoder/encoder_tav.c` - Multi-wavelet encoder with perceptual quantisation
|
||||||
- How to build: `make tav`
|
- How to build: `make tav`
|
||||||
- **Wavelet Support**: Multiple wavelet types for different compression characteristics
|
- **Wavelet Support**: Multiple wavelet types for different compression characteristics
|
||||||
- **JS Decoder**: `assets/disk0/tvdos/bin/playtav.js` - Native decoder for TAV format playback
|
- **JS Decoder**: `assets/disk0/tvdos/bin/playtav.js` - Native decoder for TAV format playback
|
||||||
@@ -172,8 +172,8 @@ Peripheral memories can be accessed using `vm.peek()` and `vm.poke()` functions,
|
|||||||
- **Features**:
|
- **Features**:
|
||||||
- **Multiple Wavelet Types**: 5/3 reversible, 9/7 irreversible, CDF 13/7, DD-4, Haar
|
- **Multiple Wavelet Types**: 5/3 reversible, 9/7 irreversible, CDF 13/7, DD-4, Haar
|
||||||
- **Single-tile encoding**: One large DWT tile for optimal quality (no blocking artifacts)
|
- **Single-tile encoding**: One large DWT tile for optimal quality (no blocking artifacts)
|
||||||
- **Perceptual quantization**: HVS-optimized coefficient scaling
|
- **Perceptual quantisation**: HVS-optimized coefficient scaling
|
||||||
- **YCoCg-R color space**: Efficient chroma representation with "simulated" subsampling using anisotropic quantization (search for "ANISOTROPY_MULT_CHROMA" on the encoder)
|
- **YCoCg-R color space**: Efficient chroma representation with "simulated" subsampling using anisotropic quantisation (search for "ANISOTROPY_MULT_CHROMA" on the encoder)
|
||||||
- **6-level DWT decomposition**: Deep frequency analysis for better compression (deeper levels possible but 6 is the maximum for the default TSVM size)
|
- **6-level DWT decomposition**: Deep frequency analysis for better compression (deeper levels possible but 6 is the maximum for the default TSVM size)
|
||||||
- **Significance Map Compression**: Improved coefficient storage format exploiting sparsity for 16-18% additional compression (2025-09-29 update)
|
- **Significance Map Compression**: Improved coefficient storage format exploiting sparsity for 16-18% additional compression (2025-09-29 update)
|
||||||
- **Concatenated Maps Layout**: Cross-channel compression optimization for additional 1.6% improvement (2025-09-29 enhanced)
|
- **Concatenated Maps Layout**: Cross-channel compression optimization for additional 1.6% improvement (2025-09-29 enhanced)
|
||||||
@@ -225,18 +225,18 @@ Peripheral memories can be accessed using `vm.peek()` and `vm.poke()` functions,
|
|||||||
- **Solution**: Ensure forward and inverse transforms use identical coefficient indexing and reverse operations exactly
|
- **Solution**: Ensure forward and inverse transforms use identical coefficient indexing and reverse operations exactly
|
||||||
|
|
||||||
**Supported Wavelets**:
|
**Supported Wavelets**:
|
||||||
- **0**: 5/3 reversible (lossless when unquantized, JPEG 2000 standard)
|
- **0**: 5/3 reversible (lossless when unquantised, JPEG 2000 standard)
|
||||||
- **1**: 9/7 irreversible (best compression, CDF 9/7 variant, default choice)
|
- **1**: 9/7 irreversible (best compression, CDF 9/7 variant, default choice)
|
||||||
- **2**: CDF 13/7 (experimental, simplified implementation)
|
- **2**: CDF 13/7 (experimental, simplified implementation)
|
||||||
- **16**: DD-4 (four-point interpolating Deslauriers-Dubuc, for still images)
|
- **16**: DD-4 (four-point interpolating Deslauriers-Dubuc, for still images)
|
||||||
- **255**: Haar (demonstration only, simplest possible wavelet)
|
- **255**: Haar (demonstration only, simplest possible wavelet)
|
||||||
|
|
||||||
- **Format documentation**: `terranmon.txt` (search for "TSVM Advanced Video (TAV) Format")
|
- **Format documentation**: `terranmon.txt` (search for "TSVM Advanced Video (TAV) Format")
|
||||||
- **Version**: Current (perceptual quantization, multi-wavelet support, significance map compression)
|
- **Version**: Current (perceptual quantisation, multi-wavelet support, significance map compression)
|
||||||
|
|
||||||
#### TAV Significance Map Compression (Technical Details)
|
#### TAV Significance Map Compression (Technical Details)
|
||||||
|
|
||||||
The significance map compression technique implemented on 2025-09-29 provides substantial compression improvements by exploiting the sparsity of quantized DWT coefficients:
|
The significance map compression technique implemented on 2025-09-29 provides substantial compression improvements by exploiting the sparsity of quantised DWT coefficients:
|
||||||
|
|
||||||
**Implementation Files**:
|
**Implementation Files**:
|
||||||
- **C Encoder**: `video_encoder/encoder_tav.c` - `preprocess_coefficients()` function (lines 960-991)
|
- **C Encoder**: `video_encoder/encoder_tav.c` - `preprocess_coefficients()` function (lines 960-991)
|
||||||
@@ -264,7 +264,7 @@ Concatenated Maps Layout:
|
|||||||
```
|
```
|
||||||
|
|
||||||
**Performance**:
|
**Performance**:
|
||||||
- **Sparsity exploitation**: Tested on quantized DWT coefficients with 86.9% sparsity (Y), 97.8% (Co), 99.5% (Cg)
|
- **Sparsity exploitation**: Tested on quantised DWT coefficients with 86.9% sparsity (Y), 97.8% (Co), 99.5% (Cg)
|
||||||
- **Compression improvement**: 16.4% from significance maps + 1.6% from concatenated layout
|
- **Compression improvement**: 16.4% from significance maps + 1.6% from concatenated layout
|
||||||
- **Real-world impact**: 559 bytes saved per frame (5.59 MB per 10k frames)
|
- **Real-world impact**: 559 bytes saved per frame (5.59 MB per 10k frames)
|
||||||
- **Cross-channel benefit**: Concatenated maps allow Zstd to exploit similarity between significance patterns
|
- **Cross-channel benefit**: Concatenated maps allow Zstd to exploit similarity between significance patterns
|
||||||
@@ -320,18 +320,23 @@ Implemented on 2025-10-15 for improved temporal compression through group-of-pic
|
|||||||
- **C Encoder**: `video_encoder/encoder_tad.c` - Core Encoder library; `video_encoder/encoder_tad_standalone.c` - Standalone encoder with FFmpeg integration
|
- **C Encoder**: `video_encoder/encoder_tad.c` - Core Encoder library; `video_encoder/encoder_tad_standalone.c` - Standalone encoder with FFmpeg integration
|
||||||
- How to build: `make tad`
|
- How to build: `make tad`
|
||||||
- **Quality Levels**: 0-5 (0=lowest quality/smallest, 5=highest quality/largest; designed to be in sync with TAV encoder)
|
- **Quality Levels**: 0-5 (0=lowest quality/smallest, 5=highest quality/largest; designed to be in sync with TAV encoder)
|
||||||
- **C Decoder**: `video_encoder/decoder_tad.c` - Standalone decoder for TAD format
|
- **C Decoders**:
|
||||||
|
- `video_encoder/decoder_tad.c` - Shared decoder library with `tad32_decode_chunk()` function
|
||||||
|
- `video_encoder/decoder_tad.h` - Exports shared decoder API
|
||||||
|
- `video_encoder/decoder_tav.c` - TAV decoder that uses shared TAD decoder for audio packets
|
||||||
|
- **Shared Architecture** (Fixed 2025-11-10): Both standalone TAD and TAV decoders now use the same `tad32_decode_chunk()` implementation, eliminating code duplication and ensuring identical output
|
||||||
- **Kotlin Decoder**: `AudioAdapter.kt` - Hardware-accelerated TAD decoder for TSVM runtime
|
- **Kotlin Decoder**: `AudioAdapter.kt` - Hardware-accelerated TAD decoder for TSVM runtime
|
||||||
|
- **Quantisation Fix** (2025-11-10): Fixed BASE_QUANTISER_WEIGHTS to use channel-specific 2D array (Mid/Side) instead of single 1D array, resolving severe audio distortion
|
||||||
- **Features**:
|
- **Features**:
|
||||||
- **32 KHz stereo**: TSVM audio hardware native format
|
- **32 KHz stereo**: TSVM audio hardware native format
|
||||||
- **Variable chunk sizes**: Any size ≥1024 samples, including non-power-of-2 (e.g., 32016 for TAV 1-second GOPs)
|
- **Variable chunk sizes**: Any size ≥1024 samples, including non-power-of-2 (e.g., 32016 for TAV 1-second GOPs)
|
||||||
|
- **Pre-emphasis filter**: First-order IIR filter (α=0.5) shifts quantisation noise to lower frequencies
|
||||||
|
- **Gamma compression**: Dynamic range compression (γ=0.5) before quantisation
|
||||||
- **M/S stereo decorrelation**: Exploits stereo correlation for better compression
|
- **M/S stereo decorrelation**: Exploits stereo correlation for better compression
|
||||||
- **Gamma compression**: Dynamic range compression (γ=0.707) before quantization
|
|
||||||
- **9-level CDF 9/7 DWT**: Fixed 9 decomposition levels for all chunk sizes
|
- **9-level CDF 9/7 DWT**: Fixed 9 decomposition levels for all chunk sizes
|
||||||
- **Perceptual quantization**: Frequency-dependent weights with lambda companding
|
- **Perceptual quantisation**: Channel-specific (Mid/Side) frequency-dependent weights with lambda companding (λ=6.0)
|
||||||
- **Raw int8 storage**: Direct coefficient storage (no significance map, better Zstd compression)
|
- **EZBC encoding**: Binary tree embedded zero block coding exploits coefficient sparsity (86.9% Mid, 97.8% Side)
|
||||||
- **Coefficient-domain dithering**: Light TPDF dithering to reduce banding
|
- **Zstd compression**: Level 7 on concatenated EZBC bitstreams for additional compression
|
||||||
- **Zstd compression**: Level 7 for additional compression
|
|
||||||
- **Non-power-of-2 support**: Fixed 2025-10-30 to handle arbitrary chunk sizes correctly
|
- **Non-power-of-2 support**: Fixed 2025-10-30 to handle arbitrary chunk sizes correctly
|
||||||
- **Usage Examples**:
|
- **Usage Examples**:
|
||||||
```bash
|
```bash
|
||||||
@@ -351,26 +356,23 @@ Implemented on 2025-10-15 for improved temporal compression through group-of-pic
|
|||||||
decoder_tad -i input.tad -o output.pcm
|
decoder_tad -i input.tad -o output.pcm
|
||||||
```
|
```
|
||||||
- **Format documentation**: `terranmon.txt` (search for "TSVM Advanced Audio (TAD) Format")
|
- **Format documentation**: `terranmon.txt` (search for "TSVM Advanced Audio (TAD) Format")
|
||||||
- **Version**: 1.1 (raw int8 storage with non-power-of-2 support, updated 2025-10-30)
|
- **Version**: 1.1 (EZBC encoding with non-power-of-2 support, updated 2025-10-30; decoder architecture and Kotlin quantisation weights fixed 2025-11-10; documentation updated 2025-11-10 to reflect pre-emphasis and EZBC)
|
||||||
|
|
||||||
|
**TAD Encoding Pipeline**:
|
||||||
|
1. **Pre-emphasis filter** (α=0.5) - Shifts quantisation noise toward lower frequencies
|
||||||
|
2. **Gamma compression** (γ=0.5) - Dynamic range compression
|
||||||
|
3. **M/S decorrelation** - Transforms L/R to Mid/Side
|
||||||
|
4. **9-level CDF 9/7 DWT** - Wavelet decomposition (fixed 9 levels)
|
||||||
|
5. **Perceptual quantisation** - Lambda companding (λ=6.0) with channel-specific weights
|
||||||
|
6. **EZBC encoding** - Binary tree embedded zero block coding per channel
|
||||||
|
7. **Zstd compression** (level 7) - Additional compression on concatenated EZBC bitstreams
|
||||||
|
|
||||||
**TAD Compression Performance**:
|
**TAD Compression Performance**:
|
||||||
- **Target Compression**: 2:1 against PCMu8 baseline (4:1 against PCM16LE input)
|
- **Target Compression**: 2:1 against PCMu8 baseline (4:1 against PCM16LE input)
|
||||||
- **Achieved Compression**: 2.51:1 against PCMu8 at quality level 3
|
- **Achieved Compression**: 2.51:1 against PCMu8 at quality level 3
|
||||||
- **Audio Quality**: Preserves full 0-16 KHz bandwidth
|
- **Audio Quality**: Preserves full 0-16 KHz bandwidth
|
||||||
- **Coefficient Sparsity**: 86.9% zeros in Mid channel, 97.8% in Side channel (typical)
|
- **Coefficient Sparsity**: 86.9% zeros in Mid channel, 97.8% in Side channel (typical)
|
||||||
|
- **EZBC Benefits**: Exploits sparsity, progressive refinement, spatial clustering
|
||||||
**TAD Encoding Pipeline**:
|
|
||||||
1. **FFmpeg Two-Pass Extraction**: High-quality SoXR resampling to 32 KHz with 16 Hz highpass filter
|
|
||||||
2. **Gamma Compression**: Dynamic range compression (γ=0.707) for perceptual uniformity
|
|
||||||
3. **M/S Stereo Decorrelation**: Transforms Left/Right to Mid/Side for better compression
|
|
||||||
4. **9-Level CDF 9/7 DWT**: biorthogonal wavelets with fixed 9 levels
|
|
||||||
- All chunk sizes use 9 levels (sufficient for ≥512 samples after 9 halvings)
|
|
||||||
- Supports non-power-of-2 sizes through proper length tracking
|
|
||||||
5. **Frequency-Dependent Quantization**: Perceptual weights with lambda companding
|
|
||||||
6. **Dead Zone Quantization**: Zeros high-frequency noise (highest band)
|
|
||||||
7. **Coefficient-Domain Dithering**: Light TPDF dithering (±0.5 quantization steps)
|
|
||||||
8. **Raw Int8 Storage**: Direct coefficient storage as signed int8 values
|
|
||||||
9. **Optional Zstd Compression**: Level 7 compression on concatenated Mid+Side data
|
|
||||||
|
|
||||||
**TAD Integration with TAV**:
|
**TAD Integration with TAV**:
|
||||||
TAD is designed as an includable API for TAV video encoder integration. The variable chunk size
|
TAD is designed as an includable API for TAV video encoder integration. The variable chunk size
|
||||||
@@ -396,3 +398,37 @@ for (i in 1..levels) {
|
|||||||
```
|
```
|
||||||
Using simple doubling (`length *= 2`) is incorrect for non-power-of-2 sizes and causes
|
Using simple doubling (`length *= 2`) is incorrect for non-power-of-2 sizes and causes
|
||||||
mirrored subband artifacts.
|
mirrored subband artifacts.
|
||||||
|
|
||||||
|
**TAD Decoding Pipeline**:
|
||||||
|
1. **Zstd decompression** - Decompress concatenated EZBC bitstreams
|
||||||
|
2. **EZBC decoding** - Binary tree decoder reconstructs quantised int8 coefficients per channel
|
||||||
|
3. **Lambda decompanding** - Inverse Laplacian CDF mapping with channel-specific weights
|
||||||
|
4. **9-level inverse CDF 9/7 DWT** - Wavelet reconstruction with proper non-power-of-2 length tracking
|
||||||
|
5. **M/S to L/R conversion** - Transform Mid/Side back to Left/Right
|
||||||
|
6. **Gamma expansion** (γ⁻¹=2.0) - Restore dynamic range
|
||||||
|
7. **De-emphasis filter** (α=0.5) - Reverse pre-emphasis, remove frequency shaping
|
||||||
|
8. **PCM32f to PCM8** - Noise-shaped dithering for final 8-bit output
|
||||||
|
|
||||||
|
**Critical Quantisation Weights Note (Fixed 2025-11-10)**:
|
||||||
|
The TAD decoder MUST use channel-specific quantisation weights for Mid (channel 0) and Side (channel 1) channels. The Kotlin decoder (AudioAdapter.kt) originally used a single 1D weight array, which caused severe audio distortion. The correct implementation uses a 2D array:
|
||||||
|
|
||||||
|
```kotlin
|
||||||
|
// CORRECT (Fixed 2025-11-10)
|
||||||
|
private val BASE_QUANTISER_WEIGHTS = arrayOf(
|
||||||
|
floatArrayOf( // Mid channel (0)
|
||||||
|
4.0f, 2.0f, 1.8f, 1.6f, 1.4f, 1.2f, 1.0f, 1.0f, 1.3f, 2.0f
|
||||||
|
),
|
||||||
|
floatArrayOf( // Side channel (1)
|
||||||
|
6.0f, 5.0f, 2.6f, 2.4f, 1.8f, 1.3f, 1.0f, 1.0f, 1.6f, 3.2f
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
// During dequantisation:
|
||||||
|
val weight = BASE_QUANTISER_WEIGHTS[channel][sideband] * quantiserScale
|
||||||
|
coeffs[i] = normalisedVal * TAD32_COEFF_SCALARS[sideband] * weight
|
||||||
|
```
|
||||||
|
|
||||||
|
The different weights for Mid and Side channels reflect the perceptual importance of different frequency bands in each channel. Using incorrect weights causes:
|
||||||
|
- DC frequency underamplification (using 1.0 instead of 4.0/6.0)
|
||||||
|
- Incorrect stereo imaging and extreme side channel distortion
|
||||||
|
- Severe frequency response errors that manifest as "clipping-like" distortion
|
||||||
|
|||||||
141
terranmon.txt
141
terranmon.txt
@@ -866,8 +866,8 @@ When KSF is interleaved with MP2 audio, the payload must be inserted in-between
|
|||||||
0x30 = reveal text normally (arguments: UTF-8 text. The reveal text must contain spaces when required)
|
0x30 = reveal text normally (arguments: UTF-8 text. The reveal text must contain spaces when required)
|
||||||
0x31 = reveal text slowly (arguments: UTF-8 text. The effect is implementation-dependent)
|
0x31 = reveal text slowly (arguments: UTF-8 text. The effect is implementation-dependent)
|
||||||
|
|
||||||
0x40 = reveal text normally with emphasize (arguments: UTF-8 text. On TEV/TAV player, the text will be white; otherwise, implementation-dependent)
|
0x40 = reveal text normally with emphasise (arguments: UTF-8 text. On TEV/TAV player, the text will be white; otherwise, implementation-dependent)
|
||||||
0x41 = reveal text slowly with emphasize (arguments: UTF-8 text)
|
0x41 = reveal text slowly with emphasise (arguments: UTF-8 text)
|
||||||
|
|
||||||
0x50 = reveal text normally with target colour (arguments: uint8 target colour; UTF-8 text)
|
0x50 = reveal text normally with target colour (arguments: uint8 target colour; UTF-8 text)
|
||||||
0x51 = reveal text slowly with target colour (arguments: uint8 target colour; UTF-8 text)
|
0x51 = reveal text slowly with target colour (arguments: uint8 target colour; UTF-8 text)
|
||||||
@@ -887,7 +887,7 @@ When KSF is interleaved with MP2 audio, the payload must be inserted in-between
|
|||||||
TSVM Advanced Video (TAV) Format
|
TSVM Advanced Video (TAV) Format
|
||||||
Created by CuriousTorvald and Claude on 2025-09-13
|
Created by CuriousTorvald and Claude on 2025-09-13
|
||||||
|
|
||||||
TAV is a next-generation video codec for TSVM utilizing Discrete Wavelet Transform (DWT)
|
TAV is a next-generation video codec for TSVM utilising Discrete Wavelet Transform (DWT)
|
||||||
similar to JPEG2000, providing superior compression efficiency and scalability compared
|
similar to JPEG2000, providing superior compression efficiency and scalability compared
|
||||||
to DCT-based codecs like TEV. Features include multi-resolution encoding, progressive
|
to DCT-based codecs like TEV. Features include multi-resolution encoding, progressive
|
||||||
transmission capability, and region-of-interest coding.
|
transmission capability, and region-of-interest coding.
|
||||||
@@ -1134,7 +1134,7 @@ resulting in superior compression compared to per-frame encoding.
|
|||||||
2. Determine GOP slicing from the scene detection
|
2. Determine GOP slicing from the scene detection
|
||||||
3. Apply 1D DWT across temporal axis (GOP frames)
|
3. Apply 1D DWT across temporal axis (GOP frames)
|
||||||
4. Apply 2D DWT on each spatial slice of temporal subbands
|
4. Apply 2D DWT on each spatial slice of temporal subbands
|
||||||
5. Perceptual quantization with temporal-spatial awareness
|
5. Perceptual quantisation with temporal-spatial awareness
|
||||||
6. Unified significance map preprocessing across all frames/channels
|
6. Unified significance map preprocessing across all frames/channels
|
||||||
7. Single Zstd compression of entire GOP block
|
7. Single Zstd compression of entire GOP block
|
||||||
|
|
||||||
@@ -1246,7 +1246,7 @@ The encoder expects linear alpha.
|
|||||||
## Compression Features
|
## Compression Features
|
||||||
- Single DWT tiles vs 16x16 DCT blocks in TEV
|
- Single DWT tiles vs 16x16 DCT blocks in TEV
|
||||||
- Multi-resolution representation enables scalable decoding
|
- Multi-resolution representation enables scalable decoding
|
||||||
- Better frequency localization than DCT
|
- Better frequency localisation than DCT
|
||||||
- Reduced blocking artifacts due to overlapping basis functions
|
- Reduced blocking artifacts due to overlapping basis functions
|
||||||
|
|
||||||
## Hardware Acceleration Functions
|
## Hardware Acceleration Functions
|
||||||
@@ -1533,9 +1533,9 @@ TSVM Advanced Audio (TAD) Format
|
|||||||
Created by CuriousTorvald and Claude on 2025-10-23
|
Created by CuriousTorvald and Claude on 2025-10-23
|
||||||
Updated: 2025-10-30 (fixed non-power-of-2 sample count support)
|
Updated: 2025-10-30 (fixed non-power-of-2 sample count support)
|
||||||
|
|
||||||
TAD is a perceptual audio codec for TSVM utilizing Discrete Wavelet Transform (DWT)
|
TAD is a perceptual audio codec for TSVM utilising Discrete Wavelet Transform (DWT)
|
||||||
with CDF 9/7 biorthogonal wavelets, providing efficient compression through M/S stereo
|
with CDF 9/7 biorthogonal wavelets, providing efficient compression through M/S stereo
|
||||||
decorrelation, frequency-dependent quantization, and raw int8 coefficient storage.
|
decorrelation, frequency-dependent quantisation, and raw int8 coefficient storage.
|
||||||
Designed as an includable API for integration with TAV video encoder.
|
Designed as an includable API for integration with TAV video encoder.
|
||||||
|
|
||||||
When used inside of a video codec, only zstd-compressed payload is stored, chunk length
|
When used inside of a video codec, only zstd-compressed payload is stored, chunk length
|
||||||
@@ -1584,20 +1584,34 @@ TAV integration uses exact GOP sample counts (e.g., 32016 samples for 1 second a
|
|||||||
uint32 Chunk Payload Size: size of following payload in bytes
|
uint32 Chunk Payload Size: size of following payload in bytes
|
||||||
* Chunk Payload: encoded M/S stereo data (Zstd compressed if flag set)
|
* Chunk Payload: encoded M/S stereo data (Zstd compressed if flag set)
|
||||||
|
|
||||||
### Chunk Payload Structure (before optional Zstd compression)
|
### Chunk Payload Structure (before Zstd compression)
|
||||||
* Mid Channel Encoded Data (raw int8 values)
|
* Mid Channel EZBC Data (embedded zero block coded bitstream)
|
||||||
* Side Channel Encoded Data (raw int8 values)
|
* Side Channel EZBC Data (embedded zero block coded bitstream)
|
||||||
|
|
||||||
|
Each EZBC channel structure:
|
||||||
|
uint8 MSB Bitplane: highest bitplane with significant coefficient
|
||||||
|
uint16 Coefficient Count: number of coefficients in this channel
|
||||||
|
* Binary Tree EZBC Bitstream: significance map + refinement bits
|
||||||
|
|
||||||
## Encoding Pipeline
|
## Encoding Pipeline
|
||||||
|
|
||||||
### Step 1: Dynamic Range Compression (Gamma Compression)
|
### Step 1: Pre-emphasis Filter
|
||||||
Input stereo PCM32fLE undergoes gamma compression for perceptual uniformity:
|
Input stereo PCM32fLE undergoes first-order IIR pre-emphasis filtering (α=0.5):
|
||||||
|
|
||||||
encode(x) = sign(x) * |x|^γ where γ=0.707 (1/√2)
|
H(z) = 1 - α·z⁻¹
|
||||||
|
|
||||||
This compresses dynamic range before quantization, improving perceptual quality.
|
This shifts quantisation noise toward lower frequencies where it's more maskable by
|
||||||
|
the psychoacoustic model. The filter has persistent state across chunks to prevent
|
||||||
|
discontinuities at chunk boundaries.
|
||||||
|
|
||||||
### Step 2: M/S Stereo Decorrelation
|
### Step 2: Dynamic Range Compression (Gamma Compression)
|
||||||
|
Pre-emphasised audio undergoes gamma compression for perceptual uniformity:
|
||||||
|
|
||||||
|
encode(x) = sign(x) * |x|^γ where γ=0.5
|
||||||
|
|
||||||
|
This compresses dynamic range before quantisation, improving perceptual quality.
|
||||||
|
|
||||||
|
### Step 3: M/S Stereo Decorrelation
|
||||||
Mid-Side transformation exploits stereo correlation:
|
Mid-Side transformation exploits stereo correlation:
|
||||||
|
|
||||||
Mid = (Left + Right) / 2
|
Mid = (Left + Right) / 2
|
||||||
@@ -1606,7 +1620,7 @@ Mid-Side transformation exploits stereo correlation:
|
|||||||
This typically concentrates energy in the Mid channel while the Side channel
|
This typically concentrates energy in the Mid channel while the Side channel
|
||||||
contains mostly small values, improving compression efficiency.
|
contains mostly small values, improving compression efficiency.
|
||||||
|
|
||||||
### Step 3: 9-Level CDF 9/7 DWT
|
### Step 4: 9-Level CDF 9/7 DWT
|
||||||
Each channel (Mid and Side) undergoes CDF 9/7 biorthogonal wavelet decomposition. The codec uses a fixed 9 decomposition levels for all chunk sizes:
|
Each channel (Mid and Side) undergoes CDF 9/7 biorthogonal wavelet decomposition. The codec uses a fixed 9 decomposition levels for all chunk sizes:
|
||||||
|
|
||||||
DWT Levels = 9 (fixed)
|
DWT Levels = 9 (fixed)
|
||||||
@@ -1632,32 +1646,53 @@ CDF 9/7 lifting coefficients:
|
|||||||
δ = 0.443506852
|
δ = 0.443506852
|
||||||
K = 1.230174105
|
K = 1.230174105
|
||||||
|
|
||||||
### Step 4: Frequency-Dependent Quantization
|
### Step 5: Frequency-Dependent Quantisation with Lambda Companding
|
||||||
DWT coefficients are quantized using perceptually-tuned frequency-dependent weights.
|
DWT coefficients are quantized using:
|
||||||
|
1. **Lambda companding**: Maps normalised coefficients through Laplacian CDF with λ=6.0
|
||||||
|
2. **Perceptually-tuned weights**: Channel-specific (Mid/Side) frequency-dependent scaling
|
||||||
|
3. **Final quantisation**: base_weight[channel][subband] * quality_scale
|
||||||
|
|
||||||
Final quantization step: base_weight * quality_scale
|
The lambda companding provides perceptually uniform quantisation, allocating more bits
|
||||||
|
to perceptually important coefficient magnitudes.
|
||||||
|
|
||||||
#### Dead Zone Quantization
|
Channel-specific base quantisation weights:
|
||||||
High-frequency coefficients (Level 0: 8-16 KHz) use dead zone quantization
|
Mid (0): [4.0, 2.0, 1.8, 1.6, 1.4, 1.2, 1.0, 1.0, 1.3, 2.0]
|
||||||
where coefficients smaller than half the quantization step are zeroed:
|
Side (1): [6.0, 5.0, 2.6, 2.4, 1.8, 1.3, 1.0, 1.0, 1.6, 3.2]
|
||||||
|
|
||||||
if (abs(coefficient) < quantization_step / 2)
|
Output: Quantized int8 coefficients in range [-max_index, +max_index]
|
||||||
coefficient = 0
|
|
||||||
|
|
||||||
This aggressively removes high-frequency noise while preserving important
|
### Step 6: EZBC Encoding (Embedded Zero Block Coding)
|
||||||
mid-frequency content (2-4 KHz critical for speech intelligibility).
|
Quantized int8 coefficients are compressed using binary tree EZBC, a 1D variant of
|
||||||
|
the embedded zero-block coding.
|
||||||
|
|
||||||
### Step 5: Raw Int8 Coefficient Storage
|
**EZBC Algorithm**:
|
||||||
Quantized coefficients are stored directly as signed int8 values (no significance map, better Zstd compression).
|
1. Find MSB bitplane (highest bit position with significant coefficient)
|
||||||
Concatenated format: [Mid_channel_data][Side_channel_data]
|
2. Initialise root block covering all coefficients as insignificant
|
||||||
|
3. For each bitplane from MSB to LSB:
|
||||||
|
- **Insignificant Pass**: Test each insignificant block for significance
|
||||||
|
- If still zero at this bitplane: emit 0 bit, keep in insignificant queue
|
||||||
|
- If becomes significant: emit 1 bit, recursively subdivide using binary tree
|
||||||
|
- **Refinement Pass**: For already-significant coefficients, emit next bit
|
||||||
|
4. Binary tree subdivision continues until blocks of size 1 (single coefficients)
|
||||||
|
5. When coefficient becomes significant: emit sign bit and reconstruct value
|
||||||
|
|
||||||
### Step 6: Coefficient-Domain Dithering (Encoder)
|
**EZBC Output Structure** (per channel):
|
||||||
Light triangular dithering (±0.5 quantization steps) added to coefficients before
|
uint8 MSB Bitplane (8 bits)
|
||||||
quantization to reduce banding artifacts.
|
uint16 Coefficient Count (16 bits)
|
||||||
|
* Bitstream: [significance_bits][sign_bits][refinement_bits]
|
||||||
|
|
||||||
### Step 7: Zstd Compression
|
**Compression Benefits**:
|
||||||
The concatenated Mid+Side encoded data is compressed
|
- Exploits coefficient sparsity through significance testing
|
||||||
using Zstd level 7 for additional compression without significant CPU overhead.
|
- Progressive refinement enables quality scalability
|
||||||
|
- Binary tree exploits spatial clustering of significant coefficients
|
||||||
|
- Typical sparsity: 86.9% zeros (Mid), 97.8% zeros (Side)
|
||||||
|
|
||||||
|
### Step 7: Concatenation and Zstd Compression
|
||||||
|
The Mid and Side EZBC bitstreams are concatenated:
|
||||||
|
Payload = [Mid_EZBC_data][Side_EZBC_data]
|
||||||
|
|
||||||
|
Then compressed using Zstd level 7 for additional compression without significant
|
||||||
|
CPU overhead. Zstd exploits redundancy in the concatenated bitstreams.
|
||||||
|
|
||||||
## Decoding Pipeline
|
## Decoding Pipeline
|
||||||
|
|
||||||
@@ -1665,16 +1700,25 @@ using Zstd level 7 for additional compression without significant CPU overhead.
|
|||||||
Read chunk header (sample_count, max_index, payload_size).
|
Read chunk header (sample_count, max_index, payload_size).
|
||||||
If compressed (default), decompress payload using Zstd.
|
If compressed (default), decompress payload using Zstd.
|
||||||
|
|
||||||
### Step 2: Coefficient Extraction
|
### Step 2: EZBC Decoding
|
||||||
Extract Mid and Side channel int8 data from concatenated payload:
|
Decode Mid and Side channels from concatenated EZBC bitstreams using binary tree
|
||||||
- Mid channel: bytes [0..sample_count-1]
|
embedded zero block decoder:
|
||||||
- Side channel: bytes [sample_count..2*sample_count-1]
|
|
||||||
|
|
||||||
### Step 3: Dequantization with Lambda Decompanding
|
For each channel:
|
||||||
|
1. Read EZBC header: MSB bitplane (8 bits), coefficient count (16 bits)
|
||||||
|
2. Initialise root block as insignificant, track coefficient states
|
||||||
|
3. Process bitplanes from MSB to LSB:
|
||||||
|
- **Insignificant Pass**: Read significance bits, recursively decode significant blocks
|
||||||
|
- **Refinement Pass**: Read refinement bits for already-significant coefficients
|
||||||
|
4. Reconstruct quantized int8 coefficients from bitplane representation
|
||||||
|
|
||||||
|
Output: Quantized int8 coefficients for Mid and Side channels
|
||||||
|
|
||||||
|
### Step 3: Dequantisation with Lambda Decompanding
|
||||||
Convert quantized int8 values back to float coefficients using:
|
Convert quantized int8 values back to float coefficients using:
|
||||||
1. Lambda decompanding (inverse of Laplacian CDF compression)
|
1. Lambda decompanding (inverse of Laplacian CDF compression)
|
||||||
2. Multiply by frequency-dependent quantization steps
|
2. Multiply by frequency-dependent quantisation steps
|
||||||
3. Apply coefficient-domain dithering (TPDF, ~-60 dBFS)
|
3. [Optional] Apply coefficient-domain dithering (TPDF, ~-60 dBFS)
|
||||||
|
|
||||||
### Step 4: 9-Level Inverse CDF 9/7 DWT
|
### Step 4: 9-Level Inverse CDF 9/7 DWT
|
||||||
Reconstruct Float32 audio from DWT coefficients using inverse CDF 9/7 transform.
|
Reconstruct Float32 audio from DWT coefficients using inverse CDF 9/7 transform.
|
||||||
@@ -1704,9 +1748,18 @@ Convert Mid/Side back to Left/Right stereo:
|
|||||||
### Step 6: Gamma Expansion
|
### Step 6: Gamma Expansion
|
||||||
Expand dynamic range (inverse of encoder's gamma compression):
|
Expand dynamic range (inverse of encoder's gamma compression):
|
||||||
|
|
||||||
decode(y) = sign(y) * |y|^(1/γ) where γ=0.707, so 1/γ=√2≈1.414
|
decode(y) = sign(y) * |y|^(1/γ) where γ=0.5, so 1/γ=2.0
|
||||||
|
|
||||||
### Step 7: PCM32f to PCM8 Conversion with Noise-Shaped Dithering
|
### Step 7: De-emphasis Filter
|
||||||
|
Apply de-emphasis filter to reverse the pre-emphasis (α=0.5):
|
||||||
|
|
||||||
|
H(z) = 1 / (1 - α·z⁻¹)
|
||||||
|
|
||||||
|
This is a first-order IIR filter with persistent state across chunks to prevent
|
||||||
|
discontinuities at chunk boundaries. The de-emphasis must be applied AFTER gamma
|
||||||
|
expansion but BEFORE PCM8 conversion to correctly reconstruct the original audio.
|
||||||
|
|
||||||
|
### Step 8: PCM32f to PCM8 Conversion with Noise-Shaped Dithering
|
||||||
Convert Float32 samples to unsigned PCM8 (PCMu8) using second-order error-diffusion
|
Convert Float32 samples to unsigned PCM8 (PCMu8) using second-order error-diffusion
|
||||||
dithering with reduced amplitude (0.2× TPDF) to coordinate with coefficient-domain
|
dithering with reduced amplitude (0.2× TPDF) to coordinate with coefficient-domain
|
||||||
dithering.
|
dithering.
|
||||||
|
|||||||
@@ -419,7 +419,7 @@ class AudioAdapter(val vm: VM) : PeriBase(VM.PERITYPE_SOUND) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Lambda-based decompanding decoder (inverse of Laplacian CDF-based encoder)
|
// Lambda-based decompanding decoder (inverse of Laplacian CDF-based encoder)
|
||||||
// Converts quantized index back to normalized float in [-1, 1]
|
// Converts quantised index back to normalised float in [-1, 1]
|
||||||
private fun lambdaDecompanding(quantVal: Byte, maxIndex: Int): Float {
|
private fun lambdaDecompanding(quantVal: Byte, maxIndex: Int): Float {
|
||||||
// Handle zero
|
// Handle zero
|
||||||
if (quantVal == 0.toByte()) {
|
if (quantVal == 0.toByte()) {
|
||||||
@@ -432,11 +432,11 @@ class AudioAdapter(val vm: VM) : PeriBase(VM.PERITYPE_SOUND) {
|
|||||||
// Clamp to valid range
|
// Clamp to valid range
|
||||||
if (absIndex > maxIndex) absIndex = maxIndex
|
if (absIndex > maxIndex) absIndex = maxIndex
|
||||||
|
|
||||||
// Map index back to normalized CDF [0, 1]
|
// Map index back to normalised CDF [0, 1]
|
||||||
val normalizedCdf = absIndex.toFloat() / maxIndex
|
val normalisedCdf = absIndex.toFloat() / maxIndex
|
||||||
|
|
||||||
// Map from [0, 1] back to [0.5, 1.0] (CDF range for positive half)
|
// Map from [0, 1] back to [0.5, 1.0] (CDF range for positive half)
|
||||||
val cdf = 0.5f + normalizedCdf * 0.5f
|
val cdf = 0.5f + normalisedCdf * 0.5f
|
||||||
|
|
||||||
// Inverse Laplacian CDF for x >= 0: x = -(1/λ) * ln(2*(1-F))
|
// Inverse Laplacian CDF for x >= 0: x = -(1/λ) * ln(2*(1-F))
|
||||||
// For F in [0.5, 1.0]: x = -(1/λ) * ln(2*(1-F))
|
// For F in [0.5, 1.0]: x = -(1/λ) * ln(2*(1-F))
|
||||||
@@ -698,13 +698,13 @@ class AudioAdapter(val vm: VM) : PeriBase(VM.PERITYPE_SOUND) {
|
|||||||
val msbBitplane = bs.readBits(8)
|
val msbBitplane = bs.readBits(8)
|
||||||
val count = bs.readBits(16)
|
val count = bs.readBits(16)
|
||||||
|
|
||||||
// Initialize coefficient array to zero
|
// Initialise coefficient array to zero
|
||||||
coeffs.fill(0)
|
coeffs.fill(0)
|
||||||
|
|
||||||
// Track coefficient significance
|
// Track coefficient significance
|
||||||
val states = Array(count) { TadCoeffState() }
|
val states = Array(count) { TadCoeffState() }
|
||||||
|
|
||||||
// Initialize queues
|
// Initialise queues
|
||||||
val insignificantQueue = TadBlockQueue()
|
val insignificantQueue = TadBlockQueue()
|
||||||
val nextInsignificant = TadBlockQueue()
|
val nextInsignificant = TadBlockQueue()
|
||||||
val significantQueue = TadBlockQueue()
|
val significantQueue = TadBlockQueue()
|
||||||
@@ -822,11 +822,11 @@ class AudioAdapter(val vm: VM) : PeriBase(VM.PERITYPE_SOUND) {
|
|||||||
// Calculate DWT levels from sample count
|
// Calculate DWT levels from sample count
|
||||||
val dwtLevels = calculateDwtLevels(sampleCount)
|
val dwtLevels = calculateDwtLevels(sampleCount)
|
||||||
|
|
||||||
// Dequantize to Float32
|
// Dequantise to Float32
|
||||||
val dwtMid = FloatArray(sampleCount)
|
val dwtMid = FloatArray(sampleCount)
|
||||||
val dwtSide = FloatArray(sampleCount)
|
val dwtSide = FloatArray(sampleCount)
|
||||||
dequantizeDwtCoefficients(0, quantMid, dwtMid, sampleCount, maxIndex, dwtLevels)
|
dequantiseDwtCoefficients(0, quantMid, dwtMid, sampleCount, maxIndex, dwtLevels)
|
||||||
dequantizeDwtCoefficients(1, quantSide, dwtSide, sampleCount, maxIndex, dwtLevels)
|
dequantiseDwtCoefficients(1, quantSide, dwtSide, sampleCount, maxIndex, dwtLevels)
|
||||||
|
|
||||||
// Inverse DWT using CDF 9/7 wavelet (produces Float32 samples in range [-1.0, 1.0])
|
// Inverse DWT using CDF 9/7 wavelet (produces Float32 samples in range [-1.0, 1.0])
|
||||||
dwt97InverseMultilevel(dwtMid, sampleCount, dwtLevels)
|
dwt97InverseMultilevel(dwtMid, sampleCount, dwtLevels)
|
||||||
@@ -891,20 +891,20 @@ class AudioAdapter(val vm: VM) : PeriBase(VM.PERITYPE_SOUND) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Simplified spectral reconstruction for wavelet coefficients
|
// Simplified spectral reconstruction for wavelet coefficients
|
||||||
// Conservative approach: only add light dither to reduce quantization grain
|
// Conservative approach: only add light dither to reduce quantisation grain
|
||||||
private fun spectralInterpolateBand(c: FloatArray, start: Int, len: Int, Q: Float, lowerBandRms: Float) {
|
private fun spectralInterpolateBand(c: FloatArray, start: Int, len: Int, Q: Float, lowerBandRms: Float) {
|
||||||
if (len < 4) return
|
if (len < 4) return
|
||||||
|
|
||||||
xorshift32State = 0x9E3779B9u xor len.toUInt() xor (Q * 65536.0f).toUInt()
|
xorshift32State = 0x9E3779B9u xor len.toUInt() xor (Q * 65536.0f).toUInt()
|
||||||
val ditherAmp = 0.05f * Q // Very light dither (~-60 dBFS)
|
val ditherAmp = 0.05f * Q // Very light dither (~-60 dBFS)
|
||||||
|
|
||||||
// Just add ultra-light TPDF dither to reduce quantization grain
|
// Just add ultra-light TPDF dither to reduce quantisation grain
|
||||||
for (i in 0 until len) {
|
for (i in 0 until len) {
|
||||||
c[start + i] += tpdf() * ditherAmp
|
c[start + i] += tpdf() * ditherAmp
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
private fun dequantizeDwtCoefficients(channel: Int, quantized: ByteArray, coeffs: FloatArray, count: Int,
|
private fun dequantiseDwtCoefficients(channel: Int, quantised: ByteArray, coeffs: FloatArray, count: Int,
|
||||||
maxIndex: Int, dwtLevels: Int) {
|
maxIndex: Int, dwtLevels: Int) {
|
||||||
// Calculate sideband boundaries dynamically
|
// Calculate sideband boundaries dynamically
|
||||||
val firstBandSize = count shr dwtLevels
|
val firstBandSize = count shr dwtLevels
|
||||||
@@ -915,7 +915,7 @@ class AudioAdapter(val vm: VM) : PeriBase(VM.PERITYPE_SOUND) {
|
|||||||
sidebandStarts[i] = sidebandStarts[i - 1] + (firstBandSize shl (i - 2))
|
sidebandStarts[i] = sidebandStarts[i - 1] + (firstBandSize shl (i - 2))
|
||||||
}
|
}
|
||||||
|
|
||||||
// Dequantize all coefficients with stochastic reconstruction for deadzoned values
|
// Dequantise all coefficients with stochastic reconstruction for deadzoned values
|
||||||
val quantiserScale = 1.0f
|
val quantiserScale = 1.0f
|
||||||
for (i in 0 until count) {
|
for (i in 0 until count) {
|
||||||
var sideband = dwtLevels
|
var sideband = dwtLevels
|
||||||
@@ -927,7 +927,7 @@ class AudioAdapter(val vm: VM) : PeriBase(VM.PERITYPE_SOUND) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Check for deadzone marker
|
// Check for deadzone marker
|
||||||
/*if (quantized[i] == DEADZONE_MARKER_QUANT) {
|
/*if (quantised[i] == DEADZONE_MARKER_QUANT) {
|
||||||
// Stochastic reconstruction: generate Laplacian noise in deadband range
|
// Stochastic reconstruction: generate Laplacian noise in deadband range
|
||||||
val deadbandThreshold = DEADBANDS[channel][sideband]
|
val deadbandThreshold = DEADBANDS[channel][sideband]
|
||||||
|
|
||||||
@@ -942,13 +942,13 @@ class AudioAdapter(val vm: VM) : PeriBase(VM.PERITYPE_SOUND) {
|
|||||||
// Apply scalar (but not quantiser weight - noise is already in correct range)
|
// Apply scalar (but not quantiser weight - noise is already in correct range)
|
||||||
coeffs[i] = noise * TAD32_COEFF_SCALARS[sideband]
|
coeffs[i] = noise * TAD32_COEFF_SCALARS[sideband]
|
||||||
} else {*/
|
} else {*/
|
||||||
// Normal dequantization using lambda decompanding
|
// Normal dequantisation using lambda decompanding
|
||||||
val normalizedVal = lambdaDecompanding(quantized[i], maxIndex)
|
val normalisedVal = lambdaDecompanding(quantised[i], maxIndex)
|
||||||
|
|
||||||
// Denormalize using the subband scalar and apply base weight + quantiser scaling
|
// Denormalise using the subband scalar and apply base weight + quantiser scaling
|
||||||
// CRITICAL: Use channel-specific weights (Mid=0, Side=1)
|
// CRITICAL: Use channel-specific weights (Mid=0, Side=1)
|
||||||
val weight = BASE_QUANTISER_WEIGHTS[channel][sideband] * quantiserScale
|
val weight = BASE_QUANTISER_WEIGHTS[channel][sideband] * quantiserScale
|
||||||
coeffs[i] = normalizedVal * TAD32_COEFF_SCALARS[sideband] * weight
|
coeffs[i] = normalisedVal * TAD32_COEFF_SCALARS[sideband] * weight
|
||||||
// }
|
// }
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -82,7 +82,7 @@ static void write_tav_header_only(FILE *out) {
|
|||||||
// Channel layout: 0 (Y-Co-Cg)
|
// Channel layout: 0 (Y-Co-Cg)
|
||||||
header[26] = 0;
|
header[26] = 0;
|
||||||
|
|
||||||
// Reserved[4]: zeros (27-30 already initialized to 0)
|
// Reserved[4]: zeros (27-30 already initialised to 0)
|
||||||
|
|
||||||
// File Role: 1 (header-only, UCF payload follows)
|
// File Role: 1 (header-only, UCF payload follows)
|
||||||
header[31] = 1;
|
header[31] = 1;
|
||||||
|
|||||||
@@ -20,7 +20,7 @@
|
|||||||
static const float TAD32_COEFF_SCALARS[] = {64.0f, 45.255f, 32.0f, 22.627f, 16.0f, 11.314f, 8.0f, 5.657f, 4.0f, 2.828f};
|
static const float TAD32_COEFF_SCALARS[] = {64.0f, 45.255f, 32.0f, 22.627f, 16.0f, 11.314f, 8.0f, 5.657f, 4.0f, 2.828f};
|
||||||
|
|
||||||
// Base quantiser weight table (10 subbands: LL + 9 H bands)
|
// Base quantiser weight table (10 subbands: LL + 9 H bands)
|
||||||
// These weights are multiplied by quantiser_scale during quantization
|
// These weights are multiplied by quantiser_scale during quantisation
|
||||||
static const float BASE_QUANTISER_WEIGHTS[2][10] = {
|
static const float BASE_QUANTISER_WEIGHTS[2][10] = {
|
||||||
{ // mid channel
|
{ // mid channel
|
||||||
4.0f, // LL (L9) DC
|
4.0f, // LL (L9) DC
|
||||||
@@ -47,7 +47,7 @@ static const float BASE_QUANTISER_WEIGHTS[2][10] = {
|
|||||||
3.2f // H (L1) 8 khz
|
3.2f // H (L1) 8 khz
|
||||||
}};
|
}};
|
||||||
|
|
||||||
#define TAD_DEFAULT_CHUNK_SIZE 31991
|
#define TAD_DEFAULT_CHUNK_SIZE 32768
|
||||||
#define TAD_MIN_CHUNK_SIZE 1024
|
#define TAD_MIN_CHUNK_SIZE 1024
|
||||||
#define TAD_SAMPLE_RATE 32000
|
#define TAD_SAMPLE_RATE 32000
|
||||||
#define TAD_CHANNELS 2
|
#define TAD_CHANNELS 2
|
||||||
@@ -105,7 +105,7 @@ static void spectral_interpolate_band(float *c, size_t len, float Q, float lower
|
|||||||
uint32_t seed = 0x9E3779B9u ^ (uint32_t)len ^ (uint32_t)(Q * 65536.0f);
|
uint32_t seed = 0x9E3779B9u ^ (uint32_t)len ^ (uint32_t)(Q * 65536.0f);
|
||||||
const float dither_amp = 0.02f * Q; // Very light dither
|
const float dither_amp = 0.02f * Q; // Very light dither
|
||||||
|
|
||||||
// Just add ultra-light TPDF dither to reduce quantization grain
|
// Just add ultra-light TPDF dither to reduce quantisation grain
|
||||||
// No aggressive hole filling or AR prediction that might create artifacts
|
// No aggressive hole filling or AR prediction that might create artifacts
|
||||||
for (size_t i = 0; i < len; i++) {
|
for (size_t i = 0; i < len; i++) {
|
||||||
c[i] += tpdf(&seed) * dither_amp;
|
c[i] += tpdf(&seed) * dither_amp;
|
||||||
@@ -539,14 +539,14 @@ static void pcm32f_to_pcm8(const float *fleft, const float *fright, uint8_t *lef
|
|||||||
}
|
}
|
||||||
|
|
||||||
//=============================================================================
|
//=============================================================================
|
||||||
// Dequantization (inverse of quantization)
|
// Dequantisation (inverse of quantisation)
|
||||||
//=============================================================================
|
//=============================================================================
|
||||||
|
|
||||||
|
|
||||||
#define LAMBDA_FIXED 6.0f
|
#define LAMBDA_FIXED 6.0f
|
||||||
|
|
||||||
// Lambda-based decompanding decoder (inverse of Laplacian CDF-based encoder)
|
// Lambda-based decompanding decoder (inverse of Laplacian CDF-based encoder)
|
||||||
// Converts quantized index back to normalized float in [-1, 1]
|
// Converts quantised index back to normalised float in [-1, 1]
|
||||||
static float lambda_decompanding(int8_t quant_val, int max_index) {
|
static float lambda_decompanding(int8_t quant_val, int max_index) {
|
||||||
// Handle zero
|
// Handle zero
|
||||||
if (quant_val == 0) {
|
if (quant_val == 0) {
|
||||||
@@ -559,11 +559,11 @@ static float lambda_decompanding(int8_t quant_val, int max_index) {
|
|||||||
// Clamp to valid range
|
// Clamp to valid range
|
||||||
if (abs_index > max_index) abs_index = max_index;
|
if (abs_index > max_index) abs_index = max_index;
|
||||||
|
|
||||||
// Map index back to normalized CDF [0, 1]
|
// Map index back to normalised CDF [0, 1]
|
||||||
float normalized_cdf = (float)abs_index / max_index;
|
float normalised_cdf = (float)abs_index / max_index;
|
||||||
|
|
||||||
// Map from [0, 1] back to [0.5, 1.0] (CDF range for positive half)
|
// Map from [0, 1] back to [0.5, 1.0] (CDF range for positive half)
|
||||||
float cdf = 0.5f + normalized_cdf * 0.5f;
|
float cdf = 0.5f + normalised_cdf * 0.5f;
|
||||||
|
|
||||||
// Inverse Laplacian CDF for x >= 0: x = -(1/λ) * ln(2*(1-F))
|
// Inverse Laplacian CDF for x >= 0: x = -(1/λ) * ln(2*(1-F))
|
||||||
// For F in [0.5, 1.0]: x = -(1/λ) * ln(2*(1-F))
|
// For F in [0.5, 1.0]: x = -(1/λ) * ln(2*(1-F))
|
||||||
@@ -576,7 +576,7 @@ static float lambda_decompanding(int8_t quant_val, int max_index) {
|
|||||||
return sign * abs_val;
|
return sign * abs_val;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void dequantize_dwt_coefficients(int channel, const int8_t *quantized, float *coeffs, size_t count, int chunk_size, int dwt_levels, int max_index, float quantiser_scale) {
|
static void dequantise_dwt_coefficients(int channel, const int8_t *quantised, float *coeffs, size_t count, int chunk_size, int dwt_levels, int max_index, float quantiser_scale) {
|
||||||
|
|
||||||
// Calculate sideband boundaries dynamically
|
// Calculate sideband boundaries dynamically
|
||||||
int first_band_size = chunk_size >> dwt_levels;
|
int first_band_size = chunk_size >> dwt_levels;
|
||||||
@@ -588,7 +588,7 @@ static void dequantize_dwt_coefficients(int channel, const int8_t *quantized, fl
|
|||||||
sideband_starts[i] = sideband_starts[i-1] + (first_band_size << (i-2));
|
sideband_starts[i] = sideband_starts[i-1] + (first_band_size << (i-2));
|
||||||
}
|
}
|
||||||
|
|
||||||
// Dequantize all coefficients with stochastic reconstruction for deadzoned values
|
// Dequantise all coefficients with stochastic reconstruction for deadzoned values
|
||||||
for (size_t i = 0; i < count; i++) {
|
for (size_t i = 0; i < count; i++) {
|
||||||
int sideband = dwt_levels;
|
int sideband = dwt_levels;
|
||||||
for (int s = 0; s <= dwt_levels; s++) {
|
for (int s = 0; s <= dwt_levels; s++) {
|
||||||
@@ -599,7 +599,7 @@ static void dequantize_dwt_coefficients(int channel, const int8_t *quantized, fl
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Check for deadzone marker
|
// Check for deadzone marker
|
||||||
/*if (quantized[i] == (int8_t)0) {//DEADZONE_MARKER_QUANT) {
|
/*if (quantised[i] == (int8_t)0) {//DEADZONE_MARKER_QUANT) {
|
||||||
// Stochastic reconstruction: generate Laplacian noise in deadband range
|
// Stochastic reconstruction: generate Laplacian noise in deadband range
|
||||||
float deadband_threshold = DEADBANDS[channel][sideband];
|
float deadband_threshold = DEADBANDS[channel][sideband];
|
||||||
|
|
||||||
@@ -614,12 +614,12 @@ static void dequantize_dwt_coefficients(int channel, const int8_t *quantized, fl
|
|||||||
// Apply scalar (but not quantiser weight - noise is already in correct range)
|
// Apply scalar (but not quantiser weight - noise is already in correct range)
|
||||||
coeffs[i] = noise * TAD32_COEFF_SCALARS[sideband];
|
coeffs[i] = noise * TAD32_COEFF_SCALARS[sideband];
|
||||||
} else {*/
|
} else {*/
|
||||||
// Normal dequantization using lambda decompanding
|
// Normal dequantisation using lambda decompanding
|
||||||
float normalized_val = lambda_decompanding(quantized[i], max_index);
|
float normalised_val = lambda_decompanding(quantised[i], max_index);
|
||||||
|
|
||||||
// Denormalize using the subband scalar and apply base weight + quantiser scaling
|
// Denormalise using the subband scalar and apply base weight + quantiser scaling
|
||||||
float weight = BASE_QUANTISER_WEIGHTS[channel][sideband] * quantiser_scale;
|
float weight = BASE_QUANTISER_WEIGHTS[channel][sideband] * quantiser_scale;
|
||||||
coeffs[i] = normalized_val * TAD32_COEFF_SCALARS[sideband] * weight;
|
coeffs[i] = normalised_val * TAD32_COEFF_SCALARS[sideband] * weight;
|
||||||
// }
|
// }
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -777,13 +777,13 @@ static int tad_decode_channel_ezbc(const uint8_t *input, size_t input_size, int8
|
|||||||
int msb_bitplane = tad_bitstream_read_bits(&bs, 8);
|
int msb_bitplane = tad_bitstream_read_bits(&bs, 8);
|
||||||
uint32_t count = tad_bitstream_read_bits(&bs, 16);
|
uint32_t count = tad_bitstream_read_bits(&bs, 16);
|
||||||
|
|
||||||
// Initialize coefficient array to zero
|
// Initialise coefficient array to zero
|
||||||
memset(coeffs, 0, count * sizeof(int8_t));
|
memset(coeffs, 0, count * sizeof(int8_t));
|
||||||
|
|
||||||
// Track coefficient significance
|
// Track coefficient significance
|
||||||
tad_decode_state_t *states = calloc(count, sizeof(tad_decode_state_t));
|
tad_decode_state_t *states = calloc(count, sizeof(tad_decode_state_t));
|
||||||
|
|
||||||
// Initialize queues
|
// Initialise queues
|
||||||
tad_decode_queue_t insignificant_queue, next_insignificant;
|
tad_decode_queue_t insignificant_queue, next_insignificant;
|
||||||
tad_decode_queue_t significant_queue, next_significant;
|
tad_decode_queue_t significant_queue, next_significant;
|
||||||
|
|
||||||
@@ -890,7 +890,7 @@ int tad32_decode_chunk(const uint8_t *input, size_t input_size, uint8_t *pcmu8_s
|
|||||||
return -1;
|
return -1;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Decompress if needed
|
// Decompress Zstd
|
||||||
const uint8_t *payload;
|
const uint8_t *payload;
|
||||||
uint8_t *decompressed = NULL;
|
uint8_t *decompressed = NULL;
|
||||||
|
|
||||||
@@ -946,11 +946,11 @@ int tad32_decode_chunk(const uint8_t *input, size_t input_size, uint8_t *pcmu8_s
|
|||||||
return -1;
|
return -1;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Dequantize with quantiser scaling and spectral interpolation
|
// Dequantise with quantiser scaling and spectral interpolation
|
||||||
// Use quantiser_scale = 1.0f for baseline (must match encoder)
|
// Use quantiser_scale = 1.0f for baseline (must match encoder)
|
||||||
float quantiser_scale = 1.0f;
|
float quantiser_scale = 1.0f;
|
||||||
dequantize_dwt_coefficients(0, quant_mid, dwt_mid, sample_count, sample_count, dwt_levels, max_index, quantiser_scale);
|
dequantise_dwt_coefficients(0, quant_mid, dwt_mid, sample_count, sample_count, dwt_levels, max_index, quantiser_scale);
|
||||||
dequantize_dwt_coefficients(1, quant_side, dwt_side, sample_count, sample_count, dwt_levels, max_index, quantiser_scale);
|
dequantise_dwt_coefficients(1, quant_side, dwt_side, sample_count, sample_count, dwt_levels, max_index, quantiser_scale);
|
||||||
|
|
||||||
// Inverse DWT
|
// Inverse DWT
|
||||||
dwt_inverse_multilevel(dwt_mid, sample_count, dwt_levels);
|
dwt_inverse_multilevel(dwt_mid, sample_count, dwt_levels);
|
||||||
|
|||||||
@@ -11,7 +11,7 @@
|
|||||||
// Constants (must match encoder)
|
// Constants (must match encoder)
|
||||||
#define TAD32_SAMPLE_RATE 32000
|
#define TAD32_SAMPLE_RATE 32000
|
||||||
#define TAD32_CHANNELS 2 // Stereo
|
#define TAD32_CHANNELS 2 // Stereo
|
||||||
#define TAD_DEFAULT_CHUNK_SIZE 31991 // Default chunk size for standalone TAD files
|
#define TAD_DEFAULT_CHUNK_SIZE 32768 // Default chunk size for standalone TAD files
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Decode audio chunk with TAD32 codec
|
* Decode audio chunk with TAD32 codec
|
||||||
@@ -25,7 +25,7 @@
|
|||||||
*
|
*
|
||||||
* Input format:
|
* Input format:
|
||||||
* uint16 sample_count (samples per channel)
|
* uint16 sample_count (samples per channel)
|
||||||
* uint8 max_index (maximum quantization index)
|
* uint8 max_index (maximum quantisation index)
|
||||||
* uint32 payload_size (bytes in payload)
|
* uint32 payload_size (bytes in payload)
|
||||||
* * payload (encoded M/S data, Zstd-compressed with EZBC)
|
* * payload (encoded M/S data, Zstd-compressed with EZBC)
|
||||||
*
|
*
|
||||||
|
|||||||
@@ -97,12 +97,12 @@ typedef struct {
|
|||||||
} __attribute__((packed)) tav_header_t;
|
} __attribute__((packed)) tav_header_t;
|
||||||
|
|
||||||
//=============================================================================
|
//=============================================================================
|
||||||
// Quantization Lookup Table (matches TSVM exactly)
|
// Quantisation Lookup Table (matches TSVM exactly)
|
||||||
//=============================================================================
|
//=============================================================================
|
||||||
|
|
||||||
static const int QLUT[] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126,128,132,136,140,144,148,152,156,160,164,168,172,176,180,184,188,192,196,200,204,208,212,216,220,224,228,232,236,240,244,248,252,256,264,272,280,288,296,304,312,320,328,336,344,352,360,368,376,384,392,400,408,416,424,432,440,448,456,464,472,480,488,496,504,512,528,544,560,576,592,608,624,640,656,672,688,704,720,736,752,768,784,800,816,832,848,864,880,896,912,928,944,960,976,992,1008,1024,1056,1088,1120,1152,1184,1216,1248,1280,1312,1344,1376,1408,1440,1472,1504,1536,1568,1600,1632,1664,1696,1728,1760,1792,1824,1856,1888,1920,1952,1984,2016,2048,2112,2176,2240,2304,2368,2432,2496,2560,2624,2688,2752,2816,2880,2944,3008,3072,3136,3200,3264,3328,3392,3456,3520,3584,3648,3712,3776,3840,3904,3968,4032,4096};
|
static const int QLUT[] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126,128,132,136,140,144,148,152,156,160,164,168,172,176,180,184,188,192,196,200,204,208,212,216,220,224,228,232,236,240,244,248,252,256,264,272,280,288,296,304,312,320,328,336,344,352,360,368,376,384,392,400,408,416,424,432,440,448,456,464,472,480,488,496,504,512,528,544,560,576,592,608,624,640,656,672,688,704,720,736,752,768,784,800,816,832,848,864,880,896,912,928,944,960,976,992,1008,1024,1056,1088,1120,1152,1184,1216,1248,1280,1312,1344,1376,1408,1440,1472,1504,1536,1568,1600,1632,1664,1696,1728,1760,1792,1824,1856,1888,1920,1952,1984,2016,2048,2112,2176,2240,2304,2368,2432,2496,2560,2624,2688,2752,2816,2880,2944,3008,3072,3136,3200,3264,3328,3392,3456,3520,3584,3648,3712,3776,3840,3904,3968,4032,4096};
|
||||||
|
|
||||||
// Perceptual quantization constants (match TSVM)
|
// Perceptual quantisation constants (match TSVM)
|
||||||
static const float ANISOTROPY_MULT[] = {2.0f, 1.8f, 1.6f, 1.4f, 1.2f, 1.0f};
|
static const float ANISOTROPY_MULT[] = {2.0f, 1.8f, 1.6f, 1.4f, 1.2f, 1.0f};
|
||||||
static const float ANISOTROPY_BIAS[] = {0.4f, 0.2f, 0.1f, 0.0f, 0.0f, 0.0f};
|
static const float ANISOTROPY_BIAS[] = {0.4f, 0.2f, 0.1f, 0.0f, 0.0f, 0.0f};
|
||||||
static const float ANISOTROPY_MULT_CHROMA[] = {6.6f, 5.5f, 4.4f, 3.3f, 2.2f, 1.1f};
|
static const float ANISOTROPY_MULT_CHROMA[] = {6.6f, 5.5f, 4.4f, 3.3f, 2.2f, 1.1f};
|
||||||
@@ -153,7 +153,7 @@ static int calculate_subband_layout(int width, int height, int decomp_levels, dw
|
|||||||
}
|
}
|
||||||
|
|
||||||
//=============================================================================
|
//=============================================================================
|
||||||
// Perceptual Quantization Model (matches TSVM exactly)
|
// Perceptual Quantisation Model (matches TSVM exactly)
|
||||||
//=============================================================================
|
//=============================================================================
|
||||||
|
|
||||||
static int tav_derive_encoder_qindex(int q_index, int q_y_global) {
|
static int tav_derive_encoder_qindex(int q_index, int q_y_global) {
|
||||||
@@ -248,18 +248,18 @@ static float get_perceptual_weight(int q_index, int q_y_global, int level0, int
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static void dequantize_dwt_subbands_perceptual(int q_index, int q_y_global, const int16_t *quantized,
|
static void dequantise_dwt_subbands_perceptual(int q_index, int q_y_global, const int16_t *quantised,
|
||||||
float *dequantized, int width, int height, int decomp_levels,
|
float *dequantised, int width, int height, int decomp_levels,
|
||||||
float base_quantizer, int is_chroma, int frame_num) {
|
float base_quantiser, int is_chroma, int frame_num) {
|
||||||
dwt_subband_info_t subbands[32]; // Max possible subbands
|
dwt_subband_info_t subbands[32]; // Max possible subbands
|
||||||
const int subband_count = calculate_subband_layout(width, height, decomp_levels, subbands);
|
const int subband_count = calculate_subband_layout(width, height, decomp_levels, subbands);
|
||||||
|
|
||||||
const int coeff_count = width * height;
|
const int coeff_count = width * height;
|
||||||
memset(dequantized, 0, coeff_count * sizeof(float));
|
memset(dequantised, 0, coeff_count * sizeof(float));
|
||||||
|
|
||||||
int is_debug = 0;//(frame_num == 32);
|
int is_debug = 0;//(frame_num == 32);
|
||||||
// if (frame_num == 32) {
|
// if (frame_num == 32) {
|
||||||
// fprintf(stderr, "DEBUG: dequantize called for frame %d, is_chroma=%d\n", frame_num, is_chroma);
|
// fprintf(stderr, "DEBUG: dequantise called for frame %d, is_chroma=%d\n", frame_num, is_chroma);
|
||||||
// }
|
// }
|
||||||
|
|
||||||
// Apply perceptual weighting to each subband
|
// Apply perceptual weighting to each subband
|
||||||
@@ -267,30 +267,30 @@ static void dequantize_dwt_subbands_perceptual(int q_index, int q_y_global, cons
|
|||||||
const dwt_subband_info_t *subband = &subbands[s];
|
const dwt_subband_info_t *subband = &subbands[s];
|
||||||
const float weight = get_perceptual_weight(q_index, q_y_global, subband->level,
|
const float weight = get_perceptual_weight(q_index, q_y_global, subband->level,
|
||||||
subband->subband_type, is_chroma, decomp_levels);
|
subband->subband_type, is_chroma, decomp_levels);
|
||||||
const float effective_quantizer = base_quantizer * weight;
|
const float effective_quantiser = base_quantiser * weight;
|
||||||
|
|
||||||
if (is_debug && !is_chroma) {
|
if (is_debug && !is_chroma) {
|
||||||
if (subband->subband_type == 0) { // LL band
|
if (subband->subband_type == 0) { // LL band
|
||||||
fprintf(stderr, " Subband level %d (LL): weight=%.6f, base_q=%.1f, effective_q=%.1f, count=%d\n",
|
fprintf(stderr, " Subband level %d (LL): weight=%.6f, base_q=%.1f, effective_q=%.1f, count=%d\n",
|
||||||
subband->level, weight, base_quantizer, effective_quantizer, subband->coeff_count);
|
subband->level, weight, base_quantiser, effective_quantiser, subband->coeff_count);
|
||||||
|
|
||||||
// Print first 5 quantized LL coefficients
|
// Print first 5 quantised LL coefficients
|
||||||
fprintf(stderr, " First 5 quantized LL: ");
|
fprintf(stderr, " First 5 quantised LL: ");
|
||||||
for (int k = 0; k < 5 && k < subband->coeff_count; k++) {
|
for (int k = 0; k < 5 && k < subband->coeff_count; k++) {
|
||||||
int idx = subband->coeff_start + k;
|
int idx = subband->coeff_start + k;
|
||||||
fprintf(stderr, "%d ", quantized[idx]);
|
fprintf(stderr, "%d ", quantised[idx]);
|
||||||
}
|
}
|
||||||
fprintf(stderr, "\n");
|
fprintf(stderr, "\n");
|
||||||
|
|
||||||
// Find max quantized LL coefficient
|
// Find max quantised LL coefficient
|
||||||
int max_quant_ll = 0;
|
int max_quant_ll = 0;
|
||||||
for (int k = 0; k < subband->coeff_count; k++) {
|
for (int k = 0; k < subband->coeff_count; k++) {
|
||||||
int idx = subband->coeff_start + k;
|
int idx = subband->coeff_start + k;
|
||||||
int abs_val = quantized[idx] < 0 ? -quantized[idx] : quantized[idx];
|
int abs_val = quantised[idx] < 0 ? -quantised[idx] : quantised[idx];
|
||||||
if (abs_val > max_quant_ll) max_quant_ll = abs_val;
|
if (abs_val > max_quant_ll) max_quant_ll = abs_val;
|
||||||
}
|
}
|
||||||
fprintf(stderr, " Max quantized LL coefficient: %d (dequantizes to %.1f)\n",
|
fprintf(stderr, " Max quantised LL coefficient: %d (dequantises to %.1f)\n",
|
||||||
max_quant_ll, max_quant_ll * effective_quantizer);
|
max_quant_ll, max_quant_ll * effective_quantiser);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -299,33 +299,33 @@ static void dequantize_dwt_subbands_perceptual(int q_index, int q_y_global, cons
|
|||||||
if (idx < coeff_count) {
|
if (idx < coeff_count) {
|
||||||
// CRITICAL: Must ROUND to match EZBC encoder's roundf() behavior
|
// CRITICAL: Must ROUND to match EZBC encoder's roundf() behavior
|
||||||
// Without rounding, truncation limits brightness range (e.g., Y maxes at 227 instead of 255)
|
// Without rounding, truncation limits brightness range (e.g., Y maxes at 227 instead of 255)
|
||||||
const float untruncated = quantized[idx] * effective_quantizer;
|
const float untruncated = quantised[idx] * effective_quantiser;
|
||||||
dequantized[idx] = roundf(untruncated);
|
dequantised[idx] = roundf(untruncated);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Debug: Verify LL band was dequantized correctly
|
// Debug: Verify LL band was dequantised correctly
|
||||||
if (is_debug && !is_chroma) {
|
if (is_debug && !is_chroma) {
|
||||||
// Find LL band again to verify
|
// Find LL band again to verify
|
||||||
for (int s = 0; s < subband_count; s++) {
|
for (int s = 0; s < subband_count; s++) {
|
||||||
const dwt_subband_info_t *subband = &subbands[s];
|
const dwt_subband_info_t *subband = &subbands[s];
|
||||||
if (subband->level == decomp_levels && subband->subband_type == 0) {
|
if (subband->level == decomp_levels && subband->subband_type == 0) {
|
||||||
fprintf(stderr, " AFTER all subbands processed - First 5 dequantized LL: ");
|
fprintf(stderr, " AFTER all subbands processed - First 5 dequantised LL: ");
|
||||||
for (int k = 0; k < 5 && k < subband->coeff_count; k++) {
|
for (int k = 0; k < 5 && k < subband->coeff_count; k++) {
|
||||||
int idx = subband->coeff_start + k;
|
int idx = subband->coeff_start + k;
|
||||||
fprintf(stderr, "%.1f ", dequantized[idx]);
|
fprintf(stderr, "%.1f ", dequantised[idx]);
|
||||||
}
|
}
|
||||||
fprintf(stderr, "\n");
|
fprintf(stderr, "\n");
|
||||||
|
|
||||||
// Find max dequantized LL
|
// Find max dequantised LL
|
||||||
float max_dequant_ll = -999.0f;
|
float max_dequant_ll = -999.0f;
|
||||||
for (int k = 0; k < subband->coeff_count; k++) {
|
for (int k = 0; k < subband->coeff_count; k++) {
|
||||||
int idx = subband->coeff_start + k;
|
int idx = subband->coeff_start + k;
|
||||||
float abs_val = dequantized[idx] < 0 ? -dequantized[idx] : dequantized[idx];
|
float abs_val = dequantised[idx] < 0 ? -dequantised[idx] : dequantised[idx];
|
||||||
if (abs_val > max_dequant_ll) max_dequant_ll = abs_val;
|
if (abs_val > max_dequant_ll) max_dequant_ll = abs_val;
|
||||||
}
|
}
|
||||||
fprintf(stderr, " AFTER all subbands - Max dequantized LL: %.1f\n", max_dequant_ll);
|
fprintf(stderr, " AFTER all subbands - Max dequantised LL: %.1f\n", max_dequant_ll);
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -360,7 +360,7 @@ static inline float tav_grain_triangular_noise(uint32_t rng_val) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Remove grain synthesis from DWT coefficients (decoder subtracts noise)
|
// Remove grain synthesis from DWT coefficients (decoder subtracts noise)
|
||||||
// This must be called AFTER dequantization but BEFORE inverse DWT
|
// This must be called AFTER dequantisation but BEFORE inverse DWT
|
||||||
static void remove_grain_synthesis_decoder(float *coeffs, int width, int height,
|
static void remove_grain_synthesis_decoder(float *coeffs, int width, int height,
|
||||||
int decomp_levels, int frame_num, int q_y_global) {
|
int decomp_levels, int frame_num, int q_y_global) {
|
||||||
dwt_subband_info_t subbands[32];
|
dwt_subband_info_t subbands[32];
|
||||||
@@ -647,14 +647,14 @@ static void spectral_interpolate_band(float *c, size_t len, float Q, float lower
|
|||||||
}
|
}
|
||||||
|
|
||||||
//=============================================================================
|
//=============================================================================
|
||||||
// Dequantization (inverse of quantization)
|
// Dequantisation (inverse of quantisation)
|
||||||
//=============================================================================
|
//=============================================================================
|
||||||
|
|
||||||
|
|
||||||
#define LAMBDA_FIXED 6.0f
|
#define LAMBDA_FIXED 6.0f
|
||||||
|
|
||||||
// Lambda-based decompanding decoder (inverse of Laplacian CDF-based encoder)
|
// Lambda-based decompanding decoder (inverse of Laplacian CDF-based encoder)
|
||||||
// Converts quantized index back to normalized float in [-1, 1]
|
// Converts quantised index back to normalised float in [-1, 1]
|
||||||
static float lambda_decompanding(int8_t quant_val, int max_index) {
|
static float lambda_decompanding(int8_t quant_val, int max_index) {
|
||||||
// Handle zero
|
// Handle zero
|
||||||
if (quant_val == 0) {
|
if (quant_val == 0) {
|
||||||
@@ -667,11 +667,11 @@ static float lambda_decompanding(int8_t quant_val, int max_index) {
|
|||||||
// Clamp to valid range
|
// Clamp to valid range
|
||||||
if (abs_index > max_index) abs_index = max_index;
|
if (abs_index > max_index) abs_index = max_index;
|
||||||
|
|
||||||
// Map index back to normalized CDF [0, 1]
|
// Map index back to normalised CDF [0, 1]
|
||||||
float normalized_cdf = (float)abs_index / max_index;
|
float normalised_cdf = (float)abs_index / max_index;
|
||||||
|
|
||||||
// Map from [0, 1] back to [0.5, 1.0] (CDF range for positive half)
|
// Map from [0, 1] back to [0.5, 1.0] (CDF range for positive half)
|
||||||
float cdf = 0.5f + normalized_cdf * 0.5f;
|
float cdf = 0.5f + normalised_cdf * 0.5f;
|
||||||
|
|
||||||
// Inverse Laplacian CDF for x >= 0: x = -(1/λ) * ln(2*(1-F))
|
// Inverse Laplacian CDF for x >= 0: x = -(1/λ) * ln(2*(1-F))
|
||||||
// For F in [0.5, 1.0]: x = -(1/λ) * ln(2*(1-F))
|
// For F in [0.5, 1.0]: x = -(1/λ) * ln(2*(1-F))
|
||||||
@@ -684,7 +684,7 @@ static float lambda_decompanding(int8_t quant_val, int max_index) {
|
|||||||
return sign * abs_val;
|
return sign * abs_val;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void dequantize_dwt_coefficients(const int8_t *quantized, float *coeffs, size_t count, int chunk_size, int dwt_levels, int max_index, float quantiser_scale) {
|
static void dequantise_dwt_coefficients(const int8_t *quantised, float *coeffs, size_t count, int chunk_size, int dwt_levels, int max_index, float quantiser_scale) {
|
||||||
|
|
||||||
// Calculate sideband boundaries dynamically
|
// Calculate sideband boundaries dynamically
|
||||||
int first_band_size = chunk_size >> dwt_levels;
|
int first_band_size = chunk_size >> dwt_levels;
|
||||||
@@ -696,7 +696,7 @@ static void dequantize_dwt_coefficients(const int8_t *quantized, float *coeffs,
|
|||||||
sideband_starts[i] = sideband_starts[i-1] + (first_band_size << (i-2));
|
sideband_starts[i] = sideband_starts[i-1] + (first_band_size << (i-2));
|
||||||
}
|
}
|
||||||
|
|
||||||
// Step 1: Dequantize all coefficients (no dithering yet)
|
// Step 1: Dequantise all coefficients (no dithering yet)
|
||||||
for (size_t i = 0; i < count; i++) {
|
for (size_t i = 0; i < count; i++) {
|
||||||
int sideband = dwt_levels;
|
int sideband = dwt_levels;
|
||||||
for (int s = 0; s <= dwt_levels; s++) {
|
for (int s = 0; s <= dwt_levels; s++) {
|
||||||
@@ -707,11 +707,11 @@ static void dequantize_dwt_coefficients(const int8_t *quantized, float *coeffs,
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Decode using lambda companding
|
// Decode using lambda companding
|
||||||
float normalized_val = lambda_decompanding(quantized[i], max_index);
|
float normalised_val = lambda_decompanding(quantised[i], max_index);
|
||||||
|
|
||||||
// Denormalize using the subband scalar and apply base weight + quantiser scaling
|
// Denormalise using the subband scalar and apply base weight + quantiser scaling
|
||||||
float weight = BASE_QUANTISER_WEIGHTS[sideband] * quantiser_scale;
|
float weight = BASE_QUANTISER_WEIGHTS[sideband] * quantiser_scale;
|
||||||
coeffs[i] = normalized_val * TAD32_COEFF_SCALARS[sideband] * weight;
|
coeffs[i] = normalised_val * TAD32_COEFF_SCALARS[sideband] * weight;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Step 2: Apply spectral interpolation per band
|
// Step 2: Apply spectral interpolation per band
|
||||||
@@ -724,7 +724,7 @@ static void dequantize_dwt_coefficients(const int8_t *quantized, float *coeffs,
|
|||||||
size_t band_end = sideband_starts[band + 1];
|
size_t band_end = sideband_starts[band + 1];
|
||||||
size_t band_len = band_end - band_start;
|
size_t band_len = band_end - band_start;
|
||||||
|
|
||||||
// Calculate quantization step Q for this band
|
// Calculate quantisation step Q for this band
|
||||||
float weight = BASE_QUANTISER_WEIGHTS[band] * quantiser_scale;
|
float weight = BASE_QUANTISER_WEIGHTS[band] * quantiser_scale;
|
||||||
float scalar = TAD32_COEFF_SCALARS[band] * weight;
|
float scalar = TAD32_COEFF_SCALARS[band] * weight;
|
||||||
float Q = scalar / max_index;
|
float Q = scalar / max_index;
|
||||||
@@ -1005,12 +1005,12 @@ static void decode_channel_ezbc(const uint8_t *ezbc_data, size_t offset, size_t
|
|||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Initialize output and state tracking
|
// Initialise output and state tracking
|
||||||
memset(output, 0, expected_count * sizeof(int16_t));
|
memset(output, 0, expected_count * sizeof(int16_t));
|
||||||
int8_t *significant = calloc(expected_count, sizeof(int8_t));
|
int8_t *significant = calloc(expected_count, sizeof(int8_t));
|
||||||
int *first_bitplane = calloc(expected_count, sizeof(int));
|
int *first_bitplane = calloc(expected_count, sizeof(int));
|
||||||
|
|
||||||
// Initialize queues
|
// Initialise queues
|
||||||
ezbc_block_queue_t insignificant, next_insignificant, significant_queue, next_significant;
|
ezbc_block_queue_t insignificant, next_insignificant, significant_queue, next_significant;
|
||||||
ezbc_queue_init(&insignificant);
|
ezbc_queue_init(&insignificant);
|
||||||
ezbc_queue_init(&next_insignificant);
|
ezbc_queue_init(&next_insignificant);
|
||||||
@@ -1398,8 +1398,8 @@ static int get_temporal_subband_level(int frame_idx, int num_frames, int tempora
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Calculate temporal quantizer scale for a given temporal subband level
|
// Calculate temporal quantiser scale for a given temporal subband level
|
||||||
static float get_temporal_quantizer_scale(int temporal_level) {
|
static float get_temporal_quantiser_scale(int temporal_level) {
|
||||||
// Uses exponential scaling: 2^(BETA × level^KAPPA)
|
// Uses exponential scaling: 2^(BETA × level^KAPPA)
|
||||||
// With BETA=0.6, KAPPA=1.14:
|
// With BETA=0.6, KAPPA=1.14:
|
||||||
// - Level 0 (tLL): 2^0.0 = 1.00
|
// - Level 0 (tLL): 2^0.0 = 1.00
|
||||||
@@ -2097,7 +2097,7 @@ static int extract_audio_to_wav(const char *input_file, const char *wav_file, in
|
|||||||
}
|
}
|
||||||
|
|
||||||
//=============================================================================
|
//=============================================================================
|
||||||
// Decoder Initialization and Cleanup
|
// Decoder Initialisation and Cleanup
|
||||||
//=============================================================================
|
//=============================================================================
|
||||||
|
|
||||||
static tav_decoder_t* tav_decoder_init(const char *input_file, const char *output_file, const char *audio_file) {
|
static tav_decoder_t* tav_decoder_init(const char *input_file, const char *output_file, const char *audio_file) {
|
||||||
@@ -2270,9 +2270,9 @@ static int decode_i_or_p_frame(tav_decoder_t *decoder, uint8_t packet_type, uint
|
|||||||
// Variable declarations for cleanup
|
// Variable declarations for cleanup
|
||||||
uint8_t *compressed_data = NULL;
|
uint8_t *compressed_data = NULL;
|
||||||
uint8_t *decompressed_data = NULL;
|
uint8_t *decompressed_data = NULL;
|
||||||
int16_t *quantized_y = NULL;
|
int16_t *quantised_y = NULL;
|
||||||
int16_t *quantized_co = NULL;
|
int16_t *quantised_co = NULL;
|
||||||
int16_t *quantized_cg = NULL;
|
int16_t *quantised_cg = NULL;
|
||||||
int decode_success = 1; // Assume success, set to 0 on error
|
int decode_success = 1; // Assume success, set to 0 on error
|
||||||
|
|
||||||
// Read and decompress frame data
|
// Read and decompress frame data
|
||||||
@@ -2357,11 +2357,11 @@ static int decode_i_or_p_frame(tav_decoder_t *decoder, uint8_t packet_type, uint
|
|||||||
} else {
|
} else {
|
||||||
// Decode coefficients (use function-level variables for proper cleanup)
|
// Decode coefficients (use function-level variables for proper cleanup)
|
||||||
int coeff_count = decoder->frame_size;
|
int coeff_count = decoder->frame_size;
|
||||||
quantized_y = calloc(coeff_count, sizeof(int16_t));
|
quantised_y = calloc(coeff_count, sizeof(int16_t));
|
||||||
quantized_co = calloc(coeff_count, sizeof(int16_t));
|
quantised_co = calloc(coeff_count, sizeof(int16_t));
|
||||||
quantized_cg = calloc(coeff_count, sizeof(int16_t));
|
quantised_cg = calloc(coeff_count, sizeof(int16_t));
|
||||||
|
|
||||||
if (!quantized_y || !quantized_co || !quantized_cg) {
|
if (!quantised_y || !quantised_co || !quantised_cg) {
|
||||||
fprintf(stderr, "Error: Failed to allocate coefficient buffers\n");
|
fprintf(stderr, "Error: Failed to allocate coefficient buffers\n");
|
||||||
decode_success = 0;
|
decode_success = 0;
|
||||||
goto write_frame;
|
goto write_frame;
|
||||||
@@ -2370,69 +2370,69 @@ static int decode_i_or_p_frame(tav_decoder_t *decoder, uint8_t packet_type, uint
|
|||||||
// Postprocess coefficients based on entropy_coder value
|
// Postprocess coefficients based on entropy_coder value
|
||||||
if (decoder->header.entropy_coder == 1) {
|
if (decoder->header.entropy_coder == 1) {
|
||||||
// EZBC format (stub implementation)
|
// EZBC format (stub implementation)
|
||||||
postprocess_coefficients_ezbc(ptr, coeff_count, quantized_y, quantized_co, quantized_cg,
|
postprocess_coefficients_ezbc(ptr, coeff_count, quantised_y, quantised_co, quantised_cg,
|
||||||
decoder->header.channel_layout);
|
decoder->header.channel_layout);
|
||||||
} else {
|
} else {
|
||||||
// Default: Twobitmap format (entropy_coder=0)
|
// Default: Twobitmap format (entropy_coder=0)
|
||||||
postprocess_coefficients_twobit(ptr, coeff_count, quantized_y, quantized_co, quantized_cg);
|
postprocess_coefficients_twobit(ptr, coeff_count, quantised_y, quantised_co, quantised_cg);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Debug: Check first few coefficients
|
// Debug: Check first few coefficients
|
||||||
// if (decoder->frame_count == 32) {
|
// if (decoder->frame_count == 32) {
|
||||||
// fprintf(stderr, " First 10 quantized Y coeffs: ");
|
// fprintf(stderr, " First 10 quantised Y coeffs: ");
|
||||||
// for (int i = 0; i < 10 && i < coeff_count; i++) {
|
// for (int i = 0; i < 10 && i < coeff_count; i++) {
|
||||||
// fprintf(stderr, "%d ", quantized_y[i]);
|
// fprintf(stderr, "%d ", quantised_y[i]);
|
||||||
// }
|
// }
|
||||||
// fprintf(stderr, "\n");
|
// fprintf(stderr, "\n");
|
||||||
//
|
//
|
||||||
// Check for any large quantized values that should produce bright pixels
|
// Check for any large quantised values that should produce bright pixels
|
||||||
// int max_quant_y = 0;
|
// int max_quant_y = 0;
|
||||||
// for (int i = 0; i < coeff_count; i++) {
|
// for (int i = 0; i < coeff_count; i++) {
|
||||||
// int abs_val = quantized_y[i] < 0 ? -quantized_y[i] : quantized_y[i];
|
// int abs_val = quantised_y[i] < 0 ? -quantised_y[i] : quantised_y[i];
|
||||||
// if (abs_val > max_quant_y) max_quant_y = abs_val;
|
// if (abs_val > max_quant_y) max_quant_y = abs_val;
|
||||||
// }
|
// }
|
||||||
// fprintf(stderr, " Max quantized Y coefficient: %d\n", max_quant_y);
|
// fprintf(stderr, " Max quantised Y coefficient: %d\n", max_quant_y);
|
||||||
// }
|
// }
|
||||||
|
|
||||||
// Dequantize (perceptual for versions 5-8, uniform for 1-4)
|
// Dequantise (perceptual for versions 5-8, uniform for 1-4)
|
||||||
const int is_perceptual = (decoder->header.version >= 5 && decoder->header.version <= 8);
|
const int is_perceptual = (decoder->header.version >= 5 && decoder->header.version <= 8);
|
||||||
const int is_ezbc = (decoder->header.entropy_coder == 1);
|
const int is_ezbc = (decoder->header.entropy_coder == 1);
|
||||||
|
|
||||||
if (is_ezbc) {
|
if (is_ezbc) {
|
||||||
// EZBC mode: coefficients are already denormalized by encoder
|
// EZBC mode: coefficients are already denormalised by encoder
|
||||||
// Just convert int16 to float without multiplying by quantizer
|
// Just convert int16 to float without multiplying by quantiser
|
||||||
for (int i = 0; i < coeff_count; i++) {
|
for (int i = 0; i < coeff_count; i++) {
|
||||||
decoder->dwt_buffer_y[i] = (float)quantized_y[i];
|
decoder->dwt_buffer_y[i] = (float)quantised_y[i];
|
||||||
decoder->dwt_buffer_co[i] = (float)quantized_co[i];
|
decoder->dwt_buffer_co[i] = (float)quantised_co[i];
|
||||||
decoder->dwt_buffer_cg[i] = (float)quantized_cg[i];
|
decoder->dwt_buffer_cg[i] = (float)quantised_cg[i];
|
||||||
}
|
}
|
||||||
} else if (is_perceptual) {
|
} else if (is_perceptual) {
|
||||||
dequantize_dwt_subbands_perceptual(0, qy, quantized_y, decoder->dwt_buffer_y,
|
dequantise_dwt_subbands_perceptual(0, qy, quantised_y, decoder->dwt_buffer_y,
|
||||||
decoder->header.width, decoder->header.height,
|
decoder->header.width, decoder->header.height,
|
||||||
decoder->header.decomp_levels, qy, 0, decoder->frame_count);
|
decoder->header.decomp_levels, qy, 0, decoder->frame_count);
|
||||||
|
|
||||||
// Debug: Check if values survived the function call
|
// Debug: Check if values survived the function call
|
||||||
// if (decoder->frame_count == 32) {
|
// if (decoder->frame_count == 32) {
|
||||||
// fprintf(stderr, " RIGHT AFTER dequantize_Y returns: first 5 values: %.1f %.1f %.1f %.1f %.1f\n",
|
// fprintf(stderr, " RIGHT AFTER dequantise_Y returns: first 5 values: %.1f %.1f %.1f %.1f %.1f\n",
|
||||||
// decoder->dwt_buffer_y[0], decoder->dwt_buffer_y[1], decoder->dwt_buffer_y[2],
|
// decoder->dwt_buffer_y[0], decoder->dwt_buffer_y[1], decoder->dwt_buffer_y[2],
|
||||||
// decoder->dwt_buffer_y[3], decoder->dwt_buffer_y[4]);
|
// decoder->dwt_buffer_y[3], decoder->dwt_buffer_y[4]);
|
||||||
// }
|
// }
|
||||||
|
|
||||||
dequantize_dwt_subbands_perceptual(0, qy, quantized_co, decoder->dwt_buffer_co,
|
dequantise_dwt_subbands_perceptual(0, qy, quantised_co, decoder->dwt_buffer_co,
|
||||||
decoder->header.width, decoder->header.height,
|
decoder->header.width, decoder->header.height,
|
||||||
decoder->header.decomp_levels, qco, 1, decoder->frame_count);
|
decoder->header.decomp_levels, qco, 1, decoder->frame_count);
|
||||||
dequantize_dwt_subbands_perceptual(0, qy, quantized_cg, decoder->dwt_buffer_cg,
|
dequantise_dwt_subbands_perceptual(0, qy, quantised_cg, decoder->dwt_buffer_cg,
|
||||||
decoder->header.width, decoder->header.height,
|
decoder->header.width, decoder->header.height,
|
||||||
decoder->header.decomp_levels, qcg, 1, decoder->frame_count);
|
decoder->header.decomp_levels, qcg, 1, decoder->frame_count);
|
||||||
} else {
|
} else {
|
||||||
for (int i = 0; i < coeff_count; i++) {
|
for (int i = 0; i < coeff_count; i++) {
|
||||||
decoder->dwt_buffer_y[i] = quantized_y[i] * qy;
|
decoder->dwt_buffer_y[i] = quantised_y[i] * qy;
|
||||||
decoder->dwt_buffer_co[i] = quantized_co[i] * qco;
|
decoder->dwt_buffer_co[i] = quantised_co[i] * qco;
|
||||||
decoder->dwt_buffer_cg[i] = quantized_cg[i] * qcg;
|
decoder->dwt_buffer_cg[i] = quantised_cg[i] * qcg;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Debug: Check dequantized values using correct subband layout
|
// Debug: Check dequantised values using correct subband layout
|
||||||
// if (decoder->frame_count == 32) {
|
// if (decoder->frame_count == 32) {
|
||||||
// dwt_subband_info_t subbands[32];
|
// dwt_subband_info_t subbands[32];
|
||||||
// const int subband_count = calculate_subband_layout(decoder->header.width, decoder->header.height,
|
// const int subband_count = calculate_subband_layout(decoder->header.width, decoder->header.height,
|
||||||
@@ -2459,7 +2459,7 @@ static int decode_i_or_p_frame(tav_decoder_t *decoder, uint8_t packet_type, uint
|
|||||||
// }
|
// }
|
||||||
// }
|
// }
|
||||||
|
|
||||||
// Remove grain synthesis from Y channel (must happen after dequantization, before inverse DWT)
|
// Remove grain synthesis from Y channel (must happen after dequantisation, before inverse DWT)
|
||||||
remove_grain_synthesis_decoder(decoder->dwt_buffer_y, decoder->header.width, decoder->header.height,
|
remove_grain_synthesis_decoder(decoder->dwt_buffer_y, decoder->header.width, decoder->header.height,
|
||||||
decoder->header.decomp_levels, decoder->frame_count, decoder->header.quantiser_y);
|
decoder->header.decomp_levels, decoder->frame_count, decoder->header.quantiser_y);
|
||||||
|
|
||||||
@@ -2479,7 +2479,7 @@ static int decode_i_or_p_frame(tav_decoder_t *decoder, uint8_t packet_type, uint
|
|||||||
// }
|
// }
|
||||||
|
|
||||||
// Apply inverse DWT with correct non-power-of-2 dimension handling
|
// Apply inverse DWT with correct non-power-of-2 dimension handling
|
||||||
// Note: quantized arrays freed at write_frame label
|
// Note: quantised arrays freed at write_frame label
|
||||||
apply_inverse_dwt_multilevel(decoder->dwt_buffer_y, decoder->header.width, decoder->header.height,
|
apply_inverse_dwt_multilevel(decoder->dwt_buffer_y, decoder->header.width, decoder->header.height,
|
||||||
decoder->header.decomp_levels, decoder->header.wavelet_filter);
|
decoder->header.decomp_levels, decoder->header.wavelet_filter);
|
||||||
apply_inverse_dwt_multilevel(decoder->dwt_buffer_co, decoder->header.width, decoder->header.height,
|
apply_inverse_dwt_multilevel(decoder->dwt_buffer_co, decoder->header.width, decoder->header.height,
|
||||||
@@ -2580,9 +2580,9 @@ write_frame:
|
|||||||
// Clean up temporary allocations
|
// Clean up temporary allocations
|
||||||
if (compressed_data) free(compressed_data);
|
if (compressed_data) free(compressed_data);
|
||||||
if (decompressed_data) free(decompressed_data);
|
if (decompressed_data) free(decompressed_data);
|
||||||
if (quantized_y) free(quantized_y);
|
if (quantised_y) free(quantised_y);
|
||||||
if (quantized_co) free(quantized_co);
|
if (quantised_co) free(quantised_co);
|
||||||
if (quantized_cg) free(quantized_cg);
|
if (quantised_cg) free(quantised_cg);
|
||||||
|
|
||||||
// If decoding failed, fill frame with black to maintain stream alignment
|
// If decoding failed, fill frame with black to maintain stream alignment
|
||||||
if (!decode_success) {
|
if (!decode_success) {
|
||||||
@@ -2646,7 +2646,7 @@ static void print_usage(const char *prog) {
|
|||||||
printf(" - TAD audio (decoded to PCMu8)\n");
|
printf(" - TAD audio (decoded to PCMu8)\n");
|
||||||
printf(" - MP2 audio (passed through)\n");
|
printf(" - MP2 audio (passed through)\n");
|
||||||
printf(" - All wavelet types (5/3, 9/7, CDF 13/7, DD-4, Haar)\n");
|
printf(" - All wavelet types (5/3, 9/7, CDF 13/7, DD-4, Haar)\n");
|
||||||
printf(" - Perceptual quantization (versions 5-8)\n");
|
printf(" - Perceptual quantisation (versions 5-8)\n");
|
||||||
printf(" - YCoCg-R and ICtCp color spaces\n\n");
|
printf(" - YCoCg-R and ICtCp color spaces\n\n");
|
||||||
printf("Unsupported features (not in TSVM decoder):\n");
|
printf("Unsupported features (not in TSVM decoder):\n");
|
||||||
printf(" - MC-EZBC motion compensation\n");
|
printf(" - MC-EZBC motion compensation\n");
|
||||||
@@ -2708,7 +2708,7 @@ int main(int argc, char *argv[]) {
|
|||||||
// Pass 2: Decode video with audio file
|
// Pass 2: Decode video with audio file
|
||||||
tav_decoder_t *decoder = tav_decoder_init(input_file, output_file, temp_audio_file);
|
tav_decoder_t *decoder = tav_decoder_init(input_file, output_file, temp_audio_file);
|
||||||
if (!decoder) {
|
if (!decoder) {
|
||||||
fprintf(stderr, "Failed to initialize decoder\n");
|
fprintf(stderr, "Failed to initialise decoder\n");
|
||||||
unlink(temp_audio_file); // Clean up temp file
|
unlink(temp_audio_file); // Clean up temp file
|
||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
@@ -2853,34 +2853,34 @@ int main(int argc, char *argv[]) {
|
|||||||
|
|
||||||
// Postprocess coefficients based on entropy_coder value
|
// Postprocess coefficients based on entropy_coder value
|
||||||
const int num_pixels = decoder->header.width * decoder->header.height;
|
const int num_pixels = decoder->header.width * decoder->header.height;
|
||||||
int16_t ***quantized_gop;
|
int16_t ***quantised_gop;
|
||||||
|
|
||||||
if (decoder->header.entropy_coder == 2) {
|
if (decoder->header.entropy_coder == 2) {
|
||||||
// RAW format: simple concatenated int16 arrays
|
// RAW format: simple concatenated int16 arrays
|
||||||
if (verbose) {
|
if (verbose) {
|
||||||
fprintf(stderr, " Using RAW postprocessing (entropy_coder=2)\n");
|
fprintf(stderr, " Using RAW postprocessing (entropy_coder=2)\n");
|
||||||
}
|
}
|
||||||
quantized_gop = postprocess_gop_raw(decompressed_data, decompressed_size,
|
quantised_gop = postprocess_gop_raw(decompressed_data, decompressed_size,
|
||||||
gop_size, num_pixels, decoder->header.channel_layout);
|
gop_size, num_pixels, decoder->header.channel_layout);
|
||||||
} else if (decoder->header.entropy_coder == 1) {
|
} else if (decoder->header.entropy_coder == 1) {
|
||||||
// EZBC format: embedded zero-block coding
|
// EZBC format: embedded zero-block coding
|
||||||
if (verbose) {
|
if (verbose) {
|
||||||
fprintf(stderr, " Using EZBC postprocessing (entropy_coder=1)\n");
|
fprintf(stderr, " Using EZBC postprocessing (entropy_coder=1)\n");
|
||||||
}
|
}
|
||||||
quantized_gop = postprocess_gop_ezbc(decompressed_data, decompressed_size,
|
quantised_gop = postprocess_gop_ezbc(decompressed_data, decompressed_size,
|
||||||
gop_size, num_pixels, decoder->header.channel_layout);
|
gop_size, num_pixels, decoder->header.channel_layout);
|
||||||
} else {
|
} else {
|
||||||
// Default: Twobitmap format (entropy_coder=0)
|
// Default: Twobitmap format (entropy_coder=0)
|
||||||
if (verbose) {
|
if (verbose) {
|
||||||
fprintf(stderr, " Using Twobitmap postprocessing (entropy_coder=0)\n");
|
fprintf(stderr, " Using Twobitmap postprocessing (entropy_coder=0)\n");
|
||||||
}
|
}
|
||||||
quantized_gop = postprocess_gop_unified(decompressed_data, decompressed_size,
|
quantised_gop = postprocess_gop_unified(decompressed_data, decompressed_size,
|
||||||
gop_size, num_pixels, decoder->header.channel_layout);
|
gop_size, num_pixels, decoder->header.channel_layout);
|
||||||
}
|
}
|
||||||
|
|
||||||
free(decompressed_data);
|
free(decompressed_data);
|
||||||
|
|
||||||
if (!quantized_gop) {
|
if (!quantised_gop) {
|
||||||
fprintf(stderr, "Error: Failed to postprocess GOP data\n");
|
fprintf(stderr, "Error: Failed to postprocess GOP data\n");
|
||||||
result = -1;
|
result = -1;
|
||||||
break;
|
break;
|
||||||
@@ -2897,78 +2897,78 @@ int main(int argc, char *argv[]) {
|
|||||||
gop_cg[t] = calloc(num_pixels, sizeof(float));
|
gop_cg[t] = calloc(num_pixels, sizeof(float));
|
||||||
}
|
}
|
||||||
|
|
||||||
// Dequantize with temporal scaling (perceptual quantization for versions 5-8)
|
// Dequantise with temporal scaling (perceptual quantisation for versions 5-8)
|
||||||
const int is_perceptual = (decoder->header.version >= 5 && decoder->header.version <= 8);
|
const int is_perceptual = (decoder->header.version >= 5 && decoder->header.version <= 8);
|
||||||
const int is_ezbc = (decoder->header.entropy_coder == 1);
|
const int is_ezbc = (decoder->header.entropy_coder == 1);
|
||||||
const int temporal_levels = 2; // Fixed for TAV GOP encoding
|
const int temporal_levels = 2; // Fixed for TAV GOP encoding
|
||||||
|
|
||||||
for (int t = 0; t < gop_size; t++) {
|
for (int t = 0; t < gop_size; t++) {
|
||||||
if (is_ezbc) {
|
if (is_ezbc) {
|
||||||
// EZBC mode: coefficients are already denormalized by encoder
|
// EZBC mode: coefficients are already denormalised by encoder
|
||||||
// Just convert int16 to float without multiplying by quantizer
|
// Just convert int16 to float without multiplying by quantiser
|
||||||
for (int i = 0; i < num_pixels; i++) {
|
for (int i = 0; i < num_pixels; i++) {
|
||||||
gop_y[t][i] = (float)quantized_gop[t][0][i];
|
gop_y[t][i] = (float)quantised_gop[t][0][i];
|
||||||
gop_co[t][i] = (float)quantized_gop[t][1][i];
|
gop_co[t][i] = (float)quantised_gop[t][1][i];
|
||||||
gop_cg[t][i] = (float)quantized_gop[t][2][i];
|
gop_cg[t][i] = (float)quantised_gop[t][2][i];
|
||||||
}
|
}
|
||||||
|
|
||||||
if (t == 0) {
|
if (t == 0) {
|
||||||
// Debug first frame
|
// Debug first frame
|
||||||
int16_t max_y = 0, min_y = 0;
|
int16_t max_y = 0, min_y = 0;
|
||||||
for (int i = 0; i < num_pixels; i++) {
|
for (int i = 0; i < num_pixels; i++) {
|
||||||
if (quantized_gop[t][0][i] > max_y) max_y = quantized_gop[t][0][i];
|
if (quantised_gop[t][0][i] > max_y) max_y = quantised_gop[t][0][i];
|
||||||
if (quantized_gop[t][0][i] < min_y) min_y = quantized_gop[t][0][i];
|
if (quantised_gop[t][0][i] < min_y) min_y = quantised_gop[t][0][i];
|
||||||
}
|
}
|
||||||
fprintf(stderr, "[GOP-EZBC] Frame 0 Y coeffs range: [%d, %d], first 5: %d %d %d %d %d\n",
|
fprintf(stderr, "[GOP-EZBC] Frame 0 Y coeffs range: [%d, %d], first 5: %d %d %d %d %d\n",
|
||||||
min_y, max_y,
|
min_y, max_y,
|
||||||
quantized_gop[t][0][0], quantized_gop[t][0][1], quantized_gop[t][0][2],
|
quantised_gop[t][0][0], quantised_gop[t][0][1], quantised_gop[t][0][2],
|
||||||
quantized_gop[t][0][3], quantized_gop[t][0][4]);
|
quantised_gop[t][0][3], quantised_gop[t][0][4]);
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
// Normal mode: multiply by quantizer
|
// Normal mode: multiply by quantiser
|
||||||
const int temporal_level = get_temporal_subband_level(t, gop_size, temporal_levels);
|
const int temporal_level = get_temporal_subband_level(t, gop_size, temporal_levels);
|
||||||
const float temporal_scale = get_temporal_quantizer_scale(temporal_level);
|
const float temporal_scale = get_temporal_quantiser_scale(temporal_level);
|
||||||
|
|
||||||
// CRITICAL: Must ROUND temporal quantizer to match encoder's roundf() behavior
|
// CRITICAL: Must ROUND temporal quantiser to match encoder's roundf() behavior
|
||||||
const float base_q_y = roundf(decoder->header.quantiser_y * temporal_scale);
|
const float base_q_y = roundf(decoder->header.quantiser_y * temporal_scale);
|
||||||
const float base_q_co = roundf(decoder->header.quantiser_co * temporal_scale);
|
const float base_q_co = roundf(decoder->header.quantiser_co * temporal_scale);
|
||||||
const float base_q_cg = roundf(decoder->header.quantiser_cg * temporal_scale);
|
const float base_q_cg = roundf(decoder->header.quantiser_cg * temporal_scale);
|
||||||
|
|
||||||
if (is_perceptual) {
|
if (is_perceptual) {
|
||||||
dequantize_dwt_subbands_perceptual(0, decoder->header.quantiser_y,
|
dequantise_dwt_subbands_perceptual(0, decoder->header.quantiser_y,
|
||||||
quantized_gop[t][0], gop_y[t],
|
quantised_gop[t][0], gop_y[t],
|
||||||
decoder->header.width, decoder->header.height,
|
decoder->header.width, decoder->header.height,
|
||||||
decoder->header.decomp_levels, base_q_y, 0, decoder->frame_count + t);
|
decoder->header.decomp_levels, base_q_y, 0, decoder->frame_count + t);
|
||||||
dequantize_dwt_subbands_perceptual(0, decoder->header.quantiser_y,
|
dequantise_dwt_subbands_perceptual(0, decoder->header.quantiser_y,
|
||||||
quantized_gop[t][1], gop_co[t],
|
quantised_gop[t][1], gop_co[t],
|
||||||
decoder->header.width, decoder->header.height,
|
decoder->header.width, decoder->header.height,
|
||||||
decoder->header.decomp_levels, base_q_co, 1, decoder->frame_count + t);
|
decoder->header.decomp_levels, base_q_co, 1, decoder->frame_count + t);
|
||||||
dequantize_dwt_subbands_perceptual(0, decoder->header.quantiser_y,
|
dequantise_dwt_subbands_perceptual(0, decoder->header.quantiser_y,
|
||||||
quantized_gop[t][2], gop_cg[t],
|
quantised_gop[t][2], gop_cg[t],
|
||||||
decoder->header.width, decoder->header.height,
|
decoder->header.width, decoder->header.height,
|
||||||
decoder->header.decomp_levels, base_q_cg, 1, decoder->frame_count + t);
|
decoder->header.decomp_levels, base_q_cg, 1, decoder->frame_count + t);
|
||||||
} else {
|
} else {
|
||||||
// Uniform quantization for older versions
|
// Uniform quantisation for older versions
|
||||||
for (int i = 0; i < num_pixels; i++) {
|
for (int i = 0; i < num_pixels; i++) {
|
||||||
gop_y[t][i] = quantized_gop[t][0][i] * base_q_y;
|
gop_y[t][i] = quantised_gop[t][0][i] * base_q_y;
|
||||||
gop_co[t][i] = quantized_gop[t][1][i] * base_q_co;
|
gop_co[t][i] = quantised_gop[t][1][i] * base_q_co;
|
||||||
gop_cg[t][i] = quantized_gop[t][2][i] * base_q_cg;
|
gop_cg[t][i] = quantised_gop[t][2][i] * base_q_cg;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Free quantized coefficients
|
// Free quantised coefficients
|
||||||
for (int t = 0; t < gop_size; t++) {
|
for (int t = 0; t < gop_size; t++) {
|
||||||
free(quantized_gop[t][0]);
|
free(quantised_gop[t][0]);
|
||||||
free(quantized_gop[t][1]);
|
free(quantised_gop[t][1]);
|
||||||
free(quantized_gop[t][2]);
|
free(quantised_gop[t][2]);
|
||||||
free(quantized_gop[t]);
|
free(quantised_gop[t]);
|
||||||
}
|
}
|
||||||
free(quantized_gop);
|
free(quantised_gop);
|
||||||
|
|
||||||
// Remove grain synthesis from Y channel for each GOP frame
|
// Remove grain synthesis from Y channel for each GOP frame
|
||||||
// This must happen after dequantization but before inverse DWT
|
// This must happen after dequantisation but before inverse DWT
|
||||||
for (int t = 0; t < gop_size; t++) {
|
for (int t = 0; t < gop_size; t++) {
|
||||||
remove_grain_synthesis_decoder(gop_y[t], decoder->header.width, decoder->header.height,
|
remove_grain_synthesis_decoder(gop_y[t], decoder->header.width, decoder->header.height,
|
||||||
decoder->header.decomp_levels, decoder->frame_count + t,
|
decoder->header.decomp_levels, decoder->frame_count + t,
|
||||||
|
|||||||
@@ -100,8 +100,8 @@ static ycocg_t rgb_to_ycocg_correct(uint8_t r, uint8_t g, uint8_t b, float dithe
|
|||||||
return result;
|
return result;
|
||||||
}
|
}
|
||||||
|
|
||||||
static int quantize_4bit_y(float value) {
|
static int quantise_4bit_y(float value) {
|
||||||
// Y quantization: round(y * 15)
|
// Y quantisation: round(y * 15)
|
||||||
return (int)round(fmaxf(0.0f, fminf(15.0f, value * 15.0f)));
|
return (int)round(fmaxf(0.0f, fminf(15.0f, value * 15.0f)));
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -360,7 +360,7 @@ static void encode_ipf1_block_correct(uint8_t *rgb_data, int width, int height,
|
|||||||
pixels[idx] = (ycocg_t){0.0f, 0.0f, 0.0f};
|
pixels[idx] = (ycocg_t){0.0f, 0.0f, 0.0f};
|
||||||
}
|
}
|
||||||
|
|
||||||
y_values[idx] = quantize_4bit_y(pixels[idx].y);
|
y_values[idx] = quantise_4bit_y(pixels[idx].y);
|
||||||
co_values[idx] = pixels[idx].co;
|
co_values[idx] = pixels[idx].co;
|
||||||
cg_values[idx] = pixels[idx].cg;
|
cg_values[idx] = pixels[idx].cg;
|
||||||
}
|
}
|
||||||
@@ -567,7 +567,7 @@ static int process_audio(encoder_config_t *config, int frame_num, FILE *output)
|
|||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Initialize packet size on first frame
|
// Initialise packet size on first frame
|
||||||
if (config->mp2_packet_size == 0) {
|
if (config->mp2_packet_size == 0) {
|
||||||
uint8_t header[4];
|
uint8_t header[4];
|
||||||
if (fread(header, 1, 4, config->mp2_file) != 4) return 1;
|
if (fread(header, 1, 4, config->mp2_file) != 4) return 1;
|
||||||
@@ -589,7 +589,7 @@ static int process_audio(encoder_config_t *config, int frame_num, FILE *output)
|
|||||||
double packets_per_frame = frame_audio_time / packet_audio_time;
|
double packets_per_frame = frame_audio_time / packet_audio_time;
|
||||||
|
|
||||||
// Only insert audio when buffer would go below 2 frames
|
// Only insert audio when buffer would go below 2 frames
|
||||||
// Initialize with 2 packets on first frame to prime the buffer
|
// Initialise with 2 packets on first frame to prime the buffer
|
||||||
int packets_to_insert = 0;
|
int packets_to_insert = 0;
|
||||||
if (frame_num == 1) {
|
if (frame_num == 1) {
|
||||||
packets_to_insert = 2;
|
packets_to_insert = 2;
|
||||||
@@ -654,7 +654,7 @@ static void write_tvdos_header(encoder_config_t *config, FILE *output) {
|
|||||||
fwrite(reserved, 1, 10, output);
|
fwrite(reserved, 1, 10, output);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Initialize encoder configuration
|
// Initialise encoder configuration
|
||||||
static encoder_config_t *init_encoder_config() {
|
static encoder_config_t *init_encoder_config() {
|
||||||
encoder_config_t *config = calloc(1, sizeof(encoder_config_t));
|
encoder_config_t *config = calloc(1, sizeof(encoder_config_t));
|
||||||
if (!config) return NULL;
|
if (!config) return NULL;
|
||||||
@@ -807,7 +807,7 @@ static void print_usage(const char *program_name) {
|
|||||||
int main(int argc, char *argv[]) {
|
int main(int argc, char *argv[]) {
|
||||||
encoder_config_t *config = init_encoder_config();
|
encoder_config_t *config = init_encoder_config();
|
||||||
if (!config) {
|
if (!config) {
|
||||||
fprintf(stderr, "Failed to initialize encoder\n");
|
fprintf(stderr, "Failed to initialise encoder\n");
|
||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -904,7 +904,7 @@ int main(int argc, char *argv[]) {
|
|||||||
// Write TVDOS header
|
// Write TVDOS header
|
||||||
write_tvdos_header(config, output);
|
write_tvdos_header(config, output);
|
||||||
|
|
||||||
// Initialize progress tracking
|
// Initialise progress tracking
|
||||||
gettimeofday(&config->start_time, NULL);
|
gettimeofday(&config->start_time, NULL);
|
||||||
config->last_progress_time = config->start_time;
|
config->last_progress_time = config->start_time;
|
||||||
config->total_output_bytes = 8 + 2 + 2 + 2 + 4 + 2 + 2 + 10; // TVDOS header size
|
config->total_output_bytes = 8 + 2 + 2 + 2 + 4 + 2 + 2 + 10; // TVDOS header size
|
||||||
|
|||||||
@@ -19,7 +19,7 @@
|
|||||||
static const float TAD32_COEFF_SCALARS[] = {64.0f, 45.255f, 32.0f, 22.627f, 16.0f, 11.314f, 8.0f, 5.657f, 4.0f, 2.828f};
|
static const float TAD32_COEFF_SCALARS[] = {64.0f, 45.255f, 32.0f, 22.627f, 16.0f, 11.314f, 8.0f, 5.657f, 4.0f, 2.828f};
|
||||||
|
|
||||||
// Base quantiser weight table (10 subbands: LL + 9 H bands)
|
// Base quantiser weight table (10 subbands: LL + 9 H bands)
|
||||||
// These weights are multiplied by quantiser_scale during quantization
|
// These weights are multiplied by quantiser_scale during quantisation
|
||||||
static const float BASE_QUANTISER_WEIGHTS[2][10] = {
|
static const float BASE_QUANTISER_WEIGHTS[2][10] = {
|
||||||
{ // mid channel
|
{ // mid channel
|
||||||
4.0f, // LL (L9) DC
|
4.0f, // LL (L9) DC
|
||||||
@@ -104,7 +104,7 @@ static int calculate_dwt_levels(int chunk_size) {
|
|||||||
|
|
||||||
// Special marker for deadzoned coefficients (will be reconstructed with noise on decode)
|
// Special marker for deadzoned coefficients (will be reconstructed with noise on decode)
|
||||||
#define DEADZONE_MARKER_FLOAT (-999.0f) // Unmistakable marker in float domain
|
#define DEADZONE_MARKER_FLOAT (-999.0f) // Unmistakable marker in float domain
|
||||||
#define DEADZONE_MARKER_QUANT (-128) // Maps to this in quantized domain (int8 minimum)
|
#define DEADZONE_MARKER_QUANT (-128) // Maps to this in quantised domain (int8 minimum)
|
||||||
|
|
||||||
// Perceptual epsilon - coefficients below this are truly zero (inaudible)
|
// Perceptual epsilon - coefficients below this are truly zero (inaudible)
|
||||||
#define EPSILON_PERCEPTUAL 0.001f
|
#define EPSILON_PERCEPTUAL 0.001f
|
||||||
@@ -296,7 +296,7 @@ static void calculate_preemphasis_coeffs(float *b0, float *b1, float *a1) {
|
|||||||
|
|
||||||
*b0 = 1.0f;
|
*b0 = 1.0f;
|
||||||
*b1 = -alpha;
|
*b1 = -alpha;
|
||||||
*a1 = 0.0f; // No feedback (FIR filter)
|
*a1 = 0.0f; // No feedback
|
||||||
}
|
}
|
||||||
|
|
||||||
// emphasis at alpha=0.5 shifts quantisation crackles to lower frequency which MIGHT be more preferable
|
// emphasis at alpha=0.5 shifts quantisation crackles to lower frequency which MIGHT be more preferable
|
||||||
@@ -372,14 +372,14 @@ static void compress_mu_law(float *left, float *right, size_t count) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
//=============================================================================
|
//=============================================================================
|
||||||
// Quantization with Frequency-Dependent Weighting
|
// Quantisation with Frequency-Dependent Weighting
|
||||||
//=============================================================================
|
//=============================================================================
|
||||||
|
|
||||||
#define LAMBDA_FIXED 6.0f
|
#define LAMBDA_FIXED 6.0f
|
||||||
|
|
||||||
// Lambda-based companding encoder (based on Laplacian distribution CDF)
|
// Lambda-based companding encoder (based on Laplacian distribution CDF)
|
||||||
// val must be normalised to [-1,1]
|
// val must be normalised to [-1,1]
|
||||||
// Returns quantized index in range [-127, +127]
|
// Returns quantised index in range [-127, +127]
|
||||||
static int8_t lambda_companding(float val, int max_index) {
|
static int8_t lambda_companding(float val, int max_index) {
|
||||||
// Handle zero
|
// Handle zero
|
||||||
if (fabsf(val) < 1e-9f) {
|
if (fabsf(val) < 1e-9f) {
|
||||||
@@ -398,10 +398,10 @@ static int8_t lambda_companding(float val, int max_index) {
|
|||||||
float cdf = 1.0f - 0.5f * expf(-LAMBDA_FIXED * abs_val);
|
float cdf = 1.0f - 0.5f * expf(-LAMBDA_FIXED * abs_val);
|
||||||
|
|
||||||
// Map CDF from [0.5, 1.0] to [0, 1] for positive half
|
// Map CDF from [0.5, 1.0] to [0, 1] for positive half
|
||||||
float normalized_cdf = (cdf - 0.5f) * 2.0f;
|
float normalised_cdf = (cdf - 0.5f) * 2.0f;
|
||||||
|
|
||||||
// Quantize to index
|
// Quantise to index
|
||||||
int index = (int)roundf(normalized_cdf * max_index);
|
int index = (int)roundf(normalised_cdf * max_index);
|
||||||
|
|
||||||
// Clamp index to valid range [0, max_index]
|
// Clamp index to valid range [0, max_index]
|
||||||
if (index < 0) index = 0;
|
if (index < 0) index = 0;
|
||||||
@@ -410,7 +410,7 @@ static int8_t lambda_companding(float val, int max_index) {
|
|||||||
return (int8_t)(sign * index);
|
return (int8_t)(sign * index);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void quantize_dwt_coefficients(int channel, const float *coeffs, int8_t *quantized, size_t count, int apply_deadzone, int chunk_size, int dwt_levels, int max_index, int *current_subband_index, float quantiser_scale) {
|
static void quantise_dwt_coefficients(int channel, const float *coeffs, int8_t *quantised, size_t count, int apply_deadzone, int chunk_size, int dwt_levels, int max_index, int *current_subband_index, float quantiser_scale) {
|
||||||
int first_band_size = chunk_size >> dwt_levels;
|
int first_band_size = chunk_size >> dwt_levels;
|
||||||
|
|
||||||
int *sideband_starts = malloc((dwt_levels + 2) * sizeof(int));
|
int *sideband_starts = malloc((dwt_levels + 2) * sizeof(int));
|
||||||
@@ -436,14 +436,14 @@ static void quantize_dwt_coefficients(int channel, const float *coeffs, int8_t *
|
|||||||
|
|
||||||
// Check for deadzone marker (special handling)
|
// Check for deadzone marker (special handling)
|
||||||
/*if (coeffs[i] == DEADZONE_MARKER_FLOAT) {
|
/*if (coeffs[i] == DEADZONE_MARKER_FLOAT) {
|
||||||
// Map to special quantized marker for stochastic reconstruction
|
// Map to special quantised marker for stochastic reconstruction
|
||||||
quantized[i] = (int8_t)DEADZONE_MARKER_QUANT;
|
quantised[i] = (int8_t)DEADZONE_MARKER_QUANT;
|
||||||
} else {*/
|
} else {*/
|
||||||
// Normal quantization
|
// Normal quantisation
|
||||||
float weight = BASE_QUANTISER_WEIGHTS[channel][sideband] * quantiser_scale;
|
float weight = BASE_QUANTISER_WEIGHTS[channel][sideband] * quantiser_scale;
|
||||||
float val = (coeffs[i] / (TAD32_COEFF_SCALARS[sideband] * weight)); // val is normalised to [-1,1]
|
float val = (coeffs[i] / (TAD32_COEFF_SCALARS[sideband] * weight)); // val is normalised to [-1,1]
|
||||||
int8_t quant_val = lambda_companding(val, max_index);
|
int8_t quant_val = lambda_companding(val, max_index);
|
||||||
quantized[i] = quant_val;
|
quantised[i] = quant_val;
|
||||||
// }
|
// }
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -489,11 +489,11 @@ static CoeffAccumulator *side_accumulators = NULL;
|
|||||||
static QuantAccumulator *mid_quant_accumulators = NULL;
|
static QuantAccumulator *mid_quant_accumulators = NULL;
|
||||||
static QuantAccumulator *side_quant_accumulators = NULL;
|
static QuantAccumulator *side_quant_accumulators = NULL;
|
||||||
static int num_subbands = 0;
|
static int num_subbands = 0;
|
||||||
static int stats_initialized = 0;
|
static int stats_initialised = 0;
|
||||||
static int stats_dwt_levels = 0;
|
static int stats_dwt_levels = 0;
|
||||||
|
|
||||||
static void init_statistics(int dwt_levels) {
|
static void init_statistics(int dwt_levels) {
|
||||||
if (stats_initialized) return;
|
if (stats_initialised) return;
|
||||||
|
|
||||||
num_subbands = dwt_levels + 1;
|
num_subbands = dwt_levels + 1;
|
||||||
stats_dwt_levels = dwt_levels;
|
stats_dwt_levels = dwt_levels;
|
||||||
@@ -521,7 +521,7 @@ static void init_statistics(int dwt_levels) {
|
|||||||
side_quant_accumulators[i].count = 0;
|
side_quant_accumulators[i].count = 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
stats_initialized = 1;
|
stats_initialised = 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void accumulate_coefficients(const float *coeffs, int dwt_levels, int chunk_size, CoeffAccumulator *accumulators) {
|
static void accumulate_coefficients(const float *coeffs, int dwt_levels, int chunk_size, CoeffAccumulator *accumulators) {
|
||||||
@@ -555,7 +555,7 @@ static void accumulate_coefficients(const float *coeffs, int dwt_levels, int chu
|
|||||||
free(sideband_starts);
|
free(sideband_starts);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void accumulate_quantized(const int8_t *quant, int dwt_levels, int chunk_size, QuantAccumulator *accumulators) {
|
static void accumulate_quantised(const int8_t *quant, int dwt_levels, int chunk_size, QuantAccumulator *accumulators) {
|
||||||
int first_band_size = chunk_size >> dwt_levels;
|
int first_band_size = chunk_size >> dwt_levels;
|
||||||
|
|
||||||
int *sideband_starts = malloc((dwt_levels + 2) * sizeof(int));
|
int *sideband_starts = malloc((dwt_levels + 2) * sizeof(int));
|
||||||
@@ -690,7 +690,7 @@ static int compare_value_frequency(const void *a, const void *b) {
|
|||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void print_top5_quantized_values(const int8_t *quant, size_t count, const char *title) {
|
static void print_top5_quantised_values(const int8_t *quant, size_t count, const char *title) {
|
||||||
if (count == 0) {
|
if (count == 0) {
|
||||||
fprintf(stderr, " %s: No data\n", title);
|
fprintf(stderr, " %s: No data\n", title);
|
||||||
return;
|
return;
|
||||||
@@ -731,9 +731,9 @@ static void print_top5_quantized_values(const int8_t *quant, size_t count, const
|
|||||||
}
|
}
|
||||||
|
|
||||||
void tad32_print_statistics(void) {
|
void tad32_print_statistics(void) {
|
||||||
if (!stats_initialized) return;
|
if (!stats_initialised) return;
|
||||||
|
|
||||||
fprintf(stderr, "\n=== TAD Coefficient Statistics (before quantization) ===\n");
|
fprintf(stderr, "\n=== TAD Coefficient Statistics (before quantisation) ===\n");
|
||||||
|
|
||||||
// Print Mid channel statistics
|
// Print Mid channel statistics
|
||||||
fprintf(stderr, "\nMid Channel:\n");
|
fprintf(stderr, "\nMid Channel:\n");
|
||||||
@@ -803,11 +803,11 @@ void tad32_print_statistics(void) {
|
|||||||
print_histogram(side_accumulators[s].data, side_accumulators[s].count, band_name);
|
print_histogram(side_accumulators[s].data, side_accumulators[s].count, band_name);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Print quantized values statistics
|
// Print quantised values statistics
|
||||||
fprintf(stderr, "\n=== TAD Quantized Values Statistics (after quantization) ===\n");
|
fprintf(stderr, "\n=== TAD Quantised Values Statistics (after quantisation) ===\n");
|
||||||
|
|
||||||
// Print Mid channel quantized values
|
// Print Mid channel quantised values
|
||||||
fprintf(stderr, "\nMid Channel Quantized Values:\n");
|
fprintf(stderr, "\nMid Channel Quantised Values:\n");
|
||||||
for (int s = 0; s < num_subbands; s++) {
|
for (int s = 0; s < num_subbands; s++) {
|
||||||
char band_name[32];
|
char band_name[32];
|
||||||
if (s == 0) {
|
if (s == 0) {
|
||||||
@@ -815,11 +815,11 @@ void tad32_print_statistics(void) {
|
|||||||
} else {
|
} else {
|
||||||
snprintf(band_name, sizeof(band_name), "H (L%d)", stats_dwt_levels - s + 1);
|
snprintf(band_name, sizeof(band_name), "H (L%d)", stats_dwt_levels - s + 1);
|
||||||
}
|
}
|
||||||
print_top5_quantized_values(mid_quant_accumulators[s].data, mid_quant_accumulators[s].count, band_name);
|
print_top5_quantised_values(mid_quant_accumulators[s].data, mid_quant_accumulators[s].count, band_name);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Print Side channel quantized values
|
// Print Side channel quantised values
|
||||||
fprintf(stderr, "\nSide Channel Quantized Values:\n");
|
fprintf(stderr, "\nSide Channel Quantised Values:\n");
|
||||||
for (int s = 0; s < num_subbands; s++) {
|
for (int s = 0; s < num_subbands; s++) {
|
||||||
char band_name[32];
|
char band_name[32];
|
||||||
if (s == 0) {
|
if (s == 0) {
|
||||||
@@ -827,14 +827,14 @@ void tad32_print_statistics(void) {
|
|||||||
} else {
|
} else {
|
||||||
snprintf(band_name, sizeof(band_name), "H (L%d)", stats_dwt_levels - s + 1);
|
snprintf(band_name, sizeof(band_name), "H (L%d)", stats_dwt_levels - s + 1);
|
||||||
}
|
}
|
||||||
print_top5_quantized_values(side_quant_accumulators[s].data, side_quant_accumulators[s].count, band_name);
|
print_top5_quantised_values(side_quant_accumulators[s].data, side_quant_accumulators[s].count, band_name);
|
||||||
}
|
}
|
||||||
|
|
||||||
fprintf(stderr, "\n");
|
fprintf(stderr, "\n");
|
||||||
}
|
}
|
||||||
|
|
||||||
void tad32_free_statistics(void) {
|
void tad32_free_statistics(void) {
|
||||||
if (!stats_initialized) return;
|
if (!stats_initialised) return;
|
||||||
|
|
||||||
for (int i = 0; i < num_subbands; i++) {
|
for (int i = 0; i < num_subbands; i++) {
|
||||||
free(mid_accumulators[i].data);
|
free(mid_accumulators[i].data);
|
||||||
@@ -851,7 +851,7 @@ void tad32_free_statistics(void) {
|
|||||||
side_accumulators = NULL;
|
side_accumulators = NULL;
|
||||||
mid_quant_accumulators = NULL;
|
mid_quant_accumulators = NULL;
|
||||||
side_quant_accumulators = NULL;
|
side_quant_accumulators = NULL;
|
||||||
stats_initialized = 0;
|
stats_initialised = 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
//=============================================================================
|
//=============================================================================
|
||||||
@@ -1051,7 +1051,7 @@ size_t tad_encode_channel_ezbc(int8_t *coeffs, size_t count, uint8_t **output) {
|
|||||||
tad_bitstream_write_bits(&bs, msb_bitplane, 8);
|
tad_bitstream_write_bits(&bs, msb_bitplane, 8);
|
||||||
tad_bitstream_write_bits(&bs, (uint32_t)count, 16);
|
tad_bitstream_write_bits(&bs, (uint32_t)count, 16);
|
||||||
|
|
||||||
// Initialize queues
|
// Initialise queues
|
||||||
tad_block_queue_t insignificant_queue, next_insignificant;
|
tad_block_queue_t insignificant_queue, next_insignificant;
|
||||||
tad_block_queue_t significant_queue, next_significant;
|
tad_block_queue_t significant_queue, next_significant;
|
||||||
|
|
||||||
@@ -1206,14 +1206,14 @@ size_t tad32_encode_chunk(const float *pcm32_stereo, size_t num_samples,
|
|||||||
// apply_coeff_deadzone(0, dwt_mid, num_samples);
|
// apply_coeff_deadzone(0, dwt_mid, num_samples);
|
||||||
// apply_coeff_deadzone(1, dwt_side, num_samples);
|
// apply_coeff_deadzone(1, dwt_side, num_samples);
|
||||||
|
|
||||||
// Step 4: Quantize with frequency-dependent weights and quantiser scaling
|
// Step 4: Quantise with frequency-dependent weights and quantiser scaling
|
||||||
quantize_dwt_coefficients(0, dwt_mid, quant_mid, num_samples, 1, num_samples, dwt_levels, max_index, NULL, quantiser_scale);
|
quantise_dwt_coefficients(0, dwt_mid, quant_mid, num_samples, 1, num_samples, dwt_levels, max_index, NULL, quantiser_scale);
|
||||||
quantize_dwt_coefficients(1, dwt_side, quant_side, num_samples, 1, num_samples, dwt_levels, max_index, NULL, quantiser_scale);
|
quantise_dwt_coefficients(1, dwt_side, quant_side, num_samples, 1, num_samples, dwt_levels, max_index, NULL, quantiser_scale);
|
||||||
|
|
||||||
// Step 4.5: Accumulate quantized coefficient statistics if enabled
|
// Step 4.5: Accumulate quantised coefficient statistics if enabled
|
||||||
if (stats_enabled) {
|
if (stats_enabled) {
|
||||||
accumulate_quantized(quant_mid, dwt_levels, num_samples, mid_quant_accumulators);
|
accumulate_quantised(quant_mid, dwt_levels, num_samples, mid_quant_accumulators);
|
||||||
accumulate_quantized(quant_side, dwt_levels, num_samples, side_quant_accumulators);
|
accumulate_quantised(quant_side, dwt_levels, num_samples, side_quant_accumulators);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Step 5: Encode with binary tree EZBC (1D variant) - FIXED!
|
// Step 5: Encode with binary tree EZBC (1D variant) - FIXED!
|
||||||
@@ -1232,7 +1232,7 @@ size_t tad32_encode_chunk(const float *pcm32_stereo, size_t num_samples,
|
|||||||
free(mid_ezbc);
|
free(mid_ezbc);
|
||||||
free(side_ezbc);
|
free(side_ezbc);
|
||||||
|
|
||||||
// Step 6: Optional Zstd compression
|
// Step 6: Zstd compression
|
||||||
uint8_t *write_ptr = output;
|
uint8_t *write_ptr = output;
|
||||||
|
|
||||||
// Write chunk header
|
// Write chunk header
|
||||||
|
|||||||
@@ -30,15 +30,15 @@ static inline int tad32_quality_to_max_index(int quality) {
|
|||||||
*
|
*
|
||||||
* @param pcm32_stereo Input PCM32fLE stereo samples (interleaved L,R)
|
* @param pcm32_stereo Input PCM32fLE stereo samples (interleaved L,R)
|
||||||
* @param num_samples Number of samples per channel (min 1024)
|
* @param num_samples Number of samples per channel (min 1024)
|
||||||
* @param max_index Maximum quantization index (7=3bit, 15=4bit, 31=5bit, 63=6bit, 127=7bit)
|
* @param max_index Maximum quantisation index (7=3bit, 15=4bit, 31=5bit, 63=6bit, 127=7bit)
|
||||||
* @param quantiser_scale Quantiser scaling factor (1.0=baseline, 2.0=2x coarser quantization)
|
* @param quantiser_scale Quantiser scaling factor (1.0=baseline, 2.0=2x coarser quantisation)
|
||||||
* Higher values = more aggressive quantization = smaller files
|
* Higher values = more aggressive quantisation = smaller files
|
||||||
* @param output Output buffer (must be large enough)
|
* @param output Output buffer (must be large enough)
|
||||||
* @return Number of bytes written to output, or 0 on error
|
* @return Number of bytes written to output, or 0 on error
|
||||||
*
|
*
|
||||||
* Output format:
|
* Output format:
|
||||||
* uint16 sample_count (samples per channel)
|
* uint16 sample_count (samples per channel)
|
||||||
* uint8 max_index (maximum quantization index)
|
* uint8 max_index (maximum quantisation index)
|
||||||
* uint32 payload_size (bytes in payload)
|
* uint32 payload_size (bytes in payload)
|
||||||
* * payload (encoded M/S data, Zstd-compressed with 2-bit twobitmap)
|
* * payload (encoded M/S data, Zstd-compressed with 2-bit twobitmap)
|
||||||
*/
|
*/
|
||||||
|
|||||||
@@ -15,7 +15,7 @@
|
|||||||
#define ENCODER_VENDOR_STRING "Encoder-TAD32 (PCM32f version) 20251107"
|
#define ENCODER_VENDOR_STRING "Encoder-TAD32 (PCM32f version) 20251107"
|
||||||
|
|
||||||
// TAD32 format constants
|
// TAD32 format constants
|
||||||
#define TAD32_DEFAULT_CHUNK_SIZE 31991 // Using a prime number to force the worst condition
|
#define TAD32_DEFAULT_CHUNK_SIZE 32768 // Using a prime number to force the worst condition
|
||||||
|
|
||||||
// Temporary file for FFmpeg PCM extraction
|
// Temporary file for FFmpeg PCM extraction
|
||||||
char TEMP_PCM_FILE[42];
|
char TEMP_PCM_FILE[42];
|
||||||
@@ -119,7 +119,7 @@ int main(int argc, char *argv[]) {
|
|||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Convert quality (0-5) to max_index for quantization
|
// Convert quality (0-5) to max_index for quantisation
|
||||||
int max_index = tad32_quality_to_max_index(quality);
|
int max_index = tad32_quality_to_max_index(quality);
|
||||||
|
|
||||||
// Generate output filename if not provided
|
// Generate output filename if not provided
|
||||||
|
|||||||
@@ -516,7 +516,7 @@ static size_t encode_channel_ezbc(int16_t *coeffs, size_t count, int width, int
|
|||||||
bs.data[5], bs.data[6], bs.data[7], bs.data[8]);
|
bs.data[5], bs.data[6], bs.data[7], bs.data[8]);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Initialize two queues: insignificant blocks and significant 1x1 blocks
|
// Initialise two queues: insignificant blocks and significant 1x1 blocks
|
||||||
block_queue_t insignificant_queue, next_insignificant;
|
block_queue_t insignificant_queue, next_insignificant;
|
||||||
block_queue_t significant_queue, next_significant;
|
block_queue_t significant_queue, next_significant;
|
||||||
|
|
||||||
@@ -718,7 +718,7 @@ static void refine_motion_vector(
|
|||||||
}
|
}
|
||||||
|
|
||||||
if (valid_pixels > 0) {
|
if (valid_pixels > 0) {
|
||||||
sad /= valid_pixels; // Normalize by valid pixels
|
sad /= valid_pixels; // Normalise by valid pixels
|
||||||
}
|
}
|
||||||
|
|
||||||
if (sad < best_sad) {
|
if (sad < best_sad) {
|
||||||
@@ -1272,7 +1272,7 @@ static void free_quad_tree(quad_tree_node_t *node) {
|
|||||||
free(node);
|
free(node);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Count total nodes in quad-tree (for serialization buffer sizing)
|
// Count total nodes in quad-tree (for serialisation buffer sizing)
|
||||||
static int count_quad_tree_nodes(quad_tree_node_t *node) {
|
static int count_quad_tree_nodes(quad_tree_node_t *node) {
|
||||||
if (!node) return 0;
|
if (!node) return 0;
|
||||||
|
|
||||||
@@ -1389,7 +1389,7 @@ static void build_mv_map_from_forest(
|
|||||||
) {
|
) {
|
||||||
int blocks_x = (width + residual_coding_min_block_size - 1) / residual_coding_min_block_size;
|
int blocks_x = (width + residual_coding_min_block_size - 1) / residual_coding_min_block_size;
|
||||||
|
|
||||||
// Initialize map with zeros
|
// Initialise map with zeros
|
||||||
int total_blocks = blocks_x * ((height + residual_coding_min_block_size - 1) / residual_coding_min_block_size);
|
int total_blocks = blocks_x * ((height + residual_coding_min_block_size - 1) / residual_coding_min_block_size);
|
||||||
memset(mv_map_x, 0, total_blocks * sizeof(int16_t));
|
memset(mv_map_x, 0, total_blocks * sizeof(int16_t));
|
||||||
memset(mv_map_y, 0, total_blocks * sizeof(int16_t));
|
memset(mv_map_y, 0, total_blocks * sizeof(int16_t));
|
||||||
@@ -1496,12 +1496,12 @@ static void apply_spatial_mv_prediction_to_tree(
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Serialize quad-tree to compact binary format
|
// Serialise quad-tree to compact binary format
|
||||||
// Format: [split_flags_bitstream][leaf_mv_data]
|
// Format: [split_flags_bitstream][leaf_mv_data]
|
||||||
// - split_flags: 1 bit per node (breadth-first), 1=split, 0=leaf
|
// - split_flags: 1 bit per node (breadth-first), 1=split, 0=leaf
|
||||||
// - leaf_mv_data: For each leaf in order: [skip_flag:1bit][mvd_x:15bits][mvd_y:16bits]
|
// - leaf_mv_data: For each leaf in order: [skip_flag:1bit][mvd_x:15bits][mvd_y:16bits]
|
||||||
// Note: MVs are now DIFFERENTIAL (predicted from spatial neighbors)
|
// Note: MVs are now DIFFERENTIAL (predicted from spatial neighbors)
|
||||||
static size_t serialize_quad_tree(quad_tree_node_t *root, uint8_t *buffer, size_t buffer_size) {
|
static size_t serialise_quad_tree(quad_tree_node_t *root, uint8_t *buffer, size_t buffer_size) {
|
||||||
if (!root) return 0;
|
if (!root) return 0;
|
||||||
|
|
||||||
// First pass: Count nodes and leaves
|
// First pass: Count nodes and leaves
|
||||||
@@ -1512,11 +1512,11 @@ static size_t serialize_quad_tree(quad_tree_node_t *root, uint8_t *buffer, size_
|
|||||||
quad_tree_node_t **queue = (quad_tree_node_t**)malloc(total_nodes * sizeof(quad_tree_node_t*));
|
quad_tree_node_t **queue = (quad_tree_node_t**)malloc(total_nodes * sizeof(quad_tree_node_t*));
|
||||||
int queue_start = 0, queue_end = 0;
|
int queue_start = 0, queue_end = 0;
|
||||||
|
|
||||||
// Initialize split flags buffer
|
// Initialise split flags buffer
|
||||||
uint8_t *split_flags = (uint8_t*)calloc(split_bytes, 1);
|
uint8_t *split_flags = (uint8_t*)calloc(split_bytes, 1);
|
||||||
int split_bit_pos = 0;
|
int split_bit_pos = 0;
|
||||||
|
|
||||||
// Start serialization
|
// Start serialisation
|
||||||
queue[queue_end++] = root;
|
queue[queue_end++] = root;
|
||||||
size_t write_pos = split_bytes; // Leave space for split flags
|
size_t write_pos = split_bytes; // Leave space for split flags
|
||||||
|
|
||||||
@@ -1551,7 +1551,7 @@ static size_t serialize_quad_tree(quad_tree_node_t *root, uint8_t *buffer, size_
|
|||||||
if (!node->is_split) {
|
if (!node->is_split) {
|
||||||
// Leaf node - write skip flag + motion vectors
|
// Leaf node - write skip flag + motion vectors
|
||||||
if (write_pos + 5 > buffer_size) {
|
if (write_pos + 5 > buffer_size) {
|
||||||
fprintf(stderr, "ERROR: Quad-tree serialization buffer overflow\n");
|
fprintf(stderr, "ERROR: Quad-tree serialisation buffer overflow\n");
|
||||||
free(queue);
|
free(queue);
|
||||||
free(split_flags);
|
free(split_flags);
|
||||||
return 0;
|
return 0;
|
||||||
@@ -1588,9 +1588,9 @@ static size_t serialize_quad_tree(quad_tree_node_t *root, uint8_t *buffer, size_
|
|||||||
return write_pos;
|
return write_pos;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Serialize quad-tree with bidirectional motion vectors for B-frames (64-bit leaf nodes)
|
// Serialise quad-tree with bidirectional motion vectors for B-frames (64-bit leaf nodes)
|
||||||
// Format: [split_flags] [leaf_data: skip(1) + fwd_mv_x(15) + fwd_mv_y(16) + bwd_mv_x(16) + bwd_mv_y(16) = 64 bits]
|
// Format: [split_flags] [leaf_data: skip(1) + fwd_mv_x(15) + fwd_mv_y(16) + bwd_mv_x(16) + bwd_mv_y(16) = 64 bits]
|
||||||
static size_t serialize_quad_tree_bidirectional(quad_tree_node_t *root, uint8_t *buffer, size_t buffer_size) {
|
static size_t serialise_quad_tree_bidirectional(quad_tree_node_t *root, uint8_t *buffer, size_t buffer_size) {
|
||||||
if (!root) return 0;
|
if (!root) return 0;
|
||||||
|
|
||||||
// First pass: Count nodes and leaves
|
// First pass: Count nodes and leaves
|
||||||
@@ -1601,11 +1601,11 @@ static size_t serialize_quad_tree_bidirectional(quad_tree_node_t *root, uint8_t
|
|||||||
quad_tree_node_t **queue = (quad_tree_node_t**)malloc(total_nodes * sizeof(quad_tree_node_t*));
|
quad_tree_node_t **queue = (quad_tree_node_t**)malloc(total_nodes * sizeof(quad_tree_node_t*));
|
||||||
int queue_start = 0, queue_end = 0;
|
int queue_start = 0, queue_end = 0;
|
||||||
|
|
||||||
// Initialize split flags buffer
|
// Initialise split flags buffer
|
||||||
uint8_t *split_flags = (uint8_t*)calloc(split_bytes, 1);
|
uint8_t *split_flags = (uint8_t*)calloc(split_bytes, 1);
|
||||||
int split_bit_pos = 0;
|
int split_bit_pos = 0;
|
||||||
|
|
||||||
// Start serialization
|
// Start serialisation
|
||||||
queue[queue_end++] = root;
|
queue[queue_end++] = root;
|
||||||
size_t write_pos = split_bytes; // Leave space for split flags
|
size_t write_pos = split_bytes; // Leave space for split flags
|
||||||
|
|
||||||
@@ -1640,7 +1640,7 @@ static size_t serialize_quad_tree_bidirectional(quad_tree_node_t *root, uint8_t
|
|||||||
if (!node->is_split) {
|
if (!node->is_split) {
|
||||||
// Leaf node - write skip flag + dual motion vectors
|
// Leaf node - write skip flag + dual motion vectors
|
||||||
if (write_pos + 8 > buffer_size) {
|
if (write_pos + 8 > buffer_size) {
|
||||||
fprintf(stderr, "ERROR: Bidirectional quad-tree serialization buffer overflow\n");
|
fprintf(stderr, "ERROR: Bidirectional quad-tree serialisation buffer overflow\n");
|
||||||
free(queue);
|
free(queue);
|
||||||
free(split_flags);
|
free(split_flags);
|
||||||
return 0;
|
return 0;
|
||||||
@@ -2457,7 +2457,7 @@ static tav_encoder_t* create_encoder(void) {
|
|||||||
enc->residual_coding_min_block_size = 4; // Minimum block size
|
enc->residual_coding_min_block_size = 4; // Minimum block size
|
||||||
enc->residual_coding_block_tree_root = NULL;
|
enc->residual_coding_block_tree_root = NULL;
|
||||||
|
|
||||||
// Initialize residual coding buffers (allocated in initialise_encoder)
|
// Initialise residual coding buffers (allocated in initialise_encoder)
|
||||||
enc->residual_coding_reference_frame_y = NULL;
|
enc->residual_coding_reference_frame_y = NULL;
|
||||||
enc->residual_coding_reference_frame_co = NULL;
|
enc->residual_coding_reference_frame_co = NULL;
|
||||||
enc->residual_coding_reference_frame_cg = NULL;
|
enc->residual_coding_reference_frame_cg = NULL;
|
||||||
@@ -2493,7 +2493,7 @@ static tav_encoder_t* create_encoder(void) {
|
|||||||
enc->residual_coding_lookahead_buffer_cg = NULL;
|
enc->residual_coding_lookahead_buffer_cg = NULL;
|
||||||
enc->residual_coding_lookahead_buffer_display_index = NULL;
|
enc->residual_coding_lookahead_buffer_display_index = NULL;
|
||||||
|
|
||||||
// Two-pass mode initialization
|
// Two-pass mode initialisation
|
||||||
enc->two_pass_mode = 1; // enable by default
|
enc->two_pass_mode = 1; // enable by default
|
||||||
enc->frame_analyses = NULL;
|
enc->frame_analyses = NULL;
|
||||||
enc->frame_analyses_capacity = 0;
|
enc->frame_analyses_capacity = 0;
|
||||||
@@ -2687,7 +2687,7 @@ static int initialise_encoder(tav_encoder_t *enc) {
|
|||||||
return -1;
|
return -1;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Initialize translation vectors to zero
|
// Initialise translation vectors to zero
|
||||||
memset(enc->temporal_gop_translation_x, 0, enc->temporal_gop_capacity * sizeof(int16_t));
|
memset(enc->temporal_gop_translation_x, 0, enc->temporal_gop_capacity * sizeof(int16_t));
|
||||||
memset(enc->temporal_gop_translation_y, 0, enc->temporal_gop_capacity * sizeof(int16_t));
|
memset(enc->temporal_gop_translation_y, 0, enc->temporal_gop_capacity * sizeof(int16_t));
|
||||||
|
|
||||||
@@ -2726,7 +2726,7 @@ static int initialise_encoder(tav_encoder_t *enc) {
|
|||||||
return -1;
|
return -1;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Initialize to zero
|
// Initialise to zero
|
||||||
memset(enc->temporal_gop_mvs_fwd_x[i], 0, num_blocks * sizeof(int16_t));
|
memset(enc->temporal_gop_mvs_fwd_x[i], 0, num_blocks * sizeof(int16_t));
|
||||||
memset(enc->temporal_gop_mvs_fwd_y[i], 0, num_blocks * sizeof(int16_t));
|
memset(enc->temporal_gop_mvs_fwd_y[i], 0, num_blocks * sizeof(int16_t));
|
||||||
memset(enc->temporal_gop_mvs_bwd_x[i], 0, num_blocks * sizeof(int16_t));
|
memset(enc->temporal_gop_mvs_bwd_x[i], 0, num_blocks * sizeof(int16_t));
|
||||||
@@ -3115,7 +3115,7 @@ static void dwt_53_inverse_1d(float *data, int length) {
|
|||||||
// and estimate_motion_optical_flow are implemented in encoder_tav_opencv.cpp
|
// and estimate_motion_optical_flow are implemented in encoder_tav_opencv.cpp
|
||||||
|
|
||||||
// =============================================================================
|
// =============================================================================
|
||||||
// Temporal Subband Quantization
|
// Temporal Subband Quantisation
|
||||||
// =============================================================================
|
// =============================================================================
|
||||||
|
|
||||||
// Determine which temporal decomposition level a frame belongs to after 3D DWT
|
// Determine which temporal decomposition level a frame belongs to after 3D DWT
|
||||||
@@ -3125,7 +3125,7 @@ static void dwt_53_inverse_1d(float *data, int length) {
|
|||||||
// - Level 2 (tLH): frames 6-11 (6 frames, high-pass from 2nd decomposition)
|
// - Level 2 (tLH): frames 6-11 (6 frames, high-pass from 2nd decomposition)
|
||||||
// - Level 3 (tH): frames 12-23 (12 frames, high-pass from 1st decomposition)
|
// - Level 3 (tH): frames 12-23 (12 frames, high-pass from 1st decomposition)
|
||||||
static int get_temporal_subband_level(int frame_idx, int num_frames, int temporal_levels) {
|
static int get_temporal_subband_level(int frame_idx, int num_frames, int temporal_levels) {
|
||||||
// After temporal DWT with N levels, frames are organized as:
|
// After temporal DWT with N levels, frames are organised as:
|
||||||
// Frames 0...num_frames/(2^N) = tL...L (N low-passes, coarsest)
|
// Frames 0...num_frames/(2^N) = tL...L (N low-passes, coarsest)
|
||||||
// Remaining frames are temporal high-pass subbands at various levels
|
// Remaining frames are temporal high-pass subbands at various levels
|
||||||
|
|
||||||
@@ -3141,21 +3141,21 @@ static int get_temporal_subband_level(int frame_idx, int num_frames, int tempora
|
|||||||
return temporal_levels;
|
return temporal_levels;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Quantize 3D DWT coefficients with SEPARABLE temporal-spatial quantization
|
// Quantise 3D DWT coefficients with SEPARABLE temporal-spatial quantisation
|
||||||
//
|
//
|
||||||
// IMPORTANT: This implements a separable quantization approach (temporal × spatial)
|
// IMPORTANT: This implements a separable quantisation approach (temporal × spatial)
|
||||||
// After dwt_3d_forward(), the GOP coefficients have this structure:
|
// After dwt_3d_forward(), the GOP coefficients have this structure:
|
||||||
// - Temporal DWT applied first (24 frames → 3 levels)
|
// - Temporal DWT applied first (24 frames → 3 levels)
|
||||||
// → Results in temporal subbands: tLLL (frames 0-2), tLLH (3-5), tLH (6-11), tH (12-23)
|
// → Results in temporal subbands: tLLL (frames 0-2), tLLH (3-5), tLH (6-11), tH (12-23)
|
||||||
// - Then spatial DWT applied to each temporal subband
|
// - Then spatial DWT applied to each temporal subband
|
||||||
// → Each frame now contains 2D spatial coefficients (LL, LH, HL, HH subbands)
|
// → Each frame now contains 2D spatial coefficients (LL, LH, HL, HH subbands)
|
||||||
//
|
//
|
||||||
// Quantization strategy:
|
// Quantisation strategy:
|
||||||
// 1. Compute temporal base quantizer: tH_base(level) = Qbase_t * 2^(beta*level)
|
// 1. Compute temporal base quantiser: tH_base(level) = Qbase_t * 2^(beta*level)
|
||||||
// - tLL (level 0): coarsest temporal, most important → smallest quantizer
|
// - tLL (level 0): coarsest temporal, most important → smallest quantiser
|
||||||
// - tHH (level 2): finest temporal, less important → largest quantizer
|
// - tHH (level 2): finest temporal, less important → largest quantiser
|
||||||
// 2. Apply spatial perceptual weighting to tH_base (LL: 1.0x, LH/HL: 1.5-2.0x, HH: 2.0-3.0x)
|
// 2. Apply spatial perceptual weighting to tH_base (LL: 1.0x, LH/HL: 1.5-2.0x, HH: 2.0-3.0x)
|
||||||
// 3. Final quantizer: Q_effective = tH_base × spatial_weight
|
// 3. Final quantiser: Q_effective = tH_base × spatial_weight
|
||||||
//
|
//
|
||||||
// This separable approach is efficient and what most 3D wavelet codecs use.
|
// This separable approach is efficient and what most 3D wavelet codecs use.
|
||||||
static void quantise_3d_dwt_coefficients(tav_encoder_t *enc,
|
static void quantise_3d_dwt_coefficients(tav_encoder_t *enc,
|
||||||
@@ -3176,7 +3176,7 @@ static void quantise_3d_dwt_coefficients(tav_encoder_t *enc,
|
|||||||
// - Frames 4-7, 8-11, 12-15: tLH, tHL, tHH (levels 1-2) - temporal high-pass
|
// - Frames 4-7, 8-11, 12-15: tLH, tHL, tHH (levels 1-2) - temporal high-pass
|
||||||
int temporal_level = get_temporal_subband_level(t, num_frames, enc->temporal_decomp_levels);
|
int temporal_level = get_temporal_subband_level(t, num_frames, enc->temporal_decomp_levels);
|
||||||
|
|
||||||
// Step 2: Compute temporal base quantizer using exponential scaling
|
// Step 2: Compute temporal base quantiser using exponential scaling
|
||||||
// Formula: tH_base = Qbase_t * 1.0 * 2^(2.0 * level)
|
// Formula: tH_base = Qbase_t * 1.0 * 2^(2.0 * level)
|
||||||
// Example with Qbase_t=16:
|
// Example with Qbase_t=16:
|
||||||
// - Level 0 (tLL): 16 * 1.0 * 2^0 = 16 (same as intra-only)
|
// - Level 0 (tLL): 16 * 1.0 * 2^0 = 16 (same as intra-only)
|
||||||
@@ -3185,43 +3185,43 @@ static void quantise_3d_dwt_coefficients(tav_encoder_t *enc,
|
|||||||
float temporal_scale = powf(2.0f, BETA * powf(temporal_level, KAPPA));
|
float temporal_scale = powf(2.0f, BETA * powf(temporal_level, KAPPA));
|
||||||
float temporal_quantiser = base_quantiser * temporal_scale;
|
float temporal_quantiser = base_quantiser * temporal_scale;
|
||||||
|
|
||||||
// Convert to integer for quantization
|
// Convert to integer for quantisation
|
||||||
int temporal_base_quantiser = (int)roundf(temporal_quantiser);
|
int temporal_base_quantiser = (int)roundf(temporal_quantiser);
|
||||||
temporal_base_quantiser = CLAMP(temporal_base_quantiser, 1, 255);
|
temporal_base_quantiser = CLAMP(temporal_base_quantiser, 1, 255);
|
||||||
|
|
||||||
// Step 3: Apply spatial quantization within this temporal subband
|
// Step 3: Apply spatial quantisation within this temporal subband
|
||||||
// The existing function applies spatial perceptual weighting:
|
// The existing function applies spatial perceptual weighting:
|
||||||
// Q_effective = tH_base × spatial_weight
|
// Q_effective = tH_base × spatial_weight
|
||||||
// Where spatial_weight depends on spatial frequency (LL, LH, HL, HH subbands)
|
// Where spatial_weight depends on spatial frequency (LL, LH, HL, HH subbands)
|
||||||
// This reuses all existing perceptual weighting and dead-zone logic
|
// This reuses all existing perceptual weighting and dead-zone logic
|
||||||
//
|
//
|
||||||
// CRITICAL: Use no_normalisation variant when EZBC is enabled
|
// CRITICAL: Use no_normalisation variant when EZBC is enabled
|
||||||
// - EZBC mode: coefficients must be denormalized (quantize + multiply back)
|
// - EZBC mode: coefficients must be denormalised (quantise + multiply back)
|
||||||
// - Twobit-map/raw mode: coefficients stay normalized (quantize only)
|
// - Twobit-map/raw mode: coefficients stay normalised (quantise only)
|
||||||
if (enc->preprocess_mode == PREPROCESS_EZBC) {
|
if (enc->preprocess_mode == PREPROCESS_EZBC) {
|
||||||
quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(
|
quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(
|
||||||
enc,
|
enc,
|
||||||
gop_coeffs[t], // Input: spatial coefficients for this temporal subband
|
gop_coeffs[t], // Input: spatial coefficients for this temporal subband
|
||||||
quantised[t], // Output: quantised spatial coefficients (denormalized for EZBC)
|
quantised[t], // Output: quantised spatial coefficients (denormalised for EZBC)
|
||||||
spatial_size, // Number of spatial coefficients
|
spatial_size, // Number of spatial coefficients
|
||||||
temporal_base_quantiser, // Temporally-scaled base quantiser (tH_base)
|
temporal_base_quantiser, // Temporally-scaled base quantiser (tH_base)
|
||||||
enc->width, // Frame width
|
enc->width, // Frame width
|
||||||
enc->height, // Frame height
|
enc->height, // Frame height
|
||||||
enc->decomp_levels, // Spatial decomposition levels (typically 6)
|
enc->decomp_levels, // Spatial decomposition levels (typically 6)
|
||||||
is_chroma, // Is chroma channel (gets additional quantization)
|
is_chroma, // Is chroma channel (gets additional quantisation)
|
||||||
enc->frame_count + t // Frame number (for any frame-dependent logic)
|
enc->frame_count + t // Frame number (for any frame-dependent logic)
|
||||||
);
|
);
|
||||||
} else {
|
} else {
|
||||||
quantise_dwt_coefficients_perceptual_per_coeff(
|
quantise_dwt_coefficients_perceptual_per_coeff(
|
||||||
enc,
|
enc,
|
||||||
gop_coeffs[t], // Input: spatial coefficients for this temporal subband
|
gop_coeffs[t], // Input: spatial coefficients for this temporal subband
|
||||||
quantised[t], // Output: quantised spatial coefficients (normalized for twobit-map)
|
quantised[t], // Output: quantised spatial coefficients (normalised for twobit-map)
|
||||||
spatial_size, // Number of spatial coefficients
|
spatial_size, // Number of spatial coefficients
|
||||||
temporal_base_quantiser, // Temporally-scaled base quantiser (tH_base)
|
temporal_base_quantiser, // Temporally-scaled base quantiser (tH_base)
|
||||||
enc->width, // Frame width
|
enc->width, // Frame width
|
||||||
enc->height, // Frame height
|
enc->height, // Frame height
|
||||||
enc->decomp_levels, // Spatial decomposition levels (typically 6)
|
enc->decomp_levels, // Spatial decomposition levels (typically 6)
|
||||||
is_chroma, // Is chroma channel (gets additional quantization)
|
is_chroma, // Is chroma channel (gets additional quantisation)
|
||||||
enc->frame_count + t // Frame number (for any frame-dependent logic)
|
enc->frame_count + t // Frame number (for any frame-dependent logic)
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
@@ -3889,15 +3889,15 @@ static size_t encode_pframe_residual(tav_encoder_t *enc, int qY) {
|
|||||||
dwt_2d_forward_flexible(residual_co_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter);
|
dwt_2d_forward_flexible(residual_co_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter);
|
||||||
dwt_2d_forward_flexible(residual_cg_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter);
|
dwt_2d_forward_flexible(residual_cg_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter);
|
||||||
|
|
||||||
// Step 5: Quantize residual coefficients (skip for EZBC - it handles quantization implicitly)
|
// Step 5: Quantise residual coefficients (skip for EZBC - it handles quantisation implicitly)
|
||||||
int16_t *quantised_y = enc->reusable_quantised_y;
|
int16_t *quantised_y = enc->reusable_quantised_y;
|
||||||
int16_t *quantised_co = enc->reusable_quantised_co;
|
int16_t *quantised_co = enc->reusable_quantised_co;
|
||||||
int16_t *quantised_cg = enc->reusable_quantised_cg;
|
int16_t *quantised_cg = enc->reusable_quantised_cg;
|
||||||
|
|
||||||
if (enc->preprocess_mode == PREPROCESS_EZBC) {
|
if (enc->preprocess_mode == PREPROCESS_EZBC) {
|
||||||
// EZBC mode: Quantize with perceptual weighting but no normalization (division by quantizer)
|
// EZBC mode: Quantise with perceptual weighting but no normalisation (division by quantiser)
|
||||||
// EZBC will compress by encoding only significant bitplanes
|
// EZBC will compress by encoding only significant bitplanes
|
||||||
// fprintf(stderr, "[EZBC-QUANT-PFRAME] Using perceptual quantization without normalization\n");
|
// fprintf(stderr, "[EZBC-QUANT-PFRAME] Using perceptual quantisation without normalisation\n");
|
||||||
quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(enc, residual_y_dwt, quantised_y, frame_size,
|
quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(enc, residual_y_dwt, quantised_y, frame_size,
|
||||||
qY, enc->width, enc->height,
|
qY, enc->width, enc->height,
|
||||||
enc->decomp_levels, 0, 0);
|
enc->decomp_levels, 0, 0);
|
||||||
@@ -3915,9 +3915,9 @@ static size_t encode_pframe_residual(tav_encoder_t *enc, int qY) {
|
|||||||
if (abs(quantised_co[i]) > max_co) max_co = abs(quantised_co[i]);
|
if (abs(quantised_co[i]) > max_co) max_co = abs(quantised_co[i]);
|
||||||
if (abs(quantised_cg[i]) > max_cg) max_cg = abs(quantised_cg[i]);
|
if (abs(quantised_cg[i]) > max_cg) max_cg = abs(quantised_cg[i]);
|
||||||
}
|
}
|
||||||
// fprintf(stderr, "[EZBC-QUANT-PFRAME] Quantized coeff max: Y=%d, Co=%d, Cg=%d\n", max_y, max_co, max_cg);
|
// fprintf(stderr, "[EZBC-QUANT-PFRAME] Quantised coeff max: Y=%d, Co=%d, Cg=%d\n", max_y, max_co, max_cg);
|
||||||
} else {
|
} else {
|
||||||
// Twobit-map mode: Use traditional quantization
|
// Twobit-map mode: Use traditional quantisation
|
||||||
quantise_dwt_coefficients_perceptual_per_coeff(enc, residual_y_dwt, quantised_y, frame_size,
|
quantise_dwt_coefficients_perceptual_per_coeff(enc, residual_y_dwt, quantised_y, frame_size,
|
||||||
qY, enc->width, enc->height,
|
qY, enc->width, enc->height,
|
||||||
enc->decomp_levels, 0, 0);
|
enc->decomp_levels, 0, 0);
|
||||||
@@ -4159,17 +4159,17 @@ static size_t encode_pframe_adaptive(tav_encoder_t *enc, int qY) {
|
|||||||
free(mv_map_y);
|
free(mv_map_y);
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
// Step 5: Serialize all quad-trees (now with differential MVs)
|
// Step 5: Serialise all quad-trees (now with differential MVs)
|
||||||
// Estimate buffer size: worst case is all leaf nodes at min size
|
// Estimate buffer size: worst case is all leaf nodes at min size
|
||||||
size_t max_serialized_size = total_trees * 10000; // Conservative estimate
|
size_t max_serialised_size = total_trees * 10000; // Conservative estimate
|
||||||
uint8_t *serialized_trees = malloc(max_serialized_size);
|
uint8_t *serialised_trees = malloc(max_serialised_size);
|
||||||
size_t total_serialized = 0;
|
size_t total_serialised = 0;
|
||||||
|
|
||||||
for (int i = 0; i < total_trees; i++) {
|
for (int i = 0; i < total_trees; i++) {
|
||||||
size_t tree_size = serialize_quad_tree(tree_forest[i], serialized_trees + total_serialized,
|
size_t tree_size = serialise_quad_tree(tree_forest[i], serialised_trees + total_serialised,
|
||||||
max_serialized_size - total_serialized);
|
max_serialised_size - total_serialised);
|
||||||
if (tree_size == 0) {
|
if (tree_size == 0) {
|
||||||
fprintf(stderr, "Error: Failed to serialize quad-tree %d\n", i);
|
fprintf(stderr, "Error: Failed to serialise quad-tree %d\n", i);
|
||||||
// Cleanup and return error
|
// Cleanup and return error
|
||||||
for (int j = 0; j < total_trees; j++) {
|
for (int j = 0; j < total_trees; j++) {
|
||||||
free_quad_tree(tree_forest[j]);
|
free_quad_tree(tree_forest[j]);
|
||||||
@@ -4182,7 +4182,7 @@ static size_t encode_pframe_adaptive(tav_encoder_t *enc, int qY) {
|
|||||||
free(temp_mv_x);
|
free(temp_mv_x);
|
||||||
free(temp_mv_y);
|
free(temp_mv_y);
|
||||||
#endif
|
#endif
|
||||||
free(serialized_trees);
|
free(serialised_trees);
|
||||||
enc->residual_coding_block_size = saved_block_size;
|
enc->residual_coding_block_size = saved_block_size;
|
||||||
enc->residual_coding_motion_vectors_x = orig_mv_x;
|
enc->residual_coding_motion_vectors_x = orig_mv_x;
|
||||||
enc->residual_coding_motion_vectors_y = orig_mv_y;
|
enc->residual_coding_motion_vectors_y = orig_mv_y;
|
||||||
@@ -4190,7 +4190,7 @@ static size_t encode_pframe_adaptive(tav_encoder_t *enc, int qY) {
|
|||||||
enc->residual_coding_num_blocks_y = orig_blocks_y;
|
enc->residual_coding_num_blocks_y = orig_blocks_y;
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
total_serialized += tree_size;
|
total_serialised += tree_size;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Step 6: Apply DWT to residual (same as fixed blocks)
|
// Step 6: Apply DWT to residual (same as fixed blocks)
|
||||||
@@ -4208,7 +4208,7 @@ static size_t encode_pframe_adaptive(tav_encoder_t *enc, int qY) {
|
|||||||
dwt_2d_forward_flexible(residual_co_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter);
|
dwt_2d_forward_flexible(residual_co_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter);
|
||||||
dwt_2d_forward_flexible(residual_cg_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter);
|
dwt_2d_forward_flexible(residual_cg_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter);
|
||||||
|
|
||||||
// Step 7: Quantize residual coefficients
|
// Step 7: Quantise residual coefficients
|
||||||
int16_t *quantised_y = enc->reusable_quantised_y;
|
int16_t *quantised_y = enc->reusable_quantised_y;
|
||||||
int16_t *quantised_co = enc->reusable_quantised_co;
|
int16_t *quantised_co = enc->reusable_quantised_co;
|
||||||
int16_t *quantised_cg = enc->reusable_quantised_cg;
|
int16_t *quantised_cg = enc->reusable_quantised_cg;
|
||||||
@@ -4251,7 +4251,7 @@ static size_t encode_pframe_adaptive(tav_encoder_t *enc, int qY) {
|
|||||||
free(temp_mv_x);
|
free(temp_mv_x);
|
||||||
free(temp_mv_y);
|
free(temp_mv_y);
|
||||||
#endif
|
#endif
|
||||||
free(serialized_trees);
|
free(serialised_trees);
|
||||||
free(residual_y_dwt);
|
free(residual_y_dwt);
|
||||||
free(residual_co_dwt);
|
free(residual_co_dwt);
|
||||||
free(residual_cg_dwt);
|
free(residual_cg_dwt);
|
||||||
@@ -4270,17 +4270,17 @@ static size_t encode_pframe_adaptive(tav_encoder_t *enc, int qY) {
|
|||||||
|
|
||||||
uint8_t packet_type = TAV_PACKET_PFRAME_ADAPTIVE;
|
uint8_t packet_type = TAV_PACKET_PFRAME_ADAPTIVE;
|
||||||
uint16_t num_trees_u16 = (uint16_t)total_trees;
|
uint16_t num_trees_u16 = (uint16_t)total_trees;
|
||||||
uint32_t tree_data_size = (uint32_t)total_serialized;
|
uint32_t tree_data_size = (uint32_t)total_serialised;
|
||||||
uint32_t compressed_size_u32 = (uint32_t)compressed_size;
|
uint32_t compressed_size_u32 = (uint32_t)compressed_size;
|
||||||
|
|
||||||
fwrite(&packet_type, 1, 1, enc->output_fp);
|
fwrite(&packet_type, 1, 1, enc->output_fp);
|
||||||
fwrite(&num_trees_u16, sizeof(uint16_t), 1, enc->output_fp);
|
fwrite(&num_trees_u16, sizeof(uint16_t), 1, enc->output_fp);
|
||||||
fwrite(&tree_data_size, sizeof(uint32_t), 1, enc->output_fp);
|
fwrite(&tree_data_size, sizeof(uint32_t), 1, enc->output_fp);
|
||||||
fwrite(serialized_trees, 1, total_serialized, enc->output_fp);
|
fwrite(serialised_trees, 1, total_serialised, enc->output_fp);
|
||||||
fwrite(&compressed_size_u32, sizeof(uint32_t), 1, enc->output_fp);
|
fwrite(&compressed_size_u32, sizeof(uint32_t), 1, enc->output_fp);
|
||||||
fwrite(compressed_coeffs, 1, compressed_size, enc->output_fp);
|
fwrite(compressed_coeffs, 1, compressed_size, enc->output_fp);
|
||||||
|
|
||||||
size_t packet_size = 1 + sizeof(uint16_t) + sizeof(uint32_t) + total_serialized +
|
size_t packet_size = 1 + sizeof(uint16_t) + sizeof(uint32_t) + total_serialised +
|
||||||
sizeof(uint32_t) + compressed_size;
|
sizeof(uint32_t) + compressed_size;
|
||||||
|
|
||||||
// Cleanup
|
// Cleanup
|
||||||
@@ -4295,7 +4295,7 @@ static size_t encode_pframe_adaptive(tav_encoder_t *enc, int qY) {
|
|||||||
free(temp_mv_x);
|
free(temp_mv_x);
|
||||||
free(temp_mv_y);
|
free(temp_mv_y);
|
||||||
#endif
|
#endif
|
||||||
free(serialized_trees);
|
free(serialised_trees);
|
||||||
free(residual_y_dwt);
|
free(residual_y_dwt);
|
||||||
free(residual_co_dwt);
|
free(residual_co_dwt);
|
||||||
free(residual_cg_dwt);
|
free(residual_cg_dwt);
|
||||||
@@ -4311,7 +4311,7 @@ static size_t encode_pframe_adaptive(tav_encoder_t *enc, int qY) {
|
|||||||
|
|
||||||
if (enc->verbose) {
|
if (enc->verbose) {
|
||||||
printf(" P-frame (adaptive): %d trees, tree_data: %zu bytes, residual: %zu → %zu bytes (%.1f%%)\n",
|
printf(" P-frame (adaptive): %d trees, tree_data: %zu bytes, residual: %zu → %zu bytes (%.1f%%)\n",
|
||||||
total_trees, total_serialized, preprocessed_size, compressed_size,
|
total_trees, total_serialised, preprocessed_size, compressed_size,
|
||||||
(compressed_size * 100.0f) / preprocessed_size);
|
(compressed_size * 100.0f) / preprocessed_size);
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -4404,16 +4404,16 @@ static size_t encode_bframe_adaptive(tav_encoder_t *enc, int qY) {
|
|||||||
|
|
||||||
// Note: For B-frames, we don't recompute residuals because dual predictions are already optimal
|
// Note: For B-frames, we don't recompute residuals because dual predictions are already optimal
|
||||||
|
|
||||||
// Step 5: Serialize all quad-trees with 64-bit leaf nodes
|
// Step 5: Serialise all quad-trees with 64-bit leaf nodes
|
||||||
size_t max_serialized_size = total_trees * 20000; // Conservative (2× P-frame size due to dual MVs)
|
size_t max_serialised_size = total_trees * 20000; // Conservative (2× P-frame size due to dual MVs)
|
||||||
uint8_t *serialized_trees = malloc(max_serialized_size);
|
uint8_t *serialised_trees = malloc(max_serialised_size);
|
||||||
size_t total_serialized = 0;
|
size_t total_serialised = 0;
|
||||||
|
|
||||||
for (int i = 0; i < total_trees; i++) {
|
for (int i = 0; i < total_trees; i++) {
|
||||||
size_t tree_size = serialize_quad_tree_bidirectional(tree_forest[i], serialized_trees + total_serialized,
|
size_t tree_size = serialise_quad_tree_bidirectional(tree_forest[i], serialised_trees + total_serialised,
|
||||||
max_serialized_size - total_serialized);
|
max_serialised_size - total_serialised);
|
||||||
if (tree_size == 0) {
|
if (tree_size == 0) {
|
||||||
fprintf(stderr, "Error: Failed to serialize bidirectional quad-tree %d\n", i);
|
fprintf(stderr, "Error: Failed to serialise bidirectional quad-tree %d\n", i);
|
||||||
// Cleanup and return error
|
// Cleanup and return error
|
||||||
for (int j = 0; j < total_trees; j++) {
|
for (int j = 0; j < total_trees; j++) {
|
||||||
free_quad_tree(tree_forest[j]);
|
free_quad_tree(tree_forest[j]);
|
||||||
@@ -4421,11 +4421,11 @@ static size_t encode_bframe_adaptive(tav_encoder_t *enc, int qY) {
|
|||||||
free(tree_forest);
|
free(tree_forest);
|
||||||
free(fine_fwd_mv_x); free(fine_fwd_mv_y);
|
free(fine_fwd_mv_x); free(fine_fwd_mv_y);
|
||||||
free(fine_bwd_mv_x); free(fine_bwd_mv_y);
|
free(fine_bwd_mv_x); free(fine_bwd_mv_y);
|
||||||
free(serialized_trees);
|
free(serialised_trees);
|
||||||
enc->residual_coding_block_size = saved_block_size;
|
enc->residual_coding_block_size = saved_block_size;
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
total_serialized += tree_size;
|
total_serialised += tree_size;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Step 6: Apply DWT to residual
|
// Step 6: Apply DWT to residual
|
||||||
@@ -4441,7 +4441,7 @@ static size_t encode_bframe_adaptive(tav_encoder_t *enc, int qY) {
|
|||||||
dwt_2d_forward_flexible(residual_co_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter);
|
dwt_2d_forward_flexible(residual_co_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter);
|
||||||
dwt_2d_forward_flexible(residual_cg_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter);
|
dwt_2d_forward_flexible(residual_cg_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter);
|
||||||
|
|
||||||
// Step 7: Quantize residual coefficients
|
// Step 7: Quantise residual coefficients
|
||||||
int16_t *quantised_y = enc->reusable_quantised_y;
|
int16_t *quantised_y = enc->reusable_quantised_y;
|
||||||
int16_t *quantised_co = enc->reusable_quantised_co;
|
int16_t *quantised_co = enc->reusable_quantised_co;
|
||||||
int16_t *quantised_cg = enc->reusable_quantised_cg;
|
int16_t *quantised_cg = enc->reusable_quantised_cg;
|
||||||
@@ -4479,7 +4479,7 @@ static size_t encode_bframe_adaptive(tav_encoder_t *enc, int qY) {
|
|||||||
free(tree_forest);
|
free(tree_forest);
|
||||||
free(fine_fwd_mv_x); free(fine_fwd_mv_y);
|
free(fine_fwd_mv_x); free(fine_fwd_mv_y);
|
||||||
free(fine_bwd_mv_x); free(fine_bwd_mv_y);
|
free(fine_bwd_mv_x); free(fine_bwd_mv_y);
|
||||||
free(serialized_trees);
|
free(serialised_trees);
|
||||||
free(residual_y_dwt);
|
free(residual_y_dwt);
|
||||||
free(residual_co_dwt);
|
free(residual_co_dwt);
|
||||||
free(residual_cg_dwt);
|
free(residual_cg_dwt);
|
||||||
@@ -4494,17 +4494,17 @@ static size_t encode_bframe_adaptive(tav_encoder_t *enc, int qY) {
|
|||||||
|
|
||||||
uint8_t packet_type = TAV_PACKET_BFRAME_ADAPTIVE;
|
uint8_t packet_type = TAV_PACKET_BFRAME_ADAPTIVE;
|
||||||
uint16_t num_trees_u16 = (uint16_t)total_trees;
|
uint16_t num_trees_u16 = (uint16_t)total_trees;
|
||||||
uint32_t tree_data_size = (uint32_t)total_serialized;
|
uint32_t tree_data_size = (uint32_t)total_serialised;
|
||||||
uint32_t compressed_size_u32 = (uint32_t)compressed_size;
|
uint32_t compressed_size_u32 = (uint32_t)compressed_size;
|
||||||
|
|
||||||
fwrite(&packet_type, 1, 1, enc->output_fp);
|
fwrite(&packet_type, 1, 1, enc->output_fp);
|
||||||
fwrite(&num_trees_u16, sizeof(uint16_t), 1, enc->output_fp);
|
fwrite(&num_trees_u16, sizeof(uint16_t), 1, enc->output_fp);
|
||||||
fwrite(&tree_data_size, sizeof(uint32_t), 1, enc->output_fp);
|
fwrite(&tree_data_size, sizeof(uint32_t), 1, enc->output_fp);
|
||||||
fwrite(serialized_trees, 1, total_serialized, enc->output_fp);
|
fwrite(serialised_trees, 1, total_serialised, enc->output_fp);
|
||||||
fwrite(&compressed_size_u32, sizeof(uint32_t), 1, enc->output_fp);
|
fwrite(&compressed_size_u32, sizeof(uint32_t), 1, enc->output_fp);
|
||||||
fwrite(compressed_coeffs, 1, compressed_size, enc->output_fp);
|
fwrite(compressed_coeffs, 1, compressed_size, enc->output_fp);
|
||||||
|
|
||||||
size_t packet_size = 1 + sizeof(uint16_t) + sizeof(uint32_t) + total_serialized +
|
size_t packet_size = 1 + sizeof(uint16_t) + sizeof(uint32_t) + total_serialised +
|
||||||
sizeof(uint32_t) + compressed_size;
|
sizeof(uint32_t) + compressed_size;
|
||||||
|
|
||||||
// Cleanup
|
// Cleanup
|
||||||
@@ -4514,7 +4514,7 @@ static size_t encode_bframe_adaptive(tav_encoder_t *enc, int qY) {
|
|||||||
free(tree_forest);
|
free(tree_forest);
|
||||||
free(fine_fwd_mv_x); free(fine_fwd_mv_y);
|
free(fine_fwd_mv_x); free(fine_fwd_mv_y);
|
||||||
free(fine_bwd_mv_x); free(fine_bwd_mv_y);
|
free(fine_bwd_mv_x); free(fine_bwd_mv_y);
|
||||||
free(serialized_trees);
|
free(serialised_trees);
|
||||||
free(residual_y_dwt);
|
free(residual_y_dwt);
|
||||||
free(residual_co_dwt);
|
free(residual_co_dwt);
|
||||||
free(residual_cg_dwt);
|
free(residual_cg_dwt);
|
||||||
@@ -4526,7 +4526,7 @@ static size_t encode_bframe_adaptive(tav_encoder_t *enc, int qY) {
|
|||||||
|
|
||||||
if (enc->verbose) {
|
if (enc->verbose) {
|
||||||
printf(" B-frame (adaptive): %d trees, tree_data: %zu bytes, residual: %zu → %zu bytes (%.1f%%)\n",
|
printf(" B-frame (adaptive): %d trees, tree_data: %zu bytes, residual: %zu → %zu bytes (%.1f%%)\n",
|
||||||
total_trees, total_serialized, preprocessed_size, compressed_size,
|
total_trees, total_serialised, preprocessed_size, compressed_size,
|
||||||
(compressed_size * 100.0f) / preprocessed_size);
|
(compressed_size * 100.0f) / preprocessed_size);
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -4671,7 +4671,7 @@ static int gop_should_flush_twopass(tav_encoder_t *enc, int current_frame_number
|
|||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Flush GOP: apply 3D DWT, quantize, serialise, and write to output
|
// Flush GOP: apply 3D DWT, quantise, serialise, and write to output
|
||||||
// Returns number of bytes written, or 0 on error
|
// Returns number of bytes written, or 0 on error
|
||||||
// This function processes the entire GOP and writes all frames with temporal 3D DWT
|
// This function processes the entire GOP and writes all frames with temporal 3D DWT
|
||||||
static size_t gop_flush(tav_encoder_t *enc, FILE *output, int base_quantiser,
|
static size_t gop_flush(tav_encoder_t *enc, FILE *output, int base_quantiser,
|
||||||
@@ -4808,7 +4808,7 @@ static size_t gop_flush(tav_encoder_t *enc, FILE *output, int base_quantiser,
|
|||||||
float **canvas_cg_coeffs = malloc(actual_gop_size * sizeof(float*));
|
float **canvas_cg_coeffs = malloc(actual_gop_size * sizeof(float*));
|
||||||
|
|
||||||
for (int i = 0; i < actual_gop_size; i++) {
|
for (int i = 0; i < actual_gop_size; i++) {
|
||||||
canvas_y_coeffs[i] = calloc(canvas_pixels, sizeof(float)); // Zero-initialized
|
canvas_y_coeffs[i] = calloc(canvas_pixels, sizeof(float)); // Zero-initialised
|
||||||
canvas_co_coeffs[i] = calloc(canvas_pixels, sizeof(float));
|
canvas_co_coeffs[i] = calloc(canvas_pixels, sizeof(float));
|
||||||
canvas_cg_coeffs[i] = calloc(canvas_pixels, sizeof(float));
|
canvas_cg_coeffs[i] = calloc(canvas_pixels, sizeof(float));
|
||||||
|
|
||||||
@@ -4924,7 +4924,7 @@ static size_t gop_flush(tav_encoder_t *enc, FILE *output, int base_quantiser,
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Step 2: Allocate quantized coefficient buffers
|
// Step 2: Allocate quantised coefficient buffers
|
||||||
int16_t **quant_y = malloc(actual_gop_size * sizeof(int16_t*));
|
int16_t **quant_y = malloc(actual_gop_size * sizeof(int16_t*));
|
||||||
int16_t **quant_co = malloc(actual_gop_size * sizeof(int16_t*));
|
int16_t **quant_co = malloc(actual_gop_size * sizeof(int16_t*));
|
||||||
int16_t **quant_cg = malloc(actual_gop_size * sizeof(int16_t*));
|
int16_t **quant_cg = malloc(actual_gop_size * sizeof(int16_t*));
|
||||||
@@ -4935,11 +4935,11 @@ static size_t gop_flush(tav_encoder_t *enc, FILE *output, int base_quantiser,
|
|||||||
quant_cg[i] = malloc(num_pixels * sizeof(int16_t));
|
quant_cg[i] = malloc(num_pixels * sizeof(int16_t));
|
||||||
}
|
}
|
||||||
|
|
||||||
// Step 3: Quantize 3D DWT coefficients with temporal-spatial quantization
|
// Step 3: Quantise 3D DWT coefficients with temporal-spatial quantisation
|
||||||
// Use channel-specific quantizers from encoder settings
|
// Use channel-specific quantisers from encoder settings
|
||||||
int qY = base_quantiser; // Y quantizer passed as parameter
|
int qY = base_quantiser; // Y quantiser passed as parameter
|
||||||
int qCo = QLUT[enc->quantiser_co]; // Co quantizer from encoder
|
int qCo = QLUT[enc->quantiser_co]; // Co quantiser from encoder
|
||||||
int qCg = QLUT[enc->quantiser_cg]; // Cg quantizer from encoder
|
int qCg = QLUT[enc->quantiser_cg]; // Cg quantiser from encoder
|
||||||
|
|
||||||
quantise_3d_dwt_coefficients(enc, gop_y_coeffs, quant_y, actual_gop_size,
|
quantise_3d_dwt_coefficients(enc, gop_y_coeffs, quant_y, actual_gop_size,
|
||||||
num_pixels, qY, 0); // Luma
|
num_pixels, qY, 0); // Luma
|
||||||
@@ -4983,7 +4983,7 @@ static size_t gop_flush(tav_encoder_t *enc, FILE *output, int base_quantiser,
|
|||||||
const size_t max_tile_size = 4 + (num_pixels * 3 * sizeof(int16_t));
|
const size_t max_tile_size = 4 + (num_pixels * 3 * sizeof(int16_t));
|
||||||
uint8_t *uncompressed_buffer = malloc(max_tile_size);
|
uint8_t *uncompressed_buffer = malloc(max_tile_size);
|
||||||
|
|
||||||
// Use serialise_tile_data with DWT-transformed float coefficients (before quantization)
|
// Use serialise_tile_data with DWT-transformed float coefficients (before quantisation)
|
||||||
// This matches the traditional I-frame path in compress_and_write_frame
|
// This matches the traditional I-frame path in compress_and_write_frame
|
||||||
size_t tile_size = serialise_tile_data(enc, 0, 0,
|
size_t tile_size = serialise_tile_data(enc, 0, 0,
|
||||||
gop_y_coeffs[0], gop_co_coeffs[0], gop_cg_coeffs[0],
|
gop_y_coeffs[0], gop_co_coeffs[0], gop_cg_coeffs[0],
|
||||||
@@ -5640,7 +5640,7 @@ static void dwt_3d_forward_mc(
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Apply 3D DWT: temporal DWT across frames, then spatial DWT on each temporal subband
|
// Apply 3D DWT: temporal DWT across frames, then spatial DWT on each temporal subband
|
||||||
// gop_data[frame][y * width + x] - GOP buffer organized as frame-major
|
// gop_data[frame][y * width + x] - GOP buffer organised as frame-major
|
||||||
// Modifies gop_data in-place
|
// Modifies gop_data in-place
|
||||||
// NOTE: This is the OLD version without MC-lifting (kept for non-mesh mode)
|
// NOTE: This is the OLD version without MC-lifting (kept for non-mesh mode)
|
||||||
static void dwt_3d_forward(float **gop_data, int width, int height, int num_frames,
|
static void dwt_3d_forward(float **gop_data, int width, int height, int num_frames,
|
||||||
@@ -6666,7 +6666,7 @@ static void quantise_dwt_coefficients_perceptual_per_coeff(tav_encoder_t *enc,
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Quantization for EZBC mode: quantizes to discrete levels but doesn't normalize (shrink) values
|
// Quantisation for EZBC mode: quantises to discrete levels but doesn't normalise (shrink) values
|
||||||
// This reduces coefficient precision while preserving magnitude for EZBC's bitplane encoding
|
// This reduces coefficient precision while preserving magnitude for EZBC's bitplane encoding
|
||||||
static void quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(tav_encoder_t *enc,
|
static void quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(tav_encoder_t *enc,
|
||||||
float *coeffs, int16_t *quantised, int size,
|
float *coeffs, int16_t *quantised, int size,
|
||||||
@@ -6682,10 +6682,10 @@ static void quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(tav_
|
|||||||
float weight = get_perceptual_weight_for_position(enc, i, width, height, decomp_levels, is_chroma);
|
float weight = get_perceptual_weight_for_position(enc, i, width, height, decomp_levels, is_chroma);
|
||||||
float effective_q = effective_base_q * weight;
|
float effective_q = effective_base_q * weight;
|
||||||
|
|
||||||
// Step 1: Quantize - divide by quantizer to get normalized value
|
// Step 1: Quantise - divide by quantiser to get normalised value
|
||||||
float quantised_val = coeffs[i] / effective_q;
|
float quantised_val = coeffs[i] / effective_q;
|
||||||
|
|
||||||
// Step 2: Apply dead-zone quantization to normalized value
|
// Step 2: Apply dead-zone quantisation to normalised value
|
||||||
if (enc->dead_zone_threshold > 0.0f && !is_chroma) {
|
if (enc->dead_zone_threshold > 0.0f && !is_chroma) {
|
||||||
int level = get_subband_level(i, width, height, decomp_levels);
|
int level = get_subband_level(i, width, height, decomp_levels);
|
||||||
int subband_type = get_subband_type(i, width, height, decomp_levels);
|
int subband_type = get_subband_type(i, width, height, decomp_levels);
|
||||||
@@ -6715,16 +6715,16 @@ static void quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(tav_
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Step 3: Round to discrete quantization levels
|
// Step 3: Round to discrete quantisation levels
|
||||||
quantised_val = roundf(quantised_val); // file size explodes without rounding
|
quantised_val = roundf(quantised_val); // file size explodes without rounding
|
||||||
|
|
||||||
// Step 4: Denormalize - multiply back by quantizer to restore magnitude
|
// Step 4: Denormalise - multiply back by quantiser to restore magnitude
|
||||||
// This gives us quantized values at original scale (not shrunken to 0-10 range)
|
// This gives us quantised values at original scale (not shrunken to 0-10 range)
|
||||||
float denormalized = quantised_val * effective_q;
|
float denormalised = quantised_val * effective_q;
|
||||||
|
|
||||||
// CRITICAL FIX: Must round (not truncate) to match decoder behavior
|
// CRITICAL FIX: Must round (not truncate) to match decoder behavior
|
||||||
// With odd baseQ values and fractional weights, truncation causes mismatch with Sigmap mode
|
// With odd baseQ values and fractional weights, truncation causes mismatch with Sigmap mode
|
||||||
quantised[i] = (int16_t)CLAMP((int)roundf(denormalized), -32768, 32767);
|
quantised[i] = (int16_t)CLAMP((int)roundf(denormalised), -32768, 32767);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -6836,8 +6836,8 @@ static size_t serialise_tile_data(tav_encoder_t *enc, int tile_x, int tile_y,
|
|||||||
if (mode == TAV_MODE_INTRA) {
|
if (mode == TAV_MODE_INTRA) {
|
||||||
// INTRA mode: quantise coefficients directly and store for future reference
|
// INTRA mode: quantise coefficients directly and store for future reference
|
||||||
if (enc->preprocess_mode == PREPROCESS_EZBC) {
|
if (enc->preprocess_mode == PREPROCESS_EZBC) {
|
||||||
// EZBC mode: Quantize with perceptual weighting but no normalization (division by quantizer)
|
// EZBC mode: Quantise with perceptual weighting but no normalisation (division by quantiser)
|
||||||
// fprintf(stderr, "[EZBC-QUANT-INTRA] Using perceptual quantization without normalization\n");
|
// fprintf(stderr, "[EZBC-QUANT-INTRA] Using perceptual quantisation without normalisation\n");
|
||||||
quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(enc, (float*)tile_y_data, quantised_y, tile_size, this_frame_qY, enc->width, enc->height, enc->decomp_levels, 0, enc->frame_count);
|
quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(enc, (float*)tile_y_data, quantised_y, tile_size, this_frame_qY, enc->width, enc->height, enc->decomp_levels, 0, enc->frame_count);
|
||||||
quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(enc, (float*)tile_co_data, quantised_co, tile_size, this_frame_qCo, enc->width, enc->height, enc->decomp_levels, 1, enc->frame_count);
|
quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(enc, (float*)tile_co_data, quantised_co, tile_size, this_frame_qCo, enc->width, enc->height, enc->decomp_levels, 1, enc->frame_count);
|
||||||
quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(enc, (float*)tile_cg_data, quantised_cg, tile_size, this_frame_qCg, enc->width, enc->height, enc->decomp_levels, 1, enc->frame_count);
|
quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(enc, (float*)tile_cg_data, quantised_cg, tile_size, this_frame_qCg, enc->width, enc->height, enc->decomp_levels, 1, enc->frame_count);
|
||||||
@@ -6849,7 +6849,7 @@ static size_t serialise_tile_data(tav_encoder_t *enc, int tile_x, int tile_y,
|
|||||||
if (abs(quantised_co[i]) > max_co) max_co = abs(quantised_co[i]);
|
if (abs(quantised_co[i]) > max_co) max_co = abs(quantised_co[i]);
|
||||||
if (abs(quantised_cg[i]) > max_cg) max_cg = abs(quantised_cg[i]);
|
if (abs(quantised_cg[i]) > max_cg) max_cg = abs(quantised_cg[i]);
|
||||||
}
|
}
|
||||||
// fprintf(stderr, "[EZBC-QUANT-INTRA] Quantized coeff max: Y=%d, Co=%d, Cg=%d\n", max_y, max_co, max_cg);
|
// fprintf(stderr, "[EZBC-QUANT-INTRA] Quantised coeff max: Y=%d, Co=%d, Cg=%d\n", max_y, max_co, max_cg);
|
||||||
} else if (enc->perceptual_tuning) {
|
} else if (enc->perceptual_tuning) {
|
||||||
// Perceptual quantisation: EXACTLY like uniform but with per-coefficient weights
|
// Perceptual quantisation: EXACTLY like uniform but with per-coefficient weights
|
||||||
quantise_dwt_coefficients_perceptual_per_coeff(enc, (float*)tile_y_data, quantised_y, tile_size, this_frame_qY, enc->width, enc->height, enc->decomp_levels, 0, enc->frame_count);
|
quantise_dwt_coefficients_perceptual_per_coeff(enc, (float*)tile_y_data, quantised_y, tile_size, this_frame_qY, enc->width, enc->height, enc->decomp_levels, 0, enc->frame_count);
|
||||||
@@ -7798,7 +7798,7 @@ static int start_audio_conversion(tav_encoder_t *enc) {
|
|||||||
// Calculate samples per frame: ceil(sample_rate / fps)
|
// Calculate samples per frame: ceil(sample_rate / fps)
|
||||||
enc->samples_per_frame = (TSVM_AUDIO_SAMPLE_RATE + enc->output_fps - 1) / enc->output_fps;
|
enc->samples_per_frame = (TSVM_AUDIO_SAMPLE_RATE + enc->output_fps - 1) / enc->output_fps;
|
||||||
|
|
||||||
// Initialize 2nd-order noise shaping error history
|
// Initialise 2nd-order noise shaping error history
|
||||||
enc->dither_error[0][0] = 0.0f;
|
enc->dither_error[0][0] = 0.0f;
|
||||||
enc->dither_error[0][1] = 0.0f;
|
enc->dither_error[0][1] = 0.0f;
|
||||||
enc->dither_error[1][0] = 0.0f;
|
enc->dither_error[1][0] = 0.0f;
|
||||||
@@ -8510,7 +8510,7 @@ static void convert_pcm32_to_pcm8_dithered(tav_encoder_t *enc, const float *pcm3
|
|||||||
if (shaped < -1.0f) shaped = -1.0f;
|
if (shaped < -1.0f) shaped = -1.0f;
|
||||||
if (shaped > 1.0f) shaped = 1.0f;
|
if (shaped > 1.0f) shaped = 1.0f;
|
||||||
|
|
||||||
// Quantize to signed 8-bit range [-128, 127]
|
// Quantise to signed 8-bit range [-128, 127]
|
||||||
int q = (int)lrintf(shaped * scale);
|
int q = (int)lrintf(shaped * scale);
|
||||||
if (q < -128) q = -128;
|
if (q < -128) q = -128;
|
||||||
else if (q > 127) q = 127;
|
else if (q > 127) q = 127;
|
||||||
@@ -8518,7 +8518,7 @@ static void convert_pcm32_to_pcm8_dithered(tav_encoder_t *enc, const float *pcm3
|
|||||||
// Convert to unsigned 8-bit [0, 255]
|
// Convert to unsigned 8-bit [0, 255]
|
||||||
pcm8[idx] = (uint8_t)(q + (int)bias);
|
pcm8[idx] = (uint8_t)(q + (int)bias);
|
||||||
|
|
||||||
// Calculate quantization error for feedback
|
// Calculate quantisation error for feedback
|
||||||
float qerr = shaped - (float)q / scale;
|
float qerr = shaped - (float)q / scale;
|
||||||
|
|
||||||
// Update error history (shift and store)
|
// Update error history (shift and store)
|
||||||
@@ -8623,9 +8623,9 @@ static int write_tad_packet_samples(tav_encoder_t *enc, FILE *output, int sample
|
|||||||
if (tad_quality > TAD32_QUALITY_MAX) tad_quality = TAD32_QUALITY_MAX;
|
if (tad_quality > TAD32_QUALITY_MAX) tad_quality = TAD32_QUALITY_MAX;
|
||||||
if (tad_quality < TAD32_QUALITY_MIN) tad_quality = TAD32_QUALITY_MIN;
|
if (tad_quality < TAD32_QUALITY_MIN) tad_quality = TAD32_QUALITY_MIN;
|
||||||
|
|
||||||
// Convert quality (0-5) to max_index for quantization
|
// Convert quality (0-5) to max_index for quantisation
|
||||||
int max_index = tad32_quality_to_max_index(tad_quality);
|
int max_index = tad32_quality_to_max_index(tad_quality);
|
||||||
float quantiser_scale = 1.0f; // Baseline quantizer scaling
|
float quantiser_scale = 1.0f; // Baseline quantiser scaling
|
||||||
|
|
||||||
// Allocate output buffer (generous size for TAD chunk)
|
// Allocate output buffer (generous size for TAD chunk)
|
||||||
size_t max_output_size = samples_to_read * 4 * sizeof(int16_t) + 1024;
|
size_t max_output_size = samples_to_read * 4 * sizeof(int16_t) + 1024;
|
||||||
@@ -8963,7 +8963,7 @@ static int process_audio_for_gop(tav_encoder_t *enc, int *frame_numbers, int num
|
|||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Handle first frame initialization (same as process_audio)
|
// Handle first frame initialisation (same as process_audio)
|
||||||
int first_frame_in_gop = frame_numbers[0];
|
int first_frame_in_gop = frame_numbers[0];
|
||||||
if (first_frame_in_gop == 0) {
|
if (first_frame_in_gop == 0) {
|
||||||
uint8_t header[4];
|
uint8_t header[4];
|
||||||
@@ -9255,7 +9255,7 @@ static double calculate_shannon_entropy(const float *coeffs, int count) {
|
|||||||
#define HIST_BINS 256
|
#define HIST_BINS 256
|
||||||
int histogram[HIST_BINS] = {0};
|
int histogram[HIST_BINS] = {0};
|
||||||
|
|
||||||
// Find min/max for normalization
|
// Find min/max for normalisation
|
||||||
float min_val = FLT_MAX, max_val = -FLT_MAX;
|
float min_val = FLT_MAX, max_val = -FLT_MAX;
|
||||||
for (int i = 0; i < count; i++) {
|
for (int i = 0; i < count; i++) {
|
||||||
float abs_val = fabsf(coeffs[i]);
|
float abs_val = fabsf(coeffs[i]);
|
||||||
@@ -9325,7 +9325,7 @@ static void compute_frame_metrics(const float *dwt_current, const float *dwt_pre
|
|||||||
frame_analysis_t *metrics) {
|
frame_analysis_t *metrics) {
|
||||||
int num_pixels = width * height;
|
int num_pixels = width * height;
|
||||||
|
|
||||||
// Initialize metrics
|
// Initialise metrics
|
||||||
memset(metrics, 0, sizeof(frame_analysis_t));
|
memset(metrics, 0, sizeof(frame_analysis_t));
|
||||||
|
|
||||||
// Extract LL band (approximation coefficients)
|
// Extract LL band (approximation coefficients)
|
||||||
@@ -9438,16 +9438,16 @@ static int detect_scene_change_wavelet(int frame_number,
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Detection rule 1: Hard cut or fast fade (LL_diff spike)
|
// Detection rule 1: Hard cut or fast fade (LL_diff spike)
|
||||||
// Improvement: Normalize LL_diff by LL_mean to handle exposure/lighting changes
|
// Improvement: Normalise LL_diff by LL_mean to handle exposure/lighting changes
|
||||||
double normalized_ll_diff = current_metrics->ll_mean > 1.0 ?
|
double normalised_ll_diff = current_metrics->ll_mean > 1.0 ?
|
||||||
current_metrics->ll_diff / current_metrics->ll_mean : current_metrics->ll_diff;
|
current_metrics->ll_diff / current_metrics->ll_mean : current_metrics->ll_diff;
|
||||||
double normalized_threshold = current_metrics->ll_mean > 1.0 ?
|
double normalised_threshold = current_metrics->ll_mean > 1.0 ?
|
||||||
ll_diff_threshold / current_metrics->ll_mean : ll_diff_threshold;
|
ll_diff_threshold / current_metrics->ll_mean : ll_diff_threshold;
|
||||||
|
|
||||||
if (normalized_ll_diff > normalized_threshold) {
|
if (normalised_ll_diff > normalised_threshold) {
|
||||||
if (verbose) {
|
if (verbose) {
|
||||||
printf(" Scene change detected frame %d: Normalized LL_diff=%.4f > threshold=%.4f (raw: %.2f > %.2f)\n",
|
printf(" Scene change detected frame %d: Normalised LL_diff=%.4f > threshold=%.4f (raw: %.2f > %.2f)\n",
|
||||||
frame_number + 1, normalized_ll_diff, normalized_threshold,
|
frame_number + 1, normalised_ll_diff, normalised_threshold,
|
||||||
current_metrics->ll_diff, ll_diff_threshold);
|
current_metrics->ll_diff, ll_diff_threshold);
|
||||||
}
|
}
|
||||||
return 1;
|
return 1;
|
||||||
@@ -9457,7 +9457,7 @@ static int detect_scene_change_wavelet(int frame_number,
|
|||||||
// Improvement: Require temporal persistence only for borderline detections
|
// Improvement: Require temporal persistence only for borderline detections
|
||||||
double hb_ratio_threshold = ANALYSIS_HB_RATIO_THRESHOLD;
|
double hb_ratio_threshold = ANALYSIS_HB_RATIO_THRESHOLD;
|
||||||
|
|
||||||
// Calculate average highband energy from history (normalized by total energy for RMS-like measure)
|
// Calculate average highband energy from history (normalised by total energy for RMS-like measure)
|
||||||
double hb_energy_sum = 0.0;
|
double hb_energy_sum = 0.0;
|
||||||
for (int i = start_idx; i < history_count; i++) {
|
for (int i = start_idx; i < history_count; i++) {
|
||||||
hb_energy_sum += metrics_history[i].highband_energy;
|
hb_energy_sum += metrics_history[i].highband_energy;
|
||||||
@@ -9884,7 +9884,7 @@ int main(int argc, char *argv[]) {
|
|||||||
{"dimension", required_argument, 0, 's'},
|
{"dimension", required_argument, 0, 's'},
|
||||||
{"fps", required_argument, 0, 'f'},
|
{"fps", required_argument, 0, 'f'},
|
||||||
{"quality", required_argument, 0, 'q'},
|
{"quality", required_argument, 0, 'q'},
|
||||||
{"quantizer", required_argument, 0, 'Q'},
|
{"quantiser", required_argument, 0, 'Q'},
|
||||||
{"quantiser", required_argument, 0, 'Q'},
|
{"quantiser", required_argument, 0, 'Q'},
|
||||||
{"wavelet", required_argument, 0, 1010},
|
{"wavelet", required_argument, 0, 1010},
|
||||||
{"channel-layout", required_argument, 0, 'c'},
|
{"channel-layout", required_argument, 0, 'c'},
|
||||||
@@ -10371,7 +10371,7 @@ int main(int argc, char *argv[]) {
|
|||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Initialize GOP boundary iterator for second pass
|
// Initialise GOP boundary iterator for second pass
|
||||||
enc->current_gop_boundary = enc->gop_boundaries;
|
enc->current_gop_boundary = enc->gop_boundaries;
|
||||||
enc->two_pass_current_frame = 0;
|
enc->two_pass_current_frame = 0;
|
||||||
|
|
||||||
|
|||||||
@@ -458,11 +458,11 @@ static void colour_space_to_rgb(tev_encoder_t *enc, double c1, double c2, double
|
|||||||
// Pre-calculated cosine tables
|
// Pre-calculated cosine tables
|
||||||
static float dct_table_16[16][16]; // For 16x16 DCT
|
static float dct_table_16[16][16]; // For 16x16 DCT
|
||||||
static float dct_table_8[8][8]; // For 8x8 DCT
|
static float dct_table_8[8][8]; // For 8x8 DCT
|
||||||
static int tables_initialized = 0;
|
static int tables_initialised = 0;
|
||||||
|
|
||||||
// Initialize the pre-calculated tables
|
// Initialise the pre-calculated tables
|
||||||
static void init_dct_tables(void) {
|
static void init_dct_tables(void) {
|
||||||
if (tables_initialized) return;
|
if (tables_initialised) return;
|
||||||
|
|
||||||
// Pre-calculate cosine values for 16x16 DCT
|
// Pre-calculate cosine values for 16x16 DCT
|
||||||
for (int u = 0; u < 16; u++) {
|
for (int u = 0; u < 16; u++) {
|
||||||
@@ -478,7 +478,7 @@ static void init_dct_tables(void) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
tables_initialized = 1;
|
tables_initialised = 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
// 16x16 2D DCT
|
// 16x16 2D DCT
|
||||||
@@ -486,7 +486,7 @@ static void init_dct_tables(void) {
|
|||||||
static float temp_dct_16[BLOCK_SIZE_SQR]; // Reusable temporary buffer
|
static float temp_dct_16[BLOCK_SIZE_SQR]; // Reusable temporary buffer
|
||||||
|
|
||||||
static void dct_16x16_fast(float *input, float *output) {
|
static void dct_16x16_fast(float *input, float *output) {
|
||||||
init_dct_tables(); // Ensure tables are initialized
|
init_dct_tables(); // Ensure tables are initialised
|
||||||
|
|
||||||
// First pass: Process rows (16 1D DCTs)
|
// First pass: Process rows (16 1D DCTs)
|
||||||
for (int row = 0; row < 16; row++) {
|
for (int row = 0; row < 16; row++) {
|
||||||
@@ -521,7 +521,7 @@ static void dct_16x16_fast(float *input, float *output) {
|
|||||||
static float temp_dct_8[HALF_BLOCK_SIZE_SQR]; // Reusable temporary buffer
|
static float temp_dct_8[HALF_BLOCK_SIZE_SQR]; // Reusable temporary buffer
|
||||||
|
|
||||||
static void dct_8x8_fast(float *input, float *output) {
|
static void dct_8x8_fast(float *input, float *output) {
|
||||||
init_dct_tables(); // Ensure tables are initialized
|
init_dct_tables(); // Ensure tables are initialised
|
||||||
|
|
||||||
// First pass: Process rows (8 1D DCTs)
|
// First pass: Process rows (8 1D DCTs)
|
||||||
for (int row = 0; row < 8; row++) {
|
for (int row = 0; row < 8; row++) {
|
||||||
@@ -770,11 +770,11 @@ static float complexity_to_rate_factor(float complexity) {
|
|||||||
float log_median = logf(median_complexity + 1.0f);
|
float log_median = logf(median_complexity + 1.0f);
|
||||||
float log_high = logf(high_complexity + 1.0f);
|
float log_high = logf(high_complexity + 1.0f);
|
||||||
|
|
||||||
// Normalize: 0 = median complexity, 1 = high complexity threshold
|
// Normalise: 0 = median complexity, 1 = high complexity threshold
|
||||||
float normalized = (log_complexity - log_median) / (log_high - log_median);
|
float normalised = (log_complexity - log_median) / (log_high - log_median);
|
||||||
|
|
||||||
// Sigmoid centered at median: f(0) ≈ 1.0, f(1) ≈ 1.6, f(-∞) ≈ 0.7
|
// Sigmoid centered at median: f(0) ≈ 1.0, f(1) ≈ 1.6, f(-∞) ≈ 0.7
|
||||||
float sigmoid = 1.0f / (1.0f + expf(-4.0f * normalized));
|
float sigmoid = 1.0f / (1.0f + expf(-4.0f * normalised));
|
||||||
float rate_factor = 0.7f + 0.9f * sigmoid; // Range: 0.7 to 1.6
|
float rate_factor = 0.7f + 0.9f * sigmoid; // Range: 0.7 to 1.6
|
||||||
|
|
||||||
// Clamp to prevent extreme coefficient amplification/reduction
|
// Clamp to prevent extreme coefficient amplification/reduction
|
||||||
@@ -787,7 +787,7 @@ static float complexity_to_rate_factor(float complexity) {
|
|||||||
static void add_complexity_value(tev_encoder_t *enc, float complexity) {
|
static void add_complexity_value(tev_encoder_t *enc, float complexity) {
|
||||||
if (!enc->stats_mode) return;
|
if (!enc->stats_mode) return;
|
||||||
|
|
||||||
// Initialize array if needed
|
// Initialise array if needed
|
||||||
if (!enc->complexity_values) {
|
if (!enc->complexity_values) {
|
||||||
enc->complexity_capacity = 10000; // Initial capacity
|
enc->complexity_capacity = 10000; // Initial capacity
|
||||||
enc->complexity_values = malloc(enc->complexity_capacity * sizeof(float));
|
enc->complexity_values = malloc(enc->complexity_capacity * sizeof(float));
|
||||||
@@ -1416,7 +1416,7 @@ static subtitle_entry_t* parse_srt_file(const char *filename, int fps) {
|
|||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Initialize text buffer
|
// Initialise text buffer
|
||||||
text_buffer_size = 256;
|
text_buffer_size = 256;
|
||||||
text_buffer = malloc(text_buffer_size);
|
text_buffer = malloc(text_buffer_size);
|
||||||
if (!text_buffer) {
|
if (!text_buffer) {
|
||||||
@@ -1917,7 +1917,7 @@ static int write_all_subtitles_tc(tev_encoder_t *enc, FILE *output) {
|
|||||||
return bytes_written;
|
return bytes_written;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Initialize encoder
|
// Initialise encoder
|
||||||
static tev_encoder_t* init_encoder(void) {
|
static tev_encoder_t* init_encoder(void) {
|
||||||
tev_encoder_t *enc = calloc(1, sizeof(tev_encoder_t));
|
tev_encoder_t *enc = calloc(1, sizeof(tev_encoder_t));
|
||||||
if (!enc) return NULL;
|
if (!enc) return NULL;
|
||||||
@@ -1997,10 +1997,10 @@ static int alloc_encoder_buffers(tev_encoder_t *enc) {
|
|||||||
return -1;
|
return -1;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Initialize Zstd compression context
|
// Initialise Zstd compression context
|
||||||
enc->zstd_context = ZSTD_createCCtx();
|
enc->zstd_context = ZSTD_createCCtx();
|
||||||
if (!enc->zstd_context) {
|
if (!enc->zstd_context) {
|
||||||
fprintf(stderr, "Failed to initialize Zstd compression\n");
|
fprintf(stderr, "Failed to initialise Zstd compression\n");
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -2009,7 +2009,7 @@ static int alloc_encoder_buffers(tev_encoder_t *enc) {
|
|||||||
ZSTD_CCtx_setParameter(enc->zstd_context, ZSTD_c_windowLog, 24); // 16MB window (should be plenty to hold an entire frame; interframe compression is unavailable)
|
ZSTD_CCtx_setParameter(enc->zstd_context, ZSTD_c_windowLog, 24); // 16MB window (should be plenty to hold an entire frame; interframe compression is unavailable)
|
||||||
ZSTD_CCtx_setParameter(enc->zstd_context, ZSTD_c_hashLog, 16);
|
ZSTD_CCtx_setParameter(enc->zstd_context, ZSTD_c_hashLog, 16);
|
||||||
|
|
||||||
// Initialize previous frame to black
|
// Initialise previous frame to black
|
||||||
memset(enc->previous_rgb, 0, encoding_pixels * 3);
|
memset(enc->previous_rgb, 0, encoding_pixels * 3);
|
||||||
memset(enc->previous_even_field, 0, encoding_pixels * 3);
|
memset(enc->previous_even_field, 0, encoding_pixels * 3);
|
||||||
|
|
||||||
@@ -2467,7 +2467,7 @@ static int process_audio(tev_encoder_t *enc, int frame_num, FILE *output) {
|
|||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Initialize packet size on first frame
|
// Initialise packet size on first frame
|
||||||
if (enc->mp2_packet_size == 0) {
|
if (enc->mp2_packet_size == 0) {
|
||||||
uint8_t header[4];
|
uint8_t header[4];
|
||||||
if (fread(header, 1, 4, enc->mp2_file) != 4) return 1;
|
if (fread(header, 1, 4, enc->mp2_file) != 4) return 1;
|
||||||
@@ -2665,7 +2665,7 @@ int main(int argc, char *argv[]) {
|
|||||||
{"fps", required_argument, 0, 'f'},
|
{"fps", required_argument, 0, 'f'},
|
||||||
{"quality", required_argument, 0, 'q'},
|
{"quality", required_argument, 0, 'q'},
|
||||||
{"quantiser", required_argument, 0, 'Q'},
|
{"quantiser", required_argument, 0, 'Q'},
|
||||||
{"quantizer", required_argument, 0, 'Q'},
|
{"quantiser", required_argument, 0, 'Q'},
|
||||||
{"bitrate", required_argument, 0, 'b'},
|
{"bitrate", required_argument, 0, 'b'},
|
||||||
{"arate", required_argument, 0, 1400},
|
{"arate", required_argument, 0, 1400},
|
||||||
{"progressive", no_argument, 0, 'p'},
|
{"progressive", no_argument, 0, 'p'},
|
||||||
@@ -2793,7 +2793,7 @@ int main(int argc, char *argv[]) {
|
|||||||
|
|
||||||
if (enc->ictcp_mode) {
|
if (enc->ictcp_mode) {
|
||||||
// ICtCp: Ct and Cp have different characteristics than YCoCg Co/Cg
|
// ICtCp: Ct and Cp have different characteristics than YCoCg Co/Cg
|
||||||
// Cp channel now uses specialized quantisation table, so moderate quality is fine
|
// Cp channel now uses specialised quantisation table, so moderate quality is fine
|
||||||
int base_chroma_quality = enc->qualityCo;
|
int base_chroma_quality = enc->qualityCo;
|
||||||
enc->qualityCo = base_chroma_quality; // Ct channel: keep original Co quantisation
|
enc->qualityCo = base_chroma_quality; // Ct channel: keep original Co quantisation
|
||||||
enc->qualityCg = base_chroma_quality; // Cp channel: same quality since Q_Cp_8 handles detail preservation
|
enc->qualityCg = base_chroma_quality; // Cp channel: same quality since Q_Cp_8 handles detail preservation
|
||||||
|
|||||||
@@ -21,7 +21,7 @@ static inline uint8_t range_decoder_get_byte(RangeDecoder *dec) {
|
|||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void range_encoder_renormalize(RangeEncoder *enc) {
|
static void range_encoder_renormalise(RangeEncoder *enc) {
|
||||||
while (enc->range <= BOTTOM_VALUE) {
|
while (enc->range <= BOTTOM_VALUE) {
|
||||||
range_encoder_put_byte(enc, (enc->low >> 24) & 0xFF);
|
range_encoder_put_byte(enc, (enc->low >> 24) & 0xFF);
|
||||||
enc->low <<= 8;
|
enc->low <<= 8;
|
||||||
@@ -29,7 +29,7 @@ static void range_encoder_renormalize(RangeEncoder *enc) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
static void range_decoder_renormalize(RangeDecoder *dec) {
|
static void range_decoder_renormalise(RangeDecoder *dec) {
|
||||||
while (dec->range <= BOTTOM_VALUE) {
|
while (dec->range <= BOTTOM_VALUE) {
|
||||||
dec->code = (dec->code << 8) | range_decoder_get_byte(dec);
|
dec->code = (dec->code << 8) | range_decoder_get_byte(dec);
|
||||||
dec->low <<= 8;
|
dec->low <<= 8;
|
||||||
@@ -66,7 +66,7 @@ void range_encode_int16_laplacian(RangeEncoder *enc, int16_t value, int16_t max_
|
|||||||
double cdf_low = (value == -max_abs_value) ? 0.0 : laplacian_cdf(value - 1, lambda);
|
double cdf_low = (value == -max_abs_value) ? 0.0 : laplacian_cdf(value - 1, lambda);
|
||||||
double cdf_high = laplacian_cdf(value, lambda);
|
double cdf_high = laplacian_cdf(value, lambda);
|
||||||
|
|
||||||
// Normalize to get cumulative counts in range [0, SCALE]
|
// Normalise to get cumulative counts in range [0, SCALE]
|
||||||
const uint32_t SCALE = 0x10000; // 65536 for precision
|
const uint32_t SCALE = 0x10000; // 65536 for precision
|
||||||
uint32_t cum_low = (uint32_t)(cdf_low * SCALE);
|
uint32_t cum_low = (uint32_t)(cdf_low * SCALE);
|
||||||
uint32_t cum_high = (uint32_t)(cdf_high * SCALE);
|
uint32_t cum_high = (uint32_t)(cdf_high * SCALE);
|
||||||
@@ -80,7 +80,7 @@ void range_encode_int16_laplacian(RangeEncoder *enc, int16_t value, int16_t max_
|
|||||||
enc->low += (uint32_t)((range_64 * cum_low) / SCALE);
|
enc->low += (uint32_t)((range_64 * cum_low) / SCALE);
|
||||||
enc->range = (uint32_t)((range_64 * (cum_high - cum_low)) / SCALE);
|
enc->range = (uint32_t)((range_64 * (cum_high - cum_low)) / SCALE);
|
||||||
|
|
||||||
range_encoder_renormalize(enc);
|
range_encoder_renormalise(enc);
|
||||||
}
|
}
|
||||||
|
|
||||||
size_t range_encoder_finish(RangeEncoder *enc) {
|
size_t range_encoder_finish(RangeEncoder *enc) {
|
||||||
@@ -137,7 +137,7 @@ int16_t range_decode_int16_laplacian(RangeDecoder *dec, int16_t max_abs_value, f
|
|||||||
dec->low += (uint32_t)((range_64 * cum_low) / SCALE);
|
dec->low += (uint32_t)((range_64 * cum_low) / SCALE);
|
||||||
dec->range = (uint32_t)((range_64 * (cum_high - cum_low)) / SCALE);
|
dec->range = (uint32_t)((range_64 * (cum_high - cum_low)) / SCALE);
|
||||||
|
|
||||||
range_decoder_renormalize(dec);
|
range_decoder_renormalise(dec);
|
||||||
return value;
|
return value;
|
||||||
} else if (cum_freq < cum_low) {
|
} else if (cum_freq < cum_low) {
|
||||||
high = mid - 1;
|
high = mid - 1;
|
||||||
@@ -147,6 +147,6 @@ int16_t range_decode_int16_laplacian(RangeDecoder *dec, int16_t max_abs_value, f
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Fallback: shouldn't happen with correct encoding
|
// Fallback: shouldn't happen with correct encoding
|
||||||
range_decoder_renormalize(dec);
|
range_decoder_renormalise(dec);
|
||||||
return value;
|
return value;
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -24,16 +24,16 @@ typedef struct {
|
|||||||
size_t buffer_size;
|
size_t buffer_size;
|
||||||
} RangeDecoder;
|
} RangeDecoder;
|
||||||
|
|
||||||
// Initialize encoder
|
// Initialise encoder
|
||||||
void range_encoder_init(RangeEncoder *enc, uint8_t *buffer, size_t capacity);
|
void range_encoder_init(RangeEncoder *enc, uint8_t *buffer, size_t capacity);
|
||||||
|
|
||||||
// Encode a signed 16-bit value with Laplacian distribution (λ=5.0, μ=0)
|
// Encode a signed 16-bit value with Laplacian distribution (λ=5.0, μ=0)
|
||||||
void range_encode_int16_laplacian(RangeEncoder *enc, int16_t value, int16_t max_abs_value, float lambda);
|
void range_encode_int16_laplacian(RangeEncoder *enc, int16_t value, int16_t max_abs_value, float lambda);
|
||||||
|
|
||||||
// Finalize encoding and return bytes written
|
// Finalise encoding and return bytes written
|
||||||
size_t range_encoder_finish(RangeEncoder *enc);
|
size_t range_encoder_finish(RangeEncoder *enc);
|
||||||
|
|
||||||
// Initialize decoder
|
// Initialise decoder
|
||||||
void range_decoder_init(RangeDecoder *dec, const uint8_t *buffer, size_t size);
|
void range_decoder_init(RangeDecoder *dec, const uint8_t *buffer, size_t size);
|
||||||
|
|
||||||
// Decode a signed 16-bit value with Laplacian distribution (λ=5.0, μ=0)
|
// Decode a signed 16-bit value with Laplacian distribution (λ=5.0, μ=0)
|
||||||
|
|||||||
@@ -531,7 +531,7 @@ static const char* VERDESC[] = {"null", "YCoCg tiled, uniform", "ICtCp tiled, un
|
|||||||
if (wavelet == 255) printf(" (Haar)");
|
if (wavelet == 255) printf(" (Haar)");
|
||||||
printf("\n");
|
printf("\n");
|
||||||
printf(" Decomp levels: %d\n", decomp_levels);
|
printf(" Decomp levels: %d\n", decomp_levels);
|
||||||
printf(" Quantizers: Y=%d, Co=%d, Cg=%d (Index=%d,%d,%d)\n", QLUT[quant_y], QLUT[quant_co], QLUT[quant_cg], quant_y, quant_co, quant_cg);
|
printf(" Quantisers: Y=%d, Co=%d, Cg=%d (Index=%d,%d,%d)\n", QLUT[quant_y], QLUT[quant_co], QLUT[quant_cg], quant_y, quant_co, quant_cg);
|
||||||
if (quality > 0)
|
if (quality > 0)
|
||||||
printf(" Quality: %d\n", quality - 1);
|
printf(" Quality: %d\n", quality - 1);
|
||||||
else
|
else
|
||||||
|
|||||||
@@ -270,7 +270,7 @@ int main(int argc, char** argv) {
|
|||||||
avg_motion /= (mesh_w * mesh_h);
|
avg_motion /= (mesh_w * mesh_h);
|
||||||
printf(" Motion: avg=%.2f px, max=%.2f px\n\n", avg_motion, max_motion);
|
printf(" Motion: avg=%.2f px, max=%.2f px\n\n", avg_motion, max_motion);
|
||||||
|
|
||||||
// Save visualization for worst case
|
// Save visualisation for worst case
|
||||||
if (test == 0 || roundtrip_psnr < 30.0) {
|
if (test == 0 || roundtrip_psnr < 30.0) {
|
||||||
char filename[256];
|
char filename[256];
|
||||||
sprintf(filename, "roundtrip_%04d_original.png", frame_num);
|
sprintf(filename, "roundtrip_%04d_original.png", frame_num);
|
||||||
@@ -293,7 +293,7 @@ int main(int argc, char** argv) {
|
|||||||
}
|
}
|
||||||
sprintf(filename, "roundtrip_%04d_diff.png", frame_num);
|
sprintf(filename, "roundtrip_%04d_diff.png", frame_num);
|
||||||
cv::imwrite(filename, diff_roundtrip);
|
cv::imwrite(filename, diff_roundtrip);
|
||||||
printf(" Saved visualization: roundtrip_%04d_*.png\n\n", frame_num);
|
printf(" Saved visualisation: roundtrip_%04d_*.png\n\n", frame_num);
|
||||||
}
|
}
|
||||||
|
|
||||||
free(flow_x);
|
free(flow_x);
|
||||||
|
|||||||
@@ -158,7 +158,7 @@ static void apply_mesh_warp_rgb(
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Create visualization overlay showing affine cells
|
// Create visualisation overlay showing affine cells
|
||||||
static void create_affine_overlay(
|
static void create_affine_overlay(
|
||||||
cv::Mat &img,
|
cv::Mat &img,
|
||||||
const uint8_t *affine_mask,
|
const uint8_t *affine_mask,
|
||||||
@@ -334,7 +334,7 @@ int main(int argc, char** argv) {
|
|||||||
affine_mask, affine_a11, affine_a12, affine_a21, affine_a22,
|
affine_mask, affine_a11, affine_a12, affine_a21, affine_a22,
|
||||||
mesh_w, mesh_h);
|
mesh_w, mesh_h);
|
||||||
|
|
||||||
// Create visualization with affine overlay
|
// Create visualisation with affine overlay
|
||||||
cv::Mat warped_viz = warped.clone();
|
cv::Mat warped_viz = warped.clone();
|
||||||
create_affine_overlay(warped_viz, affine_mask, mesh_w, mesh_h);
|
create_affine_overlay(warped_viz, affine_mask, mesh_w, mesh_h);
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user