TAV/TAD doc update

This commit is contained in:
minjaesong
2025-11-10 17:01:44 +09:00
parent edb951fb1a
commit c1d6a959f5
18 changed files with 512 additions and 423 deletions

View File

@@ -83,11 +83,11 @@ Use the build scripts in `buildapp/`:
- `assets/disk0/`: Virtual disk content including TVDOS system files - `assets/disk0/`: Virtual disk content including TVDOS system files
- `assets/bios/`: BIOS ROM files and implementations - `assets/bios/`: BIOS ROM files and implementations
- `My_BASIC_Programs/`: Example BASIC programs for testing - `My_BASIC_Programs/`: Example BASIC programs for testing
- TVDOS filesystem uses custom format with specialized drivers - TVDOS filesystem uses custom format with specialised drivers
## Videotron2K ## Videotron2K
The Videotron2K is a specialized video display controller with: The Videotron2K is a specialised video display controller with:
- Assembly-like programming language - Assembly-like programming language
- 6 general registers (r1-r6) and special registers (tmr, frm, px, py, c1-c6) - 6 general registers (r1-r6) and special registers (tmr, frm, px, py, c1-c6)
- Scene-based programming model - Scene-based programming model
@@ -148,7 +148,7 @@ Peripheral memories can be accessed using `vm.peek()` and `vm.poke()` functions,
- **Features**: - **Features**:
- 16×16 DCT blocks (vs 4×4 in iPF) for better compression - 16×16 DCT blocks (vs 4×4 in iPF) for better compression
- Motion compensation with ±8 pixel search range - Motion compensation with ±8 pixel search range
- YCoCg-R 4:2:0 Chroma subsampling (more aggressive quantization on Cg channel) - YCoCg-R 4:2:0 Chroma subsampling (more aggressive quantisation on Cg channel)
- Full 8-Bit RGB colour for increased visual fidelity, rendered down to TSVM-compliant 4-Bit RGB with dithering upon playback - Full 8-Bit RGB colour for increased visual fidelity, rendered down to TSVM-compliant 4-Bit RGB with dithering upon playback
- **Usage Examples**: - **Usage Examples**:
```bash ```bash
@@ -163,7 +163,7 @@ Peripheral memories can be accessed using `vm.peek()` and `vm.poke()` functions,
#### TAV Format (TSVM Advanced Video) #### TAV Format (TSVM Advanced Video)
- **Successor to TEV**: DWT-based video codec using wavelet transforms instead of DCT - **Successor to TEV**: DWT-based video codec using wavelet transforms instead of DCT
- **C Encoder**: `video_encoder/encoder_tav.c` - Multi-wavelet encoder with perceptual quantization - **C Encoder**: `video_encoder/encoder_tav.c` - Multi-wavelet encoder with perceptual quantisation
- How to build: `make tav` - How to build: `make tav`
- **Wavelet Support**: Multiple wavelet types for different compression characteristics - **Wavelet Support**: Multiple wavelet types for different compression characteristics
- **JS Decoder**: `assets/disk0/tvdos/bin/playtav.js` - Native decoder for TAV format playback - **JS Decoder**: `assets/disk0/tvdos/bin/playtav.js` - Native decoder for TAV format playback
@@ -172,8 +172,8 @@ Peripheral memories can be accessed using `vm.peek()` and `vm.poke()` functions,
- **Features**: - **Features**:
- **Multiple Wavelet Types**: 5/3 reversible, 9/7 irreversible, CDF 13/7, DD-4, Haar - **Multiple Wavelet Types**: 5/3 reversible, 9/7 irreversible, CDF 13/7, DD-4, Haar
- **Single-tile encoding**: One large DWT tile for optimal quality (no blocking artifacts) - **Single-tile encoding**: One large DWT tile for optimal quality (no blocking artifacts)
- **Perceptual quantization**: HVS-optimized coefficient scaling - **Perceptual quantisation**: HVS-optimized coefficient scaling
- **YCoCg-R color space**: Efficient chroma representation with "simulated" subsampling using anisotropic quantization (search for "ANISOTROPY_MULT_CHROMA" on the encoder) - **YCoCg-R color space**: Efficient chroma representation with "simulated" subsampling using anisotropic quantisation (search for "ANISOTROPY_MULT_CHROMA" on the encoder)
- **6-level DWT decomposition**: Deep frequency analysis for better compression (deeper levels possible but 6 is the maximum for the default TSVM size) - **6-level DWT decomposition**: Deep frequency analysis for better compression (deeper levels possible but 6 is the maximum for the default TSVM size)
- **Significance Map Compression**: Improved coefficient storage format exploiting sparsity for 16-18% additional compression (2025-09-29 update) - **Significance Map Compression**: Improved coefficient storage format exploiting sparsity for 16-18% additional compression (2025-09-29 update)
- **Concatenated Maps Layout**: Cross-channel compression optimization for additional 1.6% improvement (2025-09-29 enhanced) - **Concatenated Maps Layout**: Cross-channel compression optimization for additional 1.6% improvement (2025-09-29 enhanced)
@@ -225,18 +225,18 @@ Peripheral memories can be accessed using `vm.peek()` and `vm.poke()` functions,
- **Solution**: Ensure forward and inverse transforms use identical coefficient indexing and reverse operations exactly - **Solution**: Ensure forward and inverse transforms use identical coefficient indexing and reverse operations exactly
**Supported Wavelets**: **Supported Wavelets**:
- **0**: 5/3 reversible (lossless when unquantized, JPEG 2000 standard) - **0**: 5/3 reversible (lossless when unquantised, JPEG 2000 standard)
- **1**: 9/7 irreversible (best compression, CDF 9/7 variant, default choice) - **1**: 9/7 irreversible (best compression, CDF 9/7 variant, default choice)
- **2**: CDF 13/7 (experimental, simplified implementation) - **2**: CDF 13/7 (experimental, simplified implementation)
- **16**: DD-4 (four-point interpolating Deslauriers-Dubuc, for still images) - **16**: DD-4 (four-point interpolating Deslauriers-Dubuc, for still images)
- **255**: Haar (demonstration only, simplest possible wavelet) - **255**: Haar (demonstration only, simplest possible wavelet)
- **Format documentation**: `terranmon.txt` (search for "TSVM Advanced Video (TAV) Format") - **Format documentation**: `terranmon.txt` (search for "TSVM Advanced Video (TAV) Format")
- **Version**: Current (perceptual quantization, multi-wavelet support, significance map compression) - **Version**: Current (perceptual quantisation, multi-wavelet support, significance map compression)
#### TAV Significance Map Compression (Technical Details) #### TAV Significance Map Compression (Technical Details)
The significance map compression technique implemented on 2025-09-29 provides substantial compression improvements by exploiting the sparsity of quantized DWT coefficients: The significance map compression technique implemented on 2025-09-29 provides substantial compression improvements by exploiting the sparsity of quantised DWT coefficients:
**Implementation Files**: **Implementation Files**:
- **C Encoder**: `video_encoder/encoder_tav.c` - `preprocess_coefficients()` function (lines 960-991) - **C Encoder**: `video_encoder/encoder_tav.c` - `preprocess_coefficients()` function (lines 960-991)
@@ -264,7 +264,7 @@ Concatenated Maps Layout:
``` ```
**Performance**: **Performance**:
- **Sparsity exploitation**: Tested on quantized DWT coefficients with 86.9% sparsity (Y), 97.8% (Co), 99.5% (Cg) - **Sparsity exploitation**: Tested on quantised DWT coefficients with 86.9% sparsity (Y), 97.8% (Co), 99.5% (Cg)
- **Compression improvement**: 16.4% from significance maps + 1.6% from concatenated layout - **Compression improvement**: 16.4% from significance maps + 1.6% from concatenated layout
- **Real-world impact**: 559 bytes saved per frame (5.59 MB per 10k frames) - **Real-world impact**: 559 bytes saved per frame (5.59 MB per 10k frames)
- **Cross-channel benefit**: Concatenated maps allow Zstd to exploit similarity between significance patterns - **Cross-channel benefit**: Concatenated maps allow Zstd to exploit similarity between significance patterns
@@ -320,18 +320,23 @@ Implemented on 2025-10-15 for improved temporal compression through group-of-pic
- **C Encoder**: `video_encoder/encoder_tad.c` - Core Encoder library; `video_encoder/encoder_tad_standalone.c` - Standalone encoder with FFmpeg integration - **C Encoder**: `video_encoder/encoder_tad.c` - Core Encoder library; `video_encoder/encoder_tad_standalone.c` - Standalone encoder with FFmpeg integration
- How to build: `make tad` - How to build: `make tad`
- **Quality Levels**: 0-5 (0=lowest quality/smallest, 5=highest quality/largest; designed to be in sync with TAV encoder) - **Quality Levels**: 0-5 (0=lowest quality/smallest, 5=highest quality/largest; designed to be in sync with TAV encoder)
- **C Decoder**: `video_encoder/decoder_tad.c` - Standalone decoder for TAD format - **C Decoders**:
- `video_encoder/decoder_tad.c` - Shared decoder library with `tad32_decode_chunk()` function
- `video_encoder/decoder_tad.h` - Exports shared decoder API
- `video_encoder/decoder_tav.c` - TAV decoder that uses shared TAD decoder for audio packets
- **Shared Architecture** (Fixed 2025-11-10): Both standalone TAD and TAV decoders now use the same `tad32_decode_chunk()` implementation, eliminating code duplication and ensuring identical output
- **Kotlin Decoder**: `AudioAdapter.kt` - Hardware-accelerated TAD decoder for TSVM runtime - **Kotlin Decoder**: `AudioAdapter.kt` - Hardware-accelerated TAD decoder for TSVM runtime
- **Quantisation Fix** (2025-11-10): Fixed BASE_QUANTISER_WEIGHTS to use channel-specific 2D array (Mid/Side) instead of single 1D array, resolving severe audio distortion
- **Features**: - **Features**:
- **32 KHz stereo**: TSVM audio hardware native format - **32 KHz stereo**: TSVM audio hardware native format
- **Variable chunk sizes**: Any size ≥1024 samples, including non-power-of-2 (e.g., 32016 for TAV 1-second GOPs) - **Variable chunk sizes**: Any size ≥1024 samples, including non-power-of-2 (e.g., 32016 for TAV 1-second GOPs)
- **Pre-emphasis filter**: First-order IIR filter (α=0.5) shifts quantisation noise to lower frequencies
- **Gamma compression**: Dynamic range compression (γ=0.5) before quantisation
- **M/S stereo decorrelation**: Exploits stereo correlation for better compression - **M/S stereo decorrelation**: Exploits stereo correlation for better compression
- **Gamma compression**: Dynamic range compression (γ=0.707) before quantization
- **9-level CDF 9/7 DWT**: Fixed 9 decomposition levels for all chunk sizes - **9-level CDF 9/7 DWT**: Fixed 9 decomposition levels for all chunk sizes
- **Perceptual quantization**: Frequency-dependent weights with lambda companding - **Perceptual quantisation**: Channel-specific (Mid/Side) frequency-dependent weights with lambda companding (λ=6.0)
- **Raw int8 storage**: Direct coefficient storage (no significance map, better Zstd compression) - **EZBC encoding**: Binary tree embedded zero block coding exploits coefficient sparsity (86.9% Mid, 97.8% Side)
- **Coefficient-domain dithering**: Light TPDF dithering to reduce banding - **Zstd compression**: Level 7 on concatenated EZBC bitstreams for additional compression
- **Zstd compression**: Level 7 for additional compression
- **Non-power-of-2 support**: Fixed 2025-10-30 to handle arbitrary chunk sizes correctly - **Non-power-of-2 support**: Fixed 2025-10-30 to handle arbitrary chunk sizes correctly
- **Usage Examples**: - **Usage Examples**:
```bash ```bash
@@ -351,26 +356,23 @@ Implemented on 2025-10-15 for improved temporal compression through group-of-pic
decoder_tad -i input.tad -o output.pcm decoder_tad -i input.tad -o output.pcm
``` ```
- **Format documentation**: `terranmon.txt` (search for "TSVM Advanced Audio (TAD) Format") - **Format documentation**: `terranmon.txt` (search for "TSVM Advanced Audio (TAD) Format")
- **Version**: 1.1 (raw int8 storage with non-power-of-2 support, updated 2025-10-30) - **Version**: 1.1 (EZBC encoding with non-power-of-2 support, updated 2025-10-30; decoder architecture and Kotlin quantisation weights fixed 2025-11-10; documentation updated 2025-11-10 to reflect pre-emphasis and EZBC)
**TAD Encoding Pipeline**:
1. **Pre-emphasis filter** (α=0.5) - Shifts quantisation noise toward lower frequencies
2. **Gamma compression** (γ=0.5) - Dynamic range compression
3. **M/S decorrelation** - Transforms L/R to Mid/Side
4. **9-level CDF 9/7 DWT** - Wavelet decomposition (fixed 9 levels)
5. **Perceptual quantisation** - Lambda companding (λ=6.0) with channel-specific weights
6. **EZBC encoding** - Binary tree embedded zero block coding per channel
7. **Zstd compression** (level 7) - Additional compression on concatenated EZBC bitstreams
**TAD Compression Performance**: **TAD Compression Performance**:
- **Target Compression**: 2:1 against PCMu8 baseline (4:1 against PCM16LE input) - **Target Compression**: 2:1 against PCMu8 baseline (4:1 against PCM16LE input)
- **Achieved Compression**: 2.51:1 against PCMu8 at quality level 3 - **Achieved Compression**: 2.51:1 against PCMu8 at quality level 3
- **Audio Quality**: Preserves full 0-16 KHz bandwidth - **Audio Quality**: Preserves full 0-16 KHz bandwidth
- **Coefficient Sparsity**: 86.9% zeros in Mid channel, 97.8% in Side channel (typical) - **Coefficient Sparsity**: 86.9% zeros in Mid channel, 97.8% in Side channel (typical)
- **EZBC Benefits**: Exploits sparsity, progressive refinement, spatial clustering
**TAD Encoding Pipeline**:
1. **FFmpeg Two-Pass Extraction**: High-quality SoXR resampling to 32 KHz with 16 Hz highpass filter
2. **Gamma Compression**: Dynamic range compression (γ=0.707) for perceptual uniformity
3. **M/S Stereo Decorrelation**: Transforms Left/Right to Mid/Side for better compression
4. **9-Level CDF 9/7 DWT**: biorthogonal wavelets with fixed 9 levels
- All chunk sizes use 9 levels (sufficient for ≥512 samples after 9 halvings)
- Supports non-power-of-2 sizes through proper length tracking
5. **Frequency-Dependent Quantization**: Perceptual weights with lambda companding
6. **Dead Zone Quantization**: Zeros high-frequency noise (highest band)
7. **Coefficient-Domain Dithering**: Light TPDF dithering (±0.5 quantization steps)
8. **Raw Int8 Storage**: Direct coefficient storage as signed int8 values
9. **Optional Zstd Compression**: Level 7 compression on concatenated Mid+Side data
**TAD Integration with TAV**: **TAD Integration with TAV**:
TAD is designed as an includable API for TAV video encoder integration. The variable chunk size TAD is designed as an includable API for TAV video encoder integration. The variable chunk size
@@ -396,3 +398,37 @@ for (i in 1..levels) {
``` ```
Using simple doubling (`length *= 2`) is incorrect for non-power-of-2 sizes and causes Using simple doubling (`length *= 2`) is incorrect for non-power-of-2 sizes and causes
mirrored subband artifacts. mirrored subband artifacts.
**TAD Decoding Pipeline**:
1. **Zstd decompression** - Decompress concatenated EZBC bitstreams
2. **EZBC decoding** - Binary tree decoder reconstructs quantised int8 coefficients per channel
3. **Lambda decompanding** - Inverse Laplacian CDF mapping with channel-specific weights
4. **9-level inverse CDF 9/7 DWT** - Wavelet reconstruction with proper non-power-of-2 length tracking
5. **M/S to L/R conversion** - Transform Mid/Side back to Left/Right
6. **Gamma expansion** (γ⁻¹=2.0) - Restore dynamic range
7. **De-emphasis filter** (α=0.5) - Reverse pre-emphasis, remove frequency shaping
8. **PCM32f to PCM8** - Noise-shaped dithering for final 8-bit output
**Critical Quantisation Weights Note (Fixed 2025-11-10)**:
The TAD decoder MUST use channel-specific quantisation weights for Mid (channel 0) and Side (channel 1) channels. The Kotlin decoder (AudioAdapter.kt) originally used a single 1D weight array, which caused severe audio distortion. The correct implementation uses a 2D array:
```kotlin
// CORRECT (Fixed 2025-11-10)
private val BASE_QUANTISER_WEIGHTS = arrayOf(
floatArrayOf( // Mid channel (0)
4.0f, 2.0f, 1.8f, 1.6f, 1.4f, 1.2f, 1.0f, 1.0f, 1.3f, 2.0f
),
floatArrayOf( // Side channel (1)
6.0f, 5.0f, 2.6f, 2.4f, 1.8f, 1.3f, 1.0f, 1.0f, 1.6f, 3.2f
)
)
// During dequantisation:
val weight = BASE_QUANTISER_WEIGHTS[channel][sideband] * quantiserScale
coeffs[i] = normalisedVal * TAD32_COEFF_SCALARS[sideband] * weight
```
The different weights for Mid and Side channels reflect the perceptual importance of different frequency bands in each channel. Using incorrect weights causes:
- DC frequency underamplification (using 1.0 instead of 4.0/6.0)
- Incorrect stereo imaging and extreme side channel distortion
- Severe frequency response errors that manifest as "clipping-like" distortion

View File

@@ -866,8 +866,8 @@ When KSF is interleaved with MP2 audio, the payload must be inserted in-between
0x30 = reveal text normally (arguments: UTF-8 text. The reveal text must contain spaces when required) 0x30 = reveal text normally (arguments: UTF-8 text. The reveal text must contain spaces when required)
0x31 = reveal text slowly (arguments: UTF-8 text. The effect is implementation-dependent) 0x31 = reveal text slowly (arguments: UTF-8 text. The effect is implementation-dependent)
0x40 = reveal text normally with emphasize (arguments: UTF-8 text. On TEV/TAV player, the text will be white; otherwise, implementation-dependent) 0x40 = reveal text normally with emphasise (arguments: UTF-8 text. On TEV/TAV player, the text will be white; otherwise, implementation-dependent)
0x41 = reveal text slowly with emphasize (arguments: UTF-8 text) 0x41 = reveal text slowly with emphasise (arguments: UTF-8 text)
0x50 = reveal text normally with target colour (arguments: uint8 target colour; UTF-8 text) 0x50 = reveal text normally with target colour (arguments: uint8 target colour; UTF-8 text)
0x51 = reveal text slowly with target colour (arguments: uint8 target colour; UTF-8 text) 0x51 = reveal text slowly with target colour (arguments: uint8 target colour; UTF-8 text)
@@ -887,7 +887,7 @@ When KSF is interleaved with MP2 audio, the payload must be inserted in-between
TSVM Advanced Video (TAV) Format TSVM Advanced Video (TAV) Format
Created by CuriousTorvald and Claude on 2025-09-13 Created by CuriousTorvald and Claude on 2025-09-13
TAV is a next-generation video codec for TSVM utilizing Discrete Wavelet Transform (DWT) TAV is a next-generation video codec for TSVM utilising Discrete Wavelet Transform (DWT)
similar to JPEG2000, providing superior compression efficiency and scalability compared similar to JPEG2000, providing superior compression efficiency and scalability compared
to DCT-based codecs like TEV. Features include multi-resolution encoding, progressive to DCT-based codecs like TEV. Features include multi-resolution encoding, progressive
transmission capability, and region-of-interest coding. transmission capability, and region-of-interest coding.
@@ -1134,7 +1134,7 @@ resulting in superior compression compared to per-frame encoding.
2. Determine GOP slicing from the scene detection 2. Determine GOP slicing from the scene detection
3. Apply 1D DWT across temporal axis (GOP frames) 3. Apply 1D DWT across temporal axis (GOP frames)
4. Apply 2D DWT on each spatial slice of temporal subbands 4. Apply 2D DWT on each spatial slice of temporal subbands
5. Perceptual quantization with temporal-spatial awareness 5. Perceptual quantisation with temporal-spatial awareness
6. Unified significance map preprocessing across all frames/channels 6. Unified significance map preprocessing across all frames/channels
7. Single Zstd compression of entire GOP block 7. Single Zstd compression of entire GOP block
@@ -1246,7 +1246,7 @@ The encoder expects linear alpha.
## Compression Features ## Compression Features
- Single DWT tiles vs 16x16 DCT blocks in TEV - Single DWT tiles vs 16x16 DCT blocks in TEV
- Multi-resolution representation enables scalable decoding - Multi-resolution representation enables scalable decoding
- Better frequency localization than DCT - Better frequency localisation than DCT
- Reduced blocking artifacts due to overlapping basis functions - Reduced blocking artifacts due to overlapping basis functions
## Hardware Acceleration Functions ## Hardware Acceleration Functions
@@ -1533,9 +1533,9 @@ TSVM Advanced Audio (TAD) Format
Created by CuriousTorvald and Claude on 2025-10-23 Created by CuriousTorvald and Claude on 2025-10-23
Updated: 2025-10-30 (fixed non-power-of-2 sample count support) Updated: 2025-10-30 (fixed non-power-of-2 sample count support)
TAD is a perceptual audio codec for TSVM utilizing Discrete Wavelet Transform (DWT) TAD is a perceptual audio codec for TSVM utilising Discrete Wavelet Transform (DWT)
with CDF 9/7 biorthogonal wavelets, providing efficient compression through M/S stereo with CDF 9/7 biorthogonal wavelets, providing efficient compression through M/S stereo
decorrelation, frequency-dependent quantization, and raw int8 coefficient storage. decorrelation, frequency-dependent quantisation, and raw int8 coefficient storage.
Designed as an includable API for integration with TAV video encoder. Designed as an includable API for integration with TAV video encoder.
When used inside of a video codec, only zstd-compressed payload is stored, chunk length When used inside of a video codec, only zstd-compressed payload is stored, chunk length
@@ -1584,20 +1584,34 @@ TAV integration uses exact GOP sample counts (e.g., 32016 samples for 1 second a
uint32 Chunk Payload Size: size of following payload in bytes uint32 Chunk Payload Size: size of following payload in bytes
* Chunk Payload: encoded M/S stereo data (Zstd compressed if flag set) * Chunk Payload: encoded M/S stereo data (Zstd compressed if flag set)
### Chunk Payload Structure (before optional Zstd compression) ### Chunk Payload Structure (before Zstd compression)
* Mid Channel Encoded Data (raw int8 values) * Mid Channel EZBC Data (embedded zero block coded bitstream)
* Side Channel Encoded Data (raw int8 values) * Side Channel EZBC Data (embedded zero block coded bitstream)
Each EZBC channel structure:
uint8 MSB Bitplane: highest bitplane with significant coefficient
uint16 Coefficient Count: number of coefficients in this channel
* Binary Tree EZBC Bitstream: significance map + refinement bits
## Encoding Pipeline ## Encoding Pipeline
### Step 1: Dynamic Range Compression (Gamma Compression) ### Step 1: Pre-emphasis Filter
Input stereo PCM32fLE undergoes gamma compression for perceptual uniformity: Input stereo PCM32fLE undergoes first-order IIR pre-emphasis filtering (α=0.5):
encode(x) = sign(x) * |x|^γ where γ=0.707 (1/√2) H(z) = 1 - α·z⁻¹
This compresses dynamic range before quantization, improving perceptual quality. This shifts quantisation noise toward lower frequencies where it's more maskable by
the psychoacoustic model. The filter has persistent state across chunks to prevent
discontinuities at chunk boundaries.
### Step 2: M/S Stereo Decorrelation ### Step 2: Dynamic Range Compression (Gamma Compression)
Pre-emphasised audio undergoes gamma compression for perceptual uniformity:
encode(x) = sign(x) * |x|^γ where γ=0.5
This compresses dynamic range before quantisation, improving perceptual quality.
### Step 3: M/S Stereo Decorrelation
Mid-Side transformation exploits stereo correlation: Mid-Side transformation exploits stereo correlation:
Mid = (Left + Right) / 2 Mid = (Left + Right) / 2
@@ -1606,7 +1620,7 @@ Mid-Side transformation exploits stereo correlation:
This typically concentrates energy in the Mid channel while the Side channel This typically concentrates energy in the Mid channel while the Side channel
contains mostly small values, improving compression efficiency. contains mostly small values, improving compression efficiency.
### Step 3: 9-Level CDF 9/7 DWT ### Step 4: 9-Level CDF 9/7 DWT
Each channel (Mid and Side) undergoes CDF 9/7 biorthogonal wavelet decomposition. The codec uses a fixed 9 decomposition levels for all chunk sizes: Each channel (Mid and Side) undergoes CDF 9/7 biorthogonal wavelet decomposition. The codec uses a fixed 9 decomposition levels for all chunk sizes:
DWT Levels = 9 (fixed) DWT Levels = 9 (fixed)
@@ -1632,32 +1646,53 @@ CDF 9/7 lifting coefficients:
δ = 0.443506852 δ = 0.443506852
K = 1.230174105 K = 1.230174105
### Step 4: Frequency-Dependent Quantization ### Step 5: Frequency-Dependent Quantisation with Lambda Companding
DWT coefficients are quantized using perceptually-tuned frequency-dependent weights. DWT coefficients are quantized using:
1. **Lambda companding**: Maps normalised coefficients through Laplacian CDF with λ=6.0
2. **Perceptually-tuned weights**: Channel-specific (Mid/Side) frequency-dependent scaling
3. **Final quantisation**: base_weight[channel][subband] * quality_scale
Final quantization step: base_weight * quality_scale The lambda companding provides perceptually uniform quantisation, allocating more bits
to perceptually important coefficient magnitudes.
#### Dead Zone Quantization Channel-specific base quantisation weights:
High-frequency coefficients (Level 0: 8-16 KHz) use dead zone quantization Mid (0): [4.0, 2.0, 1.8, 1.6, 1.4, 1.2, 1.0, 1.0, 1.3, 2.0]
where coefficients smaller than half the quantization step are zeroed: Side (1): [6.0, 5.0, 2.6, 2.4, 1.8, 1.3, 1.0, 1.0, 1.6, 3.2]
if (abs(coefficient) < quantization_step / 2) Output: Quantized int8 coefficients in range [-max_index, +max_index]
coefficient = 0
This aggressively removes high-frequency noise while preserving important ### Step 6: EZBC Encoding (Embedded Zero Block Coding)
mid-frequency content (2-4 KHz critical for speech intelligibility). Quantized int8 coefficients are compressed using binary tree EZBC, a 1D variant of
the embedded zero-block coding.
### Step 5: Raw Int8 Coefficient Storage **EZBC Algorithm**:
Quantized coefficients are stored directly as signed int8 values (no significance map, better Zstd compression). 1. Find MSB bitplane (highest bit position with significant coefficient)
Concatenated format: [Mid_channel_data][Side_channel_data] 2. Initialise root block covering all coefficients as insignificant
3. For each bitplane from MSB to LSB:
- **Insignificant Pass**: Test each insignificant block for significance
- If still zero at this bitplane: emit 0 bit, keep in insignificant queue
- If becomes significant: emit 1 bit, recursively subdivide using binary tree
- **Refinement Pass**: For already-significant coefficients, emit next bit
4. Binary tree subdivision continues until blocks of size 1 (single coefficients)
5. When coefficient becomes significant: emit sign bit and reconstruct value
### Step 6: Coefficient-Domain Dithering (Encoder) **EZBC Output Structure** (per channel):
Light triangular dithering (±0.5 quantization steps) added to coefficients before uint8 MSB Bitplane (8 bits)
quantization to reduce banding artifacts. uint16 Coefficient Count (16 bits)
* Bitstream: [significance_bits][sign_bits][refinement_bits]
### Step 7: Zstd Compression **Compression Benefits**:
The concatenated Mid+Side encoded data is compressed - Exploits coefficient sparsity through significance testing
using Zstd level 7 for additional compression without significant CPU overhead. - Progressive refinement enables quality scalability
- Binary tree exploits spatial clustering of significant coefficients
- Typical sparsity: 86.9% zeros (Mid), 97.8% zeros (Side)
### Step 7: Concatenation and Zstd Compression
The Mid and Side EZBC bitstreams are concatenated:
Payload = [Mid_EZBC_data][Side_EZBC_data]
Then compressed using Zstd level 7 for additional compression without significant
CPU overhead. Zstd exploits redundancy in the concatenated bitstreams.
## Decoding Pipeline ## Decoding Pipeline
@@ -1665,16 +1700,25 @@ using Zstd level 7 for additional compression without significant CPU overhead.
Read chunk header (sample_count, max_index, payload_size). Read chunk header (sample_count, max_index, payload_size).
If compressed (default), decompress payload using Zstd. If compressed (default), decompress payload using Zstd.
### Step 2: Coefficient Extraction ### Step 2: EZBC Decoding
Extract Mid and Side channel int8 data from concatenated payload: Decode Mid and Side channels from concatenated EZBC bitstreams using binary tree
- Mid channel: bytes [0..sample_count-1] embedded zero block decoder:
- Side channel: bytes [sample_count..2*sample_count-1]
### Step 3: Dequantization with Lambda Decompanding For each channel:
1. Read EZBC header: MSB bitplane (8 bits), coefficient count (16 bits)
2. Initialise root block as insignificant, track coefficient states
3. Process bitplanes from MSB to LSB:
- **Insignificant Pass**: Read significance bits, recursively decode significant blocks
- **Refinement Pass**: Read refinement bits for already-significant coefficients
4. Reconstruct quantized int8 coefficients from bitplane representation
Output: Quantized int8 coefficients for Mid and Side channels
### Step 3: Dequantisation with Lambda Decompanding
Convert quantized int8 values back to float coefficients using: Convert quantized int8 values back to float coefficients using:
1. Lambda decompanding (inverse of Laplacian CDF compression) 1. Lambda decompanding (inverse of Laplacian CDF compression)
2. Multiply by frequency-dependent quantization steps 2. Multiply by frequency-dependent quantisation steps
3. Apply coefficient-domain dithering (TPDF, ~-60 dBFS) 3. [Optional] Apply coefficient-domain dithering (TPDF, ~-60 dBFS)
### Step 4: 9-Level Inverse CDF 9/7 DWT ### Step 4: 9-Level Inverse CDF 9/7 DWT
Reconstruct Float32 audio from DWT coefficients using inverse CDF 9/7 transform. Reconstruct Float32 audio from DWT coefficients using inverse CDF 9/7 transform.
@@ -1704,9 +1748,18 @@ Convert Mid/Side back to Left/Right stereo:
### Step 6: Gamma Expansion ### Step 6: Gamma Expansion
Expand dynamic range (inverse of encoder's gamma compression): Expand dynamic range (inverse of encoder's gamma compression):
decode(y) = sign(y) * |y|^(1/γ) where γ=0.707, so 1/γ=√2≈1.414 decode(y) = sign(y) * |y|^(1/γ) where γ=0.5, so 1/γ=2.0
### Step 7: PCM32f to PCM8 Conversion with Noise-Shaped Dithering ### Step 7: De-emphasis Filter
Apply de-emphasis filter to reverse the pre-emphasis (α=0.5):
H(z) = 1 / (1 - α·z⁻¹)
This is a first-order IIR filter with persistent state across chunks to prevent
discontinuities at chunk boundaries. The de-emphasis must be applied AFTER gamma
expansion but BEFORE PCM8 conversion to correctly reconstruct the original audio.
### Step 8: PCM32f to PCM8 Conversion with Noise-Shaped Dithering
Convert Float32 samples to unsigned PCM8 (PCMu8) using second-order error-diffusion Convert Float32 samples to unsigned PCM8 (PCMu8) using second-order error-diffusion
dithering with reduced amplitude (0.2× TPDF) to coordinate with coefficient-domain dithering with reduced amplitude (0.2× TPDF) to coordinate with coefficient-domain
dithering. dithering.

View File

@@ -419,7 +419,7 @@ class AudioAdapter(val vm: VM) : PeriBase(VM.PERITYPE_SOUND) {
} }
// Lambda-based decompanding decoder (inverse of Laplacian CDF-based encoder) // Lambda-based decompanding decoder (inverse of Laplacian CDF-based encoder)
// Converts quantized index back to normalized float in [-1, 1] // Converts quantised index back to normalised float in [-1, 1]
private fun lambdaDecompanding(quantVal: Byte, maxIndex: Int): Float { private fun lambdaDecompanding(quantVal: Byte, maxIndex: Int): Float {
// Handle zero // Handle zero
if (quantVal == 0.toByte()) { if (quantVal == 0.toByte()) {
@@ -432,11 +432,11 @@ class AudioAdapter(val vm: VM) : PeriBase(VM.PERITYPE_SOUND) {
// Clamp to valid range // Clamp to valid range
if (absIndex > maxIndex) absIndex = maxIndex if (absIndex > maxIndex) absIndex = maxIndex
// Map index back to normalized CDF [0, 1] // Map index back to normalised CDF [0, 1]
val normalizedCdf = absIndex.toFloat() / maxIndex val normalisedCdf = absIndex.toFloat() / maxIndex
// Map from [0, 1] back to [0.5, 1.0] (CDF range for positive half) // Map from [0, 1] back to [0.5, 1.0] (CDF range for positive half)
val cdf = 0.5f + normalizedCdf * 0.5f val cdf = 0.5f + normalisedCdf * 0.5f
// Inverse Laplacian CDF for x >= 0: x = -(1/λ) * ln(2*(1-F)) // Inverse Laplacian CDF for x >= 0: x = -(1/λ) * ln(2*(1-F))
// For F in [0.5, 1.0]: x = -(1/λ) * ln(2*(1-F)) // For F in [0.5, 1.0]: x = -(1/λ) * ln(2*(1-F))
@@ -698,13 +698,13 @@ class AudioAdapter(val vm: VM) : PeriBase(VM.PERITYPE_SOUND) {
val msbBitplane = bs.readBits(8) val msbBitplane = bs.readBits(8)
val count = bs.readBits(16) val count = bs.readBits(16)
// Initialize coefficient array to zero // Initialise coefficient array to zero
coeffs.fill(0) coeffs.fill(0)
// Track coefficient significance // Track coefficient significance
val states = Array(count) { TadCoeffState() } val states = Array(count) { TadCoeffState() }
// Initialize queues // Initialise queues
val insignificantQueue = TadBlockQueue() val insignificantQueue = TadBlockQueue()
val nextInsignificant = TadBlockQueue() val nextInsignificant = TadBlockQueue()
val significantQueue = TadBlockQueue() val significantQueue = TadBlockQueue()
@@ -822,11 +822,11 @@ class AudioAdapter(val vm: VM) : PeriBase(VM.PERITYPE_SOUND) {
// Calculate DWT levels from sample count // Calculate DWT levels from sample count
val dwtLevels = calculateDwtLevels(sampleCount) val dwtLevels = calculateDwtLevels(sampleCount)
// Dequantize to Float32 // Dequantise to Float32
val dwtMid = FloatArray(sampleCount) val dwtMid = FloatArray(sampleCount)
val dwtSide = FloatArray(sampleCount) val dwtSide = FloatArray(sampleCount)
dequantizeDwtCoefficients(0, quantMid, dwtMid, sampleCount, maxIndex, dwtLevels) dequantiseDwtCoefficients(0, quantMid, dwtMid, sampleCount, maxIndex, dwtLevels)
dequantizeDwtCoefficients(1, quantSide, dwtSide, sampleCount, maxIndex, dwtLevels) dequantiseDwtCoefficients(1, quantSide, dwtSide, sampleCount, maxIndex, dwtLevels)
// Inverse DWT using CDF 9/7 wavelet (produces Float32 samples in range [-1.0, 1.0]) // Inverse DWT using CDF 9/7 wavelet (produces Float32 samples in range [-1.0, 1.0])
dwt97InverseMultilevel(dwtMid, sampleCount, dwtLevels) dwt97InverseMultilevel(dwtMid, sampleCount, dwtLevels)
@@ -891,20 +891,20 @@ class AudioAdapter(val vm: VM) : PeriBase(VM.PERITYPE_SOUND) {
} }
// Simplified spectral reconstruction for wavelet coefficients // Simplified spectral reconstruction for wavelet coefficients
// Conservative approach: only add light dither to reduce quantization grain // Conservative approach: only add light dither to reduce quantisation grain
private fun spectralInterpolateBand(c: FloatArray, start: Int, len: Int, Q: Float, lowerBandRms: Float) { private fun spectralInterpolateBand(c: FloatArray, start: Int, len: Int, Q: Float, lowerBandRms: Float) {
if (len < 4) return if (len < 4) return
xorshift32State = 0x9E3779B9u xor len.toUInt() xor (Q * 65536.0f).toUInt() xorshift32State = 0x9E3779B9u xor len.toUInt() xor (Q * 65536.0f).toUInt()
val ditherAmp = 0.05f * Q // Very light dither (~-60 dBFS) val ditherAmp = 0.05f * Q // Very light dither (~-60 dBFS)
// Just add ultra-light TPDF dither to reduce quantization grain // Just add ultra-light TPDF dither to reduce quantisation grain
for (i in 0 until len) { for (i in 0 until len) {
c[start + i] += tpdf() * ditherAmp c[start + i] += tpdf() * ditherAmp
} }
} }
private fun dequantizeDwtCoefficients(channel: Int, quantized: ByteArray, coeffs: FloatArray, count: Int, private fun dequantiseDwtCoefficients(channel: Int, quantised: ByteArray, coeffs: FloatArray, count: Int,
maxIndex: Int, dwtLevels: Int) { maxIndex: Int, dwtLevels: Int) {
// Calculate sideband boundaries dynamically // Calculate sideband boundaries dynamically
val firstBandSize = count shr dwtLevels val firstBandSize = count shr dwtLevels
@@ -915,7 +915,7 @@ class AudioAdapter(val vm: VM) : PeriBase(VM.PERITYPE_SOUND) {
sidebandStarts[i] = sidebandStarts[i - 1] + (firstBandSize shl (i - 2)) sidebandStarts[i] = sidebandStarts[i - 1] + (firstBandSize shl (i - 2))
} }
// Dequantize all coefficients with stochastic reconstruction for deadzoned values // Dequantise all coefficients with stochastic reconstruction for deadzoned values
val quantiserScale = 1.0f val quantiserScale = 1.0f
for (i in 0 until count) { for (i in 0 until count) {
var sideband = dwtLevels var sideband = dwtLevels
@@ -927,7 +927,7 @@ class AudioAdapter(val vm: VM) : PeriBase(VM.PERITYPE_SOUND) {
} }
// Check for deadzone marker // Check for deadzone marker
/*if (quantized[i] == DEADZONE_MARKER_QUANT) { /*if (quantised[i] == DEADZONE_MARKER_QUANT) {
// Stochastic reconstruction: generate Laplacian noise in deadband range // Stochastic reconstruction: generate Laplacian noise in deadband range
val deadbandThreshold = DEADBANDS[channel][sideband] val deadbandThreshold = DEADBANDS[channel][sideband]
@@ -942,13 +942,13 @@ class AudioAdapter(val vm: VM) : PeriBase(VM.PERITYPE_SOUND) {
// Apply scalar (but not quantiser weight - noise is already in correct range) // Apply scalar (but not quantiser weight - noise is already in correct range)
coeffs[i] = noise * TAD32_COEFF_SCALARS[sideband] coeffs[i] = noise * TAD32_COEFF_SCALARS[sideband]
} else {*/ } else {*/
// Normal dequantization using lambda decompanding // Normal dequantisation using lambda decompanding
val normalizedVal = lambdaDecompanding(quantized[i], maxIndex) val normalisedVal = lambdaDecompanding(quantised[i], maxIndex)
// Denormalize using the subband scalar and apply base weight + quantiser scaling // Denormalise using the subband scalar and apply base weight + quantiser scaling
// CRITICAL: Use channel-specific weights (Mid=0, Side=1) // CRITICAL: Use channel-specific weights (Mid=0, Side=1)
val weight = BASE_QUANTISER_WEIGHTS[channel][sideband] * quantiserScale val weight = BASE_QUANTISER_WEIGHTS[channel][sideband] * quantiserScale
coeffs[i] = normalizedVal * TAD32_COEFF_SCALARS[sideband] * weight coeffs[i] = normalisedVal * TAD32_COEFF_SCALARS[sideband] * weight
// } // }
} }

View File

@@ -82,7 +82,7 @@ static void write_tav_header_only(FILE *out) {
// Channel layout: 0 (Y-Co-Cg) // Channel layout: 0 (Y-Co-Cg)
header[26] = 0; header[26] = 0;
// Reserved[4]: zeros (27-30 already initialized to 0) // Reserved[4]: zeros (27-30 already initialised to 0)
// File Role: 1 (header-only, UCF payload follows) // File Role: 1 (header-only, UCF payload follows)
header[31] = 1; header[31] = 1;

View File

@@ -20,7 +20,7 @@
static const float TAD32_COEFF_SCALARS[] = {64.0f, 45.255f, 32.0f, 22.627f, 16.0f, 11.314f, 8.0f, 5.657f, 4.0f, 2.828f}; static const float TAD32_COEFF_SCALARS[] = {64.0f, 45.255f, 32.0f, 22.627f, 16.0f, 11.314f, 8.0f, 5.657f, 4.0f, 2.828f};
// Base quantiser weight table (10 subbands: LL + 9 H bands) // Base quantiser weight table (10 subbands: LL + 9 H bands)
// These weights are multiplied by quantiser_scale during quantization // These weights are multiplied by quantiser_scale during quantisation
static const float BASE_QUANTISER_WEIGHTS[2][10] = { static const float BASE_QUANTISER_WEIGHTS[2][10] = {
{ // mid channel { // mid channel
4.0f, // LL (L9) DC 4.0f, // LL (L9) DC
@@ -47,7 +47,7 @@ static const float BASE_QUANTISER_WEIGHTS[2][10] = {
3.2f // H (L1) 8 khz 3.2f // H (L1) 8 khz
}}; }};
#define TAD_DEFAULT_CHUNK_SIZE 31991 #define TAD_DEFAULT_CHUNK_SIZE 32768
#define TAD_MIN_CHUNK_SIZE 1024 #define TAD_MIN_CHUNK_SIZE 1024
#define TAD_SAMPLE_RATE 32000 #define TAD_SAMPLE_RATE 32000
#define TAD_CHANNELS 2 #define TAD_CHANNELS 2
@@ -105,7 +105,7 @@ static void spectral_interpolate_band(float *c, size_t len, float Q, float lower
uint32_t seed = 0x9E3779B9u ^ (uint32_t)len ^ (uint32_t)(Q * 65536.0f); uint32_t seed = 0x9E3779B9u ^ (uint32_t)len ^ (uint32_t)(Q * 65536.0f);
const float dither_amp = 0.02f * Q; // Very light dither const float dither_amp = 0.02f * Q; // Very light dither
// Just add ultra-light TPDF dither to reduce quantization grain // Just add ultra-light TPDF dither to reduce quantisation grain
// No aggressive hole filling or AR prediction that might create artifacts // No aggressive hole filling or AR prediction that might create artifacts
for (size_t i = 0; i < len; i++) { for (size_t i = 0; i < len; i++) {
c[i] += tpdf(&seed) * dither_amp; c[i] += tpdf(&seed) * dither_amp;
@@ -539,14 +539,14 @@ static void pcm32f_to_pcm8(const float *fleft, const float *fright, uint8_t *lef
} }
//============================================================================= //=============================================================================
// Dequantization (inverse of quantization) // Dequantisation (inverse of quantisation)
//============================================================================= //=============================================================================
#define LAMBDA_FIXED 6.0f #define LAMBDA_FIXED 6.0f
// Lambda-based decompanding decoder (inverse of Laplacian CDF-based encoder) // Lambda-based decompanding decoder (inverse of Laplacian CDF-based encoder)
// Converts quantized index back to normalized float in [-1, 1] // Converts quantised index back to normalised float in [-1, 1]
static float lambda_decompanding(int8_t quant_val, int max_index) { static float lambda_decompanding(int8_t quant_val, int max_index) {
// Handle zero // Handle zero
if (quant_val == 0) { if (quant_val == 0) {
@@ -559,11 +559,11 @@ static float lambda_decompanding(int8_t quant_val, int max_index) {
// Clamp to valid range // Clamp to valid range
if (abs_index > max_index) abs_index = max_index; if (abs_index > max_index) abs_index = max_index;
// Map index back to normalized CDF [0, 1] // Map index back to normalised CDF [0, 1]
float normalized_cdf = (float)abs_index / max_index; float normalised_cdf = (float)abs_index / max_index;
// Map from [0, 1] back to [0.5, 1.0] (CDF range for positive half) // Map from [0, 1] back to [0.5, 1.0] (CDF range for positive half)
float cdf = 0.5f + normalized_cdf * 0.5f; float cdf = 0.5f + normalised_cdf * 0.5f;
// Inverse Laplacian CDF for x >= 0: x = -(1/λ) * ln(2*(1-F)) // Inverse Laplacian CDF for x >= 0: x = -(1/λ) * ln(2*(1-F))
// For F in [0.5, 1.0]: x = -(1/λ) * ln(2*(1-F)) // For F in [0.5, 1.0]: x = -(1/λ) * ln(2*(1-F))
@@ -576,7 +576,7 @@ static float lambda_decompanding(int8_t quant_val, int max_index) {
return sign * abs_val; return sign * abs_val;
} }
static void dequantize_dwt_coefficients(int channel, const int8_t *quantized, float *coeffs, size_t count, int chunk_size, int dwt_levels, int max_index, float quantiser_scale) { static void dequantise_dwt_coefficients(int channel, const int8_t *quantised, float *coeffs, size_t count, int chunk_size, int dwt_levels, int max_index, float quantiser_scale) {
// Calculate sideband boundaries dynamically // Calculate sideband boundaries dynamically
int first_band_size = chunk_size >> dwt_levels; int first_band_size = chunk_size >> dwt_levels;
@@ -588,7 +588,7 @@ static void dequantize_dwt_coefficients(int channel, const int8_t *quantized, fl
sideband_starts[i] = sideband_starts[i-1] + (first_band_size << (i-2)); sideband_starts[i] = sideband_starts[i-1] + (first_band_size << (i-2));
} }
// Dequantize all coefficients with stochastic reconstruction for deadzoned values // Dequantise all coefficients with stochastic reconstruction for deadzoned values
for (size_t i = 0; i < count; i++) { for (size_t i = 0; i < count; i++) {
int sideband = dwt_levels; int sideband = dwt_levels;
for (int s = 0; s <= dwt_levels; s++) { for (int s = 0; s <= dwt_levels; s++) {
@@ -599,7 +599,7 @@ static void dequantize_dwt_coefficients(int channel, const int8_t *quantized, fl
} }
// Check for deadzone marker // Check for deadzone marker
/*if (quantized[i] == (int8_t)0) {//DEADZONE_MARKER_QUANT) { /*if (quantised[i] == (int8_t)0) {//DEADZONE_MARKER_QUANT) {
// Stochastic reconstruction: generate Laplacian noise in deadband range // Stochastic reconstruction: generate Laplacian noise in deadband range
float deadband_threshold = DEADBANDS[channel][sideband]; float deadband_threshold = DEADBANDS[channel][sideband];
@@ -614,12 +614,12 @@ static void dequantize_dwt_coefficients(int channel, const int8_t *quantized, fl
// Apply scalar (but not quantiser weight - noise is already in correct range) // Apply scalar (but not quantiser weight - noise is already in correct range)
coeffs[i] = noise * TAD32_COEFF_SCALARS[sideband]; coeffs[i] = noise * TAD32_COEFF_SCALARS[sideband];
} else {*/ } else {*/
// Normal dequantization using lambda decompanding // Normal dequantisation using lambda decompanding
float normalized_val = lambda_decompanding(quantized[i], max_index); float normalised_val = lambda_decompanding(quantised[i], max_index);
// Denormalize using the subband scalar and apply base weight + quantiser scaling // Denormalise using the subband scalar and apply base weight + quantiser scaling
float weight = BASE_QUANTISER_WEIGHTS[channel][sideband] * quantiser_scale; float weight = BASE_QUANTISER_WEIGHTS[channel][sideband] * quantiser_scale;
coeffs[i] = normalized_val * TAD32_COEFF_SCALARS[sideband] * weight; coeffs[i] = normalised_val * TAD32_COEFF_SCALARS[sideband] * weight;
// } // }
} }
@@ -777,13 +777,13 @@ static int tad_decode_channel_ezbc(const uint8_t *input, size_t input_size, int8
int msb_bitplane = tad_bitstream_read_bits(&bs, 8); int msb_bitplane = tad_bitstream_read_bits(&bs, 8);
uint32_t count = tad_bitstream_read_bits(&bs, 16); uint32_t count = tad_bitstream_read_bits(&bs, 16);
// Initialize coefficient array to zero // Initialise coefficient array to zero
memset(coeffs, 0, count * sizeof(int8_t)); memset(coeffs, 0, count * sizeof(int8_t));
// Track coefficient significance // Track coefficient significance
tad_decode_state_t *states = calloc(count, sizeof(tad_decode_state_t)); tad_decode_state_t *states = calloc(count, sizeof(tad_decode_state_t));
// Initialize queues // Initialise queues
tad_decode_queue_t insignificant_queue, next_insignificant; tad_decode_queue_t insignificant_queue, next_insignificant;
tad_decode_queue_t significant_queue, next_significant; tad_decode_queue_t significant_queue, next_significant;
@@ -890,7 +890,7 @@ int tad32_decode_chunk(const uint8_t *input, size_t input_size, uint8_t *pcmu8_s
return -1; return -1;
} }
// Decompress if needed // Decompress Zstd
const uint8_t *payload; const uint8_t *payload;
uint8_t *decompressed = NULL; uint8_t *decompressed = NULL;
@@ -946,11 +946,11 @@ int tad32_decode_chunk(const uint8_t *input, size_t input_size, uint8_t *pcmu8_s
return -1; return -1;
} }
// Dequantize with quantiser scaling and spectral interpolation // Dequantise with quantiser scaling and spectral interpolation
// Use quantiser_scale = 1.0f for baseline (must match encoder) // Use quantiser_scale = 1.0f for baseline (must match encoder)
float quantiser_scale = 1.0f; float quantiser_scale = 1.0f;
dequantize_dwt_coefficients(0, quant_mid, dwt_mid, sample_count, sample_count, dwt_levels, max_index, quantiser_scale); dequantise_dwt_coefficients(0, quant_mid, dwt_mid, sample_count, sample_count, dwt_levels, max_index, quantiser_scale);
dequantize_dwt_coefficients(1, quant_side, dwt_side, sample_count, sample_count, dwt_levels, max_index, quantiser_scale); dequantise_dwt_coefficients(1, quant_side, dwt_side, sample_count, sample_count, dwt_levels, max_index, quantiser_scale);
// Inverse DWT // Inverse DWT
dwt_inverse_multilevel(dwt_mid, sample_count, dwt_levels); dwt_inverse_multilevel(dwt_mid, sample_count, dwt_levels);

View File

@@ -11,7 +11,7 @@
// Constants (must match encoder) // Constants (must match encoder)
#define TAD32_SAMPLE_RATE 32000 #define TAD32_SAMPLE_RATE 32000
#define TAD32_CHANNELS 2 // Stereo #define TAD32_CHANNELS 2 // Stereo
#define TAD_DEFAULT_CHUNK_SIZE 31991 // Default chunk size for standalone TAD files #define TAD_DEFAULT_CHUNK_SIZE 32768 // Default chunk size for standalone TAD files
/** /**
* Decode audio chunk with TAD32 codec * Decode audio chunk with TAD32 codec
@@ -25,7 +25,7 @@
* *
* Input format: * Input format:
* uint16 sample_count (samples per channel) * uint16 sample_count (samples per channel)
* uint8 max_index (maximum quantization index) * uint8 max_index (maximum quantisation index)
* uint32 payload_size (bytes in payload) * uint32 payload_size (bytes in payload)
* * payload (encoded M/S data, Zstd-compressed with EZBC) * * payload (encoded M/S data, Zstd-compressed with EZBC)
* *

View File

@@ -97,12 +97,12 @@ typedef struct {
} __attribute__((packed)) tav_header_t; } __attribute__((packed)) tav_header_t;
//============================================================================= //=============================================================================
// Quantization Lookup Table (matches TSVM exactly) // Quantisation Lookup Table (matches TSVM exactly)
//============================================================================= //=============================================================================
static const int QLUT[] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126,128,132,136,140,144,148,152,156,160,164,168,172,176,180,184,188,192,196,200,204,208,212,216,220,224,228,232,236,240,244,248,252,256,264,272,280,288,296,304,312,320,328,336,344,352,360,368,376,384,392,400,408,416,424,432,440,448,456,464,472,480,488,496,504,512,528,544,560,576,592,608,624,640,656,672,688,704,720,736,752,768,784,800,816,832,848,864,880,896,912,928,944,960,976,992,1008,1024,1056,1088,1120,1152,1184,1216,1248,1280,1312,1344,1376,1408,1440,1472,1504,1536,1568,1600,1632,1664,1696,1728,1760,1792,1824,1856,1888,1920,1952,1984,2016,2048,2112,2176,2240,2304,2368,2432,2496,2560,2624,2688,2752,2816,2880,2944,3008,3072,3136,3200,3264,3328,3392,3456,3520,3584,3648,3712,3776,3840,3904,3968,4032,4096}; static const int QLUT[] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126,128,132,136,140,144,148,152,156,160,164,168,172,176,180,184,188,192,196,200,204,208,212,216,220,224,228,232,236,240,244,248,252,256,264,272,280,288,296,304,312,320,328,336,344,352,360,368,376,384,392,400,408,416,424,432,440,448,456,464,472,480,488,496,504,512,528,544,560,576,592,608,624,640,656,672,688,704,720,736,752,768,784,800,816,832,848,864,880,896,912,928,944,960,976,992,1008,1024,1056,1088,1120,1152,1184,1216,1248,1280,1312,1344,1376,1408,1440,1472,1504,1536,1568,1600,1632,1664,1696,1728,1760,1792,1824,1856,1888,1920,1952,1984,2016,2048,2112,2176,2240,2304,2368,2432,2496,2560,2624,2688,2752,2816,2880,2944,3008,3072,3136,3200,3264,3328,3392,3456,3520,3584,3648,3712,3776,3840,3904,3968,4032,4096};
// Perceptual quantization constants (match TSVM) // Perceptual quantisation constants (match TSVM)
static const float ANISOTROPY_MULT[] = {2.0f, 1.8f, 1.6f, 1.4f, 1.2f, 1.0f}; static const float ANISOTROPY_MULT[] = {2.0f, 1.8f, 1.6f, 1.4f, 1.2f, 1.0f};
static const float ANISOTROPY_BIAS[] = {0.4f, 0.2f, 0.1f, 0.0f, 0.0f, 0.0f}; static const float ANISOTROPY_BIAS[] = {0.4f, 0.2f, 0.1f, 0.0f, 0.0f, 0.0f};
static const float ANISOTROPY_MULT_CHROMA[] = {6.6f, 5.5f, 4.4f, 3.3f, 2.2f, 1.1f}; static const float ANISOTROPY_MULT_CHROMA[] = {6.6f, 5.5f, 4.4f, 3.3f, 2.2f, 1.1f};
@@ -153,7 +153,7 @@ static int calculate_subband_layout(int width, int height, int decomp_levels, dw
} }
//============================================================================= //=============================================================================
// Perceptual Quantization Model (matches TSVM exactly) // Perceptual Quantisation Model (matches TSVM exactly)
//============================================================================= //=============================================================================
static int tav_derive_encoder_qindex(int q_index, int q_y_global) { static int tav_derive_encoder_qindex(int q_index, int q_y_global) {
@@ -248,18 +248,18 @@ static float get_perceptual_weight(int q_index, int q_y_global, int level0, int
} }
} }
static void dequantize_dwt_subbands_perceptual(int q_index, int q_y_global, const int16_t *quantized, static void dequantise_dwt_subbands_perceptual(int q_index, int q_y_global, const int16_t *quantised,
float *dequantized, int width, int height, int decomp_levels, float *dequantised, int width, int height, int decomp_levels,
float base_quantizer, int is_chroma, int frame_num) { float base_quantiser, int is_chroma, int frame_num) {
dwt_subband_info_t subbands[32]; // Max possible subbands dwt_subband_info_t subbands[32]; // Max possible subbands
const int subband_count = calculate_subband_layout(width, height, decomp_levels, subbands); const int subband_count = calculate_subband_layout(width, height, decomp_levels, subbands);
const int coeff_count = width * height; const int coeff_count = width * height;
memset(dequantized, 0, coeff_count * sizeof(float)); memset(dequantised, 0, coeff_count * sizeof(float));
int is_debug = 0;//(frame_num == 32); int is_debug = 0;//(frame_num == 32);
// if (frame_num == 32) { // if (frame_num == 32) {
// fprintf(stderr, "DEBUG: dequantize called for frame %d, is_chroma=%d\n", frame_num, is_chroma); // fprintf(stderr, "DEBUG: dequantise called for frame %d, is_chroma=%d\n", frame_num, is_chroma);
// } // }
// Apply perceptual weighting to each subband // Apply perceptual weighting to each subband
@@ -267,30 +267,30 @@ static void dequantize_dwt_subbands_perceptual(int q_index, int q_y_global, cons
const dwt_subband_info_t *subband = &subbands[s]; const dwt_subband_info_t *subband = &subbands[s];
const float weight = get_perceptual_weight(q_index, q_y_global, subband->level, const float weight = get_perceptual_weight(q_index, q_y_global, subband->level,
subband->subband_type, is_chroma, decomp_levels); subband->subband_type, is_chroma, decomp_levels);
const float effective_quantizer = base_quantizer * weight; const float effective_quantiser = base_quantiser * weight;
if (is_debug && !is_chroma) { if (is_debug && !is_chroma) {
if (subband->subband_type == 0) { // LL band if (subband->subband_type == 0) { // LL band
fprintf(stderr, " Subband level %d (LL): weight=%.6f, base_q=%.1f, effective_q=%.1f, count=%d\n", fprintf(stderr, " Subband level %d (LL): weight=%.6f, base_q=%.1f, effective_q=%.1f, count=%d\n",
subband->level, weight, base_quantizer, effective_quantizer, subband->coeff_count); subband->level, weight, base_quantiser, effective_quantiser, subband->coeff_count);
// Print first 5 quantized LL coefficients // Print first 5 quantised LL coefficients
fprintf(stderr, " First 5 quantized LL: "); fprintf(stderr, " First 5 quantised LL: ");
for (int k = 0; k < 5 && k < subband->coeff_count; k++) { for (int k = 0; k < 5 && k < subband->coeff_count; k++) {
int idx = subband->coeff_start + k; int idx = subband->coeff_start + k;
fprintf(stderr, "%d ", quantized[idx]); fprintf(stderr, "%d ", quantised[idx]);
} }
fprintf(stderr, "\n"); fprintf(stderr, "\n");
// Find max quantized LL coefficient // Find max quantised LL coefficient
int max_quant_ll = 0; int max_quant_ll = 0;
for (int k = 0; k < subband->coeff_count; k++) { for (int k = 0; k < subband->coeff_count; k++) {
int idx = subband->coeff_start + k; int idx = subband->coeff_start + k;
int abs_val = quantized[idx] < 0 ? -quantized[idx] : quantized[idx]; int abs_val = quantised[idx] < 0 ? -quantised[idx] : quantised[idx];
if (abs_val > max_quant_ll) max_quant_ll = abs_val; if (abs_val > max_quant_ll) max_quant_ll = abs_val;
} }
fprintf(stderr, " Max quantized LL coefficient: %d (dequantizes to %.1f)\n", fprintf(stderr, " Max quantised LL coefficient: %d (dequantises to %.1f)\n",
max_quant_ll, max_quant_ll * effective_quantizer); max_quant_ll, max_quant_ll * effective_quantiser);
} }
} }
@@ -299,33 +299,33 @@ static void dequantize_dwt_subbands_perceptual(int q_index, int q_y_global, cons
if (idx < coeff_count) { if (idx < coeff_count) {
// CRITICAL: Must ROUND to match EZBC encoder's roundf() behavior // CRITICAL: Must ROUND to match EZBC encoder's roundf() behavior
// Without rounding, truncation limits brightness range (e.g., Y maxes at 227 instead of 255) // Without rounding, truncation limits brightness range (e.g., Y maxes at 227 instead of 255)
const float untruncated = quantized[idx] * effective_quantizer; const float untruncated = quantised[idx] * effective_quantiser;
dequantized[idx] = roundf(untruncated); dequantised[idx] = roundf(untruncated);
} }
} }
} }
// Debug: Verify LL band was dequantized correctly // Debug: Verify LL band was dequantised correctly
if (is_debug && !is_chroma) { if (is_debug && !is_chroma) {
// Find LL band again to verify // Find LL band again to verify
for (int s = 0; s < subband_count; s++) { for (int s = 0; s < subband_count; s++) {
const dwt_subband_info_t *subband = &subbands[s]; const dwt_subband_info_t *subband = &subbands[s];
if (subband->level == decomp_levels && subband->subband_type == 0) { if (subband->level == decomp_levels && subband->subband_type == 0) {
fprintf(stderr, " AFTER all subbands processed - First 5 dequantized LL: "); fprintf(stderr, " AFTER all subbands processed - First 5 dequantised LL: ");
for (int k = 0; k < 5 && k < subband->coeff_count; k++) { for (int k = 0; k < 5 && k < subband->coeff_count; k++) {
int idx = subband->coeff_start + k; int idx = subband->coeff_start + k;
fprintf(stderr, "%.1f ", dequantized[idx]); fprintf(stderr, "%.1f ", dequantised[idx]);
} }
fprintf(stderr, "\n"); fprintf(stderr, "\n");
// Find max dequantized LL // Find max dequantised LL
float max_dequant_ll = -999.0f; float max_dequant_ll = -999.0f;
for (int k = 0; k < subband->coeff_count; k++) { for (int k = 0; k < subband->coeff_count; k++) {
int idx = subband->coeff_start + k; int idx = subband->coeff_start + k;
float abs_val = dequantized[idx] < 0 ? -dequantized[idx] : dequantized[idx]; float abs_val = dequantised[idx] < 0 ? -dequantised[idx] : dequantised[idx];
if (abs_val > max_dequant_ll) max_dequant_ll = abs_val; if (abs_val > max_dequant_ll) max_dequant_ll = abs_val;
} }
fprintf(stderr, " AFTER all subbands - Max dequantized LL: %.1f\n", max_dequant_ll); fprintf(stderr, " AFTER all subbands - Max dequantised LL: %.1f\n", max_dequant_ll);
break; break;
} }
} }
@@ -360,7 +360,7 @@ static inline float tav_grain_triangular_noise(uint32_t rng_val) {
} }
// Remove grain synthesis from DWT coefficients (decoder subtracts noise) // Remove grain synthesis from DWT coefficients (decoder subtracts noise)
// This must be called AFTER dequantization but BEFORE inverse DWT // This must be called AFTER dequantisation but BEFORE inverse DWT
static void remove_grain_synthesis_decoder(float *coeffs, int width, int height, static void remove_grain_synthesis_decoder(float *coeffs, int width, int height,
int decomp_levels, int frame_num, int q_y_global) { int decomp_levels, int frame_num, int q_y_global) {
dwt_subband_info_t subbands[32]; dwt_subband_info_t subbands[32];
@@ -647,14 +647,14 @@ static void spectral_interpolate_band(float *c, size_t len, float Q, float lower
} }
//============================================================================= //=============================================================================
// Dequantization (inverse of quantization) // Dequantisation (inverse of quantisation)
//============================================================================= //=============================================================================
#define LAMBDA_FIXED 6.0f #define LAMBDA_FIXED 6.0f
// Lambda-based decompanding decoder (inverse of Laplacian CDF-based encoder) // Lambda-based decompanding decoder (inverse of Laplacian CDF-based encoder)
// Converts quantized index back to normalized float in [-1, 1] // Converts quantised index back to normalised float in [-1, 1]
static float lambda_decompanding(int8_t quant_val, int max_index) { static float lambda_decompanding(int8_t quant_val, int max_index) {
// Handle zero // Handle zero
if (quant_val == 0) { if (quant_val == 0) {
@@ -667,11 +667,11 @@ static float lambda_decompanding(int8_t quant_val, int max_index) {
// Clamp to valid range // Clamp to valid range
if (abs_index > max_index) abs_index = max_index; if (abs_index > max_index) abs_index = max_index;
// Map index back to normalized CDF [0, 1] // Map index back to normalised CDF [0, 1]
float normalized_cdf = (float)abs_index / max_index; float normalised_cdf = (float)abs_index / max_index;
// Map from [0, 1] back to [0.5, 1.0] (CDF range for positive half) // Map from [0, 1] back to [0.5, 1.0] (CDF range for positive half)
float cdf = 0.5f + normalized_cdf * 0.5f; float cdf = 0.5f + normalised_cdf * 0.5f;
// Inverse Laplacian CDF for x >= 0: x = -(1/λ) * ln(2*(1-F)) // Inverse Laplacian CDF for x >= 0: x = -(1/λ) * ln(2*(1-F))
// For F in [0.5, 1.0]: x = -(1/λ) * ln(2*(1-F)) // For F in [0.5, 1.0]: x = -(1/λ) * ln(2*(1-F))
@@ -684,7 +684,7 @@ static float lambda_decompanding(int8_t quant_val, int max_index) {
return sign * abs_val; return sign * abs_val;
} }
static void dequantize_dwt_coefficients(const int8_t *quantized, float *coeffs, size_t count, int chunk_size, int dwt_levels, int max_index, float quantiser_scale) { static void dequantise_dwt_coefficients(const int8_t *quantised, float *coeffs, size_t count, int chunk_size, int dwt_levels, int max_index, float quantiser_scale) {
// Calculate sideband boundaries dynamically // Calculate sideband boundaries dynamically
int first_band_size = chunk_size >> dwt_levels; int first_band_size = chunk_size >> dwt_levels;
@@ -696,7 +696,7 @@ static void dequantize_dwt_coefficients(const int8_t *quantized, float *coeffs,
sideband_starts[i] = sideband_starts[i-1] + (first_band_size << (i-2)); sideband_starts[i] = sideband_starts[i-1] + (first_band_size << (i-2));
} }
// Step 1: Dequantize all coefficients (no dithering yet) // Step 1: Dequantise all coefficients (no dithering yet)
for (size_t i = 0; i < count; i++) { for (size_t i = 0; i < count; i++) {
int sideband = dwt_levels; int sideband = dwt_levels;
for (int s = 0; s <= dwt_levels; s++) { for (int s = 0; s <= dwt_levels; s++) {
@@ -707,11 +707,11 @@ static void dequantize_dwt_coefficients(const int8_t *quantized, float *coeffs,
} }
// Decode using lambda companding // Decode using lambda companding
float normalized_val = lambda_decompanding(quantized[i], max_index); float normalised_val = lambda_decompanding(quantised[i], max_index);
// Denormalize using the subband scalar and apply base weight + quantiser scaling // Denormalise using the subband scalar and apply base weight + quantiser scaling
float weight = BASE_QUANTISER_WEIGHTS[sideband] * quantiser_scale; float weight = BASE_QUANTISER_WEIGHTS[sideband] * quantiser_scale;
coeffs[i] = normalized_val * TAD32_COEFF_SCALARS[sideband] * weight; coeffs[i] = normalised_val * TAD32_COEFF_SCALARS[sideband] * weight;
} }
// Step 2: Apply spectral interpolation per band // Step 2: Apply spectral interpolation per band
@@ -724,7 +724,7 @@ static void dequantize_dwt_coefficients(const int8_t *quantized, float *coeffs,
size_t band_end = sideband_starts[band + 1]; size_t band_end = sideband_starts[band + 1];
size_t band_len = band_end - band_start; size_t band_len = band_end - band_start;
// Calculate quantization step Q for this band // Calculate quantisation step Q for this band
float weight = BASE_QUANTISER_WEIGHTS[band] * quantiser_scale; float weight = BASE_QUANTISER_WEIGHTS[band] * quantiser_scale;
float scalar = TAD32_COEFF_SCALARS[band] * weight; float scalar = TAD32_COEFF_SCALARS[band] * weight;
float Q = scalar / max_index; float Q = scalar / max_index;
@@ -1005,12 +1005,12 @@ static void decode_channel_ezbc(const uint8_t *ezbc_data, size_t offset, size_t
return; return;
} }
// Initialize output and state tracking // Initialise output and state tracking
memset(output, 0, expected_count * sizeof(int16_t)); memset(output, 0, expected_count * sizeof(int16_t));
int8_t *significant = calloc(expected_count, sizeof(int8_t)); int8_t *significant = calloc(expected_count, sizeof(int8_t));
int *first_bitplane = calloc(expected_count, sizeof(int)); int *first_bitplane = calloc(expected_count, sizeof(int));
// Initialize queues // Initialise queues
ezbc_block_queue_t insignificant, next_insignificant, significant_queue, next_significant; ezbc_block_queue_t insignificant, next_insignificant, significant_queue, next_significant;
ezbc_queue_init(&insignificant); ezbc_queue_init(&insignificant);
ezbc_queue_init(&next_insignificant); ezbc_queue_init(&next_insignificant);
@@ -1398,8 +1398,8 @@ static int get_temporal_subband_level(int frame_idx, int num_frames, int tempora
} }
} }
// Calculate temporal quantizer scale for a given temporal subband level // Calculate temporal quantiser scale for a given temporal subband level
static float get_temporal_quantizer_scale(int temporal_level) { static float get_temporal_quantiser_scale(int temporal_level) {
// Uses exponential scaling: 2^(BETA × level^KAPPA) // Uses exponential scaling: 2^(BETA × level^KAPPA)
// With BETA=0.6, KAPPA=1.14: // With BETA=0.6, KAPPA=1.14:
// - Level 0 (tLL): 2^0.0 = 1.00 // - Level 0 (tLL): 2^0.0 = 1.00
@@ -2097,7 +2097,7 @@ static int extract_audio_to_wav(const char *input_file, const char *wav_file, in
} }
//============================================================================= //=============================================================================
// Decoder Initialization and Cleanup // Decoder Initialisation and Cleanup
//============================================================================= //=============================================================================
static tav_decoder_t* tav_decoder_init(const char *input_file, const char *output_file, const char *audio_file) { static tav_decoder_t* tav_decoder_init(const char *input_file, const char *output_file, const char *audio_file) {
@@ -2270,9 +2270,9 @@ static int decode_i_or_p_frame(tav_decoder_t *decoder, uint8_t packet_type, uint
// Variable declarations for cleanup // Variable declarations for cleanup
uint8_t *compressed_data = NULL; uint8_t *compressed_data = NULL;
uint8_t *decompressed_data = NULL; uint8_t *decompressed_data = NULL;
int16_t *quantized_y = NULL; int16_t *quantised_y = NULL;
int16_t *quantized_co = NULL; int16_t *quantised_co = NULL;
int16_t *quantized_cg = NULL; int16_t *quantised_cg = NULL;
int decode_success = 1; // Assume success, set to 0 on error int decode_success = 1; // Assume success, set to 0 on error
// Read and decompress frame data // Read and decompress frame data
@@ -2357,11 +2357,11 @@ static int decode_i_or_p_frame(tav_decoder_t *decoder, uint8_t packet_type, uint
} else { } else {
// Decode coefficients (use function-level variables for proper cleanup) // Decode coefficients (use function-level variables for proper cleanup)
int coeff_count = decoder->frame_size; int coeff_count = decoder->frame_size;
quantized_y = calloc(coeff_count, sizeof(int16_t)); quantised_y = calloc(coeff_count, sizeof(int16_t));
quantized_co = calloc(coeff_count, sizeof(int16_t)); quantised_co = calloc(coeff_count, sizeof(int16_t));
quantized_cg = calloc(coeff_count, sizeof(int16_t)); quantised_cg = calloc(coeff_count, sizeof(int16_t));
if (!quantized_y || !quantized_co || !quantized_cg) { if (!quantised_y || !quantised_co || !quantised_cg) {
fprintf(stderr, "Error: Failed to allocate coefficient buffers\n"); fprintf(stderr, "Error: Failed to allocate coefficient buffers\n");
decode_success = 0; decode_success = 0;
goto write_frame; goto write_frame;
@@ -2370,69 +2370,69 @@ static int decode_i_or_p_frame(tav_decoder_t *decoder, uint8_t packet_type, uint
// Postprocess coefficients based on entropy_coder value // Postprocess coefficients based on entropy_coder value
if (decoder->header.entropy_coder == 1) { if (decoder->header.entropy_coder == 1) {
// EZBC format (stub implementation) // EZBC format (stub implementation)
postprocess_coefficients_ezbc(ptr, coeff_count, quantized_y, quantized_co, quantized_cg, postprocess_coefficients_ezbc(ptr, coeff_count, quantised_y, quantised_co, quantised_cg,
decoder->header.channel_layout); decoder->header.channel_layout);
} else { } else {
// Default: Twobitmap format (entropy_coder=0) // Default: Twobitmap format (entropy_coder=0)
postprocess_coefficients_twobit(ptr, coeff_count, quantized_y, quantized_co, quantized_cg); postprocess_coefficients_twobit(ptr, coeff_count, quantised_y, quantised_co, quantised_cg);
} }
// Debug: Check first few coefficients // Debug: Check first few coefficients
// if (decoder->frame_count == 32) { // if (decoder->frame_count == 32) {
// fprintf(stderr, " First 10 quantized Y coeffs: "); // fprintf(stderr, " First 10 quantised Y coeffs: ");
// for (int i = 0; i < 10 && i < coeff_count; i++) { // for (int i = 0; i < 10 && i < coeff_count; i++) {
// fprintf(stderr, "%d ", quantized_y[i]); // fprintf(stderr, "%d ", quantised_y[i]);
// } // }
// fprintf(stderr, "\n"); // fprintf(stderr, "\n");
// //
// Check for any large quantized values that should produce bright pixels // Check for any large quantised values that should produce bright pixels
// int max_quant_y = 0; // int max_quant_y = 0;
// for (int i = 0; i < coeff_count; i++) { // for (int i = 0; i < coeff_count; i++) {
// int abs_val = quantized_y[i] < 0 ? -quantized_y[i] : quantized_y[i]; // int abs_val = quantised_y[i] < 0 ? -quantised_y[i] : quantised_y[i];
// if (abs_val > max_quant_y) max_quant_y = abs_val; // if (abs_val > max_quant_y) max_quant_y = abs_val;
// } // }
// fprintf(stderr, " Max quantized Y coefficient: %d\n", max_quant_y); // fprintf(stderr, " Max quantised Y coefficient: %d\n", max_quant_y);
// } // }
// Dequantize (perceptual for versions 5-8, uniform for 1-4) // Dequantise (perceptual for versions 5-8, uniform for 1-4)
const int is_perceptual = (decoder->header.version >= 5 && decoder->header.version <= 8); const int is_perceptual = (decoder->header.version >= 5 && decoder->header.version <= 8);
const int is_ezbc = (decoder->header.entropy_coder == 1); const int is_ezbc = (decoder->header.entropy_coder == 1);
if (is_ezbc) { if (is_ezbc) {
// EZBC mode: coefficients are already denormalized by encoder // EZBC mode: coefficients are already denormalised by encoder
// Just convert int16 to float without multiplying by quantizer // Just convert int16 to float without multiplying by quantiser
for (int i = 0; i < coeff_count; i++) { for (int i = 0; i < coeff_count; i++) {
decoder->dwt_buffer_y[i] = (float)quantized_y[i]; decoder->dwt_buffer_y[i] = (float)quantised_y[i];
decoder->dwt_buffer_co[i] = (float)quantized_co[i]; decoder->dwt_buffer_co[i] = (float)quantised_co[i];
decoder->dwt_buffer_cg[i] = (float)quantized_cg[i]; decoder->dwt_buffer_cg[i] = (float)quantised_cg[i];
} }
} else if (is_perceptual) { } else if (is_perceptual) {
dequantize_dwt_subbands_perceptual(0, qy, quantized_y, decoder->dwt_buffer_y, dequantise_dwt_subbands_perceptual(0, qy, quantised_y, decoder->dwt_buffer_y,
decoder->header.width, decoder->header.height, decoder->header.width, decoder->header.height,
decoder->header.decomp_levels, qy, 0, decoder->frame_count); decoder->header.decomp_levels, qy, 0, decoder->frame_count);
// Debug: Check if values survived the function call // Debug: Check if values survived the function call
// if (decoder->frame_count == 32) { // if (decoder->frame_count == 32) {
// fprintf(stderr, " RIGHT AFTER dequantize_Y returns: first 5 values: %.1f %.1f %.1f %.1f %.1f\n", // fprintf(stderr, " RIGHT AFTER dequantise_Y returns: first 5 values: %.1f %.1f %.1f %.1f %.1f\n",
// decoder->dwt_buffer_y[0], decoder->dwt_buffer_y[1], decoder->dwt_buffer_y[2], // decoder->dwt_buffer_y[0], decoder->dwt_buffer_y[1], decoder->dwt_buffer_y[2],
// decoder->dwt_buffer_y[3], decoder->dwt_buffer_y[4]); // decoder->dwt_buffer_y[3], decoder->dwt_buffer_y[4]);
// } // }
dequantize_dwt_subbands_perceptual(0, qy, quantized_co, decoder->dwt_buffer_co, dequantise_dwt_subbands_perceptual(0, qy, quantised_co, decoder->dwt_buffer_co,
decoder->header.width, decoder->header.height, decoder->header.width, decoder->header.height,
decoder->header.decomp_levels, qco, 1, decoder->frame_count); decoder->header.decomp_levels, qco, 1, decoder->frame_count);
dequantize_dwt_subbands_perceptual(0, qy, quantized_cg, decoder->dwt_buffer_cg, dequantise_dwt_subbands_perceptual(0, qy, quantised_cg, decoder->dwt_buffer_cg,
decoder->header.width, decoder->header.height, decoder->header.width, decoder->header.height,
decoder->header.decomp_levels, qcg, 1, decoder->frame_count); decoder->header.decomp_levels, qcg, 1, decoder->frame_count);
} else { } else {
for (int i = 0; i < coeff_count; i++) { for (int i = 0; i < coeff_count; i++) {
decoder->dwt_buffer_y[i] = quantized_y[i] * qy; decoder->dwt_buffer_y[i] = quantised_y[i] * qy;
decoder->dwt_buffer_co[i] = quantized_co[i] * qco; decoder->dwt_buffer_co[i] = quantised_co[i] * qco;
decoder->dwt_buffer_cg[i] = quantized_cg[i] * qcg; decoder->dwt_buffer_cg[i] = quantised_cg[i] * qcg;
} }
} }
// Debug: Check dequantized values using correct subband layout // Debug: Check dequantised values using correct subband layout
// if (decoder->frame_count == 32) { // if (decoder->frame_count == 32) {
// dwt_subband_info_t subbands[32]; // dwt_subband_info_t subbands[32];
// const int subband_count = calculate_subband_layout(decoder->header.width, decoder->header.height, // const int subband_count = calculate_subband_layout(decoder->header.width, decoder->header.height,
@@ -2459,7 +2459,7 @@ static int decode_i_or_p_frame(tav_decoder_t *decoder, uint8_t packet_type, uint
// } // }
// } // }
// Remove grain synthesis from Y channel (must happen after dequantization, before inverse DWT) // Remove grain synthesis from Y channel (must happen after dequantisation, before inverse DWT)
remove_grain_synthesis_decoder(decoder->dwt_buffer_y, decoder->header.width, decoder->header.height, remove_grain_synthesis_decoder(decoder->dwt_buffer_y, decoder->header.width, decoder->header.height,
decoder->header.decomp_levels, decoder->frame_count, decoder->header.quantiser_y); decoder->header.decomp_levels, decoder->frame_count, decoder->header.quantiser_y);
@@ -2479,7 +2479,7 @@ static int decode_i_or_p_frame(tav_decoder_t *decoder, uint8_t packet_type, uint
// } // }
// Apply inverse DWT with correct non-power-of-2 dimension handling // Apply inverse DWT with correct non-power-of-2 dimension handling
// Note: quantized arrays freed at write_frame label // Note: quantised arrays freed at write_frame label
apply_inverse_dwt_multilevel(decoder->dwt_buffer_y, decoder->header.width, decoder->header.height, apply_inverse_dwt_multilevel(decoder->dwt_buffer_y, decoder->header.width, decoder->header.height,
decoder->header.decomp_levels, decoder->header.wavelet_filter); decoder->header.decomp_levels, decoder->header.wavelet_filter);
apply_inverse_dwt_multilevel(decoder->dwt_buffer_co, decoder->header.width, decoder->header.height, apply_inverse_dwt_multilevel(decoder->dwt_buffer_co, decoder->header.width, decoder->header.height,
@@ -2580,9 +2580,9 @@ write_frame:
// Clean up temporary allocations // Clean up temporary allocations
if (compressed_data) free(compressed_data); if (compressed_data) free(compressed_data);
if (decompressed_data) free(decompressed_data); if (decompressed_data) free(decompressed_data);
if (quantized_y) free(quantized_y); if (quantised_y) free(quantised_y);
if (quantized_co) free(quantized_co); if (quantised_co) free(quantised_co);
if (quantized_cg) free(quantized_cg); if (quantised_cg) free(quantised_cg);
// If decoding failed, fill frame with black to maintain stream alignment // If decoding failed, fill frame with black to maintain stream alignment
if (!decode_success) { if (!decode_success) {
@@ -2646,7 +2646,7 @@ static void print_usage(const char *prog) {
printf(" - TAD audio (decoded to PCMu8)\n"); printf(" - TAD audio (decoded to PCMu8)\n");
printf(" - MP2 audio (passed through)\n"); printf(" - MP2 audio (passed through)\n");
printf(" - All wavelet types (5/3, 9/7, CDF 13/7, DD-4, Haar)\n"); printf(" - All wavelet types (5/3, 9/7, CDF 13/7, DD-4, Haar)\n");
printf(" - Perceptual quantization (versions 5-8)\n"); printf(" - Perceptual quantisation (versions 5-8)\n");
printf(" - YCoCg-R and ICtCp color spaces\n\n"); printf(" - YCoCg-R and ICtCp color spaces\n\n");
printf("Unsupported features (not in TSVM decoder):\n"); printf("Unsupported features (not in TSVM decoder):\n");
printf(" - MC-EZBC motion compensation\n"); printf(" - MC-EZBC motion compensation\n");
@@ -2708,7 +2708,7 @@ int main(int argc, char *argv[]) {
// Pass 2: Decode video with audio file // Pass 2: Decode video with audio file
tav_decoder_t *decoder = tav_decoder_init(input_file, output_file, temp_audio_file); tav_decoder_t *decoder = tav_decoder_init(input_file, output_file, temp_audio_file);
if (!decoder) { if (!decoder) {
fprintf(stderr, "Failed to initialize decoder\n"); fprintf(stderr, "Failed to initialise decoder\n");
unlink(temp_audio_file); // Clean up temp file unlink(temp_audio_file); // Clean up temp file
return 1; return 1;
} }
@@ -2853,34 +2853,34 @@ int main(int argc, char *argv[]) {
// Postprocess coefficients based on entropy_coder value // Postprocess coefficients based on entropy_coder value
const int num_pixels = decoder->header.width * decoder->header.height; const int num_pixels = decoder->header.width * decoder->header.height;
int16_t ***quantized_gop; int16_t ***quantised_gop;
if (decoder->header.entropy_coder == 2) { if (decoder->header.entropy_coder == 2) {
// RAW format: simple concatenated int16 arrays // RAW format: simple concatenated int16 arrays
if (verbose) { if (verbose) {
fprintf(stderr, " Using RAW postprocessing (entropy_coder=2)\n"); fprintf(stderr, " Using RAW postprocessing (entropy_coder=2)\n");
} }
quantized_gop = postprocess_gop_raw(decompressed_data, decompressed_size, quantised_gop = postprocess_gop_raw(decompressed_data, decompressed_size,
gop_size, num_pixels, decoder->header.channel_layout); gop_size, num_pixels, decoder->header.channel_layout);
} else if (decoder->header.entropy_coder == 1) { } else if (decoder->header.entropy_coder == 1) {
// EZBC format: embedded zero-block coding // EZBC format: embedded zero-block coding
if (verbose) { if (verbose) {
fprintf(stderr, " Using EZBC postprocessing (entropy_coder=1)\n"); fprintf(stderr, " Using EZBC postprocessing (entropy_coder=1)\n");
} }
quantized_gop = postprocess_gop_ezbc(decompressed_data, decompressed_size, quantised_gop = postprocess_gop_ezbc(decompressed_data, decompressed_size,
gop_size, num_pixels, decoder->header.channel_layout); gop_size, num_pixels, decoder->header.channel_layout);
} else { } else {
// Default: Twobitmap format (entropy_coder=0) // Default: Twobitmap format (entropy_coder=0)
if (verbose) { if (verbose) {
fprintf(stderr, " Using Twobitmap postprocessing (entropy_coder=0)\n"); fprintf(stderr, " Using Twobitmap postprocessing (entropy_coder=0)\n");
} }
quantized_gop = postprocess_gop_unified(decompressed_data, decompressed_size, quantised_gop = postprocess_gop_unified(decompressed_data, decompressed_size,
gop_size, num_pixels, decoder->header.channel_layout); gop_size, num_pixels, decoder->header.channel_layout);
} }
free(decompressed_data); free(decompressed_data);
if (!quantized_gop) { if (!quantised_gop) {
fprintf(stderr, "Error: Failed to postprocess GOP data\n"); fprintf(stderr, "Error: Failed to postprocess GOP data\n");
result = -1; result = -1;
break; break;
@@ -2897,78 +2897,78 @@ int main(int argc, char *argv[]) {
gop_cg[t] = calloc(num_pixels, sizeof(float)); gop_cg[t] = calloc(num_pixels, sizeof(float));
} }
// Dequantize with temporal scaling (perceptual quantization for versions 5-8) // Dequantise with temporal scaling (perceptual quantisation for versions 5-8)
const int is_perceptual = (decoder->header.version >= 5 && decoder->header.version <= 8); const int is_perceptual = (decoder->header.version >= 5 && decoder->header.version <= 8);
const int is_ezbc = (decoder->header.entropy_coder == 1); const int is_ezbc = (decoder->header.entropy_coder == 1);
const int temporal_levels = 2; // Fixed for TAV GOP encoding const int temporal_levels = 2; // Fixed for TAV GOP encoding
for (int t = 0; t < gop_size; t++) { for (int t = 0; t < gop_size; t++) {
if (is_ezbc) { if (is_ezbc) {
// EZBC mode: coefficients are already denormalized by encoder // EZBC mode: coefficients are already denormalised by encoder
// Just convert int16 to float without multiplying by quantizer // Just convert int16 to float without multiplying by quantiser
for (int i = 0; i < num_pixels; i++) { for (int i = 0; i < num_pixels; i++) {
gop_y[t][i] = (float)quantized_gop[t][0][i]; gop_y[t][i] = (float)quantised_gop[t][0][i];
gop_co[t][i] = (float)quantized_gop[t][1][i]; gop_co[t][i] = (float)quantised_gop[t][1][i];
gop_cg[t][i] = (float)quantized_gop[t][2][i]; gop_cg[t][i] = (float)quantised_gop[t][2][i];
} }
if (t == 0) { if (t == 0) {
// Debug first frame // Debug first frame
int16_t max_y = 0, min_y = 0; int16_t max_y = 0, min_y = 0;
for (int i = 0; i < num_pixels; i++) { for (int i = 0; i < num_pixels; i++) {
if (quantized_gop[t][0][i] > max_y) max_y = quantized_gop[t][0][i]; if (quantised_gop[t][0][i] > max_y) max_y = quantised_gop[t][0][i];
if (quantized_gop[t][0][i] < min_y) min_y = quantized_gop[t][0][i]; if (quantised_gop[t][0][i] < min_y) min_y = quantised_gop[t][0][i];
} }
fprintf(stderr, "[GOP-EZBC] Frame 0 Y coeffs range: [%d, %d], first 5: %d %d %d %d %d\n", fprintf(stderr, "[GOP-EZBC] Frame 0 Y coeffs range: [%d, %d], first 5: %d %d %d %d %d\n",
min_y, max_y, min_y, max_y,
quantized_gop[t][0][0], quantized_gop[t][0][1], quantized_gop[t][0][2], quantised_gop[t][0][0], quantised_gop[t][0][1], quantised_gop[t][0][2],
quantized_gop[t][0][3], quantized_gop[t][0][4]); quantised_gop[t][0][3], quantised_gop[t][0][4]);
} }
} else { } else {
// Normal mode: multiply by quantizer // Normal mode: multiply by quantiser
const int temporal_level = get_temporal_subband_level(t, gop_size, temporal_levels); const int temporal_level = get_temporal_subband_level(t, gop_size, temporal_levels);
const float temporal_scale = get_temporal_quantizer_scale(temporal_level); const float temporal_scale = get_temporal_quantiser_scale(temporal_level);
// CRITICAL: Must ROUND temporal quantizer to match encoder's roundf() behavior // CRITICAL: Must ROUND temporal quantiser to match encoder's roundf() behavior
const float base_q_y = roundf(decoder->header.quantiser_y * temporal_scale); const float base_q_y = roundf(decoder->header.quantiser_y * temporal_scale);
const float base_q_co = roundf(decoder->header.quantiser_co * temporal_scale); const float base_q_co = roundf(decoder->header.quantiser_co * temporal_scale);
const float base_q_cg = roundf(decoder->header.quantiser_cg * temporal_scale); const float base_q_cg = roundf(decoder->header.quantiser_cg * temporal_scale);
if (is_perceptual) { if (is_perceptual) {
dequantize_dwt_subbands_perceptual(0, decoder->header.quantiser_y, dequantise_dwt_subbands_perceptual(0, decoder->header.quantiser_y,
quantized_gop[t][0], gop_y[t], quantised_gop[t][0], gop_y[t],
decoder->header.width, decoder->header.height, decoder->header.width, decoder->header.height,
decoder->header.decomp_levels, base_q_y, 0, decoder->frame_count + t); decoder->header.decomp_levels, base_q_y, 0, decoder->frame_count + t);
dequantize_dwt_subbands_perceptual(0, decoder->header.quantiser_y, dequantise_dwt_subbands_perceptual(0, decoder->header.quantiser_y,
quantized_gop[t][1], gop_co[t], quantised_gop[t][1], gop_co[t],
decoder->header.width, decoder->header.height, decoder->header.width, decoder->header.height,
decoder->header.decomp_levels, base_q_co, 1, decoder->frame_count + t); decoder->header.decomp_levels, base_q_co, 1, decoder->frame_count + t);
dequantize_dwt_subbands_perceptual(0, decoder->header.quantiser_y, dequantise_dwt_subbands_perceptual(0, decoder->header.quantiser_y,
quantized_gop[t][2], gop_cg[t], quantised_gop[t][2], gop_cg[t],
decoder->header.width, decoder->header.height, decoder->header.width, decoder->header.height,
decoder->header.decomp_levels, base_q_cg, 1, decoder->frame_count + t); decoder->header.decomp_levels, base_q_cg, 1, decoder->frame_count + t);
} else { } else {
// Uniform quantization for older versions // Uniform quantisation for older versions
for (int i = 0; i < num_pixels; i++) { for (int i = 0; i < num_pixels; i++) {
gop_y[t][i] = quantized_gop[t][0][i] * base_q_y; gop_y[t][i] = quantised_gop[t][0][i] * base_q_y;
gop_co[t][i] = quantized_gop[t][1][i] * base_q_co; gop_co[t][i] = quantised_gop[t][1][i] * base_q_co;
gop_cg[t][i] = quantized_gop[t][2][i] * base_q_cg; gop_cg[t][i] = quantised_gop[t][2][i] * base_q_cg;
} }
} }
} }
} }
// Free quantized coefficients // Free quantised coefficients
for (int t = 0; t < gop_size; t++) { for (int t = 0; t < gop_size; t++) {
free(quantized_gop[t][0]); free(quantised_gop[t][0]);
free(quantized_gop[t][1]); free(quantised_gop[t][1]);
free(quantized_gop[t][2]); free(quantised_gop[t][2]);
free(quantized_gop[t]); free(quantised_gop[t]);
} }
free(quantized_gop); free(quantised_gop);
// Remove grain synthesis from Y channel for each GOP frame // Remove grain synthesis from Y channel for each GOP frame
// This must happen after dequantization but before inverse DWT // This must happen after dequantisation but before inverse DWT
for (int t = 0; t < gop_size; t++) { for (int t = 0; t < gop_size; t++) {
remove_grain_synthesis_decoder(gop_y[t], decoder->header.width, decoder->header.height, remove_grain_synthesis_decoder(gop_y[t], decoder->header.width, decoder->header.height,
decoder->header.decomp_levels, decoder->frame_count + t, decoder->header.decomp_levels, decoder->frame_count + t,

View File

@@ -100,8 +100,8 @@ static ycocg_t rgb_to_ycocg_correct(uint8_t r, uint8_t g, uint8_t b, float dithe
return result; return result;
} }
static int quantize_4bit_y(float value) { static int quantise_4bit_y(float value) {
// Y quantization: round(y * 15) // Y quantisation: round(y * 15)
return (int)round(fmaxf(0.0f, fminf(15.0f, value * 15.0f))); return (int)round(fmaxf(0.0f, fminf(15.0f, value * 15.0f)));
} }
@@ -360,7 +360,7 @@ static void encode_ipf1_block_correct(uint8_t *rgb_data, int width, int height,
pixels[idx] = (ycocg_t){0.0f, 0.0f, 0.0f}; pixels[idx] = (ycocg_t){0.0f, 0.0f, 0.0f};
} }
y_values[idx] = quantize_4bit_y(pixels[idx].y); y_values[idx] = quantise_4bit_y(pixels[idx].y);
co_values[idx] = pixels[idx].co; co_values[idx] = pixels[idx].co;
cg_values[idx] = pixels[idx].cg; cg_values[idx] = pixels[idx].cg;
} }
@@ -567,7 +567,7 @@ static int process_audio(encoder_config_t *config, int frame_num, FILE *output)
return 1; return 1;
} }
// Initialize packet size on first frame // Initialise packet size on first frame
if (config->mp2_packet_size == 0) { if (config->mp2_packet_size == 0) {
uint8_t header[4]; uint8_t header[4];
if (fread(header, 1, 4, config->mp2_file) != 4) return 1; if (fread(header, 1, 4, config->mp2_file) != 4) return 1;
@@ -589,7 +589,7 @@ static int process_audio(encoder_config_t *config, int frame_num, FILE *output)
double packets_per_frame = frame_audio_time / packet_audio_time; double packets_per_frame = frame_audio_time / packet_audio_time;
// Only insert audio when buffer would go below 2 frames // Only insert audio when buffer would go below 2 frames
// Initialize with 2 packets on first frame to prime the buffer // Initialise with 2 packets on first frame to prime the buffer
int packets_to_insert = 0; int packets_to_insert = 0;
if (frame_num == 1) { if (frame_num == 1) {
packets_to_insert = 2; packets_to_insert = 2;
@@ -654,7 +654,7 @@ static void write_tvdos_header(encoder_config_t *config, FILE *output) {
fwrite(reserved, 1, 10, output); fwrite(reserved, 1, 10, output);
} }
// Initialize encoder configuration // Initialise encoder configuration
static encoder_config_t *init_encoder_config() { static encoder_config_t *init_encoder_config() {
encoder_config_t *config = calloc(1, sizeof(encoder_config_t)); encoder_config_t *config = calloc(1, sizeof(encoder_config_t));
if (!config) return NULL; if (!config) return NULL;
@@ -807,7 +807,7 @@ static void print_usage(const char *program_name) {
int main(int argc, char *argv[]) { int main(int argc, char *argv[]) {
encoder_config_t *config = init_encoder_config(); encoder_config_t *config = init_encoder_config();
if (!config) { if (!config) {
fprintf(stderr, "Failed to initialize encoder\n"); fprintf(stderr, "Failed to initialise encoder\n");
return 1; return 1;
} }
@@ -904,7 +904,7 @@ int main(int argc, char *argv[]) {
// Write TVDOS header // Write TVDOS header
write_tvdos_header(config, output); write_tvdos_header(config, output);
// Initialize progress tracking // Initialise progress tracking
gettimeofday(&config->start_time, NULL); gettimeofday(&config->start_time, NULL);
config->last_progress_time = config->start_time; config->last_progress_time = config->start_time;
config->total_output_bytes = 8 + 2 + 2 + 2 + 4 + 2 + 2 + 10; // TVDOS header size config->total_output_bytes = 8 + 2 + 2 + 2 + 4 + 2 + 2 + 10; // TVDOS header size

View File

@@ -19,7 +19,7 @@
static const float TAD32_COEFF_SCALARS[] = {64.0f, 45.255f, 32.0f, 22.627f, 16.0f, 11.314f, 8.0f, 5.657f, 4.0f, 2.828f}; static const float TAD32_COEFF_SCALARS[] = {64.0f, 45.255f, 32.0f, 22.627f, 16.0f, 11.314f, 8.0f, 5.657f, 4.0f, 2.828f};
// Base quantiser weight table (10 subbands: LL + 9 H bands) // Base quantiser weight table (10 subbands: LL + 9 H bands)
// These weights are multiplied by quantiser_scale during quantization // These weights are multiplied by quantiser_scale during quantisation
static const float BASE_QUANTISER_WEIGHTS[2][10] = { static const float BASE_QUANTISER_WEIGHTS[2][10] = {
{ // mid channel { // mid channel
4.0f, // LL (L9) DC 4.0f, // LL (L9) DC
@@ -104,7 +104,7 @@ static int calculate_dwt_levels(int chunk_size) {
// Special marker for deadzoned coefficients (will be reconstructed with noise on decode) // Special marker for deadzoned coefficients (will be reconstructed with noise on decode)
#define DEADZONE_MARKER_FLOAT (-999.0f) // Unmistakable marker in float domain #define DEADZONE_MARKER_FLOAT (-999.0f) // Unmistakable marker in float domain
#define DEADZONE_MARKER_QUANT (-128) // Maps to this in quantized domain (int8 minimum) #define DEADZONE_MARKER_QUANT (-128) // Maps to this in quantised domain (int8 minimum)
// Perceptual epsilon - coefficients below this are truly zero (inaudible) // Perceptual epsilon - coefficients below this are truly zero (inaudible)
#define EPSILON_PERCEPTUAL 0.001f #define EPSILON_PERCEPTUAL 0.001f
@@ -296,7 +296,7 @@ static void calculate_preemphasis_coeffs(float *b0, float *b1, float *a1) {
*b0 = 1.0f; *b0 = 1.0f;
*b1 = -alpha; *b1 = -alpha;
*a1 = 0.0f; // No feedback (FIR filter) *a1 = 0.0f; // No feedback
} }
// emphasis at alpha=0.5 shifts quantisation crackles to lower frequency which MIGHT be more preferable // emphasis at alpha=0.5 shifts quantisation crackles to lower frequency which MIGHT be more preferable
@@ -372,14 +372,14 @@ static void compress_mu_law(float *left, float *right, size_t count) {
} }
//============================================================================= //=============================================================================
// Quantization with Frequency-Dependent Weighting // Quantisation with Frequency-Dependent Weighting
//============================================================================= //=============================================================================
#define LAMBDA_FIXED 6.0f #define LAMBDA_FIXED 6.0f
// Lambda-based companding encoder (based on Laplacian distribution CDF) // Lambda-based companding encoder (based on Laplacian distribution CDF)
// val must be normalised to [-1,1] // val must be normalised to [-1,1]
// Returns quantized index in range [-127, +127] // Returns quantised index in range [-127, +127]
static int8_t lambda_companding(float val, int max_index) { static int8_t lambda_companding(float val, int max_index) {
// Handle zero // Handle zero
if (fabsf(val) < 1e-9f) { if (fabsf(val) < 1e-9f) {
@@ -398,10 +398,10 @@ static int8_t lambda_companding(float val, int max_index) {
float cdf = 1.0f - 0.5f * expf(-LAMBDA_FIXED * abs_val); float cdf = 1.0f - 0.5f * expf(-LAMBDA_FIXED * abs_val);
// Map CDF from [0.5, 1.0] to [0, 1] for positive half // Map CDF from [0.5, 1.0] to [0, 1] for positive half
float normalized_cdf = (cdf - 0.5f) * 2.0f; float normalised_cdf = (cdf - 0.5f) * 2.0f;
// Quantize to index // Quantise to index
int index = (int)roundf(normalized_cdf * max_index); int index = (int)roundf(normalised_cdf * max_index);
// Clamp index to valid range [0, max_index] // Clamp index to valid range [0, max_index]
if (index < 0) index = 0; if (index < 0) index = 0;
@@ -410,7 +410,7 @@ static int8_t lambda_companding(float val, int max_index) {
return (int8_t)(sign * index); return (int8_t)(sign * index);
} }
static void quantize_dwt_coefficients(int channel, const float *coeffs, int8_t *quantized, size_t count, int apply_deadzone, int chunk_size, int dwt_levels, int max_index, int *current_subband_index, float quantiser_scale) { static void quantise_dwt_coefficients(int channel, const float *coeffs, int8_t *quantised, size_t count, int apply_deadzone, int chunk_size, int dwt_levels, int max_index, int *current_subband_index, float quantiser_scale) {
int first_band_size = chunk_size >> dwt_levels; int first_band_size = chunk_size >> dwt_levels;
int *sideband_starts = malloc((dwt_levels + 2) * sizeof(int)); int *sideband_starts = malloc((dwt_levels + 2) * sizeof(int));
@@ -436,14 +436,14 @@ static void quantize_dwt_coefficients(int channel, const float *coeffs, int8_t *
// Check for deadzone marker (special handling) // Check for deadzone marker (special handling)
/*if (coeffs[i] == DEADZONE_MARKER_FLOAT) { /*if (coeffs[i] == DEADZONE_MARKER_FLOAT) {
// Map to special quantized marker for stochastic reconstruction // Map to special quantised marker for stochastic reconstruction
quantized[i] = (int8_t)DEADZONE_MARKER_QUANT; quantised[i] = (int8_t)DEADZONE_MARKER_QUANT;
} else {*/ } else {*/
// Normal quantization // Normal quantisation
float weight = BASE_QUANTISER_WEIGHTS[channel][sideband] * quantiser_scale; float weight = BASE_QUANTISER_WEIGHTS[channel][sideband] * quantiser_scale;
float val = (coeffs[i] / (TAD32_COEFF_SCALARS[sideband] * weight)); // val is normalised to [-1,1] float val = (coeffs[i] / (TAD32_COEFF_SCALARS[sideband] * weight)); // val is normalised to [-1,1]
int8_t quant_val = lambda_companding(val, max_index); int8_t quant_val = lambda_companding(val, max_index);
quantized[i] = quant_val; quantised[i] = quant_val;
// } // }
} }
@@ -489,11 +489,11 @@ static CoeffAccumulator *side_accumulators = NULL;
static QuantAccumulator *mid_quant_accumulators = NULL; static QuantAccumulator *mid_quant_accumulators = NULL;
static QuantAccumulator *side_quant_accumulators = NULL; static QuantAccumulator *side_quant_accumulators = NULL;
static int num_subbands = 0; static int num_subbands = 0;
static int stats_initialized = 0; static int stats_initialised = 0;
static int stats_dwt_levels = 0; static int stats_dwt_levels = 0;
static void init_statistics(int dwt_levels) { static void init_statistics(int dwt_levels) {
if (stats_initialized) return; if (stats_initialised) return;
num_subbands = dwt_levels + 1; num_subbands = dwt_levels + 1;
stats_dwt_levels = dwt_levels; stats_dwt_levels = dwt_levels;
@@ -521,7 +521,7 @@ static void init_statistics(int dwt_levels) {
side_quant_accumulators[i].count = 0; side_quant_accumulators[i].count = 0;
} }
stats_initialized = 1; stats_initialised = 1;
} }
static void accumulate_coefficients(const float *coeffs, int dwt_levels, int chunk_size, CoeffAccumulator *accumulators) { static void accumulate_coefficients(const float *coeffs, int dwt_levels, int chunk_size, CoeffAccumulator *accumulators) {
@@ -555,7 +555,7 @@ static void accumulate_coefficients(const float *coeffs, int dwt_levels, int chu
free(sideband_starts); free(sideband_starts);
} }
static void accumulate_quantized(const int8_t *quant, int dwt_levels, int chunk_size, QuantAccumulator *accumulators) { static void accumulate_quantised(const int8_t *quant, int dwt_levels, int chunk_size, QuantAccumulator *accumulators) {
int first_band_size = chunk_size >> dwt_levels; int first_band_size = chunk_size >> dwt_levels;
int *sideband_starts = malloc((dwt_levels + 2) * sizeof(int)); int *sideband_starts = malloc((dwt_levels + 2) * sizeof(int));
@@ -690,7 +690,7 @@ static int compare_value_frequency(const void *a, const void *b) {
return 0; return 0;
} }
static void print_top5_quantized_values(const int8_t *quant, size_t count, const char *title) { static void print_top5_quantised_values(const int8_t *quant, size_t count, const char *title) {
if (count == 0) { if (count == 0) {
fprintf(stderr, " %s: No data\n", title); fprintf(stderr, " %s: No data\n", title);
return; return;
@@ -731,9 +731,9 @@ static void print_top5_quantized_values(const int8_t *quant, size_t count, const
} }
void tad32_print_statistics(void) { void tad32_print_statistics(void) {
if (!stats_initialized) return; if (!stats_initialised) return;
fprintf(stderr, "\n=== TAD Coefficient Statistics (before quantization) ===\n"); fprintf(stderr, "\n=== TAD Coefficient Statistics (before quantisation) ===\n");
// Print Mid channel statistics // Print Mid channel statistics
fprintf(stderr, "\nMid Channel:\n"); fprintf(stderr, "\nMid Channel:\n");
@@ -803,11 +803,11 @@ void tad32_print_statistics(void) {
print_histogram(side_accumulators[s].data, side_accumulators[s].count, band_name); print_histogram(side_accumulators[s].data, side_accumulators[s].count, band_name);
} }
// Print quantized values statistics // Print quantised values statistics
fprintf(stderr, "\n=== TAD Quantized Values Statistics (after quantization) ===\n"); fprintf(stderr, "\n=== TAD Quantised Values Statistics (after quantisation) ===\n");
// Print Mid channel quantized values // Print Mid channel quantised values
fprintf(stderr, "\nMid Channel Quantized Values:\n"); fprintf(stderr, "\nMid Channel Quantised Values:\n");
for (int s = 0; s < num_subbands; s++) { for (int s = 0; s < num_subbands; s++) {
char band_name[32]; char band_name[32];
if (s == 0) { if (s == 0) {
@@ -815,11 +815,11 @@ void tad32_print_statistics(void) {
} else { } else {
snprintf(band_name, sizeof(band_name), "H (L%d)", stats_dwt_levels - s + 1); snprintf(band_name, sizeof(band_name), "H (L%d)", stats_dwt_levels - s + 1);
} }
print_top5_quantized_values(mid_quant_accumulators[s].data, mid_quant_accumulators[s].count, band_name); print_top5_quantised_values(mid_quant_accumulators[s].data, mid_quant_accumulators[s].count, band_name);
} }
// Print Side channel quantized values // Print Side channel quantised values
fprintf(stderr, "\nSide Channel Quantized Values:\n"); fprintf(stderr, "\nSide Channel Quantised Values:\n");
for (int s = 0; s < num_subbands; s++) { for (int s = 0; s < num_subbands; s++) {
char band_name[32]; char band_name[32];
if (s == 0) { if (s == 0) {
@@ -827,14 +827,14 @@ void tad32_print_statistics(void) {
} else { } else {
snprintf(band_name, sizeof(band_name), "H (L%d)", stats_dwt_levels - s + 1); snprintf(band_name, sizeof(band_name), "H (L%d)", stats_dwt_levels - s + 1);
} }
print_top5_quantized_values(side_quant_accumulators[s].data, side_quant_accumulators[s].count, band_name); print_top5_quantised_values(side_quant_accumulators[s].data, side_quant_accumulators[s].count, band_name);
} }
fprintf(stderr, "\n"); fprintf(stderr, "\n");
} }
void tad32_free_statistics(void) { void tad32_free_statistics(void) {
if (!stats_initialized) return; if (!stats_initialised) return;
for (int i = 0; i < num_subbands; i++) { for (int i = 0; i < num_subbands; i++) {
free(mid_accumulators[i].data); free(mid_accumulators[i].data);
@@ -851,7 +851,7 @@ void tad32_free_statistics(void) {
side_accumulators = NULL; side_accumulators = NULL;
mid_quant_accumulators = NULL; mid_quant_accumulators = NULL;
side_quant_accumulators = NULL; side_quant_accumulators = NULL;
stats_initialized = 0; stats_initialised = 0;
} }
//============================================================================= //=============================================================================
@@ -1051,7 +1051,7 @@ size_t tad_encode_channel_ezbc(int8_t *coeffs, size_t count, uint8_t **output) {
tad_bitstream_write_bits(&bs, msb_bitplane, 8); tad_bitstream_write_bits(&bs, msb_bitplane, 8);
tad_bitstream_write_bits(&bs, (uint32_t)count, 16); tad_bitstream_write_bits(&bs, (uint32_t)count, 16);
// Initialize queues // Initialise queues
tad_block_queue_t insignificant_queue, next_insignificant; tad_block_queue_t insignificant_queue, next_insignificant;
tad_block_queue_t significant_queue, next_significant; tad_block_queue_t significant_queue, next_significant;
@@ -1206,14 +1206,14 @@ size_t tad32_encode_chunk(const float *pcm32_stereo, size_t num_samples,
// apply_coeff_deadzone(0, dwt_mid, num_samples); // apply_coeff_deadzone(0, dwt_mid, num_samples);
// apply_coeff_deadzone(1, dwt_side, num_samples); // apply_coeff_deadzone(1, dwt_side, num_samples);
// Step 4: Quantize with frequency-dependent weights and quantiser scaling // Step 4: Quantise with frequency-dependent weights and quantiser scaling
quantize_dwt_coefficients(0, dwt_mid, quant_mid, num_samples, 1, num_samples, dwt_levels, max_index, NULL, quantiser_scale); quantise_dwt_coefficients(0, dwt_mid, quant_mid, num_samples, 1, num_samples, dwt_levels, max_index, NULL, quantiser_scale);
quantize_dwt_coefficients(1, dwt_side, quant_side, num_samples, 1, num_samples, dwt_levels, max_index, NULL, quantiser_scale); quantise_dwt_coefficients(1, dwt_side, quant_side, num_samples, 1, num_samples, dwt_levels, max_index, NULL, quantiser_scale);
// Step 4.5: Accumulate quantized coefficient statistics if enabled // Step 4.5: Accumulate quantised coefficient statistics if enabled
if (stats_enabled) { if (stats_enabled) {
accumulate_quantized(quant_mid, dwt_levels, num_samples, mid_quant_accumulators); accumulate_quantised(quant_mid, dwt_levels, num_samples, mid_quant_accumulators);
accumulate_quantized(quant_side, dwt_levels, num_samples, side_quant_accumulators); accumulate_quantised(quant_side, dwt_levels, num_samples, side_quant_accumulators);
} }
// Step 5: Encode with binary tree EZBC (1D variant) - FIXED! // Step 5: Encode with binary tree EZBC (1D variant) - FIXED!
@@ -1232,7 +1232,7 @@ size_t tad32_encode_chunk(const float *pcm32_stereo, size_t num_samples,
free(mid_ezbc); free(mid_ezbc);
free(side_ezbc); free(side_ezbc);
// Step 6: Optional Zstd compression // Step 6: Zstd compression
uint8_t *write_ptr = output; uint8_t *write_ptr = output;
// Write chunk header // Write chunk header

View File

@@ -30,15 +30,15 @@ static inline int tad32_quality_to_max_index(int quality) {
* *
* @param pcm32_stereo Input PCM32fLE stereo samples (interleaved L,R) * @param pcm32_stereo Input PCM32fLE stereo samples (interleaved L,R)
* @param num_samples Number of samples per channel (min 1024) * @param num_samples Number of samples per channel (min 1024)
* @param max_index Maximum quantization index (7=3bit, 15=4bit, 31=5bit, 63=6bit, 127=7bit) * @param max_index Maximum quantisation index (7=3bit, 15=4bit, 31=5bit, 63=6bit, 127=7bit)
* @param quantiser_scale Quantiser scaling factor (1.0=baseline, 2.0=2x coarser quantization) * @param quantiser_scale Quantiser scaling factor (1.0=baseline, 2.0=2x coarser quantisation)
* Higher values = more aggressive quantization = smaller files * Higher values = more aggressive quantisation = smaller files
* @param output Output buffer (must be large enough) * @param output Output buffer (must be large enough)
* @return Number of bytes written to output, or 0 on error * @return Number of bytes written to output, or 0 on error
* *
* Output format: * Output format:
* uint16 sample_count (samples per channel) * uint16 sample_count (samples per channel)
* uint8 max_index (maximum quantization index) * uint8 max_index (maximum quantisation index)
* uint32 payload_size (bytes in payload) * uint32 payload_size (bytes in payload)
* * payload (encoded M/S data, Zstd-compressed with 2-bit twobitmap) * * payload (encoded M/S data, Zstd-compressed with 2-bit twobitmap)
*/ */

View File

@@ -15,7 +15,7 @@
#define ENCODER_VENDOR_STRING "Encoder-TAD32 (PCM32f version) 20251107" #define ENCODER_VENDOR_STRING "Encoder-TAD32 (PCM32f version) 20251107"
// TAD32 format constants // TAD32 format constants
#define TAD32_DEFAULT_CHUNK_SIZE 31991 // Using a prime number to force the worst condition #define TAD32_DEFAULT_CHUNK_SIZE 32768 // Using a prime number to force the worst condition
// Temporary file for FFmpeg PCM extraction // Temporary file for FFmpeg PCM extraction
char TEMP_PCM_FILE[42]; char TEMP_PCM_FILE[42];
@@ -119,7 +119,7 @@ int main(int argc, char *argv[]) {
return 1; return 1;
} }
// Convert quality (0-5) to max_index for quantization // Convert quality (0-5) to max_index for quantisation
int max_index = tad32_quality_to_max_index(quality); int max_index = tad32_quality_to_max_index(quality);
// Generate output filename if not provided // Generate output filename if not provided

View File

@@ -516,7 +516,7 @@ static size_t encode_channel_ezbc(int16_t *coeffs, size_t count, int width, int
bs.data[5], bs.data[6], bs.data[7], bs.data[8]); bs.data[5], bs.data[6], bs.data[7], bs.data[8]);
} }
// Initialize two queues: insignificant blocks and significant 1x1 blocks // Initialise two queues: insignificant blocks and significant 1x1 blocks
block_queue_t insignificant_queue, next_insignificant; block_queue_t insignificant_queue, next_insignificant;
block_queue_t significant_queue, next_significant; block_queue_t significant_queue, next_significant;
@@ -718,7 +718,7 @@ static void refine_motion_vector(
} }
if (valid_pixels > 0) { if (valid_pixels > 0) {
sad /= valid_pixels; // Normalize by valid pixels sad /= valid_pixels; // Normalise by valid pixels
} }
if (sad < best_sad) { if (sad < best_sad) {
@@ -1272,7 +1272,7 @@ static void free_quad_tree(quad_tree_node_t *node) {
free(node); free(node);
} }
// Count total nodes in quad-tree (for serialization buffer sizing) // Count total nodes in quad-tree (for serialisation buffer sizing)
static int count_quad_tree_nodes(quad_tree_node_t *node) { static int count_quad_tree_nodes(quad_tree_node_t *node) {
if (!node) return 0; if (!node) return 0;
@@ -1389,7 +1389,7 @@ static void build_mv_map_from_forest(
) { ) {
int blocks_x = (width + residual_coding_min_block_size - 1) / residual_coding_min_block_size; int blocks_x = (width + residual_coding_min_block_size - 1) / residual_coding_min_block_size;
// Initialize map with zeros // Initialise map with zeros
int total_blocks = blocks_x * ((height + residual_coding_min_block_size - 1) / residual_coding_min_block_size); int total_blocks = blocks_x * ((height + residual_coding_min_block_size - 1) / residual_coding_min_block_size);
memset(mv_map_x, 0, total_blocks * sizeof(int16_t)); memset(mv_map_x, 0, total_blocks * sizeof(int16_t));
memset(mv_map_y, 0, total_blocks * sizeof(int16_t)); memset(mv_map_y, 0, total_blocks * sizeof(int16_t));
@@ -1496,12 +1496,12 @@ static void apply_spatial_mv_prediction_to_tree(
} }
} }
// Serialize quad-tree to compact binary format // Serialise quad-tree to compact binary format
// Format: [split_flags_bitstream][leaf_mv_data] // Format: [split_flags_bitstream][leaf_mv_data]
// - split_flags: 1 bit per node (breadth-first), 1=split, 0=leaf // - split_flags: 1 bit per node (breadth-first), 1=split, 0=leaf
// - leaf_mv_data: For each leaf in order: [skip_flag:1bit][mvd_x:15bits][mvd_y:16bits] // - leaf_mv_data: For each leaf in order: [skip_flag:1bit][mvd_x:15bits][mvd_y:16bits]
// Note: MVs are now DIFFERENTIAL (predicted from spatial neighbors) // Note: MVs are now DIFFERENTIAL (predicted from spatial neighbors)
static size_t serialize_quad_tree(quad_tree_node_t *root, uint8_t *buffer, size_t buffer_size) { static size_t serialise_quad_tree(quad_tree_node_t *root, uint8_t *buffer, size_t buffer_size) {
if (!root) return 0; if (!root) return 0;
// First pass: Count nodes and leaves // First pass: Count nodes and leaves
@@ -1512,11 +1512,11 @@ static size_t serialize_quad_tree(quad_tree_node_t *root, uint8_t *buffer, size_
quad_tree_node_t **queue = (quad_tree_node_t**)malloc(total_nodes * sizeof(quad_tree_node_t*)); quad_tree_node_t **queue = (quad_tree_node_t**)malloc(total_nodes * sizeof(quad_tree_node_t*));
int queue_start = 0, queue_end = 0; int queue_start = 0, queue_end = 0;
// Initialize split flags buffer // Initialise split flags buffer
uint8_t *split_flags = (uint8_t*)calloc(split_bytes, 1); uint8_t *split_flags = (uint8_t*)calloc(split_bytes, 1);
int split_bit_pos = 0; int split_bit_pos = 0;
// Start serialization // Start serialisation
queue[queue_end++] = root; queue[queue_end++] = root;
size_t write_pos = split_bytes; // Leave space for split flags size_t write_pos = split_bytes; // Leave space for split flags
@@ -1551,7 +1551,7 @@ static size_t serialize_quad_tree(quad_tree_node_t *root, uint8_t *buffer, size_
if (!node->is_split) { if (!node->is_split) {
// Leaf node - write skip flag + motion vectors // Leaf node - write skip flag + motion vectors
if (write_pos + 5 > buffer_size) { if (write_pos + 5 > buffer_size) {
fprintf(stderr, "ERROR: Quad-tree serialization buffer overflow\n"); fprintf(stderr, "ERROR: Quad-tree serialisation buffer overflow\n");
free(queue); free(queue);
free(split_flags); free(split_flags);
return 0; return 0;
@@ -1588,9 +1588,9 @@ static size_t serialize_quad_tree(quad_tree_node_t *root, uint8_t *buffer, size_
return write_pos; return write_pos;
} }
// Serialize quad-tree with bidirectional motion vectors for B-frames (64-bit leaf nodes) // Serialise quad-tree with bidirectional motion vectors for B-frames (64-bit leaf nodes)
// Format: [split_flags] [leaf_data: skip(1) + fwd_mv_x(15) + fwd_mv_y(16) + bwd_mv_x(16) + bwd_mv_y(16) = 64 bits] // Format: [split_flags] [leaf_data: skip(1) + fwd_mv_x(15) + fwd_mv_y(16) + bwd_mv_x(16) + bwd_mv_y(16) = 64 bits]
static size_t serialize_quad_tree_bidirectional(quad_tree_node_t *root, uint8_t *buffer, size_t buffer_size) { static size_t serialise_quad_tree_bidirectional(quad_tree_node_t *root, uint8_t *buffer, size_t buffer_size) {
if (!root) return 0; if (!root) return 0;
// First pass: Count nodes and leaves // First pass: Count nodes and leaves
@@ -1601,11 +1601,11 @@ static size_t serialize_quad_tree_bidirectional(quad_tree_node_t *root, uint8_t
quad_tree_node_t **queue = (quad_tree_node_t**)malloc(total_nodes * sizeof(quad_tree_node_t*)); quad_tree_node_t **queue = (quad_tree_node_t**)malloc(total_nodes * sizeof(quad_tree_node_t*));
int queue_start = 0, queue_end = 0; int queue_start = 0, queue_end = 0;
// Initialize split flags buffer // Initialise split flags buffer
uint8_t *split_flags = (uint8_t*)calloc(split_bytes, 1); uint8_t *split_flags = (uint8_t*)calloc(split_bytes, 1);
int split_bit_pos = 0; int split_bit_pos = 0;
// Start serialization // Start serialisation
queue[queue_end++] = root; queue[queue_end++] = root;
size_t write_pos = split_bytes; // Leave space for split flags size_t write_pos = split_bytes; // Leave space for split flags
@@ -1640,7 +1640,7 @@ static size_t serialize_quad_tree_bidirectional(quad_tree_node_t *root, uint8_t
if (!node->is_split) { if (!node->is_split) {
// Leaf node - write skip flag + dual motion vectors // Leaf node - write skip flag + dual motion vectors
if (write_pos + 8 > buffer_size) { if (write_pos + 8 > buffer_size) {
fprintf(stderr, "ERROR: Bidirectional quad-tree serialization buffer overflow\n"); fprintf(stderr, "ERROR: Bidirectional quad-tree serialisation buffer overflow\n");
free(queue); free(queue);
free(split_flags); free(split_flags);
return 0; return 0;
@@ -2457,7 +2457,7 @@ static tav_encoder_t* create_encoder(void) {
enc->residual_coding_min_block_size = 4; // Minimum block size enc->residual_coding_min_block_size = 4; // Minimum block size
enc->residual_coding_block_tree_root = NULL; enc->residual_coding_block_tree_root = NULL;
// Initialize residual coding buffers (allocated in initialise_encoder) // Initialise residual coding buffers (allocated in initialise_encoder)
enc->residual_coding_reference_frame_y = NULL; enc->residual_coding_reference_frame_y = NULL;
enc->residual_coding_reference_frame_co = NULL; enc->residual_coding_reference_frame_co = NULL;
enc->residual_coding_reference_frame_cg = NULL; enc->residual_coding_reference_frame_cg = NULL;
@@ -2493,7 +2493,7 @@ static tav_encoder_t* create_encoder(void) {
enc->residual_coding_lookahead_buffer_cg = NULL; enc->residual_coding_lookahead_buffer_cg = NULL;
enc->residual_coding_lookahead_buffer_display_index = NULL; enc->residual_coding_lookahead_buffer_display_index = NULL;
// Two-pass mode initialization // Two-pass mode initialisation
enc->two_pass_mode = 1; // enable by default enc->two_pass_mode = 1; // enable by default
enc->frame_analyses = NULL; enc->frame_analyses = NULL;
enc->frame_analyses_capacity = 0; enc->frame_analyses_capacity = 0;
@@ -2687,7 +2687,7 @@ static int initialise_encoder(tav_encoder_t *enc) {
return -1; return -1;
} }
// Initialize translation vectors to zero // Initialise translation vectors to zero
memset(enc->temporal_gop_translation_x, 0, enc->temporal_gop_capacity * sizeof(int16_t)); memset(enc->temporal_gop_translation_x, 0, enc->temporal_gop_capacity * sizeof(int16_t));
memset(enc->temporal_gop_translation_y, 0, enc->temporal_gop_capacity * sizeof(int16_t)); memset(enc->temporal_gop_translation_y, 0, enc->temporal_gop_capacity * sizeof(int16_t));
@@ -2726,7 +2726,7 @@ static int initialise_encoder(tav_encoder_t *enc) {
return -1; return -1;
} }
// Initialize to zero // Initialise to zero
memset(enc->temporal_gop_mvs_fwd_x[i], 0, num_blocks * sizeof(int16_t)); memset(enc->temporal_gop_mvs_fwd_x[i], 0, num_blocks * sizeof(int16_t));
memset(enc->temporal_gop_mvs_fwd_y[i], 0, num_blocks * sizeof(int16_t)); memset(enc->temporal_gop_mvs_fwd_y[i], 0, num_blocks * sizeof(int16_t));
memset(enc->temporal_gop_mvs_bwd_x[i], 0, num_blocks * sizeof(int16_t)); memset(enc->temporal_gop_mvs_bwd_x[i], 0, num_blocks * sizeof(int16_t));
@@ -3115,7 +3115,7 @@ static void dwt_53_inverse_1d(float *data, int length) {
// and estimate_motion_optical_flow are implemented in encoder_tav_opencv.cpp // and estimate_motion_optical_flow are implemented in encoder_tav_opencv.cpp
// ============================================================================= // =============================================================================
// Temporal Subband Quantization // Temporal Subband Quantisation
// ============================================================================= // =============================================================================
// Determine which temporal decomposition level a frame belongs to after 3D DWT // Determine which temporal decomposition level a frame belongs to after 3D DWT
@@ -3125,7 +3125,7 @@ static void dwt_53_inverse_1d(float *data, int length) {
// - Level 2 (tLH): frames 6-11 (6 frames, high-pass from 2nd decomposition) // - Level 2 (tLH): frames 6-11 (6 frames, high-pass from 2nd decomposition)
// - Level 3 (tH): frames 12-23 (12 frames, high-pass from 1st decomposition) // - Level 3 (tH): frames 12-23 (12 frames, high-pass from 1st decomposition)
static int get_temporal_subband_level(int frame_idx, int num_frames, int temporal_levels) { static int get_temporal_subband_level(int frame_idx, int num_frames, int temporal_levels) {
// After temporal DWT with N levels, frames are organized as: // After temporal DWT with N levels, frames are organised as:
// Frames 0...num_frames/(2^N) = tL...L (N low-passes, coarsest) // Frames 0...num_frames/(2^N) = tL...L (N low-passes, coarsest)
// Remaining frames are temporal high-pass subbands at various levels // Remaining frames are temporal high-pass subbands at various levels
@@ -3141,21 +3141,21 @@ static int get_temporal_subband_level(int frame_idx, int num_frames, int tempora
return temporal_levels; return temporal_levels;
} }
// Quantize 3D DWT coefficients with SEPARABLE temporal-spatial quantization // Quantise 3D DWT coefficients with SEPARABLE temporal-spatial quantisation
// //
// IMPORTANT: This implements a separable quantization approach (temporal × spatial) // IMPORTANT: This implements a separable quantisation approach (temporal × spatial)
// After dwt_3d_forward(), the GOP coefficients have this structure: // After dwt_3d_forward(), the GOP coefficients have this structure:
// - Temporal DWT applied first (24 frames → 3 levels) // - Temporal DWT applied first (24 frames → 3 levels)
// → Results in temporal subbands: tLLL (frames 0-2), tLLH (3-5), tLH (6-11), tH (12-23) // → Results in temporal subbands: tLLL (frames 0-2), tLLH (3-5), tLH (6-11), tH (12-23)
// - Then spatial DWT applied to each temporal subband // - Then spatial DWT applied to each temporal subband
// → Each frame now contains 2D spatial coefficients (LL, LH, HL, HH subbands) // → Each frame now contains 2D spatial coefficients (LL, LH, HL, HH subbands)
// //
// Quantization strategy: // Quantisation strategy:
// 1. Compute temporal base quantizer: tH_base(level) = Qbase_t * 2^(beta*level) // 1. Compute temporal base quantiser: tH_base(level) = Qbase_t * 2^(beta*level)
// - tLL (level 0): coarsest temporal, most important → smallest quantizer // - tLL (level 0): coarsest temporal, most important → smallest quantiser
// - tHH (level 2): finest temporal, less important → largest quantizer // - tHH (level 2): finest temporal, less important → largest quantiser
// 2. Apply spatial perceptual weighting to tH_base (LL: 1.0x, LH/HL: 1.5-2.0x, HH: 2.0-3.0x) // 2. Apply spatial perceptual weighting to tH_base (LL: 1.0x, LH/HL: 1.5-2.0x, HH: 2.0-3.0x)
// 3. Final quantizer: Q_effective = tH_base × spatial_weight // 3. Final quantiser: Q_effective = tH_base × spatial_weight
// //
// This separable approach is efficient and what most 3D wavelet codecs use. // This separable approach is efficient and what most 3D wavelet codecs use.
static void quantise_3d_dwt_coefficients(tav_encoder_t *enc, static void quantise_3d_dwt_coefficients(tav_encoder_t *enc,
@@ -3176,7 +3176,7 @@ static void quantise_3d_dwt_coefficients(tav_encoder_t *enc,
// - Frames 4-7, 8-11, 12-15: tLH, tHL, tHH (levels 1-2) - temporal high-pass // - Frames 4-7, 8-11, 12-15: tLH, tHL, tHH (levels 1-2) - temporal high-pass
int temporal_level = get_temporal_subband_level(t, num_frames, enc->temporal_decomp_levels); int temporal_level = get_temporal_subband_level(t, num_frames, enc->temporal_decomp_levels);
// Step 2: Compute temporal base quantizer using exponential scaling // Step 2: Compute temporal base quantiser using exponential scaling
// Formula: tH_base = Qbase_t * 1.0 * 2^(2.0 * level) // Formula: tH_base = Qbase_t * 1.0 * 2^(2.0 * level)
// Example with Qbase_t=16: // Example with Qbase_t=16:
// - Level 0 (tLL): 16 * 1.0 * 2^0 = 16 (same as intra-only) // - Level 0 (tLL): 16 * 1.0 * 2^0 = 16 (same as intra-only)
@@ -3185,43 +3185,43 @@ static void quantise_3d_dwt_coefficients(tav_encoder_t *enc,
float temporal_scale = powf(2.0f, BETA * powf(temporal_level, KAPPA)); float temporal_scale = powf(2.0f, BETA * powf(temporal_level, KAPPA));
float temporal_quantiser = base_quantiser * temporal_scale; float temporal_quantiser = base_quantiser * temporal_scale;
// Convert to integer for quantization // Convert to integer for quantisation
int temporal_base_quantiser = (int)roundf(temporal_quantiser); int temporal_base_quantiser = (int)roundf(temporal_quantiser);
temporal_base_quantiser = CLAMP(temporal_base_quantiser, 1, 255); temporal_base_quantiser = CLAMP(temporal_base_quantiser, 1, 255);
// Step 3: Apply spatial quantization within this temporal subband // Step 3: Apply spatial quantisation within this temporal subband
// The existing function applies spatial perceptual weighting: // The existing function applies spatial perceptual weighting:
// Q_effective = tH_base × spatial_weight // Q_effective = tH_base × spatial_weight
// Where spatial_weight depends on spatial frequency (LL, LH, HL, HH subbands) // Where spatial_weight depends on spatial frequency (LL, LH, HL, HH subbands)
// This reuses all existing perceptual weighting and dead-zone logic // This reuses all existing perceptual weighting and dead-zone logic
// //
// CRITICAL: Use no_normalisation variant when EZBC is enabled // CRITICAL: Use no_normalisation variant when EZBC is enabled
// - EZBC mode: coefficients must be denormalized (quantize + multiply back) // - EZBC mode: coefficients must be denormalised (quantise + multiply back)
// - Twobit-map/raw mode: coefficients stay normalized (quantize only) // - Twobit-map/raw mode: coefficients stay normalised (quantise only)
if (enc->preprocess_mode == PREPROCESS_EZBC) { if (enc->preprocess_mode == PREPROCESS_EZBC) {
quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation( quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(
enc, enc,
gop_coeffs[t], // Input: spatial coefficients for this temporal subband gop_coeffs[t], // Input: spatial coefficients for this temporal subband
quantised[t], // Output: quantised spatial coefficients (denormalized for EZBC) quantised[t], // Output: quantised spatial coefficients (denormalised for EZBC)
spatial_size, // Number of spatial coefficients spatial_size, // Number of spatial coefficients
temporal_base_quantiser, // Temporally-scaled base quantiser (tH_base) temporal_base_quantiser, // Temporally-scaled base quantiser (tH_base)
enc->width, // Frame width enc->width, // Frame width
enc->height, // Frame height enc->height, // Frame height
enc->decomp_levels, // Spatial decomposition levels (typically 6) enc->decomp_levels, // Spatial decomposition levels (typically 6)
is_chroma, // Is chroma channel (gets additional quantization) is_chroma, // Is chroma channel (gets additional quantisation)
enc->frame_count + t // Frame number (for any frame-dependent logic) enc->frame_count + t // Frame number (for any frame-dependent logic)
); );
} else { } else {
quantise_dwt_coefficients_perceptual_per_coeff( quantise_dwt_coefficients_perceptual_per_coeff(
enc, enc,
gop_coeffs[t], // Input: spatial coefficients for this temporal subband gop_coeffs[t], // Input: spatial coefficients for this temporal subband
quantised[t], // Output: quantised spatial coefficients (normalized for twobit-map) quantised[t], // Output: quantised spatial coefficients (normalised for twobit-map)
spatial_size, // Number of spatial coefficients spatial_size, // Number of spatial coefficients
temporal_base_quantiser, // Temporally-scaled base quantiser (tH_base) temporal_base_quantiser, // Temporally-scaled base quantiser (tH_base)
enc->width, // Frame width enc->width, // Frame width
enc->height, // Frame height enc->height, // Frame height
enc->decomp_levels, // Spatial decomposition levels (typically 6) enc->decomp_levels, // Spatial decomposition levels (typically 6)
is_chroma, // Is chroma channel (gets additional quantization) is_chroma, // Is chroma channel (gets additional quantisation)
enc->frame_count + t // Frame number (for any frame-dependent logic) enc->frame_count + t // Frame number (for any frame-dependent logic)
); );
} }
@@ -3889,15 +3889,15 @@ static size_t encode_pframe_residual(tav_encoder_t *enc, int qY) {
dwt_2d_forward_flexible(residual_co_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter); dwt_2d_forward_flexible(residual_co_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter);
dwt_2d_forward_flexible(residual_cg_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter); dwt_2d_forward_flexible(residual_cg_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter);
// Step 5: Quantize residual coefficients (skip for EZBC - it handles quantization implicitly) // Step 5: Quantise residual coefficients (skip for EZBC - it handles quantisation implicitly)
int16_t *quantised_y = enc->reusable_quantised_y; int16_t *quantised_y = enc->reusable_quantised_y;
int16_t *quantised_co = enc->reusable_quantised_co; int16_t *quantised_co = enc->reusable_quantised_co;
int16_t *quantised_cg = enc->reusable_quantised_cg; int16_t *quantised_cg = enc->reusable_quantised_cg;
if (enc->preprocess_mode == PREPROCESS_EZBC) { if (enc->preprocess_mode == PREPROCESS_EZBC) {
// EZBC mode: Quantize with perceptual weighting but no normalization (division by quantizer) // EZBC mode: Quantise with perceptual weighting but no normalisation (division by quantiser)
// EZBC will compress by encoding only significant bitplanes // EZBC will compress by encoding only significant bitplanes
// fprintf(stderr, "[EZBC-QUANT-PFRAME] Using perceptual quantization without normalization\n"); // fprintf(stderr, "[EZBC-QUANT-PFRAME] Using perceptual quantisation without normalisation\n");
quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(enc, residual_y_dwt, quantised_y, frame_size, quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(enc, residual_y_dwt, quantised_y, frame_size,
qY, enc->width, enc->height, qY, enc->width, enc->height,
enc->decomp_levels, 0, 0); enc->decomp_levels, 0, 0);
@@ -3915,9 +3915,9 @@ static size_t encode_pframe_residual(tav_encoder_t *enc, int qY) {
if (abs(quantised_co[i]) > max_co) max_co = abs(quantised_co[i]); if (abs(quantised_co[i]) > max_co) max_co = abs(quantised_co[i]);
if (abs(quantised_cg[i]) > max_cg) max_cg = abs(quantised_cg[i]); if (abs(quantised_cg[i]) > max_cg) max_cg = abs(quantised_cg[i]);
} }
// fprintf(stderr, "[EZBC-QUANT-PFRAME] Quantized coeff max: Y=%d, Co=%d, Cg=%d\n", max_y, max_co, max_cg); // fprintf(stderr, "[EZBC-QUANT-PFRAME] Quantised coeff max: Y=%d, Co=%d, Cg=%d\n", max_y, max_co, max_cg);
} else { } else {
// Twobit-map mode: Use traditional quantization // Twobit-map mode: Use traditional quantisation
quantise_dwt_coefficients_perceptual_per_coeff(enc, residual_y_dwt, quantised_y, frame_size, quantise_dwt_coefficients_perceptual_per_coeff(enc, residual_y_dwt, quantised_y, frame_size,
qY, enc->width, enc->height, qY, enc->width, enc->height,
enc->decomp_levels, 0, 0); enc->decomp_levels, 0, 0);
@@ -4159,17 +4159,17 @@ static size_t encode_pframe_adaptive(tav_encoder_t *enc, int qY) {
free(mv_map_y); free(mv_map_y);
#endif #endif
// Step 5: Serialize all quad-trees (now with differential MVs) // Step 5: Serialise all quad-trees (now with differential MVs)
// Estimate buffer size: worst case is all leaf nodes at min size // Estimate buffer size: worst case is all leaf nodes at min size
size_t max_serialized_size = total_trees * 10000; // Conservative estimate size_t max_serialised_size = total_trees * 10000; // Conservative estimate
uint8_t *serialized_trees = malloc(max_serialized_size); uint8_t *serialised_trees = malloc(max_serialised_size);
size_t total_serialized = 0; size_t total_serialised = 0;
for (int i = 0; i < total_trees; i++) { for (int i = 0; i < total_trees; i++) {
size_t tree_size = serialize_quad_tree(tree_forest[i], serialized_trees + total_serialized, size_t tree_size = serialise_quad_tree(tree_forest[i], serialised_trees + total_serialised,
max_serialized_size - total_serialized); max_serialised_size - total_serialised);
if (tree_size == 0) { if (tree_size == 0) {
fprintf(stderr, "Error: Failed to serialize quad-tree %d\n", i); fprintf(stderr, "Error: Failed to serialise quad-tree %d\n", i);
// Cleanup and return error // Cleanup and return error
for (int j = 0; j < total_trees; j++) { for (int j = 0; j < total_trees; j++) {
free_quad_tree(tree_forest[j]); free_quad_tree(tree_forest[j]);
@@ -4182,7 +4182,7 @@ static size_t encode_pframe_adaptive(tav_encoder_t *enc, int qY) {
free(temp_mv_x); free(temp_mv_x);
free(temp_mv_y); free(temp_mv_y);
#endif #endif
free(serialized_trees); free(serialised_trees);
enc->residual_coding_block_size = saved_block_size; enc->residual_coding_block_size = saved_block_size;
enc->residual_coding_motion_vectors_x = orig_mv_x; enc->residual_coding_motion_vectors_x = orig_mv_x;
enc->residual_coding_motion_vectors_y = orig_mv_y; enc->residual_coding_motion_vectors_y = orig_mv_y;
@@ -4190,7 +4190,7 @@ static size_t encode_pframe_adaptive(tav_encoder_t *enc, int qY) {
enc->residual_coding_num_blocks_y = orig_blocks_y; enc->residual_coding_num_blocks_y = orig_blocks_y;
return 0; return 0;
} }
total_serialized += tree_size; total_serialised += tree_size;
} }
// Step 6: Apply DWT to residual (same as fixed blocks) // Step 6: Apply DWT to residual (same as fixed blocks)
@@ -4208,7 +4208,7 @@ static size_t encode_pframe_adaptive(tav_encoder_t *enc, int qY) {
dwt_2d_forward_flexible(residual_co_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter); dwt_2d_forward_flexible(residual_co_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter);
dwt_2d_forward_flexible(residual_cg_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter); dwt_2d_forward_flexible(residual_cg_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter);
// Step 7: Quantize residual coefficients // Step 7: Quantise residual coefficients
int16_t *quantised_y = enc->reusable_quantised_y; int16_t *quantised_y = enc->reusable_quantised_y;
int16_t *quantised_co = enc->reusable_quantised_co; int16_t *quantised_co = enc->reusable_quantised_co;
int16_t *quantised_cg = enc->reusable_quantised_cg; int16_t *quantised_cg = enc->reusable_quantised_cg;
@@ -4251,7 +4251,7 @@ static size_t encode_pframe_adaptive(tav_encoder_t *enc, int qY) {
free(temp_mv_x); free(temp_mv_x);
free(temp_mv_y); free(temp_mv_y);
#endif #endif
free(serialized_trees); free(serialised_trees);
free(residual_y_dwt); free(residual_y_dwt);
free(residual_co_dwt); free(residual_co_dwt);
free(residual_cg_dwt); free(residual_cg_dwt);
@@ -4270,17 +4270,17 @@ static size_t encode_pframe_adaptive(tav_encoder_t *enc, int qY) {
uint8_t packet_type = TAV_PACKET_PFRAME_ADAPTIVE; uint8_t packet_type = TAV_PACKET_PFRAME_ADAPTIVE;
uint16_t num_trees_u16 = (uint16_t)total_trees; uint16_t num_trees_u16 = (uint16_t)total_trees;
uint32_t tree_data_size = (uint32_t)total_serialized; uint32_t tree_data_size = (uint32_t)total_serialised;
uint32_t compressed_size_u32 = (uint32_t)compressed_size; uint32_t compressed_size_u32 = (uint32_t)compressed_size;
fwrite(&packet_type, 1, 1, enc->output_fp); fwrite(&packet_type, 1, 1, enc->output_fp);
fwrite(&num_trees_u16, sizeof(uint16_t), 1, enc->output_fp); fwrite(&num_trees_u16, sizeof(uint16_t), 1, enc->output_fp);
fwrite(&tree_data_size, sizeof(uint32_t), 1, enc->output_fp); fwrite(&tree_data_size, sizeof(uint32_t), 1, enc->output_fp);
fwrite(serialized_trees, 1, total_serialized, enc->output_fp); fwrite(serialised_trees, 1, total_serialised, enc->output_fp);
fwrite(&compressed_size_u32, sizeof(uint32_t), 1, enc->output_fp); fwrite(&compressed_size_u32, sizeof(uint32_t), 1, enc->output_fp);
fwrite(compressed_coeffs, 1, compressed_size, enc->output_fp); fwrite(compressed_coeffs, 1, compressed_size, enc->output_fp);
size_t packet_size = 1 + sizeof(uint16_t) + sizeof(uint32_t) + total_serialized + size_t packet_size = 1 + sizeof(uint16_t) + sizeof(uint32_t) + total_serialised +
sizeof(uint32_t) + compressed_size; sizeof(uint32_t) + compressed_size;
// Cleanup // Cleanup
@@ -4295,7 +4295,7 @@ static size_t encode_pframe_adaptive(tav_encoder_t *enc, int qY) {
free(temp_mv_x); free(temp_mv_x);
free(temp_mv_y); free(temp_mv_y);
#endif #endif
free(serialized_trees); free(serialised_trees);
free(residual_y_dwt); free(residual_y_dwt);
free(residual_co_dwt); free(residual_co_dwt);
free(residual_cg_dwt); free(residual_cg_dwt);
@@ -4311,7 +4311,7 @@ static size_t encode_pframe_adaptive(tav_encoder_t *enc, int qY) {
if (enc->verbose) { if (enc->verbose) {
printf(" P-frame (adaptive): %d trees, tree_data: %zu bytes, residual: %zu → %zu bytes (%.1f%%)\n", printf(" P-frame (adaptive): %d trees, tree_data: %zu bytes, residual: %zu → %zu bytes (%.1f%%)\n",
total_trees, total_serialized, preprocessed_size, compressed_size, total_trees, total_serialised, preprocessed_size, compressed_size,
(compressed_size * 100.0f) / preprocessed_size); (compressed_size * 100.0f) / preprocessed_size);
} }
@@ -4404,16 +4404,16 @@ static size_t encode_bframe_adaptive(tav_encoder_t *enc, int qY) {
// Note: For B-frames, we don't recompute residuals because dual predictions are already optimal // Note: For B-frames, we don't recompute residuals because dual predictions are already optimal
// Step 5: Serialize all quad-trees with 64-bit leaf nodes // Step 5: Serialise all quad-trees with 64-bit leaf nodes
size_t max_serialized_size = total_trees * 20000; // Conservative (2× P-frame size due to dual MVs) size_t max_serialised_size = total_trees * 20000; // Conservative (2× P-frame size due to dual MVs)
uint8_t *serialized_trees = malloc(max_serialized_size); uint8_t *serialised_trees = malloc(max_serialised_size);
size_t total_serialized = 0; size_t total_serialised = 0;
for (int i = 0; i < total_trees; i++) { for (int i = 0; i < total_trees; i++) {
size_t tree_size = serialize_quad_tree_bidirectional(tree_forest[i], serialized_trees + total_serialized, size_t tree_size = serialise_quad_tree_bidirectional(tree_forest[i], serialised_trees + total_serialised,
max_serialized_size - total_serialized); max_serialised_size - total_serialised);
if (tree_size == 0) { if (tree_size == 0) {
fprintf(stderr, "Error: Failed to serialize bidirectional quad-tree %d\n", i); fprintf(stderr, "Error: Failed to serialise bidirectional quad-tree %d\n", i);
// Cleanup and return error // Cleanup and return error
for (int j = 0; j < total_trees; j++) { for (int j = 0; j < total_trees; j++) {
free_quad_tree(tree_forest[j]); free_quad_tree(tree_forest[j]);
@@ -4421,11 +4421,11 @@ static size_t encode_bframe_adaptive(tav_encoder_t *enc, int qY) {
free(tree_forest); free(tree_forest);
free(fine_fwd_mv_x); free(fine_fwd_mv_y); free(fine_fwd_mv_x); free(fine_fwd_mv_y);
free(fine_bwd_mv_x); free(fine_bwd_mv_y); free(fine_bwd_mv_x); free(fine_bwd_mv_y);
free(serialized_trees); free(serialised_trees);
enc->residual_coding_block_size = saved_block_size; enc->residual_coding_block_size = saved_block_size;
return 0; return 0;
} }
total_serialized += tree_size; total_serialised += tree_size;
} }
// Step 6: Apply DWT to residual // Step 6: Apply DWT to residual
@@ -4441,7 +4441,7 @@ static size_t encode_bframe_adaptive(tav_encoder_t *enc, int qY) {
dwt_2d_forward_flexible(residual_co_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter); dwt_2d_forward_flexible(residual_co_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter);
dwt_2d_forward_flexible(residual_cg_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter); dwt_2d_forward_flexible(residual_cg_dwt, enc->width, enc->height, enc->decomp_levels, enc->wavelet_filter);
// Step 7: Quantize residual coefficients // Step 7: Quantise residual coefficients
int16_t *quantised_y = enc->reusable_quantised_y; int16_t *quantised_y = enc->reusable_quantised_y;
int16_t *quantised_co = enc->reusable_quantised_co; int16_t *quantised_co = enc->reusable_quantised_co;
int16_t *quantised_cg = enc->reusable_quantised_cg; int16_t *quantised_cg = enc->reusable_quantised_cg;
@@ -4479,7 +4479,7 @@ static size_t encode_bframe_adaptive(tav_encoder_t *enc, int qY) {
free(tree_forest); free(tree_forest);
free(fine_fwd_mv_x); free(fine_fwd_mv_y); free(fine_fwd_mv_x); free(fine_fwd_mv_y);
free(fine_bwd_mv_x); free(fine_bwd_mv_y); free(fine_bwd_mv_x); free(fine_bwd_mv_y);
free(serialized_trees); free(serialised_trees);
free(residual_y_dwt); free(residual_y_dwt);
free(residual_co_dwt); free(residual_co_dwt);
free(residual_cg_dwt); free(residual_cg_dwt);
@@ -4494,17 +4494,17 @@ static size_t encode_bframe_adaptive(tav_encoder_t *enc, int qY) {
uint8_t packet_type = TAV_PACKET_BFRAME_ADAPTIVE; uint8_t packet_type = TAV_PACKET_BFRAME_ADAPTIVE;
uint16_t num_trees_u16 = (uint16_t)total_trees; uint16_t num_trees_u16 = (uint16_t)total_trees;
uint32_t tree_data_size = (uint32_t)total_serialized; uint32_t tree_data_size = (uint32_t)total_serialised;
uint32_t compressed_size_u32 = (uint32_t)compressed_size; uint32_t compressed_size_u32 = (uint32_t)compressed_size;
fwrite(&packet_type, 1, 1, enc->output_fp); fwrite(&packet_type, 1, 1, enc->output_fp);
fwrite(&num_trees_u16, sizeof(uint16_t), 1, enc->output_fp); fwrite(&num_trees_u16, sizeof(uint16_t), 1, enc->output_fp);
fwrite(&tree_data_size, sizeof(uint32_t), 1, enc->output_fp); fwrite(&tree_data_size, sizeof(uint32_t), 1, enc->output_fp);
fwrite(serialized_trees, 1, total_serialized, enc->output_fp); fwrite(serialised_trees, 1, total_serialised, enc->output_fp);
fwrite(&compressed_size_u32, sizeof(uint32_t), 1, enc->output_fp); fwrite(&compressed_size_u32, sizeof(uint32_t), 1, enc->output_fp);
fwrite(compressed_coeffs, 1, compressed_size, enc->output_fp); fwrite(compressed_coeffs, 1, compressed_size, enc->output_fp);
size_t packet_size = 1 + sizeof(uint16_t) + sizeof(uint32_t) + total_serialized + size_t packet_size = 1 + sizeof(uint16_t) + sizeof(uint32_t) + total_serialised +
sizeof(uint32_t) + compressed_size; sizeof(uint32_t) + compressed_size;
// Cleanup // Cleanup
@@ -4514,7 +4514,7 @@ static size_t encode_bframe_adaptive(tav_encoder_t *enc, int qY) {
free(tree_forest); free(tree_forest);
free(fine_fwd_mv_x); free(fine_fwd_mv_y); free(fine_fwd_mv_x); free(fine_fwd_mv_y);
free(fine_bwd_mv_x); free(fine_bwd_mv_y); free(fine_bwd_mv_x); free(fine_bwd_mv_y);
free(serialized_trees); free(serialised_trees);
free(residual_y_dwt); free(residual_y_dwt);
free(residual_co_dwt); free(residual_co_dwt);
free(residual_cg_dwt); free(residual_cg_dwt);
@@ -4526,7 +4526,7 @@ static size_t encode_bframe_adaptive(tav_encoder_t *enc, int qY) {
if (enc->verbose) { if (enc->verbose) {
printf(" B-frame (adaptive): %d trees, tree_data: %zu bytes, residual: %zu → %zu bytes (%.1f%%)\n", printf(" B-frame (adaptive): %d trees, tree_data: %zu bytes, residual: %zu → %zu bytes (%.1f%%)\n",
total_trees, total_serialized, preprocessed_size, compressed_size, total_trees, total_serialised, preprocessed_size, compressed_size,
(compressed_size * 100.0f) / preprocessed_size); (compressed_size * 100.0f) / preprocessed_size);
} }
@@ -4671,7 +4671,7 @@ static int gop_should_flush_twopass(tav_encoder_t *enc, int current_frame_number
return 0; return 0;
} }
// Flush GOP: apply 3D DWT, quantize, serialise, and write to output // Flush GOP: apply 3D DWT, quantise, serialise, and write to output
// Returns number of bytes written, or 0 on error // Returns number of bytes written, or 0 on error
// This function processes the entire GOP and writes all frames with temporal 3D DWT // This function processes the entire GOP and writes all frames with temporal 3D DWT
static size_t gop_flush(tav_encoder_t *enc, FILE *output, int base_quantiser, static size_t gop_flush(tav_encoder_t *enc, FILE *output, int base_quantiser,
@@ -4808,7 +4808,7 @@ static size_t gop_flush(tav_encoder_t *enc, FILE *output, int base_quantiser,
float **canvas_cg_coeffs = malloc(actual_gop_size * sizeof(float*)); float **canvas_cg_coeffs = malloc(actual_gop_size * sizeof(float*));
for (int i = 0; i < actual_gop_size; i++) { for (int i = 0; i < actual_gop_size; i++) {
canvas_y_coeffs[i] = calloc(canvas_pixels, sizeof(float)); // Zero-initialized canvas_y_coeffs[i] = calloc(canvas_pixels, sizeof(float)); // Zero-initialised
canvas_co_coeffs[i] = calloc(canvas_pixels, sizeof(float)); canvas_co_coeffs[i] = calloc(canvas_pixels, sizeof(float));
canvas_cg_coeffs[i] = calloc(canvas_pixels, sizeof(float)); canvas_cg_coeffs[i] = calloc(canvas_pixels, sizeof(float));
@@ -4924,7 +4924,7 @@ static size_t gop_flush(tav_encoder_t *enc, FILE *output, int base_quantiser,
} }
} }
// Step 2: Allocate quantized coefficient buffers // Step 2: Allocate quantised coefficient buffers
int16_t **quant_y = malloc(actual_gop_size * sizeof(int16_t*)); int16_t **quant_y = malloc(actual_gop_size * sizeof(int16_t*));
int16_t **quant_co = malloc(actual_gop_size * sizeof(int16_t*)); int16_t **quant_co = malloc(actual_gop_size * sizeof(int16_t*));
int16_t **quant_cg = malloc(actual_gop_size * sizeof(int16_t*)); int16_t **quant_cg = malloc(actual_gop_size * sizeof(int16_t*));
@@ -4935,11 +4935,11 @@ static size_t gop_flush(tav_encoder_t *enc, FILE *output, int base_quantiser,
quant_cg[i] = malloc(num_pixels * sizeof(int16_t)); quant_cg[i] = malloc(num_pixels * sizeof(int16_t));
} }
// Step 3: Quantize 3D DWT coefficients with temporal-spatial quantization // Step 3: Quantise 3D DWT coefficients with temporal-spatial quantisation
// Use channel-specific quantizers from encoder settings // Use channel-specific quantisers from encoder settings
int qY = base_quantiser; // Y quantizer passed as parameter int qY = base_quantiser; // Y quantiser passed as parameter
int qCo = QLUT[enc->quantiser_co]; // Co quantizer from encoder int qCo = QLUT[enc->quantiser_co]; // Co quantiser from encoder
int qCg = QLUT[enc->quantiser_cg]; // Cg quantizer from encoder int qCg = QLUT[enc->quantiser_cg]; // Cg quantiser from encoder
quantise_3d_dwt_coefficients(enc, gop_y_coeffs, quant_y, actual_gop_size, quantise_3d_dwt_coefficients(enc, gop_y_coeffs, quant_y, actual_gop_size,
num_pixels, qY, 0); // Luma num_pixels, qY, 0); // Luma
@@ -4983,7 +4983,7 @@ static size_t gop_flush(tav_encoder_t *enc, FILE *output, int base_quantiser,
const size_t max_tile_size = 4 + (num_pixels * 3 * sizeof(int16_t)); const size_t max_tile_size = 4 + (num_pixels * 3 * sizeof(int16_t));
uint8_t *uncompressed_buffer = malloc(max_tile_size); uint8_t *uncompressed_buffer = malloc(max_tile_size);
// Use serialise_tile_data with DWT-transformed float coefficients (before quantization) // Use serialise_tile_data with DWT-transformed float coefficients (before quantisation)
// This matches the traditional I-frame path in compress_and_write_frame // This matches the traditional I-frame path in compress_and_write_frame
size_t tile_size = serialise_tile_data(enc, 0, 0, size_t tile_size = serialise_tile_data(enc, 0, 0,
gop_y_coeffs[0], gop_co_coeffs[0], gop_cg_coeffs[0], gop_y_coeffs[0], gop_co_coeffs[0], gop_cg_coeffs[0],
@@ -5640,7 +5640,7 @@ static void dwt_3d_forward_mc(
} }
// Apply 3D DWT: temporal DWT across frames, then spatial DWT on each temporal subband // Apply 3D DWT: temporal DWT across frames, then spatial DWT on each temporal subband
// gop_data[frame][y * width + x] - GOP buffer organized as frame-major // gop_data[frame][y * width + x] - GOP buffer organised as frame-major
// Modifies gop_data in-place // Modifies gop_data in-place
// NOTE: This is the OLD version without MC-lifting (kept for non-mesh mode) // NOTE: This is the OLD version without MC-lifting (kept for non-mesh mode)
static void dwt_3d_forward(float **gop_data, int width, int height, int num_frames, static void dwt_3d_forward(float **gop_data, int width, int height, int num_frames,
@@ -6666,7 +6666,7 @@ static void quantise_dwt_coefficients_perceptual_per_coeff(tav_encoder_t *enc,
} }
} }
// Quantization for EZBC mode: quantizes to discrete levels but doesn't normalize (shrink) values // Quantisation for EZBC mode: quantises to discrete levels but doesn't normalise (shrink) values
// This reduces coefficient precision while preserving magnitude for EZBC's bitplane encoding // This reduces coefficient precision while preserving magnitude for EZBC's bitplane encoding
static void quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(tav_encoder_t *enc, static void quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(tav_encoder_t *enc,
float *coeffs, int16_t *quantised, int size, float *coeffs, int16_t *quantised, int size,
@@ -6682,10 +6682,10 @@ static void quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(tav_
float weight = get_perceptual_weight_for_position(enc, i, width, height, decomp_levels, is_chroma); float weight = get_perceptual_weight_for_position(enc, i, width, height, decomp_levels, is_chroma);
float effective_q = effective_base_q * weight; float effective_q = effective_base_q * weight;
// Step 1: Quantize - divide by quantizer to get normalized value // Step 1: Quantise - divide by quantiser to get normalised value
float quantised_val = coeffs[i] / effective_q; float quantised_val = coeffs[i] / effective_q;
// Step 2: Apply dead-zone quantization to normalized value // Step 2: Apply dead-zone quantisation to normalised value
if (enc->dead_zone_threshold > 0.0f && !is_chroma) { if (enc->dead_zone_threshold > 0.0f && !is_chroma) {
int level = get_subband_level(i, width, height, decomp_levels); int level = get_subband_level(i, width, height, decomp_levels);
int subband_type = get_subband_type(i, width, height, decomp_levels); int subband_type = get_subband_type(i, width, height, decomp_levels);
@@ -6715,16 +6715,16 @@ static void quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(tav_
} }
} }
// Step 3: Round to discrete quantization levels // Step 3: Round to discrete quantisation levels
quantised_val = roundf(quantised_val); // file size explodes without rounding quantised_val = roundf(quantised_val); // file size explodes without rounding
// Step 4: Denormalize - multiply back by quantizer to restore magnitude // Step 4: Denormalise - multiply back by quantiser to restore magnitude
// This gives us quantized values at original scale (not shrunken to 0-10 range) // This gives us quantised values at original scale (not shrunken to 0-10 range)
float denormalized = quantised_val * effective_q; float denormalised = quantised_val * effective_q;
// CRITICAL FIX: Must round (not truncate) to match decoder behavior // CRITICAL FIX: Must round (not truncate) to match decoder behavior
// With odd baseQ values and fractional weights, truncation causes mismatch with Sigmap mode // With odd baseQ values and fractional weights, truncation causes mismatch with Sigmap mode
quantised[i] = (int16_t)CLAMP((int)roundf(denormalized), -32768, 32767); quantised[i] = (int16_t)CLAMP((int)roundf(denormalised), -32768, 32767);
} }
} }
@@ -6836,8 +6836,8 @@ static size_t serialise_tile_data(tav_encoder_t *enc, int tile_x, int tile_y,
if (mode == TAV_MODE_INTRA) { if (mode == TAV_MODE_INTRA) {
// INTRA mode: quantise coefficients directly and store for future reference // INTRA mode: quantise coefficients directly and store for future reference
if (enc->preprocess_mode == PREPROCESS_EZBC) { if (enc->preprocess_mode == PREPROCESS_EZBC) {
// EZBC mode: Quantize with perceptual weighting but no normalization (division by quantizer) // EZBC mode: Quantise with perceptual weighting but no normalisation (division by quantiser)
// fprintf(stderr, "[EZBC-QUANT-INTRA] Using perceptual quantization without normalization\n"); // fprintf(stderr, "[EZBC-QUANT-INTRA] Using perceptual quantisation without normalisation\n");
quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(enc, (float*)tile_y_data, quantised_y, tile_size, this_frame_qY, enc->width, enc->height, enc->decomp_levels, 0, enc->frame_count); quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(enc, (float*)tile_y_data, quantised_y, tile_size, this_frame_qY, enc->width, enc->height, enc->decomp_levels, 0, enc->frame_count);
quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(enc, (float*)tile_co_data, quantised_co, tile_size, this_frame_qCo, enc->width, enc->height, enc->decomp_levels, 1, enc->frame_count); quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(enc, (float*)tile_co_data, quantised_co, tile_size, this_frame_qCo, enc->width, enc->height, enc->decomp_levels, 1, enc->frame_count);
quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(enc, (float*)tile_cg_data, quantised_cg, tile_size, this_frame_qCg, enc->width, enc->height, enc->decomp_levels, 1, enc->frame_count); quantise_dwt_coefficients_perceptual_per_coeff_no_normalisation(enc, (float*)tile_cg_data, quantised_cg, tile_size, this_frame_qCg, enc->width, enc->height, enc->decomp_levels, 1, enc->frame_count);
@@ -6849,7 +6849,7 @@ static size_t serialise_tile_data(tav_encoder_t *enc, int tile_x, int tile_y,
if (abs(quantised_co[i]) > max_co) max_co = abs(quantised_co[i]); if (abs(quantised_co[i]) > max_co) max_co = abs(quantised_co[i]);
if (abs(quantised_cg[i]) > max_cg) max_cg = abs(quantised_cg[i]); if (abs(quantised_cg[i]) > max_cg) max_cg = abs(quantised_cg[i]);
} }
// fprintf(stderr, "[EZBC-QUANT-INTRA] Quantized coeff max: Y=%d, Co=%d, Cg=%d\n", max_y, max_co, max_cg); // fprintf(stderr, "[EZBC-QUANT-INTRA] Quantised coeff max: Y=%d, Co=%d, Cg=%d\n", max_y, max_co, max_cg);
} else if (enc->perceptual_tuning) { } else if (enc->perceptual_tuning) {
// Perceptual quantisation: EXACTLY like uniform but with per-coefficient weights // Perceptual quantisation: EXACTLY like uniform but with per-coefficient weights
quantise_dwt_coefficients_perceptual_per_coeff(enc, (float*)tile_y_data, quantised_y, tile_size, this_frame_qY, enc->width, enc->height, enc->decomp_levels, 0, enc->frame_count); quantise_dwt_coefficients_perceptual_per_coeff(enc, (float*)tile_y_data, quantised_y, tile_size, this_frame_qY, enc->width, enc->height, enc->decomp_levels, 0, enc->frame_count);
@@ -7798,7 +7798,7 @@ static int start_audio_conversion(tav_encoder_t *enc) {
// Calculate samples per frame: ceil(sample_rate / fps) // Calculate samples per frame: ceil(sample_rate / fps)
enc->samples_per_frame = (TSVM_AUDIO_SAMPLE_RATE + enc->output_fps - 1) / enc->output_fps; enc->samples_per_frame = (TSVM_AUDIO_SAMPLE_RATE + enc->output_fps - 1) / enc->output_fps;
// Initialize 2nd-order noise shaping error history // Initialise 2nd-order noise shaping error history
enc->dither_error[0][0] = 0.0f; enc->dither_error[0][0] = 0.0f;
enc->dither_error[0][1] = 0.0f; enc->dither_error[0][1] = 0.0f;
enc->dither_error[1][0] = 0.0f; enc->dither_error[1][0] = 0.0f;
@@ -8510,7 +8510,7 @@ static void convert_pcm32_to_pcm8_dithered(tav_encoder_t *enc, const float *pcm3
if (shaped < -1.0f) shaped = -1.0f; if (shaped < -1.0f) shaped = -1.0f;
if (shaped > 1.0f) shaped = 1.0f; if (shaped > 1.0f) shaped = 1.0f;
// Quantize to signed 8-bit range [-128, 127] // Quantise to signed 8-bit range [-128, 127]
int q = (int)lrintf(shaped * scale); int q = (int)lrintf(shaped * scale);
if (q < -128) q = -128; if (q < -128) q = -128;
else if (q > 127) q = 127; else if (q > 127) q = 127;
@@ -8518,7 +8518,7 @@ static void convert_pcm32_to_pcm8_dithered(tav_encoder_t *enc, const float *pcm3
// Convert to unsigned 8-bit [0, 255] // Convert to unsigned 8-bit [0, 255]
pcm8[idx] = (uint8_t)(q + (int)bias); pcm8[idx] = (uint8_t)(q + (int)bias);
// Calculate quantization error for feedback // Calculate quantisation error for feedback
float qerr = shaped - (float)q / scale; float qerr = shaped - (float)q / scale;
// Update error history (shift and store) // Update error history (shift and store)
@@ -8623,9 +8623,9 @@ static int write_tad_packet_samples(tav_encoder_t *enc, FILE *output, int sample
if (tad_quality > TAD32_QUALITY_MAX) tad_quality = TAD32_QUALITY_MAX; if (tad_quality > TAD32_QUALITY_MAX) tad_quality = TAD32_QUALITY_MAX;
if (tad_quality < TAD32_QUALITY_MIN) tad_quality = TAD32_QUALITY_MIN; if (tad_quality < TAD32_QUALITY_MIN) tad_quality = TAD32_QUALITY_MIN;
// Convert quality (0-5) to max_index for quantization // Convert quality (0-5) to max_index for quantisation
int max_index = tad32_quality_to_max_index(tad_quality); int max_index = tad32_quality_to_max_index(tad_quality);
float quantiser_scale = 1.0f; // Baseline quantizer scaling float quantiser_scale = 1.0f; // Baseline quantiser scaling
// Allocate output buffer (generous size for TAD chunk) // Allocate output buffer (generous size for TAD chunk)
size_t max_output_size = samples_to_read * 4 * sizeof(int16_t) + 1024; size_t max_output_size = samples_to_read * 4 * sizeof(int16_t) + 1024;
@@ -8963,7 +8963,7 @@ static int process_audio_for_gop(tav_encoder_t *enc, int *frame_numbers, int num
return 1; return 1;
} }
// Handle first frame initialization (same as process_audio) // Handle first frame initialisation (same as process_audio)
int first_frame_in_gop = frame_numbers[0]; int first_frame_in_gop = frame_numbers[0];
if (first_frame_in_gop == 0) { if (first_frame_in_gop == 0) {
uint8_t header[4]; uint8_t header[4];
@@ -9255,7 +9255,7 @@ static double calculate_shannon_entropy(const float *coeffs, int count) {
#define HIST_BINS 256 #define HIST_BINS 256
int histogram[HIST_BINS] = {0}; int histogram[HIST_BINS] = {0};
// Find min/max for normalization // Find min/max for normalisation
float min_val = FLT_MAX, max_val = -FLT_MAX; float min_val = FLT_MAX, max_val = -FLT_MAX;
for (int i = 0; i < count; i++) { for (int i = 0; i < count; i++) {
float abs_val = fabsf(coeffs[i]); float abs_val = fabsf(coeffs[i]);
@@ -9325,7 +9325,7 @@ static void compute_frame_metrics(const float *dwt_current, const float *dwt_pre
frame_analysis_t *metrics) { frame_analysis_t *metrics) {
int num_pixels = width * height; int num_pixels = width * height;
// Initialize metrics // Initialise metrics
memset(metrics, 0, sizeof(frame_analysis_t)); memset(metrics, 0, sizeof(frame_analysis_t));
// Extract LL band (approximation coefficients) // Extract LL band (approximation coefficients)
@@ -9438,16 +9438,16 @@ static int detect_scene_change_wavelet(int frame_number,
} }
// Detection rule 1: Hard cut or fast fade (LL_diff spike) // Detection rule 1: Hard cut or fast fade (LL_diff spike)
// Improvement: Normalize LL_diff by LL_mean to handle exposure/lighting changes // Improvement: Normalise LL_diff by LL_mean to handle exposure/lighting changes
double normalized_ll_diff = current_metrics->ll_mean > 1.0 ? double normalised_ll_diff = current_metrics->ll_mean > 1.0 ?
current_metrics->ll_diff / current_metrics->ll_mean : current_metrics->ll_diff; current_metrics->ll_diff / current_metrics->ll_mean : current_metrics->ll_diff;
double normalized_threshold = current_metrics->ll_mean > 1.0 ? double normalised_threshold = current_metrics->ll_mean > 1.0 ?
ll_diff_threshold / current_metrics->ll_mean : ll_diff_threshold; ll_diff_threshold / current_metrics->ll_mean : ll_diff_threshold;
if (normalized_ll_diff > normalized_threshold) { if (normalised_ll_diff > normalised_threshold) {
if (verbose) { if (verbose) {
printf(" Scene change detected frame %d: Normalized LL_diff=%.4f > threshold=%.4f (raw: %.2f > %.2f)\n", printf(" Scene change detected frame %d: Normalised LL_diff=%.4f > threshold=%.4f (raw: %.2f > %.2f)\n",
frame_number + 1, normalized_ll_diff, normalized_threshold, frame_number + 1, normalised_ll_diff, normalised_threshold,
current_metrics->ll_diff, ll_diff_threshold); current_metrics->ll_diff, ll_diff_threshold);
} }
return 1; return 1;
@@ -9457,7 +9457,7 @@ static int detect_scene_change_wavelet(int frame_number,
// Improvement: Require temporal persistence only for borderline detections // Improvement: Require temporal persistence only for borderline detections
double hb_ratio_threshold = ANALYSIS_HB_RATIO_THRESHOLD; double hb_ratio_threshold = ANALYSIS_HB_RATIO_THRESHOLD;
// Calculate average highband energy from history (normalized by total energy for RMS-like measure) // Calculate average highband energy from history (normalised by total energy for RMS-like measure)
double hb_energy_sum = 0.0; double hb_energy_sum = 0.0;
for (int i = start_idx; i < history_count; i++) { for (int i = start_idx; i < history_count; i++) {
hb_energy_sum += metrics_history[i].highband_energy; hb_energy_sum += metrics_history[i].highband_energy;
@@ -9884,7 +9884,7 @@ int main(int argc, char *argv[]) {
{"dimension", required_argument, 0, 's'}, {"dimension", required_argument, 0, 's'},
{"fps", required_argument, 0, 'f'}, {"fps", required_argument, 0, 'f'},
{"quality", required_argument, 0, 'q'}, {"quality", required_argument, 0, 'q'},
{"quantizer", required_argument, 0, 'Q'}, {"quantiser", required_argument, 0, 'Q'},
{"quantiser", required_argument, 0, 'Q'}, {"quantiser", required_argument, 0, 'Q'},
{"wavelet", required_argument, 0, 1010}, {"wavelet", required_argument, 0, 1010},
{"channel-layout", required_argument, 0, 'c'}, {"channel-layout", required_argument, 0, 'c'},
@@ -10371,7 +10371,7 @@ int main(int argc, char *argv[]) {
return 1; return 1;
} }
// Initialize GOP boundary iterator for second pass // Initialise GOP boundary iterator for second pass
enc->current_gop_boundary = enc->gop_boundaries; enc->current_gop_boundary = enc->gop_boundaries;
enc->two_pass_current_frame = 0; enc->two_pass_current_frame = 0;

View File

@@ -458,11 +458,11 @@ static void colour_space_to_rgb(tev_encoder_t *enc, double c1, double c2, double
// Pre-calculated cosine tables // Pre-calculated cosine tables
static float dct_table_16[16][16]; // For 16x16 DCT static float dct_table_16[16][16]; // For 16x16 DCT
static float dct_table_8[8][8]; // For 8x8 DCT static float dct_table_8[8][8]; // For 8x8 DCT
static int tables_initialized = 0; static int tables_initialised = 0;
// Initialize the pre-calculated tables // Initialise the pre-calculated tables
static void init_dct_tables(void) { static void init_dct_tables(void) {
if (tables_initialized) return; if (tables_initialised) return;
// Pre-calculate cosine values for 16x16 DCT // Pre-calculate cosine values for 16x16 DCT
for (int u = 0; u < 16; u++) { for (int u = 0; u < 16; u++) {
@@ -478,7 +478,7 @@ static void init_dct_tables(void) {
} }
} }
tables_initialized = 1; tables_initialised = 1;
} }
// 16x16 2D DCT // 16x16 2D DCT
@@ -486,7 +486,7 @@ static void init_dct_tables(void) {
static float temp_dct_16[BLOCK_SIZE_SQR]; // Reusable temporary buffer static float temp_dct_16[BLOCK_SIZE_SQR]; // Reusable temporary buffer
static void dct_16x16_fast(float *input, float *output) { static void dct_16x16_fast(float *input, float *output) {
init_dct_tables(); // Ensure tables are initialized init_dct_tables(); // Ensure tables are initialised
// First pass: Process rows (16 1D DCTs) // First pass: Process rows (16 1D DCTs)
for (int row = 0; row < 16; row++) { for (int row = 0; row < 16; row++) {
@@ -521,7 +521,7 @@ static void dct_16x16_fast(float *input, float *output) {
static float temp_dct_8[HALF_BLOCK_SIZE_SQR]; // Reusable temporary buffer static float temp_dct_8[HALF_BLOCK_SIZE_SQR]; // Reusable temporary buffer
static void dct_8x8_fast(float *input, float *output) { static void dct_8x8_fast(float *input, float *output) {
init_dct_tables(); // Ensure tables are initialized init_dct_tables(); // Ensure tables are initialised
// First pass: Process rows (8 1D DCTs) // First pass: Process rows (8 1D DCTs)
for (int row = 0; row < 8; row++) { for (int row = 0; row < 8; row++) {
@@ -770,11 +770,11 @@ static float complexity_to_rate_factor(float complexity) {
float log_median = logf(median_complexity + 1.0f); float log_median = logf(median_complexity + 1.0f);
float log_high = logf(high_complexity + 1.0f); float log_high = logf(high_complexity + 1.0f);
// Normalize: 0 = median complexity, 1 = high complexity threshold // Normalise: 0 = median complexity, 1 = high complexity threshold
float normalized = (log_complexity - log_median) / (log_high - log_median); float normalised = (log_complexity - log_median) / (log_high - log_median);
// Sigmoid centered at median: f(0) ≈ 1.0, f(1) ≈ 1.6, f(-∞) ≈ 0.7 // Sigmoid centered at median: f(0) ≈ 1.0, f(1) ≈ 1.6, f(-∞) ≈ 0.7
float sigmoid = 1.0f / (1.0f + expf(-4.0f * normalized)); float sigmoid = 1.0f / (1.0f + expf(-4.0f * normalised));
float rate_factor = 0.7f + 0.9f * sigmoid; // Range: 0.7 to 1.6 float rate_factor = 0.7f + 0.9f * sigmoid; // Range: 0.7 to 1.6
// Clamp to prevent extreme coefficient amplification/reduction // Clamp to prevent extreme coefficient amplification/reduction
@@ -787,7 +787,7 @@ static float complexity_to_rate_factor(float complexity) {
static void add_complexity_value(tev_encoder_t *enc, float complexity) { static void add_complexity_value(tev_encoder_t *enc, float complexity) {
if (!enc->stats_mode) return; if (!enc->stats_mode) return;
// Initialize array if needed // Initialise array if needed
if (!enc->complexity_values) { if (!enc->complexity_values) {
enc->complexity_capacity = 10000; // Initial capacity enc->complexity_capacity = 10000; // Initial capacity
enc->complexity_values = malloc(enc->complexity_capacity * sizeof(float)); enc->complexity_values = malloc(enc->complexity_capacity * sizeof(float));
@@ -1416,7 +1416,7 @@ static subtitle_entry_t* parse_srt_file(const char *filename, int fps) {
continue; continue;
} }
// Initialize text buffer // Initialise text buffer
text_buffer_size = 256; text_buffer_size = 256;
text_buffer = malloc(text_buffer_size); text_buffer = malloc(text_buffer_size);
if (!text_buffer) { if (!text_buffer) {
@@ -1917,7 +1917,7 @@ static int write_all_subtitles_tc(tev_encoder_t *enc, FILE *output) {
return bytes_written; return bytes_written;
} }
// Initialize encoder // Initialise encoder
static tev_encoder_t* init_encoder(void) { static tev_encoder_t* init_encoder(void) {
tev_encoder_t *enc = calloc(1, sizeof(tev_encoder_t)); tev_encoder_t *enc = calloc(1, sizeof(tev_encoder_t));
if (!enc) return NULL; if (!enc) return NULL;
@@ -1997,10 +1997,10 @@ static int alloc_encoder_buffers(tev_encoder_t *enc) {
return -1; return -1;
} }
// Initialize Zstd compression context // Initialise Zstd compression context
enc->zstd_context = ZSTD_createCCtx(); enc->zstd_context = ZSTD_createCCtx();
if (!enc->zstd_context) { if (!enc->zstd_context) {
fprintf(stderr, "Failed to initialize Zstd compression\n"); fprintf(stderr, "Failed to initialise Zstd compression\n");
return 0; return 0;
} }
@@ -2009,7 +2009,7 @@ static int alloc_encoder_buffers(tev_encoder_t *enc) {
ZSTD_CCtx_setParameter(enc->zstd_context, ZSTD_c_windowLog, 24); // 16MB window (should be plenty to hold an entire frame; interframe compression is unavailable) ZSTD_CCtx_setParameter(enc->zstd_context, ZSTD_c_windowLog, 24); // 16MB window (should be plenty to hold an entire frame; interframe compression is unavailable)
ZSTD_CCtx_setParameter(enc->zstd_context, ZSTD_c_hashLog, 16); ZSTD_CCtx_setParameter(enc->zstd_context, ZSTD_c_hashLog, 16);
// Initialize previous frame to black // Initialise previous frame to black
memset(enc->previous_rgb, 0, encoding_pixels * 3); memset(enc->previous_rgb, 0, encoding_pixels * 3);
memset(enc->previous_even_field, 0, encoding_pixels * 3); memset(enc->previous_even_field, 0, encoding_pixels * 3);
@@ -2467,7 +2467,7 @@ static int process_audio(tev_encoder_t *enc, int frame_num, FILE *output) {
return 1; return 1;
} }
// Initialize packet size on first frame // Initialise packet size on first frame
if (enc->mp2_packet_size == 0) { if (enc->mp2_packet_size == 0) {
uint8_t header[4]; uint8_t header[4];
if (fread(header, 1, 4, enc->mp2_file) != 4) return 1; if (fread(header, 1, 4, enc->mp2_file) != 4) return 1;
@@ -2665,7 +2665,7 @@ int main(int argc, char *argv[]) {
{"fps", required_argument, 0, 'f'}, {"fps", required_argument, 0, 'f'},
{"quality", required_argument, 0, 'q'}, {"quality", required_argument, 0, 'q'},
{"quantiser", required_argument, 0, 'Q'}, {"quantiser", required_argument, 0, 'Q'},
{"quantizer", required_argument, 0, 'Q'}, {"quantiser", required_argument, 0, 'Q'},
{"bitrate", required_argument, 0, 'b'}, {"bitrate", required_argument, 0, 'b'},
{"arate", required_argument, 0, 1400}, {"arate", required_argument, 0, 1400},
{"progressive", no_argument, 0, 'p'}, {"progressive", no_argument, 0, 'p'},
@@ -2793,7 +2793,7 @@ int main(int argc, char *argv[]) {
if (enc->ictcp_mode) { if (enc->ictcp_mode) {
// ICtCp: Ct and Cp have different characteristics than YCoCg Co/Cg // ICtCp: Ct and Cp have different characteristics than YCoCg Co/Cg
// Cp channel now uses specialized quantisation table, so moderate quality is fine // Cp channel now uses specialised quantisation table, so moderate quality is fine
int base_chroma_quality = enc->qualityCo; int base_chroma_quality = enc->qualityCo;
enc->qualityCo = base_chroma_quality; // Ct channel: keep original Co quantisation enc->qualityCo = base_chroma_quality; // Ct channel: keep original Co quantisation
enc->qualityCg = base_chroma_quality; // Cp channel: same quality since Q_Cp_8 handles detail preservation enc->qualityCg = base_chroma_quality; // Cp channel: same quality since Q_Cp_8 handles detail preservation

View File

@@ -21,7 +21,7 @@ static inline uint8_t range_decoder_get_byte(RangeDecoder *dec) {
return 0; return 0;
} }
static void range_encoder_renormalize(RangeEncoder *enc) { static void range_encoder_renormalise(RangeEncoder *enc) {
while (enc->range <= BOTTOM_VALUE) { while (enc->range <= BOTTOM_VALUE) {
range_encoder_put_byte(enc, (enc->low >> 24) & 0xFF); range_encoder_put_byte(enc, (enc->low >> 24) & 0xFF);
enc->low <<= 8; enc->low <<= 8;
@@ -29,7 +29,7 @@ static void range_encoder_renormalize(RangeEncoder *enc) {
} }
} }
static void range_decoder_renormalize(RangeDecoder *dec) { static void range_decoder_renormalise(RangeDecoder *dec) {
while (dec->range <= BOTTOM_VALUE) { while (dec->range <= BOTTOM_VALUE) {
dec->code = (dec->code << 8) | range_decoder_get_byte(dec); dec->code = (dec->code << 8) | range_decoder_get_byte(dec);
dec->low <<= 8; dec->low <<= 8;
@@ -66,7 +66,7 @@ void range_encode_int16_laplacian(RangeEncoder *enc, int16_t value, int16_t max_
double cdf_low = (value == -max_abs_value) ? 0.0 : laplacian_cdf(value - 1, lambda); double cdf_low = (value == -max_abs_value) ? 0.0 : laplacian_cdf(value - 1, lambda);
double cdf_high = laplacian_cdf(value, lambda); double cdf_high = laplacian_cdf(value, lambda);
// Normalize to get cumulative counts in range [0, SCALE] // Normalise to get cumulative counts in range [0, SCALE]
const uint32_t SCALE = 0x10000; // 65536 for precision const uint32_t SCALE = 0x10000; // 65536 for precision
uint32_t cum_low = (uint32_t)(cdf_low * SCALE); uint32_t cum_low = (uint32_t)(cdf_low * SCALE);
uint32_t cum_high = (uint32_t)(cdf_high * SCALE); uint32_t cum_high = (uint32_t)(cdf_high * SCALE);
@@ -80,7 +80,7 @@ void range_encode_int16_laplacian(RangeEncoder *enc, int16_t value, int16_t max_
enc->low += (uint32_t)((range_64 * cum_low) / SCALE); enc->low += (uint32_t)((range_64 * cum_low) / SCALE);
enc->range = (uint32_t)((range_64 * (cum_high - cum_low)) / SCALE); enc->range = (uint32_t)((range_64 * (cum_high - cum_low)) / SCALE);
range_encoder_renormalize(enc); range_encoder_renormalise(enc);
} }
size_t range_encoder_finish(RangeEncoder *enc) { size_t range_encoder_finish(RangeEncoder *enc) {
@@ -137,7 +137,7 @@ int16_t range_decode_int16_laplacian(RangeDecoder *dec, int16_t max_abs_value, f
dec->low += (uint32_t)((range_64 * cum_low) / SCALE); dec->low += (uint32_t)((range_64 * cum_low) / SCALE);
dec->range = (uint32_t)((range_64 * (cum_high - cum_low)) / SCALE); dec->range = (uint32_t)((range_64 * (cum_high - cum_low)) / SCALE);
range_decoder_renormalize(dec); range_decoder_renormalise(dec);
return value; return value;
} else if (cum_freq < cum_low) { } else if (cum_freq < cum_low) {
high = mid - 1; high = mid - 1;
@@ -147,6 +147,6 @@ int16_t range_decode_int16_laplacian(RangeDecoder *dec, int16_t max_abs_value, f
} }
// Fallback: shouldn't happen with correct encoding // Fallback: shouldn't happen with correct encoding
range_decoder_renormalize(dec); range_decoder_renormalise(dec);
return value; return value;
} }

View File

@@ -24,16 +24,16 @@ typedef struct {
size_t buffer_size; size_t buffer_size;
} RangeDecoder; } RangeDecoder;
// Initialize encoder // Initialise encoder
void range_encoder_init(RangeEncoder *enc, uint8_t *buffer, size_t capacity); void range_encoder_init(RangeEncoder *enc, uint8_t *buffer, size_t capacity);
// Encode a signed 16-bit value with Laplacian distribution (λ=5.0, μ=0) // Encode a signed 16-bit value with Laplacian distribution (λ=5.0, μ=0)
void range_encode_int16_laplacian(RangeEncoder *enc, int16_t value, int16_t max_abs_value, float lambda); void range_encode_int16_laplacian(RangeEncoder *enc, int16_t value, int16_t max_abs_value, float lambda);
// Finalize encoding and return bytes written // Finalise encoding and return bytes written
size_t range_encoder_finish(RangeEncoder *enc); size_t range_encoder_finish(RangeEncoder *enc);
// Initialize decoder // Initialise decoder
void range_decoder_init(RangeDecoder *dec, const uint8_t *buffer, size_t size); void range_decoder_init(RangeDecoder *dec, const uint8_t *buffer, size_t size);
// Decode a signed 16-bit value with Laplacian distribution (λ=5.0, μ=0) // Decode a signed 16-bit value with Laplacian distribution (λ=5.0, μ=0)

View File

@@ -531,7 +531,7 @@ static const char* VERDESC[] = {"null", "YCoCg tiled, uniform", "ICtCp tiled, un
if (wavelet == 255) printf(" (Haar)"); if (wavelet == 255) printf(" (Haar)");
printf("\n"); printf("\n");
printf(" Decomp levels: %d\n", decomp_levels); printf(" Decomp levels: %d\n", decomp_levels);
printf(" Quantizers: Y=%d, Co=%d, Cg=%d (Index=%d,%d,%d)\n", QLUT[quant_y], QLUT[quant_co], QLUT[quant_cg], quant_y, quant_co, quant_cg); printf(" Quantisers: Y=%d, Co=%d, Cg=%d (Index=%d,%d,%d)\n", QLUT[quant_y], QLUT[quant_co], QLUT[quant_cg], quant_y, quant_co, quant_cg);
if (quality > 0) if (quality > 0)
printf(" Quality: %d\n", quality - 1); printf(" Quality: %d\n", quality - 1);
else else

View File

@@ -270,7 +270,7 @@ int main(int argc, char** argv) {
avg_motion /= (mesh_w * mesh_h); avg_motion /= (mesh_w * mesh_h);
printf(" Motion: avg=%.2f px, max=%.2f px\n\n", avg_motion, max_motion); printf(" Motion: avg=%.2f px, max=%.2f px\n\n", avg_motion, max_motion);
// Save visualization for worst case // Save visualisation for worst case
if (test == 0 || roundtrip_psnr < 30.0) { if (test == 0 || roundtrip_psnr < 30.0) {
char filename[256]; char filename[256];
sprintf(filename, "roundtrip_%04d_original.png", frame_num); sprintf(filename, "roundtrip_%04d_original.png", frame_num);
@@ -293,7 +293,7 @@ int main(int argc, char** argv) {
} }
sprintf(filename, "roundtrip_%04d_diff.png", frame_num); sprintf(filename, "roundtrip_%04d_diff.png", frame_num);
cv::imwrite(filename, diff_roundtrip); cv::imwrite(filename, diff_roundtrip);
printf(" Saved visualization: roundtrip_%04d_*.png\n\n", frame_num); printf(" Saved visualisation: roundtrip_%04d_*.png\n\n", frame_num);
} }
free(flow_x); free(flow_x);

View File

@@ -158,7 +158,7 @@ static void apply_mesh_warp_rgb(
} }
} }
// Create visualization overlay showing affine cells // Create visualisation overlay showing affine cells
static void create_affine_overlay( static void create_affine_overlay(
cv::Mat &img, cv::Mat &img,
const uint8_t *affine_mask, const uint8_t *affine_mask,
@@ -334,7 +334,7 @@ int main(int argc, char** argv) {
affine_mask, affine_a11, affine_a12, affine_a21, affine_a22, affine_mask, affine_a11, affine_a12, affine_a21, affine_a22,
mesh_w, mesh_h); mesh_w, mesh_h);
// Create visualization with affine overlay // Create visualisation with affine overlay
cv::Mat warped_viz = warped.clone(); cv::Mat warped_viz = warped.clone();
create_affine_overlay(warped_viz, affine_mask, mesh_w, mesh_h); create_affine_overlay(warped_viz, affine_mask, mesh_w, mesh_h);