112x112 blocks for TAV, which greatly improves the encoding speed

2026-06-11 15:24:05 +09:00 · 2025-09-15 19:08:46 +09:00
parent 1343dd10cf
commit 113c01b851
4 changed files with 816 additions and 83 deletions
--- a/terranmon.txt
+++ b/terranmon.txt
@@ -807,6 +807,7 @@ transmission capability, and region-of-interest coding.
 - Version 1.0: Initial DWT-based implementation with 5/3 reversible filter
 - Version 1.1: Added 9/7 irreversible filter for higher compression
 - Version 1.2: Multi-resolution pyramid encoding with up to 4 decomposition levels
+- Version 1.3: Optimized 112x112 tiles for TSVM resolution with up to 6 decomposition levels

 # File Structure
 \x1F T S V M T A V
@@ -852,7 +853,7 @@ transmission capability, and region-of-interest coding.
    uint32 Compressed Size
    *      Zstd-compressed Block Data

-## Block Data (per 64x64 tile)
+## Block Data (per 112x112 tile)
    uint8  Mode: encoding mode
           0x00 = SKIP (copy from previous frame)
           0x01 = INTRA (DWT-coded, no prediction)
@@ -885,10 +886,12 @@ transmission capability, and region-of-interest coding.
  * Provides better energy compaction than 5/3 but lossy reconstruction

 ### Decomposition Levels
- Level 1: 64x64 → 32x32 (LL) + 3×32x32 subbands (LH,HL,HH)
- Level 2: 32x32 → 16x16 (LL) + 3×16x16 subbands  
- Level 3: 16x16 → 8x8 (LL) + 3×8x8 subbands
- Level 4: 8x8 → 4x4 (LL) + 3×4x4 subbands
+- Level 1: 112x112 → 56x56 (LL) + 3×56x56 subbands (LH,HL,HH)
+- Level 2: 56x56 → 28x28 (LL) + 3×28x28 subbands  
+- Level 3: 28x28 → 14x14 (LL) + 3×14x14 subbands
+- Level 4: 14x14 → 7x7 (LL) + 3×7x7 subbands
+- Level 5: 7x7 → 3x3 (LL) + 3×3x3 subbands
+- Level 6: 3x3 → 1x1 (LL) + 3×1x1 subbands (maximum)

 ### Quantization Strategy
 TAV uses different quantization steps for each subband based on human visual
@@ -904,9 +907,11 @@ When enabled, coefficients are transmitted in order of visual importance:
 3. Higher frequency subbands for refinement

 ## Motion Compensation
- Search range: ±16 pixels (larger than TEV due to 64x64 tiles)
+- Search range: ±28 pixels (optimized for 112x112 tiles)
 - Sub-pixel precision: 1/4 pixel with bilinear interpolation
- Tile size: 64x64 pixels (4x larger than TEV blocks)
+- Tile size: 112x112 pixels (perfect fit for TSVM 560x448 resolution)
+  * Exactly 5×4 = 20 tiles per frame (560÷112 = 5, 448÷112 = 4)
+  * No partial tiles needed - optimal for processing efficiency
 - Uses Sum of Absolute Differences (SAD) for motion estimation
 - Overlapped block motion compensation (OBMC) for smooth boundaries

@@ -917,7 +922,7 @@ TAV operates in YCoCg-R colour space with full resolution channels:
 - Cg: Green-Magenta chroma (full resolution, very aggressive quantization by default)

 ## Compression Features
- 64x64 DWT tiles vs 16x16 DCT blocks in TEV
+- 112x112 DWT tiles vs 16x16 DCT blocks in TEV
 - Multi-resolution representation enables scalable decoding
 - Better frequency localization than DCT
 - Reduced blocking artifacts due to overlapping basis functions