TAV: some more mocomp shit

This commit is contained in:
minjaesong
2025-10-18 05:47:17 +09:00
parent 3b9e02b17f
commit 120058be6d
8 changed files with 2526 additions and 161 deletions

View File

@@ -1054,6 +1054,8 @@ transmission capability, and region-of-interest coding.
## GOP Unified Packet Structure (0x12)
Implemented on 2025-10-15 for temporal 3D DWT with unified preprocessing.
Updated on 2025-10-17 to include canvas expansion margins.
This packet contains multiple frames encoded as a single spacetime block for optimal
temporal compression.
@@ -1077,18 +1079,40 @@ The entire GOP (width×height×N_frames×3_channels) is preprocessed as a single
This layout enables Zstd to find patterns across both spatial and temporal dimensions,
resulting in superior compression compared to per-frame encoding.
### Canvas Expansion for Motion Compensation
When frames in a GOP have camera motion, they must be aligned before temporal DWT.
However, alignment creates "gaps" at frame edges. To preserve ALL original pixels:
1. **Calculate motion range**: Determine the total shift range across all GOP frames
- Example: If frames shift by ±3 pixels horizontally, total range = 6 pixels
2. **Expand canvas**: Create a larger canvas = original_size + margin
- Canvas width = header.width + margin_left + margin_right
- Canvas height = header.height + margin_top + margin_bottom
3. **Place aligned frames**: Each frame is positioned on the expanded canvas
- All original pixels from all frames are preserved
- No artificial padding or cropping occurs
4. **Encode expanded canvas**: Apply 3D DWT to the larger canvas dimensions
5. **Store margins**: 4 bytes (L/R/T/B) tell decoder the canvas expansion
6. **Decoder extraction**: Decoder extracts display region for each frame based on
motion vectors and margins
This approach ensures lossless preservation of original video content during GOP encoding.
### Motion Vectors
- Stored in quarter-pixel units (divide by 4.0 for pixel displacement)
- Stored in 1/16-pixel units (divide by 16.0 for pixel displacement)
- Used for global motion compensation (camera movement, scene translation)
- Computed using FFT-based phase correlation for accurate frame alignment
- First frame (frame 0) typically has motion vector (0, 0)
- Cumulative relative to frame 0 (not frame-to-frame deltas)
- First frame (frame 0) always has motion vector (0, 0)
### Temporal 3D DWT Process
1. Apply 1D DWT across temporal axis (GOP frames)
2. Apply 2D DWT on each spatial slice of temporal subbands
3. Perceptual quantization with temporal-spatial awareness
4. Unified significance map preprocessing across all frames/channels
5. Single Zstd compression of entire GOP block
1. Detect inter-frame motion using phase correlation
2. Align frames and expand canvas to preserve all original pixels
3. Apply 1D DWT across temporal axis (GOP frames) on expanded canvas
4. Apply 2D DWT on each spatial slice of temporal subbands
5. Perceptual quantization with temporal-spatial awareness
6. Unified significance map preprocessing across all frames/channels
7. Single Zstd compression of entire GOP block
## GOP Sync Packet Structure (0xFC)
Indicates that N frames were decoded from a GOP Unified block.