tsvm/video_encoder/TAV_README.md

# TAV - TSVM Advanced Video Codec

A perceptually-optimised wavelet-based video codec designed for resource-constrained systems, featuring multiple wavelet types, temporal 3D DWT, and sophisticated compression techniques.

## Overview

TAV (TSVM Advanced Video) is a modern video codec built on discrete wavelet transformation (DWT). It combines cutting-edge compression techniques with careful optimisation for resource-constrained systems.

### Key Advantages

- **No blocking artefacts**: Large-tile DWT encoding with padding eliminates DCT block boundaries
- **No colour banding**: Wavelets spreads gradients across scales, preventing banding in the first place
- **Perceptual optimisation**: HVS-aware quantisation preserves visual quality where it matters
- **Temporal coherence**: 3D DWT with GOP encoding exploits inter-frame similarity
- **Efficient sparse coding**: EZBC encoding exploits coefficient sparsity for 16-18% additional compression
- **Hardware-friendly**: Designed for efficient decoding on resource-constrained platforms

## Features

### Compression Technology

- **Wavelet Types**
  - **5/3 Reversible** (JPEG 2000 standard): Lossless-capable, good for archival
  - **9/7 Irreversible** (default): Best overall compression, CDF 9/7 variant

- **Spatial Encoding**
  - Large-tile encoding with padding, with optional single-tile mode (no blocking artefacts)
  - 6-level DWT decomposition for deep frequency analysis
  - Perceptual quantisation with HVS-optimised coefficient scaling
  - YCoCg-R colour space with anisotropic chroma quantisation

- **Temporal Encoding** (3D DWT Mode)
  - Group-of-pictures (GOP) encoding with adaptive size (typically 20 frames)
  - Unified EZBC encoding across temporal dimension
  - Adaptive GOP boundaries with scene change detection

- **EZBC Encoding**
  - Binary tree embedded zero block coding exploits coefficient sparsity
  - Progressive refinement structure with bitplane encoding
  - Concatenated channel layout for cross-channel compression optimisation
  - Typical sparsity: 86.9% (Y), 97.8% (Co), 99.5% (Cg)
  - 16-18% compression improvement over naive coefficient encoding

### Audio Integration

TAV seamlessly integrates with the TAD (TSVM Advanced Audio) codec for synchronised audio/video encoding:
- Variable chunk sizes match video GOP boundaries
- Embedded TAD packets (type 0x24) with Zstd compression
- Unified container format

## Building

### Prerequisites

- C compiler (GCC/Clang)
- Zstandard library
- OpenCV 4 library (only used by experimental motion estimation feature)

### Compilation

```bash
# Build TAV encoder/decoder
make tav

# Build all tools including TAD audio codec
make all

# Clean build artefacts
make clean
```

### Build Targets

- `encoder_tav` - Main video encoder
- `decoder_tav` - Standalone video decoder
- `tav_inspector` - Packet analysis and debugging tool

## Usage

### Basic Encoding

Encoding requires FFmpeg executable installed in your system.

```bash
# Default encoding (CDF 9/7 wavelet, quality level 3)
./encoder_tav -i input.mp4 -o output.tav

# Quality levels (0-5)
./encoder_tav -i input.avi -q 0 -o output.tav    # Lowest quality, smallest file
./encoder_tav -i input.mkv -q 5 -o output.tav    # Highest quality, largest file
```

### Intra-only Encoding

```bash
# Enable Intra-only encoding
./encoder_tav -i input.mp4 --intra-only -o output.tav
```

### Decoding and Inspection

```bash
# Decode TAV to raw video
./decoder_tav -i input.tav -o output.mkv

# Inspect packet structure (debugging)
./tav_inspector input.tav -v
```

### Frame Limiting

```bash
# Encode only first N frames (useful for testing)
./encoder_tav -i input.mp4 -o output.tav --encode-limit 100
```

## Technical Architecture

### Encoder Pipeline

1. **Input Processing**
   - FFmpeg demuxing and frame extraction
   - RGB to YCoCg-R colour space conversion
   - Resolution validation and padding

2. **DWT Transform**
   - Spatial: 6-level decomposition per frame
   - Temporal: 1D DWT across GOP frames (3D DWT mode)
   - Lifting scheme implementation for all wavelets

3. **Perceptual Quantisation**
   - HVS-based subband weights
   - Anisotropic chroma quantisation (YCoCg-R specific)
   - Quality-dependent quantisation matrices

4. **EZBC Encoding**
   - Binary tree embedded zero block coding per channel
   - Progressive refinement by bitplanes
   - Concatenated bitstream layout: `[Y_bitstream][Co_bitstream][Cg_bitstream]`
   - Cross-channel compression optimisation

5. **Entropy Coding**
   - Zstandard compression (level 7) on concatenated EZBC bitstreams
   - Cross-channel compression opportunities
   - Adaptive compression based on GOP structure

### Decoder Pipeline

1. **Container Parsing**
   - Packet type identification (0x00-0xFF)
   - Timecode synchronisation
   - GOP boundary detection

2. **Entropy Decoding**
   - Zstd decompression of concatenated bitstreams
   - EZBC binary tree decoding per channel
   - Progressive coefficient reconstruction

3. **Inverse Quantisation**
   - Perceptual weight application
   - Subband-specific scaling
   - Coefficient reconstruction from sparse representation

4. **Inverse DWT**
   - Temporal: 1D inverse DWT across frames (3D DWT mode)
   - Spatial: 6-level inverse wavelet reconstruction

5. **Output Conversion**
   - YCoCg-R to RGB colour space
   - Clamping and dithering
   - Frame buffering for display

### Wavelet Implementation

All wavelets follow a **lifting scheme** pattern with symmetric boundary extension:

```c
// Forward Transform: Predict → Update
temp[half + i] = data[odd] - predict(data[even]);  // High-pass
temp[i] = data[even] + update(temp[half]);         // Low-pass

// Inverse Transform: Undo Update → Undo Predict (reversed order)
data[even] = temp[i] - update(temp[half]);         // Undo low-pass
data[odd] = temp[half + i] + predict(data[even]);  // Undo high-pass
```

**Critical**: Forward and inverse transforms must use identical coefficient indexing and exactly reverse operations to avoid grid artefacts.

### Coefficient Layout

TAV uses **2D Spatial Layout** in memory for each decomposition level:

```
[LL] [LH] [HL] [HH] [LH] [HL] [HH] ...
 └── Level 0 ──┘ └─── Level 1 ───┘
```

- `LL`: Low-pass (approximation) - progressively smaller with each level
- `LH`, `HL`, `HH`: High-pass subbands (horizontal, vertical, diagonal detail)

## Performance Characteristics

### Compression Efficiency

- **Sparsity Exploitation**: Typical quantised coefficient sparsity
  - Y channel: 86.9% zeros
  - Co channel: 97.8% zeros
  - Cg channel: 99.5% zeros

- **EZBC Benefits**: 16-18% compression improvement over naive coefficient encoding through sparsity exploitation

- **Temporal Coherence**: Additional 15-25% improvement with 3D DWT (content-dependent)

### Computational Complexity

- **Encoding**: O(n log n) per frame for spatial DWT
- **Decoding**: O(n log n) per frame, optimised lifting scheme implementation
- **Memory**: Single-tile encoding requires O(w × h) working memory

### Quality Characteristics

- **No blocking artefacts**: Wavelet-based encoding is inherently smooth
- **Perceptual optimisation**: Better subjective quality than bitrate-equivalent DCT codecs
- **Scalability**: 6 quality levels (0-5) provide wide range of bitrate/quality trade-offs
- **Temporal stability**: 3D DWT mode reduces flickering and temporal artefacts

## Format Specification

For complete packet structure and bitstream format details, refer to `format documentation.txt`.

### Key Packet Types

- `0x00`: Metadata and initialisation
- `0x01`: I-frame (intra-coded frame)
- `0x12`: GOP unified packet (3D DWT mode)
- `0x24`: Embedded TAD audio
- `0xFC`: GOP synchronisation
- `0xFD`: Timecode

## Debugging Tools

### TAV Inspector

Analyse TAV packet structure and decode individual frames:

```bash
# Verbose packet analysis
./tav_inspector input.tav -v

# Extract specific frame ranges
./tav_inspector input.tav --frame-range 100-200
```

## Related Projects

- **TAD** (TSVM Advanced Audio): Perceptual audio codec using CDF 9/7 wavelets
- **TSVM**: Target virtual machine platform for TAV playback

## Licence

MIT.