Files
tsvm/video_encoder/lib/libtavenc/README.md
2025-12-05 03:39:32 +09:00

10 KiB
Raw Blame History

libtavenc - TAV Video Encoder Library

libtavenc is a high-performance video encoding library implementing the TSVM Advanced Video (TAV) codec. It provides a clean C API for encoding RGB24 video frames using discrete wavelet transform (DWT) with perceptual quantization and GOP-based temporal compression.

Features

  • Multiple Wavelet Types: CDF 5/3, CDF 9/7, CDF 13/7, DD-4, Haar
  • 3D DWT GOP Encoding: Temporal + spatial wavelet compression
  • Perceptual Quantization: HVS-optimized coefficient scaling
  • EZBC Entropy Coding: Efficient coefficient compression with Zstd
  • Multi-threading: Internal thread pool for optimal performance
  • Color Spaces: YCoCg-R (default) and ICtCp (for HDR)
  • Quality Levels: 0-5 (0=lowest/smallest, 5=highest/largest)

Building

# Build static library
make lib/libtavenc.a

# Build with encoder CLI
make encoder_tav

# Install library and headers
make install-libs PREFIX=/usr/local

Quick Start

Basic Encoding

#include "tav_encoder_lib.h"
#include <stdio.h>

int main() {
    // Initialize encoder parameters
    tav_encoder_params_t params;
    tav_encoder_params_init(&params, 1920, 1080);

    // Configure encoding options
    params.fps_num = 60;
    params.fps_den = 1;
    params.wavelet_type = 1;        // CDF 9/7 (default)
    params.quality_y = 3;            // Quality level 3
    params.quality_co = 3;
    params.quality_cg = 3;
    params.enable_temporal_dwt = 1;  // Enable 3D GOP encoding
    params.gop_size = 0;             // Auto-calculate (typically 16-24)
    params.num_threads = 4;          // 4 worker threads

    // Create encoder context
    tav_encoder_context_t *ctx = tav_encoder_create(&params);
    if (!ctx) {
        fprintf(stderr, "Failed to create encoder\n");
        return -1;
    }

    // Get actual parameters (with auto-calculated values)
    tav_encoder_get_params(ctx, &params);
    printf("GOP size: %d frames\n", params.gop_size);

    // Encode frames
    uint8_t *rgb_frame = /* ... load RGB24 frame ... */;
    tav_encoder_packet_t *packet;

    for (int i = 0; i < num_frames; i++) {
        int result = tav_encoder_encode_frame(ctx, rgb_frame, i, &packet);

        if (result == 1) {
            // Packet ready (GOP completed)
            fwrite(packet->data, 1, packet->size, outfile);
            tav_encoder_free_packet(packet);
        }
        else if (result == 0) {
            // Frame buffered, waiting for GOP to fill
        }
        else {
            // Error
            fprintf(stderr, "Encoding error: %s\n", tav_encoder_get_error(ctx));
            break;
        }
    }

    // Flush remaining frames
    while (tav_encoder_flush(ctx, &packet) == 1) {
        fwrite(packet->data, 1, packet->size, outfile);
        tav_encoder_free_packet(packet);
    }

    // Cleanup
    tav_encoder_free(ctx);
    return 0;
}

Stateless GOP Encoding (Multi-threaded)

The library provides tav_encoder_encode_gop() for stateless GOP encoding, perfect for multi-threaded applications:

#include "tav_encoder_lib.h"
#include <pthread.h>

typedef struct {
    tav_encoder_params_t params;
    uint8_t **rgb_frames;
    int num_frames;
    int *frame_numbers;
    tav_encoder_packet_t *output_packet;
} gop_encode_job_t;

void *encode_gop_thread(void *arg) {
    gop_encode_job_t *job = (gop_encode_job_t *)arg;

    // Create thread-local encoder context
    tav_encoder_context_t *ctx = tav_encoder_create(&job->params);
    if (!ctx) {
        return NULL;
    }

    // Encode entire GOP at once (stateless, thread-safe)
    tav_encoder_encode_gop(ctx,
                           (const uint8_t **)job->rgb_frames,
                           job->num_frames,
                           job->frame_numbers,
                           &job->output_packet);

    tav_encoder_free(ctx);
    return NULL;
}

int main() {
    // Setup parameters
    tav_encoder_params_t params;
    tav_encoder_params_init(&params, 1920, 1080);
    params.enable_temporal_dwt = 1;
    params.gop_size = 24;

    // Create worker threads
    pthread_t threads[4];
    gop_encode_job_t jobs[4];

    for (int i = 0; i < 4; i++) {
        jobs[i].params = params;
        jobs[i].rgb_frames = /* ... load GOP frames ... */;
        jobs[i].num_frames = 24;
        jobs[i].frame_numbers = /* ... frame indices ... */;

        pthread_create(&threads[i], NULL, encode_gop_thread, &jobs[i]);
    }

    // Wait for completion
    for (int i = 0; i < 4; i++) {
        pthread_join(threads[i], NULL);

        // Write output packet
        if (jobs[i].output_packet) {
            fwrite(jobs[i].output_packet->data, 1,
                   jobs[i].output_packet->size, outfile);
            tav_encoder_free_packet(jobs[i].output_packet);
        }
    }

    return 0;
}

API Reference

Context Management

tav_encoder_create()

Creates encoder context with specified parameters. Allocates internal buffers and initializes thread pool if multi-threading enabled.

Returns: Encoder context or NULL on failure

tav_encoder_free()

Frees encoder context and all resources. Any unflushed GOP frames are lost.

tav_encoder_get_error()

Returns last error message string.

tav_encoder_get_params()

Gets encoder parameters with calculated values (e.g., auto-calculated GOP size, decomposition levels).

Frame Encoding

tav_encoder_encode_frame()

Encodes single RGB24 frame. Frames are buffered until GOP is full.

Parameters:

  • rgb_frame: RGB24 planar format [R...][G...][B...], width×height×3 bytes
  • frame_pts: Presentation timestamp (frame number or time)
  • packet: Output packet pointer (NULL if GOP not ready)

Returns:

  • 1: Packet ready (GOP completed)
  • 0: Frame buffered, waiting for more frames
  • -1: Error

tav_encoder_flush()

Flushes remaining buffered frames and encodes final GOP. Call at end of stream.

Returns:

  • 1: Packet ready
  • 0: No more packets
  • -1: Error

tav_encoder_encode_gop()

Stateless GOP encoding. Thread-safe with separate contexts.

Parameters:

  • rgb_frames: Array of RGB24 frames [frame][width×height×3]
  • num_frames: Number of frames in GOP (1-24)
  • frame_numbers: Frame indices for timecodes (can be NULL)
  • packet: Output packet pointer

Returns: 1 on success, -1 on error

Packet Management

tav_encoder_free_packet()

Frees packet returned by encoding functions.

Encoder Parameters

Video Dimensions

  • width, height: Frame dimensions (must be even)
  • fps_num, fps_den: Framerate (e.g., 60/1 for 60fps)

Wavelet Configuration

  • wavelet_type: Spatial wavelet
    • 0: CDF 5/3 (reversible, lossless-capable)
    • 1: CDF 9/7 (default, best compression)
    • 2: CDF 13/7 (experimental)
    • 16: DD-4 (four-point interpolating)
    • 255: Haar (demonstration)
  • temporal_wavelet: Temporal wavelet for 3D DWT
    • 0: Haar (default for sports/high motion)
    • 1: CDF 5/3 (smooth motion)
  • decomp_levels: Spatial DWT levels (0=auto, typically 6)
  • temporal_levels: Temporal DWT levels (0=auto, typically 2 for 8-frame GOPs)

Color Space

  • channel_layout:
    • 0: YCoCg-R (default, efficient chroma)
    • 1: ICtCp (for HDR/BT.2100 sources)
  • perceptual_tuning: 1=enable HVS perceptual quantization (default), 0=uniform

GOP Configuration

  • enable_temporal_dwt: 1=enable 3D DWT GOP encoding (default), 0=intra-only I-frames
  • gop_size: Frames per GOP (8, 16, or 24; 0=auto based on framerate)
  • enable_two_pass: 1=enable two-pass with scene change detection (default), 0=single-pass

Quality Control

  • quality_y: Luma quality (0-5, default: 3)
  • quality_co: Orange chrominance quality (0-5, default: 3)
  • quality_cg: Green chrominance quality (0-5, default: 3)
  • dead_zone_threshold: Dead-zone quantization (0=disabled, 1-10 typical)

Entropy Coding

  • entropy_coder:
    • 0: Twobitmap (default, fast)
    • 1: EZBC (better compression for high-quality)
  • zstd_level: Zstd compression level (3-22, default: 7)

Multi-threading

  • num_threads: Worker threads
    • 0: Single-threaded (default for CLI)
    • -1: Auto-detect CPU cores
    • 1-16: Explicit thread count

Encoder Presets

  • encoder_preset: Preset flags
    • 0x01: Sports mode (finer temporal quantization)
    • 0x02: Anime mode (disable grain)

TAV Packet Types

Output packets have type field indicating content:

  • 0x10: I-frame (intra-only, single frame)
  • 0x11: P-frame (delta from previous)
  • 0x12: GOP unified (3D DWT, multiple frames)
  • 0x24: TAD audio (DWT-based audio codec)
  • 0xF0: Loop point start
  • 0xFC: GOP sync (frame count marker)
  • 0xFD: Timecode metadata

Performance Notes

Threading Model

  • Library manages internal thread pool when num_threads > 0
  • GOP encoding is parallelized across worker threads
  • For CLI tools: use num_threads=0 (single-threaded) to avoid double-threading with external parallelism
  • For library integration: use num_threads=-1 or explicit count for optimal performance

Memory Usage

  • Each encoder context allocates:
    • GOP buffer: gop_size × width × height × 3 bytes (RGB frames)
    • DWT coefficients: ~width × height × 12 bytes per channel
    • Thread pool: num_threads × (GOP buffer + workspace)
  • Typical 1920×1080 encoder with GOP=24: ~180 MB per context

Encoding Speed

  • Single-threaded: 10-15 fps (1920×1080 on modern CPU)
  • Multi-threaded (4 threads): 30-40 fps
  • GOP size affects latency: larger GOP = higher latency, better compression

Integration with TAD Audio

TAV files typically include TAD-compressed audio. Link with both libraries:

#include "tav_encoder_lib.h"
#include "encoder_tad.h"

// Encode video frame
tav_encoder_encode_frame(video_ctx, rgb_frame, pts, &video_packet);

// Encode audio chunk (32kHz stereo, float samples)
tad32_encode_chunk(audio_ctx, pcm_samples, num_samples, &audio_data, &audio_size);

// Mux both into TAV file (interleave by frame PTS)

Error Handling

All functions return error codes and set error message accessible via tav_encoder_get_error():

if (tav_encoder_encode_frame(ctx, frame, pts, &packet) < 0) {
    fprintf(stderr, "Encoding failed: %s\n", tav_encoder_get_error(ctx));
    // Handle error
}

Limitations

  • Maximum resolution: 8192×8192
  • GOP size: 1-48 frames
  • Single-tile encoding only (no spatial tiling)
  • Requires even width and height

License

Part of the TSVM project.

See Also

  • include/tav_encoder_lib.h - Complete API documentation
  • src/encoder_tav.c - CLI reference implementation
  • lib/libtadenc/ - TAD audio encoder library