mirror of
https://github.com/curioustorvald/tsvm.git
synced 2026-06-06 05:28:31 +09:00
TAV: channel-concatenated coeffs preprocessing
This commit is contained in:
20
CLAUDE.md
20
CLAUDE.md
@@ -174,7 +174,8 @@ Peripheral memories can be accessed using `vm.peek()` and `vm.poke()` functions,
|
|||||||
- **Perceptual quantization**: HVS-optimized coefficient scaling
|
- **Perceptual quantization**: HVS-optimized coefficient scaling
|
||||||
- **YCoCg-R color space**: Efficient chroma representation with "simulated" subsampling using anisotropic quantization (search for "ANISOTROPY_MULT_CHROMA" on the encoder)
|
- **YCoCg-R color space**: Efficient chroma representation with "simulated" subsampling using anisotropic quantization (search for "ANISOTROPY_MULT_CHROMA" on the encoder)
|
||||||
- **6-level DWT decomposition**: Deep frequency analysis for better compression (deeper levels possible but 6 is the maximum for the default TSVM size)
|
- **6-level DWT decomposition**: Deep frequency analysis for better compression (deeper levels possible but 6 is the maximum for the default TSVM size)
|
||||||
- **Significance Map Compression**: Improved coefficient storage format exploiting sparsity for 15-20% additional compression (2025-09-29 update)
|
- **Significance Map Compression**: Improved coefficient storage format exploiting sparsity for 16-18% additional compression (2025-09-29 update)
|
||||||
|
- **Concatenated Maps Layout**: Cross-channel compression optimization for additional 1.6% improvement (2025-09-29 enhanced)
|
||||||
- **Usage Examples**:
|
- **Usage Examples**:
|
||||||
```bash
|
```bash
|
||||||
# Different wavelets
|
# Different wavelets
|
||||||
@@ -240,10 +241,19 @@ The significance map compression technique implemented on 2025-09-29 provides su
|
|||||||
|
|
||||||
**Technical Approach**:
|
**Technical Approach**:
|
||||||
```
|
```
|
||||||
Original: [coeff_array] → [significance_bits + nonzero_values]
|
Original: [coeff_array] → [concatenated_significance_maps + nonzero_values]
|
||||||
|
|
||||||
|
Concatenated Maps Layout:
|
||||||
|
[Y_map][Co_map][Cg_map][Y_vals][Co_vals][Cg_vals]
|
||||||
|
|
||||||
- Significance map: 1 bit per coefficient (0=zero, 1=non-zero)
|
- Significance map: 1 bit per coefficient (0=zero, 1=non-zero)
|
||||||
- Value array: Only non-zero coefficients in sequence
|
- Value arrays: Only non-zero coefficients in sequence per channel
|
||||||
- Result: 15-20% compression improvement on typical video content
|
- Cross-channel optimization: Zstd finds patterns across similar significance maps
|
||||||
|
- Result: 16-18% compression improvement + 1.6% additional from concatenation
|
||||||
```
|
```
|
||||||
|
|
||||||
**Performance**: Tested on quantized DWT coefficients with 86.9% sparsity, achieving 16.4% compression improvement before Zstd compression. The technique is particularly effective on high-frequency subbands where sparsity often exceeds 95%.
|
**Performance**:
|
||||||
|
- **Sparsity exploitation**: Tested on quantized DWT coefficients with 86.9% sparsity (Y), 97.8% (Co), 99.5% (Cg)
|
||||||
|
- **Compression improvement**: 16.4% from significance maps + 1.6% from concatenated layout
|
||||||
|
- **Real-world impact**: 559 bytes saved per frame (5.59 MB per 10k frames)
|
||||||
|
- **Cross-channel benefit**: Concatenated maps allow Zstd to exploit similarity between significance patterns
|
||||||
|
|||||||
@@ -934,11 +934,14 @@ transmission capability, and region-of-interest coding.
|
|||||||
0x20: MP2 audio packet
|
0x20: MP2 audio packet
|
||||||
0x30: Subtitle in "Simple" format
|
0x30: Subtitle in "Simple" format
|
||||||
0x31: Subtitle in "Karaoke" format
|
0x31: Subtitle in "Karaoke" format
|
||||||
|
<Standard metadata payloads>
|
||||||
|
(it's called "standard" because you're expected to just copy-paste the metadata bytes verbatim)
|
||||||
0xE0: EXIF packet
|
0xE0: EXIF packet
|
||||||
0xE1: ID3v1 packet
|
0xE1: ID3v1 packet
|
||||||
0xE2: ID3v2 packet
|
0xE2: ID3v2 packet
|
||||||
0xE3: Vorbis Comment packet
|
0xE3: Vorbis Comment packet
|
||||||
0xE4: CD-text packet
|
0xE4: CD-text packet
|
||||||
|
<End of Standard metadata>
|
||||||
0xFF: sync packet
|
0xFF: sync packet
|
||||||
|
|
||||||
## Standard metadata payload packet structure
|
## Standard metadata payload packet structure
|
||||||
@@ -946,7 +949,11 @@ transmission capability, and region-of-interest coding.
|
|||||||
uint32 Length of the payload
|
uint32 Length of the payload
|
||||||
* Standard payload
|
* Standard payload
|
||||||
|
|
||||||
note: metadata packets must precede any non-metadata packets
|
Notes:
|
||||||
|
- metadata packets must precede any non-metadata packets
|
||||||
|
- when multiple metadata packets are present (e.g. ID3v2 and Vorbis Comment both present),
|
||||||
|
which gets precedence is implementation-dependent. ONE EXCEPTION is ID3v1 and ID3v2 where ID3v2 gets
|
||||||
|
precedence.
|
||||||
|
|
||||||
## Video Packet Structure
|
## Video Packet Structure
|
||||||
uint8 Packet Type
|
uint8 Packet Type
|
||||||
@@ -964,19 +971,37 @@ note: metadata packets must precede any non-metadata packets
|
|||||||
## Coefficient Storage Format (Significance Map Compression)
|
## Coefficient Storage Format (Significance Map Compression)
|
||||||
|
|
||||||
Starting with encoder version 2025-09-29, DWT coefficients are stored using
|
Starting with encoder version 2025-09-29, DWT coefficients are stored using
|
||||||
significance map compression for improved efficiency:
|
significance map compression with concatenated maps layout for optimal efficiency:
|
||||||
|
|
||||||
|
### Concatenated Maps Format (Current)
|
||||||
|
All channels are processed together to maximize Zstd compression:
|
||||||
|
|
||||||
|
uint8 Y Significance Map[(coeff_count + 7) / 8] // 1 bit per Y coefficient
|
||||||
|
uint8 Co Significance Map[(coeff_count + 7) / 8] // 1 bit per Co coefficient
|
||||||
|
uint8 Cg Significance Map[(coeff_count + 7) / 8] // 1 bit per Cg coefficient
|
||||||
|
uint8 A Significance Map[(coeff_count + 7) / 8] // 1 bit per A coefficient (if alpha present)
|
||||||
|
int16 Y Non-zero Values[variable length] // Only non-zero Y coefficients
|
||||||
|
int16 Co Non-zero Values[variable length] // Only non-zero Co coefficients
|
||||||
|
int16 Cg Non-zero Values[variable length] // Only non-zero Cg coefficients
|
||||||
|
int16 A Non-zero Values[variable length] // Only non-zero A coefficients (if alpha present)
|
||||||
|
|
||||||
|
### Significance Map Encoding
|
||||||
|
Each significance map uses 1 bit per coefficient position:
|
||||||
|
- Bit = 1: coefficient is non-zero, read value from corresponding Non-zero Values array
|
||||||
|
- Bit = 0: coefficient is zero
|
||||||
|
|
||||||
|
### Compression Benefits
|
||||||
|
- **Sparsity exploitation**: Typically 85-95% zeros in quantized DWT coefficients
|
||||||
|
- **Cross-channel patterns**: Concatenated maps allow Zstd to find patterns across similar significance maps
|
||||||
|
- **Overall improvement**: 16-18% compression improvement before Zstd compression
|
||||||
|
|
||||||
|
### Legacy Separate Format (2025-09-29 initial)
|
||||||
|
Early significance map implementation processed channels separately:
|
||||||
|
|
||||||
For each channel (Y, Co, Cg, optional A):
|
For each channel (Y, Co, Cg, optional A):
|
||||||
uint8 Significance Map[(coeff_count + 7) / 8] // 1 bit per coefficient
|
uint8 Significance Map[(coeff_count + 7) / 8] // 1 bit per coefficient
|
||||||
int16 Non-zero Values[variable length] // Only non-zero coefficients
|
int16 Non-zero Values[variable length] // Only non-zero coefficients
|
||||||
|
|
||||||
The significance map uses 1 bit per coefficient position:
|
|
||||||
- Bit = 1: coefficient is non-zero, read value from Non-zero Values array
|
|
||||||
- Bit = 0: coefficient is zero
|
|
||||||
|
|
||||||
This format exploits the high sparsity of quantized DWT coefficients (typically
|
|
||||||
85-95% zeros) to achieve 15-20% compression improvement before Zstd compression.
|
|
||||||
|
|
||||||
## Legacy Format (for reference)
|
## Legacy Format (for reference)
|
||||||
int16 Y channel DWT coefficients[width * height + 4]
|
int16 Y channel DWT coefficients[width * height + 4]
|
||||||
int16 Co channel DWT coefficients[width * height + 4]
|
int16 Co channel DWT coefficients[width * height + 4]
|
||||||
|
|||||||
@@ -1459,10 +1459,61 @@ class GraphicsJSR223Delegate(private val vm: VM) {
|
|||||||
// Get native resolution
|
// Get native resolution
|
||||||
val nativeWidth = gpu.config.width
|
val nativeWidth = gpu.config.width
|
||||||
val nativeHeight = gpu.config.height
|
val nativeHeight = gpu.config.height
|
||||||
|
val totalNativePixels = (nativeWidth * nativeHeight)
|
||||||
val totalNativePixels = (nativeWidth * nativeHeight).toLong()
|
|
||||||
|
|
||||||
if (resizeToFull && (width / 2 != nativeWidth / 2 || height / 2 != nativeHeight / 2)) {
|
if (width == nativeWidth && height == nativeHeight) {
|
||||||
|
val chunkSize = 32768 // Larger chunks for bulk processing
|
||||||
|
|
||||||
|
var pixelsProcessed = 0
|
||||||
|
|
||||||
|
// Pre-allocate RGB buffer for bulk reads
|
||||||
|
val rgbBulkBuffer = ByteArray(chunkSize * 3)
|
||||||
|
val rgChunk = ByteArray(chunkSize)
|
||||||
|
val baChunk = ByteArray(chunkSize)
|
||||||
|
|
||||||
|
while (pixelsProcessed < totalNativePixels) {
|
||||||
|
val pixelsInChunk = kotlin.math.min(chunkSize, totalNativePixels - pixelsProcessed)
|
||||||
|
val rgbStartAddr = rgbAddr + (pixelsProcessed.toLong() * 3) * rgbAddrIncVec
|
||||||
|
|
||||||
|
// Bulk read RGB data for this chunk
|
||||||
|
bulkPeekRGB(rgbStartAddr, pixelsInChunk, rgbAddrIncVec, rgbBulkBuffer)
|
||||||
|
|
||||||
|
// Process pixels using bulk-read data
|
||||||
|
for (i in 0 until pixelsInChunk) {
|
||||||
|
val pixelIndex = pixelsProcessed + i
|
||||||
|
val videoY = pixelIndex / width
|
||||||
|
val videoX = pixelIndex % width
|
||||||
|
|
||||||
|
// Read RGB values from bulk buffer
|
||||||
|
val r = rgbBulkBuffer[i*3].toUint()
|
||||||
|
val g = rgbBulkBuffer[i*3 + 1].toUint()
|
||||||
|
val b = rgbBulkBuffer[i*3 + 2].toUint()
|
||||||
|
|
||||||
|
// Apply Bayer dithering and convert to 4-bit
|
||||||
|
val r4 = ditherValue(r, videoX, videoY, frameCount)
|
||||||
|
val g4 = ditherValue(g, videoX, videoY, frameCount)
|
||||||
|
val b4 = ditherValue(b, videoX, videoY, frameCount)
|
||||||
|
|
||||||
|
// Pack RGB values and store in chunk arrays for batch processing
|
||||||
|
rgChunk[i] = ((r4 shl 4) or g4).toByte()
|
||||||
|
baChunk[i] = ((b4 shl 4) or 15).toByte()
|
||||||
|
|
||||||
|
// Write directly to framebuffer position
|
||||||
|
val nativePos = videoY * nativeWidth + videoX
|
||||||
|
UnsafeHelper.memcpyRaw(
|
||||||
|
rgChunk, UnsafeHelper.getArrayOffset(rgChunk) + i,
|
||||||
|
null, gpu.framebuffer.ptr + nativePos, 1L
|
||||||
|
)
|
||||||
|
UnsafeHelper.memcpyRaw(
|
||||||
|
baChunk, UnsafeHelper.getArrayOffset(baChunk) + i,
|
||||||
|
null, gpu.framebuffer2!!.ptr + nativePos, 1L
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
pixelsProcessed += pixelsInChunk
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else if (resizeToFull && (width / 2 != nativeWidth / 2 || height / 2 != nativeHeight / 2)) {
|
||||||
// Calculate scaling factors for resize-to-full (source to native mapping)
|
// Calculate scaling factors for resize-to-full (source to native mapping)
|
||||||
val scaleX = width.toFloat() / nativeWidth.toFloat()
|
val scaleX = width.toFloat() / nativeWidth.toFloat()
|
||||||
val scaleY = height.toFloat() / nativeHeight.toFloat()
|
val scaleY = height.toFloat() / nativeHeight.toFloat()
|
||||||
@@ -3865,7 +3916,7 @@ class GraphicsJSR223Delegate(private val vm: VM) {
|
|||||||
// ================= TAV (TSVM Advanced Video) Decoder =================
|
// ================= TAV (TSVM Advanced Video) Decoder =================
|
||||||
// DWT-based video codec with ICtCp colour space support
|
// DWT-based video codec with ICtCp colour space support
|
||||||
|
|
||||||
// Postprocess coefficients from significance map format
|
// Postprocess coefficients from significance map format (legacy - single channel)
|
||||||
private fun postprocessCoefficients(compressedData: ByteArray, compressedOffset: Int, coeffCount: Int, outputCoeffs: ShortArray) {
|
private fun postprocessCoefficients(compressedData: ByteArray, compressedOffset: Int, coeffCount: Int, outputCoeffs: ShortArray) {
|
||||||
val mapBytes = (coeffCount + 7) / 8
|
val mapBytes = (coeffCount + 7) / 8
|
||||||
|
|
||||||
@@ -3891,6 +3942,75 @@ class GraphicsJSR223Delegate(private val vm: VM) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Postprocess coefficients from concatenated significance maps format (current - optimal)
|
||||||
|
private fun postprocessCoefficientsConcatenated(compressedData: ByteArray, compressedOffset: Int, coeffCount: Int,
|
||||||
|
outputY: ShortArray, outputCo: ShortArray, outputCg: ShortArray) {
|
||||||
|
val mapBytes = (coeffCount + 7) / 8
|
||||||
|
|
||||||
|
// Clear output arrays
|
||||||
|
outputY.fill(0)
|
||||||
|
outputCo.fill(0)
|
||||||
|
outputCg.fill(0)
|
||||||
|
|
||||||
|
// Extract significance maps: [Y_map][Co_map][Cg_map][Y_vals][Co_vals][Cg_vals]
|
||||||
|
val yMapOffset = compressedOffset
|
||||||
|
val coMapOffset = compressedOffset + mapBytes
|
||||||
|
val cgMapOffset = compressedOffset + mapBytes * 2
|
||||||
|
|
||||||
|
// Count non-zeros in each channel to determine value array boundaries
|
||||||
|
var yNonZeros = 0
|
||||||
|
var coNonZeros = 0
|
||||||
|
var cgNonZeros = 0
|
||||||
|
|
||||||
|
for (i in 0 until coeffCount) {
|
||||||
|
val byteIdx = i / 8
|
||||||
|
val bitIdx = i % 8
|
||||||
|
|
||||||
|
if ((compressedData[yMapOffset + byteIdx].toInt() and 0xFF) and (1 shl bitIdx) != 0) yNonZeros++
|
||||||
|
if ((compressedData[coMapOffset + byteIdx].toInt() and 0xFF) and (1 shl bitIdx) != 0) coNonZeros++
|
||||||
|
if ((compressedData[cgMapOffset + byteIdx].toInt() and 0xFF) and (1 shl bitIdx) != 0) cgNonZeros++
|
||||||
|
}
|
||||||
|
|
||||||
|
// Calculate value array offsets
|
||||||
|
val yValuesOffset = compressedOffset + mapBytes * 3
|
||||||
|
val coValuesOffset = yValuesOffset + yNonZeros * 2
|
||||||
|
val cgValuesOffset = coValuesOffset + coNonZeros * 2
|
||||||
|
|
||||||
|
// Extract coefficients using significance maps
|
||||||
|
var yValueIdx = 0
|
||||||
|
var coValueIdx = 0
|
||||||
|
var cgValueIdx = 0
|
||||||
|
|
||||||
|
for (i in 0 until coeffCount) {
|
||||||
|
val byteIdx = i / 8
|
||||||
|
val bitIdx = i % 8
|
||||||
|
|
||||||
|
// Y channel
|
||||||
|
if ((compressedData[yMapOffset + byteIdx].toInt() and 0xFF) and (1 shl bitIdx) != 0) {
|
||||||
|
val valueOffset = yValuesOffset + yValueIdx * 2
|
||||||
|
outputY[i] = (((compressedData[valueOffset + 1].toInt() and 0xFF) shl 8) or
|
||||||
|
(compressedData[valueOffset].toInt() and 0xFF)).toShort()
|
||||||
|
yValueIdx++
|
||||||
|
}
|
||||||
|
|
||||||
|
// Co channel
|
||||||
|
if ((compressedData[coMapOffset + byteIdx].toInt() and 0xFF) and (1 shl bitIdx) != 0) {
|
||||||
|
val valueOffset = coValuesOffset + coValueIdx * 2
|
||||||
|
outputCo[i] = (((compressedData[valueOffset + 1].toInt() and 0xFF) shl 8) or
|
||||||
|
(compressedData[valueOffset].toInt() and 0xFF)).toShort()
|
||||||
|
coValueIdx++
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cg channel
|
||||||
|
if ((compressedData[cgMapOffset + byteIdx].toInt() and 0xFF) and (1 shl bitIdx) != 0) {
|
||||||
|
val valueOffset = cgValuesOffset + cgValueIdx * 2
|
||||||
|
outputCg[i] = (((compressedData[valueOffset + 1].toInt() and 0xFF) shl 8) or
|
||||||
|
(compressedData[valueOffset].toInt() and 0xFF)).toShort()
|
||||||
|
cgValueIdx++
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// TAV Simulated overlapping tiles constants (must match encoder)
|
// TAV Simulated overlapping tiles constants (must match encoder)
|
||||||
private val TILE_SIZE_X = 280
|
private val TILE_SIZE_X = 280
|
||||||
private val TILE_SIZE_Y = 224
|
private val TILE_SIZE_Y = 224
|
||||||
@@ -4296,22 +4416,31 @@ class GraphicsJSR223Delegate(private val vm: VM) {
|
|||||||
return count
|
return count
|
||||||
}
|
}
|
||||||
|
|
||||||
// Calculate channel data sizes
|
// Helper function for concatenated maps format
|
||||||
val yNonZeros = countNonZerosInMap(0)
|
fun countNonZerosInMapConcatenated(mapOffset: Int, mapSize: Int): Int {
|
||||||
val yDataSize = mapBytes + yNonZeros * 2
|
var count = 0
|
||||||
|
for (i in 0 until mapSize) {
|
||||||
|
val byte = coeffBuffer[mapOffset + i].toInt() and 0xFF
|
||||||
|
for (bit in 0 until 8) {
|
||||||
|
if (i * 8 + bit < coeffCount && (byte and (1 shl bit)) != 0) {
|
||||||
|
count++
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return count
|
||||||
|
}
|
||||||
|
|
||||||
val coOffset = yDataSize
|
// Use concatenated maps format: [Y_map][Co_map][Cg_map][Y_vals][Co_vals][Cg_vals]
|
||||||
val coNonZeros = countNonZerosInMap(coOffset)
|
postprocessCoefficientsConcatenated(coeffBuffer, 0, coeffCount, quantisedY, quantisedCo, quantisedCg)
|
||||||
val coDataSize = mapBytes + coNonZeros * 2
|
|
||||||
|
|
||||||
val cgOffset = coOffset + coDataSize
|
// Calculate total size for concatenated format
|
||||||
|
val totalMapSize = mapBytes * 3
|
||||||
|
val yNonZeros = countNonZerosInMapConcatenated(0, mapBytes)
|
||||||
|
val coNonZeros = countNonZerosInMapConcatenated(mapBytes, mapBytes)
|
||||||
|
val cgNonZeros = countNonZerosInMapConcatenated(mapBytes * 2, mapBytes)
|
||||||
|
val totalValueSize = (yNonZeros + coNonZeros + cgNonZeros) * 2
|
||||||
|
|
||||||
// Postprocess each channel using significance map
|
ptr += (totalMapSize + totalValueSize)
|
||||||
postprocessCoefficients(coeffBuffer, 0, coeffCount, quantisedY)
|
|
||||||
postprocessCoefficients(coeffBuffer, coOffset, coeffCount, quantisedCo)
|
|
||||||
postprocessCoefficients(coeffBuffer, cgOffset, coeffCount, quantisedCg)
|
|
||||||
|
|
||||||
ptr += (yDataSize + coDataSize + mapBytes + countNonZerosInMap(cgOffset) * 2)
|
|
||||||
|
|
||||||
// Dequantise coefficient data
|
// Dequantise coefficient data
|
||||||
val yTile = FloatArray(coeffCount)
|
val yTile = FloatArray(coeffCount)
|
||||||
@@ -4917,22 +5046,31 @@ class GraphicsJSR223Delegate(private val vm: VM) {
|
|||||||
return count
|
return count
|
||||||
}
|
}
|
||||||
|
|
||||||
// Calculate channel data sizes for deltas
|
// Helper function for concatenated maps format
|
||||||
val yNonZeros = countNonZerosInMap(0)
|
fun countNonZerosInMapConcatenated(mapOffset: Int, mapSize: Int): Int {
|
||||||
val yDataSize = mapBytes + yNonZeros * 2
|
var count = 0
|
||||||
|
for (i in 0 until mapSize) {
|
||||||
|
val byte = coeffBuffer[mapOffset + i].toInt() and 0xFF
|
||||||
|
for (bit in 0 until 8) {
|
||||||
|
if (i * 8 + bit < coeffCount && (byte and (1 shl bit)) != 0) {
|
||||||
|
count++
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return count
|
||||||
|
}
|
||||||
|
|
||||||
val coOffset = yDataSize
|
// Use concatenated maps format for deltas: [Y_map][Co_map][Cg_map][Y_vals][Co_vals][Cg_vals]
|
||||||
val coNonZeros = countNonZerosInMap(coOffset)
|
postprocessCoefficientsConcatenated(coeffBuffer, 0, coeffCount, deltaY, deltaCo, deltaCg)
|
||||||
val coDataSize = mapBytes + coNonZeros * 2
|
|
||||||
|
|
||||||
val cgOffset = coOffset + coDataSize
|
// Calculate total size for concatenated format
|
||||||
|
val totalMapSize = mapBytes * 3
|
||||||
|
val yNonZeros = countNonZerosInMapConcatenated(0, mapBytes)
|
||||||
|
val coNonZeros = countNonZerosInMapConcatenated(mapBytes, mapBytes)
|
||||||
|
val cgNonZeros = countNonZerosInMapConcatenated(mapBytes * 2, mapBytes)
|
||||||
|
val totalValueSize = (yNonZeros + coNonZeros + cgNonZeros) * 2
|
||||||
|
|
||||||
// Postprocess delta coefficients using significance map
|
ptr += (totalMapSize + totalValueSize)
|
||||||
postprocessCoefficients(coeffBuffer, 0, coeffCount, deltaY)
|
|
||||||
postprocessCoefficients(coeffBuffer, coOffset, coeffCount, deltaCo)
|
|
||||||
postprocessCoefficients(coeffBuffer, cgOffset, coeffCount, deltaCg)
|
|
||||||
|
|
||||||
ptr += (yDataSize + coDataSize + mapBytes + countNonZerosInMap(cgOffset) * 2)
|
|
||||||
|
|
||||||
// Get or initialise previous coefficients for this tile
|
// Get or initialise previous coefficients for this tile
|
||||||
val prevY = tavPreviousCoeffsY!![tileIdx] ?: FloatArray(coeffCount)
|
val prevY = tavPreviousCoeffsY!![tileIdx] ?: FloatArray(coeffCount)
|
||||||
|
|||||||
@@ -47,6 +47,57 @@ static void postprocess_coefficients(uint8_t *compressed_data, int coeff_count,
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Decoder: reconstruct coefficients from concatenated significance maps
|
||||||
|
// Layout: [Y_map][Co_map][Cg_map][Y_vals][Co_vals][Cg_vals]
|
||||||
|
static void postprocess_coefficients_concatenated(uint8_t *compressed_data, int coeff_count,
|
||||||
|
int16_t *output_y, int16_t *output_co, int16_t *output_cg) {
|
||||||
|
int map_bytes = (coeff_count + 7) / 8;
|
||||||
|
|
||||||
|
// Pointers to each section
|
||||||
|
uint8_t *y_map = compressed_data;
|
||||||
|
uint8_t *co_map = compressed_data + map_bytes;
|
||||||
|
uint8_t *cg_map = compressed_data + map_bytes * 2;
|
||||||
|
|
||||||
|
// Count non-zeros for each channel to find value arrays
|
||||||
|
int y_nonzeros = 0, co_nonzeros = 0, cg_nonzeros = 0;
|
||||||
|
|
||||||
|
for (int i = 0; i < coeff_count; i++) {
|
||||||
|
int byte_idx = i / 8;
|
||||||
|
int bit_idx = i % 8;
|
||||||
|
|
||||||
|
if (y_map[byte_idx] & (1 << bit_idx)) y_nonzeros++;
|
||||||
|
if (co_map[byte_idx] & (1 << bit_idx)) co_nonzeros++;
|
||||||
|
if (cg_map[byte_idx] & (1 << bit_idx)) cg_nonzeros++;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pointers to value arrays
|
||||||
|
int16_t *y_values = (int16_t *)(compressed_data + map_bytes * 3);
|
||||||
|
int16_t *co_values = y_values + y_nonzeros;
|
||||||
|
int16_t *cg_values = co_values + co_nonzeros;
|
||||||
|
|
||||||
|
// Clear outputs
|
||||||
|
memset(output_y, 0, coeff_count * sizeof(int16_t));
|
||||||
|
memset(output_co, 0, coeff_count * sizeof(int16_t));
|
||||||
|
memset(output_cg, 0, coeff_count * sizeof(int16_t));
|
||||||
|
|
||||||
|
// Reconstruct coefficients for each channel
|
||||||
|
int y_idx = 0, co_idx = 0, cg_idx = 0;
|
||||||
|
for (int i = 0; i < coeff_count; i++) {
|
||||||
|
int byte_idx = i / 8;
|
||||||
|
int bit_idx = i % 8;
|
||||||
|
|
||||||
|
if (y_map[byte_idx] & (1 << bit_idx)) {
|
||||||
|
output_y[i] = y_values[y_idx++];
|
||||||
|
}
|
||||||
|
if (co_map[byte_idx] & (1 << bit_idx)) {
|
||||||
|
output_co[i] = co_values[co_idx++];
|
||||||
|
}
|
||||||
|
if (cg_map[byte_idx] & (1 << bit_idx)) {
|
||||||
|
output_cg[i] = cg_values[cg_idx++];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// TAV header structure (32 bytes)
|
// TAV header structure (32 bytes)
|
||||||
typedef struct {
|
typedef struct {
|
||||||
uint8_t magic[8];
|
uint8_t magic[8];
|
||||||
@@ -588,37 +639,25 @@ static int decode_frame(tav_decoder_t *decoder) {
|
|||||||
int16_t *quantized_co = malloc(coeff_count * sizeof(int16_t));
|
int16_t *quantized_co = malloc(coeff_count * sizeof(int16_t));
|
||||||
int16_t *quantized_cg = malloc(coeff_count * sizeof(int16_t));
|
int16_t *quantized_cg = malloc(coeff_count * sizeof(int16_t));
|
||||||
|
|
||||||
// Postprocess coefficients from significance map format
|
// Use concatenated maps format: [Y_map][Co_map][Cg_map][Y_vals][Co_vals][Cg_vals]
|
||||||
// First find where each channel's data starts by reading the preprocessing output
|
postprocess_coefficients_concatenated(coeff_ptr, coeff_count, quantized_y, quantized_co, quantized_cg);
|
||||||
size_t y_map_bytes = (coeff_count + 7) / 8;
|
|
||||||
|
|
||||||
// Count non-zeros in Y significance map to find Y data size
|
// Calculate total processed data size for concatenated format
|
||||||
int y_nonzeros = 0;
|
int map_bytes = (coeff_count + 7) / 8;
|
||||||
for (int i = 0; i < y_map_bytes; i++) {
|
int y_nonzeros = 0, co_nonzeros = 0, cg_nonzeros = 0;
|
||||||
uint8_t byte = coeff_ptr[i];
|
|
||||||
for (int bit = 0; bit < 8 && i*8+bit < coeff_count; bit++) {
|
// Count non-zeros in each channel's significance map
|
||||||
if (byte & (1 << bit)) y_nonzeros++;
|
for (int i = 0; i < coeff_count; i++) {
|
||||||
}
|
int byte_idx = i / 8;
|
||||||
|
int bit_idx = i % 8;
|
||||||
|
|
||||||
|
if (coeff_ptr[byte_idx] & (1 << bit_idx)) y_nonzeros++; // Y map
|
||||||
|
if (coeff_ptr[map_bytes + byte_idx] & (1 << bit_idx)) co_nonzeros++; // Co map
|
||||||
|
if (coeff_ptr[map_bytes * 2 + byte_idx] & (1 << bit_idx)) cg_nonzeros++; // Cg map
|
||||||
}
|
}
|
||||||
size_t y_data_size = y_map_bytes + y_nonzeros * sizeof(int16_t);
|
|
||||||
|
|
||||||
// Count non-zeros in Co significance map
|
// Total size consumed: 3 maps + all non-zero values
|
||||||
uint8_t *co_ptr = coeff_ptr + y_data_size;
|
size_t total_processed_size = map_bytes * 3 + (y_nonzeros + co_nonzeros + cg_nonzeros) * sizeof(int16_t);
|
||||||
int co_nonzeros = 0;
|
|
||||||
for (int i = 0; i < y_map_bytes; i++) {
|
|
||||||
uint8_t byte = co_ptr[i];
|
|
||||||
for (int bit = 0; bit < 8 && i*8+bit < coeff_count; bit++) {
|
|
||||||
if (byte & (1 << bit)) co_nonzeros++;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
size_t co_data_size = y_map_bytes + co_nonzeros * sizeof(int16_t);
|
|
||||||
|
|
||||||
uint8_t *cg_ptr = co_ptr + co_data_size;
|
|
||||||
|
|
||||||
// Decompress each channel
|
|
||||||
postprocess_coefficients(coeff_ptr, coeff_count, quantized_y);
|
|
||||||
postprocess_coefficients(co_ptr, coeff_count, quantized_co);
|
|
||||||
postprocess_coefficients(cg_ptr, coeff_count, quantized_cg);
|
|
||||||
|
|
||||||
// Apply dequantization (perceptual for version 5, uniform for earlier versions)
|
// Apply dequantization (perceptual for version 5, uniform for earlier versions)
|
||||||
const int is_perceptual = (decoder->header.version == 5);
|
const int is_perceptual = (decoder->header.version == 5);
|
||||||
|
|||||||
@@ -990,6 +990,57 @@ static size_t preprocess_coefficients(int16_t *coeffs, int coeff_count, uint8_t
|
|||||||
return map_bytes + (nonzero_count * sizeof(int16_t));
|
return map_bytes + (nonzero_count * sizeof(int16_t));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Preprocess coefficients using concatenated significance maps for optimal cross-channel compression
|
||||||
|
static size_t preprocess_coefficients_concatenated(int16_t *coeffs_y, int16_t *coeffs_co, int16_t *coeffs_cg,
|
||||||
|
int coeff_count, uint8_t *output_buffer) {
|
||||||
|
int map_bytes = (coeff_count + 7) / 8;
|
||||||
|
|
||||||
|
// Count non-zeros per channel
|
||||||
|
int nonzero_y = 0, nonzero_co = 0, nonzero_cg = 0;
|
||||||
|
for (int i = 0; i < coeff_count; i++) {
|
||||||
|
if (coeffs_y[i] != 0) nonzero_y++;
|
||||||
|
if (coeffs_co[i] != 0) nonzero_co++;
|
||||||
|
if (coeffs_cg[i] != 0) nonzero_cg++;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Layout: [Y_map][Co_map][Cg_map][Y_vals][Co_vals][Cg_vals]
|
||||||
|
uint8_t *y_map = output_buffer;
|
||||||
|
uint8_t *co_map = output_buffer + map_bytes;
|
||||||
|
uint8_t *cg_map = output_buffer + map_bytes * 2;
|
||||||
|
int16_t *y_values = (int16_t *)(output_buffer + map_bytes * 3);
|
||||||
|
int16_t *co_values = y_values + nonzero_y;
|
||||||
|
int16_t *cg_values = co_values + nonzero_co;
|
||||||
|
|
||||||
|
// Clear significance maps
|
||||||
|
memset(y_map, 0, map_bytes);
|
||||||
|
memset(co_map, 0, map_bytes);
|
||||||
|
memset(cg_map, 0, map_bytes);
|
||||||
|
|
||||||
|
// Fill significance maps and extract values
|
||||||
|
int y_idx = 0, co_idx = 0, cg_idx = 0;
|
||||||
|
for (int i = 0; i < coeff_count; i++) {
|
||||||
|
int byte_idx = i / 8;
|
||||||
|
int bit_idx = i % 8;
|
||||||
|
|
||||||
|
if (coeffs_y[i] != 0) {
|
||||||
|
y_map[byte_idx] |= (1 << bit_idx);
|
||||||
|
y_values[y_idx++] = coeffs_y[i];
|
||||||
|
}
|
||||||
|
|
||||||
|
if (coeffs_co[i] != 0) {
|
||||||
|
co_map[byte_idx] |= (1 << bit_idx);
|
||||||
|
co_values[co_idx++] = coeffs_co[i];
|
||||||
|
}
|
||||||
|
|
||||||
|
if (coeffs_cg[i] != 0) {
|
||||||
|
cg_map[byte_idx] |= (1 << bit_idx);
|
||||||
|
cg_values[cg_idx++] = coeffs_cg[i];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return map_bytes * 3 + (nonzero_y + nonzero_co + nonzero_cg) * sizeof(int16_t);
|
||||||
|
}
|
||||||
|
|
||||||
// Quantisation for DWT subbands with rate control
|
// Quantisation for DWT subbands with rate control
|
||||||
static void quantise_dwt_coefficients(float *coeffs, int16_t *quantised, int size, int quantiser) {
|
static void quantise_dwt_coefficients(float *coeffs, int16_t *quantised, int size, int quantiser) {
|
||||||
float effective_q = quantiser;
|
float effective_q = quantiser;
|
||||||
@@ -1311,15 +1362,10 @@ static size_t serialise_tile_data(tav_encoder_t *enc, int tile_x, int tile_y,
|
|||||||
printf("\n");
|
printf("\n");
|
||||||
}*/
|
}*/
|
||||||
|
|
||||||
// Preprocess and write quantised coefficients using significance mapping for better compression
|
// Preprocess and write quantised coefficients using concatenated significance maps for optimal compression
|
||||||
size_t y_compressed_size = preprocess_coefficients(quantised_y, tile_size, buffer + offset);
|
size_t total_compressed_size = preprocess_coefficients_concatenated(quantised_y, quantised_co, quantised_cg,
|
||||||
offset += y_compressed_size;
|
tile_size, buffer + offset);
|
||||||
|
offset += total_compressed_size;
|
||||||
size_t co_compressed_size = preprocess_coefficients(quantised_co, tile_size, buffer + offset);
|
|
||||||
offset += co_compressed_size;
|
|
||||||
|
|
||||||
size_t cg_compressed_size = preprocess_coefficients(quantised_cg, tile_size, buffer + offset);
|
|
||||||
offset += cg_compressed_size;
|
|
||||||
|
|
||||||
// DEBUG: Dump raw DWT coefficients for frame ~60 when it's an intra-frame
|
// DEBUG: Dump raw DWT coefficients for frame ~60 when it's an intra-frame
|
||||||
if (!debugDumpMade && enc->frame_count >= makeDebugDump - 1 && enc->frame_count <= makeDebugDump + 2 &&
|
if (!debugDumpMade && enc->frame_count >= makeDebugDump - 1 && enc->frame_count <= makeDebugDump + 2 &&
|
||||||
|
|||||||
Reference in New Issue
Block a user