diff --git a/CLAUDE.md b/CLAUDE.md index c4ea8f7..c2ba48b 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -174,7 +174,8 @@ Peripheral memories can be accessed using `vm.peek()` and `vm.poke()` functions, - **Perceptual quantization**: HVS-optimized coefficient scaling - **YCoCg-R color space**: Efficient chroma representation with "simulated" subsampling using anisotropic quantization (search for "ANISOTROPY_MULT_CHROMA" on the encoder) - **6-level DWT decomposition**: Deep frequency analysis for better compression (deeper levels possible but 6 is the maximum for the default TSVM size) - - **Significance Map Compression**: Improved coefficient storage format exploiting sparsity for 15-20% additional compression (2025-09-29 update) + - **Significance Map Compression**: Improved coefficient storage format exploiting sparsity for 16-18% additional compression (2025-09-29 update) + - **Concatenated Maps Layout**: Cross-channel compression optimization for additional 1.6% improvement (2025-09-29 enhanced) - **Usage Examples**: ```bash # Different wavelets @@ -240,10 +241,19 @@ The significance map compression technique implemented on 2025-09-29 provides su **Technical Approach**: ``` -Original: [coeff_array] → [significance_bits + nonzero_values] +Original: [coeff_array] → [concatenated_significance_maps + nonzero_values] + +Concatenated Maps Layout: +[Y_map][Co_map][Cg_map][Y_vals][Co_vals][Cg_vals] + - Significance map: 1 bit per coefficient (0=zero, 1=non-zero) -- Value array: Only non-zero coefficients in sequence -- Result: 15-20% compression improvement on typical video content +- Value arrays: Only non-zero coefficients in sequence per channel +- Cross-channel optimization: Zstd finds patterns across similar significance maps +- Result: 16-18% compression improvement + 1.6% additional from concatenation ``` -**Performance**: Tested on quantized DWT coefficients with 86.9% sparsity, achieving 16.4% compression improvement before Zstd compression. The technique is particularly effective on high-frequency subbands where sparsity often exceeds 95%. +**Performance**: +- **Sparsity exploitation**: Tested on quantized DWT coefficients with 86.9% sparsity (Y), 97.8% (Co), 99.5% (Cg) +- **Compression improvement**: 16.4% from significance maps + 1.6% from concatenated layout +- **Real-world impact**: 559 bytes saved per frame (5.59 MB per 10k frames) +- **Cross-channel benefit**: Concatenated maps allow Zstd to exploit similarity between significance patterns diff --git a/terranmon.txt b/terranmon.txt index e46995c..c74ecbd 100644 --- a/terranmon.txt +++ b/terranmon.txt @@ -934,11 +934,14 @@ transmission capability, and region-of-interest coding. 0x20: MP2 audio packet 0x30: Subtitle in "Simple" format 0x31: Subtitle in "Karaoke" format + + (it's called "standard" because you're expected to just copy-paste the metadata bytes verbatim) 0xE0: EXIF packet 0xE1: ID3v1 packet 0xE2: ID3v2 packet 0xE3: Vorbis Comment packet 0xE4: CD-text packet + 0xFF: sync packet ## Standard metadata payload packet structure @@ -946,7 +949,11 @@ transmission capability, and region-of-interest coding. uint32 Length of the payload * Standard payload -note: metadata packets must precede any non-metadata packets + Notes: + - metadata packets must precede any non-metadata packets + - when multiple metadata packets are present (e.g. ID3v2 and Vorbis Comment both present), + which gets precedence is implementation-dependent. ONE EXCEPTION is ID3v1 and ID3v2 where ID3v2 gets + precedence. ## Video Packet Structure uint8 Packet Type @@ -964,19 +971,37 @@ note: metadata packets must precede any non-metadata packets ## Coefficient Storage Format (Significance Map Compression) Starting with encoder version 2025-09-29, DWT coefficients are stored using - significance map compression for improved efficiency: + significance map compression with concatenated maps layout for optimal efficiency: + + ### Concatenated Maps Format (Current) + All channels are processed together to maximize Zstd compression: + + uint8 Y Significance Map[(coeff_count + 7) / 8] // 1 bit per Y coefficient + uint8 Co Significance Map[(coeff_count + 7) / 8] // 1 bit per Co coefficient + uint8 Cg Significance Map[(coeff_count + 7) / 8] // 1 bit per Cg coefficient + uint8 A Significance Map[(coeff_count + 7) / 8] // 1 bit per A coefficient (if alpha present) + int16 Y Non-zero Values[variable length] // Only non-zero Y coefficients + int16 Co Non-zero Values[variable length] // Only non-zero Co coefficients + int16 Cg Non-zero Values[variable length] // Only non-zero Cg coefficients + int16 A Non-zero Values[variable length] // Only non-zero A coefficients (if alpha present) + + ### Significance Map Encoding + Each significance map uses 1 bit per coefficient position: + - Bit = 1: coefficient is non-zero, read value from corresponding Non-zero Values array + - Bit = 0: coefficient is zero + + ### Compression Benefits + - **Sparsity exploitation**: Typically 85-95% zeros in quantized DWT coefficients + - **Cross-channel patterns**: Concatenated maps allow Zstd to find patterns across similar significance maps + - **Overall improvement**: 16-18% compression improvement before Zstd compression + + ### Legacy Separate Format (2025-09-29 initial) + Early significance map implementation processed channels separately: For each channel (Y, Co, Cg, optional A): uint8 Significance Map[(coeff_count + 7) / 8] // 1 bit per coefficient int16 Non-zero Values[variable length] // Only non-zero coefficients - The significance map uses 1 bit per coefficient position: - - Bit = 1: coefficient is non-zero, read value from Non-zero Values array - - Bit = 0: coefficient is zero - - This format exploits the high sparsity of quantized DWT coefficients (typically - 85-95% zeros) to achieve 15-20% compression improvement before Zstd compression. - ## Legacy Format (for reference) int16 Y channel DWT coefficients[width * height + 4] int16 Co channel DWT coefficients[width * height + 4] diff --git a/tsvm_core/src/net/torvald/tsvm/GraphicsJSR223Delegate.kt b/tsvm_core/src/net/torvald/tsvm/GraphicsJSR223Delegate.kt index 2aa47a5..c95c9a9 100644 --- a/tsvm_core/src/net/torvald/tsvm/GraphicsJSR223Delegate.kt +++ b/tsvm_core/src/net/torvald/tsvm/GraphicsJSR223Delegate.kt @@ -1459,10 +1459,61 @@ class GraphicsJSR223Delegate(private val vm: VM) { // Get native resolution val nativeWidth = gpu.config.width val nativeHeight = gpu.config.height - - val totalNativePixels = (nativeWidth * nativeHeight).toLong() + val totalNativePixels = (nativeWidth * nativeHeight) - if (resizeToFull && (width / 2 != nativeWidth / 2 || height / 2 != nativeHeight / 2)) { + if (width == nativeWidth && height == nativeHeight) { + val chunkSize = 32768 // Larger chunks for bulk processing + + var pixelsProcessed = 0 + + // Pre-allocate RGB buffer for bulk reads + val rgbBulkBuffer = ByteArray(chunkSize * 3) + val rgChunk = ByteArray(chunkSize) + val baChunk = ByteArray(chunkSize) + + while (pixelsProcessed < totalNativePixels) { + val pixelsInChunk = kotlin.math.min(chunkSize, totalNativePixels - pixelsProcessed) + val rgbStartAddr = rgbAddr + (pixelsProcessed.toLong() * 3) * rgbAddrIncVec + + // Bulk read RGB data for this chunk + bulkPeekRGB(rgbStartAddr, pixelsInChunk, rgbAddrIncVec, rgbBulkBuffer) + + // Process pixels using bulk-read data + for (i in 0 until pixelsInChunk) { + val pixelIndex = pixelsProcessed + i + val videoY = pixelIndex / width + val videoX = pixelIndex % width + + // Read RGB values from bulk buffer + val r = rgbBulkBuffer[i*3].toUint() + val g = rgbBulkBuffer[i*3 + 1].toUint() + val b = rgbBulkBuffer[i*3 + 2].toUint() + + // Apply Bayer dithering and convert to 4-bit + val r4 = ditherValue(r, videoX, videoY, frameCount) + val g4 = ditherValue(g, videoX, videoY, frameCount) + val b4 = ditherValue(b, videoX, videoY, frameCount) + + // Pack RGB values and store in chunk arrays for batch processing + rgChunk[i] = ((r4 shl 4) or g4).toByte() + baChunk[i] = ((b4 shl 4) or 15).toByte() + + // Write directly to framebuffer position + val nativePos = videoY * nativeWidth + videoX + UnsafeHelper.memcpyRaw( + rgChunk, UnsafeHelper.getArrayOffset(rgChunk) + i, + null, gpu.framebuffer.ptr + nativePos, 1L + ) + UnsafeHelper.memcpyRaw( + baChunk, UnsafeHelper.getArrayOffset(baChunk) + i, + null, gpu.framebuffer2!!.ptr + nativePos, 1L + ) + } + + pixelsProcessed += pixelsInChunk + } + } + else if (resizeToFull && (width / 2 != nativeWidth / 2 || height / 2 != nativeHeight / 2)) { // Calculate scaling factors for resize-to-full (source to native mapping) val scaleX = width.toFloat() / nativeWidth.toFloat() val scaleY = height.toFloat() / nativeHeight.toFloat() @@ -3865,7 +3916,7 @@ class GraphicsJSR223Delegate(private val vm: VM) { // ================= TAV (TSVM Advanced Video) Decoder ================= // DWT-based video codec with ICtCp colour space support - // Postprocess coefficients from significance map format + // Postprocess coefficients from significance map format (legacy - single channel) private fun postprocessCoefficients(compressedData: ByteArray, compressedOffset: Int, coeffCount: Int, outputCoeffs: ShortArray) { val mapBytes = (coeffCount + 7) / 8 @@ -3891,6 +3942,75 @@ class GraphicsJSR223Delegate(private val vm: VM) { } } + // Postprocess coefficients from concatenated significance maps format (current - optimal) + private fun postprocessCoefficientsConcatenated(compressedData: ByteArray, compressedOffset: Int, coeffCount: Int, + outputY: ShortArray, outputCo: ShortArray, outputCg: ShortArray) { + val mapBytes = (coeffCount + 7) / 8 + + // Clear output arrays + outputY.fill(0) + outputCo.fill(0) + outputCg.fill(0) + + // Extract significance maps: [Y_map][Co_map][Cg_map][Y_vals][Co_vals][Cg_vals] + val yMapOffset = compressedOffset + val coMapOffset = compressedOffset + mapBytes + val cgMapOffset = compressedOffset + mapBytes * 2 + + // Count non-zeros in each channel to determine value array boundaries + var yNonZeros = 0 + var coNonZeros = 0 + var cgNonZeros = 0 + + for (i in 0 until coeffCount) { + val byteIdx = i / 8 + val bitIdx = i % 8 + + if ((compressedData[yMapOffset + byteIdx].toInt() and 0xFF) and (1 shl bitIdx) != 0) yNonZeros++ + if ((compressedData[coMapOffset + byteIdx].toInt() and 0xFF) and (1 shl bitIdx) != 0) coNonZeros++ + if ((compressedData[cgMapOffset + byteIdx].toInt() and 0xFF) and (1 shl bitIdx) != 0) cgNonZeros++ + } + + // Calculate value array offsets + val yValuesOffset = compressedOffset + mapBytes * 3 + val coValuesOffset = yValuesOffset + yNonZeros * 2 + val cgValuesOffset = coValuesOffset + coNonZeros * 2 + + // Extract coefficients using significance maps + var yValueIdx = 0 + var coValueIdx = 0 + var cgValueIdx = 0 + + for (i in 0 until coeffCount) { + val byteIdx = i / 8 + val bitIdx = i % 8 + + // Y channel + if ((compressedData[yMapOffset + byteIdx].toInt() and 0xFF) and (1 shl bitIdx) != 0) { + val valueOffset = yValuesOffset + yValueIdx * 2 + outputY[i] = (((compressedData[valueOffset + 1].toInt() and 0xFF) shl 8) or + (compressedData[valueOffset].toInt() and 0xFF)).toShort() + yValueIdx++ + } + + // Co channel + if ((compressedData[coMapOffset + byteIdx].toInt() and 0xFF) and (1 shl bitIdx) != 0) { + val valueOffset = coValuesOffset + coValueIdx * 2 + outputCo[i] = (((compressedData[valueOffset + 1].toInt() and 0xFF) shl 8) or + (compressedData[valueOffset].toInt() and 0xFF)).toShort() + coValueIdx++ + } + + // Cg channel + if ((compressedData[cgMapOffset + byteIdx].toInt() and 0xFF) and (1 shl bitIdx) != 0) { + val valueOffset = cgValuesOffset + cgValueIdx * 2 + outputCg[i] = (((compressedData[valueOffset + 1].toInt() and 0xFF) shl 8) or + (compressedData[valueOffset].toInt() and 0xFF)).toShort() + cgValueIdx++ + } + } + } + // TAV Simulated overlapping tiles constants (must match encoder) private val TILE_SIZE_X = 280 private val TILE_SIZE_Y = 224 @@ -4296,22 +4416,31 @@ class GraphicsJSR223Delegate(private val vm: VM) { return count } - // Calculate channel data sizes - val yNonZeros = countNonZerosInMap(0) - val yDataSize = mapBytes + yNonZeros * 2 + // Helper function for concatenated maps format + fun countNonZerosInMapConcatenated(mapOffset: Int, mapSize: Int): Int { + var count = 0 + for (i in 0 until mapSize) { + val byte = coeffBuffer[mapOffset + i].toInt() and 0xFF + for (bit in 0 until 8) { + if (i * 8 + bit < coeffCount && (byte and (1 shl bit)) != 0) { + count++ + } + } + } + return count + } - val coOffset = yDataSize - val coNonZeros = countNonZerosInMap(coOffset) - val coDataSize = mapBytes + coNonZeros * 2 + // Use concatenated maps format: [Y_map][Co_map][Cg_map][Y_vals][Co_vals][Cg_vals] + postprocessCoefficientsConcatenated(coeffBuffer, 0, coeffCount, quantisedY, quantisedCo, quantisedCg) - val cgOffset = coOffset + coDataSize + // Calculate total size for concatenated format + val totalMapSize = mapBytes * 3 + val yNonZeros = countNonZerosInMapConcatenated(0, mapBytes) + val coNonZeros = countNonZerosInMapConcatenated(mapBytes, mapBytes) + val cgNonZeros = countNonZerosInMapConcatenated(mapBytes * 2, mapBytes) + val totalValueSize = (yNonZeros + coNonZeros + cgNonZeros) * 2 - // Postprocess each channel using significance map - postprocessCoefficients(coeffBuffer, 0, coeffCount, quantisedY) - postprocessCoefficients(coeffBuffer, coOffset, coeffCount, quantisedCo) - postprocessCoefficients(coeffBuffer, cgOffset, coeffCount, quantisedCg) - - ptr += (yDataSize + coDataSize + mapBytes + countNonZerosInMap(cgOffset) * 2) + ptr += (totalMapSize + totalValueSize) // Dequantise coefficient data val yTile = FloatArray(coeffCount) @@ -4917,22 +5046,31 @@ class GraphicsJSR223Delegate(private val vm: VM) { return count } - // Calculate channel data sizes for deltas - val yNonZeros = countNonZerosInMap(0) - val yDataSize = mapBytes + yNonZeros * 2 + // Helper function for concatenated maps format + fun countNonZerosInMapConcatenated(mapOffset: Int, mapSize: Int): Int { + var count = 0 + for (i in 0 until mapSize) { + val byte = coeffBuffer[mapOffset + i].toInt() and 0xFF + for (bit in 0 until 8) { + if (i * 8 + bit < coeffCount && (byte and (1 shl bit)) != 0) { + count++ + } + } + } + return count + } - val coOffset = yDataSize - val coNonZeros = countNonZerosInMap(coOffset) - val coDataSize = mapBytes + coNonZeros * 2 + // Use concatenated maps format for deltas: [Y_map][Co_map][Cg_map][Y_vals][Co_vals][Cg_vals] + postprocessCoefficientsConcatenated(coeffBuffer, 0, coeffCount, deltaY, deltaCo, deltaCg) - val cgOffset = coOffset + coDataSize + // Calculate total size for concatenated format + val totalMapSize = mapBytes * 3 + val yNonZeros = countNonZerosInMapConcatenated(0, mapBytes) + val coNonZeros = countNonZerosInMapConcatenated(mapBytes, mapBytes) + val cgNonZeros = countNonZerosInMapConcatenated(mapBytes * 2, mapBytes) + val totalValueSize = (yNonZeros + coNonZeros + cgNonZeros) * 2 - // Postprocess delta coefficients using significance map - postprocessCoefficients(coeffBuffer, 0, coeffCount, deltaY) - postprocessCoefficients(coeffBuffer, coOffset, coeffCount, deltaCo) - postprocessCoefficients(coeffBuffer, cgOffset, coeffCount, deltaCg) - - ptr += (yDataSize + coDataSize + mapBytes + countNonZerosInMap(cgOffset) * 2) + ptr += (totalMapSize + totalValueSize) // Get or initialise previous coefficients for this tile val prevY = tavPreviousCoeffsY!![tileIdx] ?: FloatArray(coeffCount) diff --git a/video_encoder/decoder_tav.c b/video_encoder/decoder_tav.c index 23b057f..f6c0c09 100644 --- a/video_encoder/decoder_tav.c +++ b/video_encoder/decoder_tav.c @@ -47,6 +47,57 @@ static void postprocess_coefficients(uint8_t *compressed_data, int coeff_count, } } +// Decoder: reconstruct coefficients from concatenated significance maps +// Layout: [Y_map][Co_map][Cg_map][Y_vals][Co_vals][Cg_vals] +static void postprocess_coefficients_concatenated(uint8_t *compressed_data, int coeff_count, + int16_t *output_y, int16_t *output_co, int16_t *output_cg) { + int map_bytes = (coeff_count + 7) / 8; + + // Pointers to each section + uint8_t *y_map = compressed_data; + uint8_t *co_map = compressed_data + map_bytes; + uint8_t *cg_map = compressed_data + map_bytes * 2; + + // Count non-zeros for each channel to find value arrays + int y_nonzeros = 0, co_nonzeros = 0, cg_nonzeros = 0; + + for (int i = 0; i < coeff_count; i++) { + int byte_idx = i / 8; + int bit_idx = i % 8; + + if (y_map[byte_idx] & (1 << bit_idx)) y_nonzeros++; + if (co_map[byte_idx] & (1 << bit_idx)) co_nonzeros++; + if (cg_map[byte_idx] & (1 << bit_idx)) cg_nonzeros++; + } + + // Pointers to value arrays + int16_t *y_values = (int16_t *)(compressed_data + map_bytes * 3); + int16_t *co_values = y_values + y_nonzeros; + int16_t *cg_values = co_values + co_nonzeros; + + // Clear outputs + memset(output_y, 0, coeff_count * sizeof(int16_t)); + memset(output_co, 0, coeff_count * sizeof(int16_t)); + memset(output_cg, 0, coeff_count * sizeof(int16_t)); + + // Reconstruct coefficients for each channel + int y_idx = 0, co_idx = 0, cg_idx = 0; + for (int i = 0; i < coeff_count; i++) { + int byte_idx = i / 8; + int bit_idx = i % 8; + + if (y_map[byte_idx] & (1 << bit_idx)) { + output_y[i] = y_values[y_idx++]; + } + if (co_map[byte_idx] & (1 << bit_idx)) { + output_co[i] = co_values[co_idx++]; + } + if (cg_map[byte_idx] & (1 << bit_idx)) { + output_cg[i] = cg_values[cg_idx++]; + } + } +} + // TAV header structure (32 bytes) typedef struct { uint8_t magic[8]; @@ -588,37 +639,25 @@ static int decode_frame(tav_decoder_t *decoder) { int16_t *quantized_co = malloc(coeff_count * sizeof(int16_t)); int16_t *quantized_cg = malloc(coeff_count * sizeof(int16_t)); - // Postprocess coefficients from significance map format - // First find where each channel's data starts by reading the preprocessing output - size_t y_map_bytes = (coeff_count + 7) / 8; + // Use concatenated maps format: [Y_map][Co_map][Cg_map][Y_vals][Co_vals][Cg_vals] + postprocess_coefficients_concatenated(coeff_ptr, coeff_count, quantized_y, quantized_co, quantized_cg); - // Count non-zeros in Y significance map to find Y data size - int y_nonzeros = 0; - for (int i = 0; i < y_map_bytes; i++) { - uint8_t byte = coeff_ptr[i]; - for (int bit = 0; bit < 8 && i*8+bit < coeff_count; bit++) { - if (byte & (1 << bit)) y_nonzeros++; - } + // Calculate total processed data size for concatenated format + int map_bytes = (coeff_count + 7) / 8; + int y_nonzeros = 0, co_nonzeros = 0, cg_nonzeros = 0; + + // Count non-zeros in each channel's significance map + for (int i = 0; i < coeff_count; i++) { + int byte_idx = i / 8; + int bit_idx = i % 8; + + if (coeff_ptr[byte_idx] & (1 << bit_idx)) y_nonzeros++; // Y map + if (coeff_ptr[map_bytes + byte_idx] & (1 << bit_idx)) co_nonzeros++; // Co map + if (coeff_ptr[map_bytes * 2 + byte_idx] & (1 << bit_idx)) cg_nonzeros++; // Cg map } - size_t y_data_size = y_map_bytes + y_nonzeros * sizeof(int16_t); - // Count non-zeros in Co significance map - uint8_t *co_ptr = coeff_ptr + y_data_size; - int co_nonzeros = 0; - for (int i = 0; i < y_map_bytes; i++) { - uint8_t byte = co_ptr[i]; - for (int bit = 0; bit < 8 && i*8+bit < coeff_count; bit++) { - if (byte & (1 << bit)) co_nonzeros++; - } - } - size_t co_data_size = y_map_bytes + co_nonzeros * sizeof(int16_t); - - uint8_t *cg_ptr = co_ptr + co_data_size; - - // Decompress each channel - postprocess_coefficients(coeff_ptr, coeff_count, quantized_y); - postprocess_coefficients(co_ptr, coeff_count, quantized_co); - postprocess_coefficients(cg_ptr, coeff_count, quantized_cg); + // Total size consumed: 3 maps + all non-zero values + size_t total_processed_size = map_bytes * 3 + (y_nonzeros + co_nonzeros + cg_nonzeros) * sizeof(int16_t); // Apply dequantization (perceptual for version 5, uniform for earlier versions) const int is_perceptual = (decoder->header.version == 5); diff --git a/video_encoder/encoder_tav.c b/video_encoder/encoder_tav.c index 8c11474..2b5e4e3 100644 --- a/video_encoder/encoder_tav.c +++ b/video_encoder/encoder_tav.c @@ -990,6 +990,57 @@ static size_t preprocess_coefficients(int16_t *coeffs, int coeff_count, uint8_t return map_bytes + (nonzero_count * sizeof(int16_t)); } +// Preprocess coefficients using concatenated significance maps for optimal cross-channel compression +static size_t preprocess_coefficients_concatenated(int16_t *coeffs_y, int16_t *coeffs_co, int16_t *coeffs_cg, + int coeff_count, uint8_t *output_buffer) { + int map_bytes = (coeff_count + 7) / 8; + + // Count non-zeros per channel + int nonzero_y = 0, nonzero_co = 0, nonzero_cg = 0; + for (int i = 0; i < coeff_count; i++) { + if (coeffs_y[i] != 0) nonzero_y++; + if (coeffs_co[i] != 0) nonzero_co++; + if (coeffs_cg[i] != 0) nonzero_cg++; + } + + // Layout: [Y_map][Co_map][Cg_map][Y_vals][Co_vals][Cg_vals] + uint8_t *y_map = output_buffer; + uint8_t *co_map = output_buffer + map_bytes; + uint8_t *cg_map = output_buffer + map_bytes * 2; + int16_t *y_values = (int16_t *)(output_buffer + map_bytes * 3); + int16_t *co_values = y_values + nonzero_y; + int16_t *cg_values = co_values + nonzero_co; + + // Clear significance maps + memset(y_map, 0, map_bytes); + memset(co_map, 0, map_bytes); + memset(cg_map, 0, map_bytes); + + // Fill significance maps and extract values + int y_idx = 0, co_idx = 0, cg_idx = 0; + for (int i = 0; i < coeff_count; i++) { + int byte_idx = i / 8; + int bit_idx = i % 8; + + if (coeffs_y[i] != 0) { + y_map[byte_idx] |= (1 << bit_idx); + y_values[y_idx++] = coeffs_y[i]; + } + + if (coeffs_co[i] != 0) { + co_map[byte_idx] |= (1 << bit_idx); + co_values[co_idx++] = coeffs_co[i]; + } + + if (coeffs_cg[i] != 0) { + cg_map[byte_idx] |= (1 << bit_idx); + cg_values[cg_idx++] = coeffs_cg[i]; + } + } + + return map_bytes * 3 + (nonzero_y + nonzero_co + nonzero_cg) * sizeof(int16_t); +} + // Quantisation for DWT subbands with rate control static void quantise_dwt_coefficients(float *coeffs, int16_t *quantised, int size, int quantiser) { float effective_q = quantiser; @@ -1311,15 +1362,10 @@ static size_t serialise_tile_data(tav_encoder_t *enc, int tile_x, int tile_y, printf("\n"); }*/ - // Preprocess and write quantised coefficients using significance mapping for better compression - size_t y_compressed_size = preprocess_coefficients(quantised_y, tile_size, buffer + offset); - offset += y_compressed_size; - - size_t co_compressed_size = preprocess_coefficients(quantised_co, tile_size, buffer + offset); - offset += co_compressed_size; - - size_t cg_compressed_size = preprocess_coefficients(quantised_cg, tile_size, buffer + offset); - offset += cg_compressed_size; + // Preprocess and write quantised coefficients using concatenated significance maps for optimal compression + size_t total_compressed_size = preprocess_coefficients_concatenated(quantised_y, quantised_co, quantised_cg, + tile_size, buffer + offset); + offset += total_compressed_size; // DEBUG: Dump raw DWT coefficients for frame ~60 when it's an intra-frame if (!debugDumpMade && enc->frame_count >= makeDebugDump - 1 && enc->frame_count <= makeDebugDump + 2 &&