GH-3344: Adaptive compression for v2 page#3368
GH-3344: Adaptive compression for v2 page#3368ConeyLiu wants to merge 5 commits intoapache:masterfrom
Conversation
| definitionLevels, | ||
| data, | ||
| statistics, | ||
| 10); |
There was a problem hiding this comment.
It does not make sense to have a threshold greater than 1.0, does it?
In any case, the data of a column full of zeros is extremely compressible (you might want to disable dictionary encoding?), so the default value should work IMHO.
| assertEquals( | ||
| "Data should be stored uncompressed when compression ratio exceeds threshold", | ||
| uncompressedSize, | ||
| compressedSize); |
There was a problem hiding this comment.
Perhaps you want to read the data back and check it's as expected?
| * @param threshold the compression ratio threshold, default is {@value #DEFAULT_V2_PAGE_COMPRESS_THRESHOLD} | ||
| * @return this builder for method chaining | ||
| */ | ||
| public Builder withV2PageCompressThreshold(double threshold) { |
There was a problem hiding this comment.
I'm not sure it makes sense to put "V2" in the API name? For users it doesn't seem be relevant (just keep it mentioned in the docstrings).
There was a problem hiding this comment.
Good point, updated
| converter.getEncoding(headerV2.getEncoding()), | ||
| BytesInput.from(pageLoad), | ||
| headerV2.is_compressed, | ||
| compressed, |
There was a problem hiding this comment.
Fixed a potential bug
| public static final boolean DEFAULT_SIZE_STATISTICS_ENABLED = true; | ||
|
|
||
| public static final boolean DEFAULT_PAGE_WRITE_CHECKSUM_ENABLED = true; | ||
| public static final double DEFAULT_PAGE_COMPRESS_THRESHOLD = 0.98; |
There was a problem hiding this comment.
Why is this magic number? Is it better to use a smaller number like 0.9 or 0.85?
| } | ||
| } | ||
|
|
||
| public static Builder build(BytesInputCompressor compressor, MessageType schema, ByteBufferAllocator allocator) { |
There was a problem hiding this comment.
Is it better to use a separate issue to implement this, though I'm fine to keep it as is.
Rationale for this change
Closes #3344
What changes are included in this PR?
Are these changes tested?
Yes, added UT.
Are there any user-facing changes?
Yes, add new configuration. No breaking changes.