-
Notifications
You must be signed in to change notification settings - Fork 15
Antalya 26.1 - Forward port of export part and partition #1388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: antalya-26.1
Are you sure you want to change the base?
Conversation
Antalya 25.8 - Forward port of # 1009 - Export merge tree part
…port_on_object_storage
Try to fix export mt part test issue
Preserve parquet specific parallel formatting for export part
Export merge tree part docs
Fix crash due to incompatible headers when mutation changed the schema between scheduling and executing part exports
Fix export partition ordering unit test
Fix flakiness in tests by adding status poll instead of random sleeps
Fix export part crash and add docs for export partition
Cleanup export manifests - attempt to prevent crash
split large parquet files on part export, preserve entire settings object in part export
…t_export Allow merge tree materialized / alias columns to be exported through part export
…atch_parts export part - skip or throw on pending mutations and patch parts
Add query id to `system.part_log`, `system_exports` and `system.replicated_partition_exports`
…nation_for_part_export Accept table function as destination for part export
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9eaa05f23d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/Parsers/ASTAlterQuery.cpp
Outdated
| bool ASTAlterQuery::isExportPartAlter() const | ||
| { | ||
| return isOneCommandTypeOnly(ASTAlterCommand::EXPORT_PART); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Treat EXPORT PARTITION as export alter in helper
isExportPartAlter() only matches EXPORT_PART, but this helper is now used by replicated-DDL routing (DatabaseReplicated::shouldReplicateQuery and DDLWorker::taskShouldBeExecutedOnLeader) to skip data-moving ALTERs. As a result, ALTER ... EXPORT PARTITION falls through those guards and can be replicated/executed via normal DDL paths, so one command may be re-run on multiple replicas and create duplicate/conflicting export tasks (or fail as already exported) instead of a single logical export operation.
Useful? React with 👍 / 👎.
| destination_storage->commitExportPartitionTransaction(manifest.transaction_id, manifest.partition_id, exported_paths, context); | ||
|
|
||
| LOG_INFO(log, "ExportPartition: Committed export, mark as completed"); | ||
| if (Coordination::Error::ZOK == zk->trySet(fs::path(entry_path) / "status", String(magic_enum::enum_name(ExportReplicatedMergeTreePartitionTaskEntry::Status::COMPLETED)).data(), -1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard completion status write against concurrent kill
After committing destination files, the code unconditionally writes COMPLETED with version -1, so a concurrent KILL EXPORT PARTITION that already switched status to KILLED can be overwritten and the export still committed. This creates a race where cancellation is acknowledged at the control plane but the task is finalized as successful anyway, which is especially visible when kill arrives just after the last part export finishes.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a port, I don't want to address this right now. Perhaps in #1177 or later work.
| configuration_to_initialize.file_path_generator = std::make_shared<ObjectStorageAppendFilePathGenerator>( | ||
| configuration_to_initialize.getRawPath().path, | ||
| configuration_to_initialize.format); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize Hive path generator after resolving auto format
For Hive partition strategy, file_path_generator is constructed from configuration_to_initialize.format before the auto format branch resolves the actual format. In format='auto' configurations this leaves the generator stuck with auto, so generated write/read paths use the wrong extension (for example *.auto) even after format inference picks a concrete format, which can produce unreadable object layouts or missed files.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm ok
Add tags to indicate the script requires S3 storage.
|
There is currently a crash in exports when dealing with materialized columns. The crash is related to a non-finalized write buffer. This happens in case there is an exception after the sink has been created and before the pipeline started running, it needs to be fixed. But this did not happen in 25.8, and the reason is that So, two things need to be done:
|
With analyzer off, it works. Analyzer on, |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Forward port of export part and partition #1041, #1083, #1086, #1106, #1124, #1144, #1147, #1150, #1157, #1158, #1161, #1167, #1229, #1294, #1320, #1324 and #1330
Documentation entry for user-facing changes
Export merge tree part and partition (we still need to rebase #1177 afterwards)
CI/CD Options
Exclude tests:
Regression jobs to run: