Skip to content

Conversation

@arthurpassos
Copy link
Collaborator

@arthurpassos arthurpassos commented Feb 10, 2026

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Forward port of export part and partition #1041, #1083, #1086, #1106, #1124, #1144, #1147, #1150, #1157, #1158, #1161, #1167, #1229, #1294, #1320, #1324 and #1330

Documentation entry for user-facing changes

Export merge tree part and partition (we still need to rebase #1177 afterwards)

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

Enmk and others added 2 commits February 9, 2026 15:33
Antalya 25.8 - Forward port of # 1009 - Export merge tree part
@github-actions
Copy link

github-actions bot commented Feb 10, 2026

Workflow [PR], commit [fcbbce8]

Enmk and others added 17 commits February 10, 2026 08:16
Preserve parquet specific parallel formatting for export part
Fix crash due to incompatible headers when mutation changed the schema between scheduling and executing part exports
Fix export partition ordering unit test
Fix flakiness in tests by adding status poll instead of random sleeps
Fix export part crash and add docs for export partition
Cleanup export manifests - attempt to prevent crash
split large parquet files on part export, preserve entire settings object in part export
…t_export

Allow merge tree materialized / alias columns to be exported through part export
…atch_parts

export part - skip or throw on pending mutations and patch parts
Add query id to `system.part_log`, `system_exports` and `system.replicated_partition_exports`
…nation_for_part_export

Accept table function as destination for part export
@arthurpassos arthurpassos marked this pull request as ready for review February 10, 2026 13:57
@arthurpassos arthurpassos added antalya port-antalya PRs to be ported to all new Antalya releases antalya-26.1 labels Feb 10, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9eaa05f23d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 698 to 700
bool ASTAlterQuery::isExportPartAlter() const
{
return isOneCommandTypeOnly(ASTAlterCommand::EXPORT_PART);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Treat EXPORT PARTITION as export alter in helper

isExportPartAlter() only matches EXPORT_PART, but this helper is now used by replicated-DDL routing (DatabaseReplicated::shouldReplicateQuery and DDLWorker::taskShouldBeExecutedOnLeader) to skip data-moving ALTERs. As a result, ALTER ... EXPORT PARTITION falls through those guards and can be replicated/executed via normal DDL paths, so one command may be re-run on multiple replicas and create duplicate/conflicting export tasks (or fail as already exported) instead of a single logical export operation.

Useful? React with 👍 / 👎.

destination_storage->commitExportPartitionTransaction(manifest.transaction_id, manifest.partition_id, exported_paths, context);

LOG_INFO(log, "ExportPartition: Committed export, mark as completed");
if (Coordination::Error::ZOK == zk->trySet(fs::path(entry_path) / "status", String(magic_enum::enum_name(ExportReplicatedMergeTreePartitionTaskEntry::Status::COMPLETED)).data(), -1))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard completion status write against concurrent kill

After committing destination files, the code unconditionally writes COMPLETED with version -1, so a concurrent KILL EXPORT PARTITION that already switched status to KILLED can be overwritten and the export still committed. This creates a race where cancellation is acknowledged at the control plane but the task is finalized as successful anyway, which is especially visible when kill arrives just after the last part export finishes.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a port, I don't want to address this right now. Perhaps in #1177 or later work.

Comment on lines 114 to 116
configuration_to_initialize.file_path_generator = std::make_shared<ObjectStorageAppendFilePathGenerator>(
configuration_to_initialize.getRawPath().path,
configuration_to_initialize.format);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Initialize Hive path generator after resolving auto format

For Hive partition strategy, file_path_generator is constructed from configuration_to_initialize.format before the auto format branch resolves the actual format. In format='auto' configurations this leaves the generator stuck with auto, so generated write/read paths use the wrong extension (for example *.auto) even after format inference picks a concrete format, which can produce unreadable object layouts or missed files.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm ok

@arthurpassos
Copy link
Collaborator Author

There is currently a crash in exports when dealing with materialized columns. The crash is related to a non-finalized write buffer. This happens in case there is an exception after the sink has been created and before the pipeline started running, it needs to be fixed.

But this did not happen in 25.8, and the reason is that materializeSpecialColumns is throwing because evaluateMissingDefaults implementation has changed in ClickHouse#87585.

So, two things need to be done:

  1. Either adapt materializeSpecialColumns or evaluateMissingDefaults so it does not throw; and
  2. finalize the buffers upon exception.

@arthurpassos
Copy link
Collaborator Author

There is currently a crash in exports when dealing with materialized columns. The crash is related to a non-finalized write buffer. This happens in case there is an exception after the sink has been created and before the pipeline started running, it needs to be fixed.

But this did not happen in 25.8, and the reason is that materializeSpecialColumns is throwing because evaluateMissingDefaults implementation has changed in ClickHouse#87585.

So, two things need to be done:

  1. Either adapt materializeSpecialColumns or evaluateMissingDefaults so it does not throw; and
  2. finalize the buffers upon exception.

With analyzer off, it works. Analyzer on, evaluateMissingDefaults fails. I need to investigate why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-26.1 port-antalya PRs to be ported to all new Antalya releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants