Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8b92a48e5b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| DB::Names RestCatalog::getTables(const std::string & base_namespace, size_t limit) const | ||
| { | ||
| if (!allowed_namespaces.isNamespaceAllowed(base_namespace, /*nested*/ false)) | ||
| throw DB::Exception(DB::ErrorCodes::OUT_OF_SCOPE, | ||
| "Namespace {} is filtered by `namespaces` database parameter", base_namespace); |
There was a problem hiding this comment.
Skip filtered namespaces when probing emptiness
This new OUT_OF_SCOPE throw means RestCatalog::getTables() now fails for any namespace excluded by the namespaces filter. RestCatalog::empty() still iterates all namespaces returned by the catalog and calls getTables() in its stop condition without checking the filter, so any filtered namespace will now throw and bubble up. That makes DatabaseDataLake::empty() (used by DatabaseCatalog::detachDatabase when dropping/detaching DBs) fail whenever the catalog contains namespaces outside the filter (e.g., namespaces='dev' but prod exists). Consider skipping filtered namespaces in empty() or swallowing OUT_OF_SCOPE there.
Useful? React with 👍 / 👎.
|
@codex review |
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
| DataLakeStorageSettingsPtr settings_, | ||
| std::optional<std::string> catalog_namespaces_ = std::nullopt) | ||
| : settings(settings_) | ||
| , catalog_namespaces(catalog_namespaces_ ? *catalog_namespaces_ : std::string("*")) {} |
There was a problem hiding this comment.
consider std::optional::value_or()
| { | ||
| return allowed_namespaces.contains("*") || allowed_namespaces.contains(namespace_); | ||
| } | ||
|
|
There was a problem hiding this comment.
Not a big deal, but may be make this method virtual and move this implementation to ICatalog to avoid duplication?
Feel free to ignore if you've considered this option and decided that it is better to keep it simple.
There was a problem hiding this comment.
Implementation in ICatalog requires to make allowed_namespaces a ICatalog member. But allowed_namespaces has a simple type for 'flat' case and more complex type for case with nested namespaces.
There was a problem hiding this comment.
Implementation in
ICatalogrequires to makeallowed_namespacesaICatalogmember. Butallowed_namespaceshas a simple type for 'flat' case and more complex type for case with nested namespaces.
More complex type can live (under a different name) near more complex implementation in a derived class.
Member of a simple type can be std::optional in ICatalog.
Up to you.
| glue_client = std::make_unique<Aws::Glue::GlueClient>(chain, endpoint_provider, client_configuration); | ||
| } | ||
|
|
||
| boost::split(allowed_namespaces, settings.namespaces, [](char c){ return c == ','; }); |
There was a problem hiding this comment.
Consider boost::is_any_of instead of lambda.
There was a problem hiding this comment.
BTW, am I right that 'aaa, bbb' would not work because of space after comma?
Is it ok?
To fix it one can
...split ... is_any_of(", "), boost::token_compress_on)
src/Common/ErrorCodes.cpp
Outdated
| M(754, UDF_EXECUTION_FAILED) \ | ||
| M(755, TOO_LARGE_LIGHTWEIGHT_UPDATES) \ | ||
| M(756, CANNOT_PARSE_PROMQL_QUERY) \ | ||
| M(757, OUT_OF_SCOPE) \ |
There was a problem hiding this comment.
Not sure, though may be it is better to either use existing error code (e.g. DATALAKE_DATABASE_ERROR) or make up a more specific name, e.g. CATALOG_NAMESPACE_DISABLED ?
| | `aws_access_key_id` | AWS access key ID for S3/Glue access (if not using vended credentials) | | ||
| | `aws_secret_access_key` | AWS secret access key for S3/Glue access (if not using vended credentials) | | ||
| | `region` | AWS region for the service (e.g., `us-east-1`) | | ||
| | `namespaces` | Comma-separated list of namespaces, supported types: `rest`, `glue` and `unity` | |
There was a problem hiding this comment.
Initially I thought that rest, glue and unity are supported namespaces. Probably target audience of this feature would not have this problem, but may be '... implemented for catalog types: ..'
| boost::split(list_of_nested_namespaces, ns, [](char c){ return c == '.'; }); | ||
|
|
||
| size_t len = list_of_nested_namespaces.size(); | ||
| if (!len) |
There was a problem hiding this comment.
Not sure that I understand what it actually means.
That 'ns' is an empty string? Is it possible?
There was a problem hiding this comment.
Yes. Some kind of useless value.
You mean. need to throw exception in this case?
There was a problem hiding this comment.
I think that the only possible example of 'useless value'- a string that contains nothing except dots, right?
It can be processed in any way, e.g. as you currently do.
|
Test |
|
Failed stateless tests: |
|
Verified by this test: https://github.com/Altinity/clickhouse-regression/blob/main/iceberg/tests/iceberg_engine/namespace_filtering.py Test description: https://github.com/Altinity/clickhouse-regression/blob/main/iceberg/tests/iceberg_engine/test_descriptions/namespace_filtering_test_cases.md @ianton-ru can you please check this section and tell if I might forgot something important https://github.com/Altinity/clickhouse-regression/blob/main/iceberg/tests/iceberg_engine/test_descriptions/namespace_filtering_test_cases.md#test-setup I will investigate regression and upstream test fails. |
Do you test with Unity catalog, or it is out of our test scope? And just to be clean about combinations - If exists |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
DataLakeCatalog namespace filter
Documentation entry for user-facing changes
New setting
namespacesfor DataLakeCatalog with comma-separated list of namespaces.Supports
rest,glueandunitytypes.namepsaces='foo,bar'resttype can have nested namespaces, supports next rules:foo- tables from namespacefoo, but not from nested namespacesfoo.bar- tables from nested namespace, but not from base namespacefoo.*- tables from al; nested namespaces, but not from base namespaceWhen tables from both namespaces (base and nested) are required, need to use both:
foo,foo.*CI/CD Options
Exclude tests:
Regression jobs to run: