feat: support dynamic conversion of BigQuery schema to Protobuf descriptor#15572
feat: support dynamic conversion of BigQuery schema to Protobuf descriptor#15572Regan-Koopmans wants to merge 2 commits intogoogleapis:mainfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello @Regan-Koopmans, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new utility to dynamically convert BigQuery table schemas into Protocol Buffer descriptors. This enhancement streamlines the process of using the BigQuery Storage Write API by allowing users to generate the necessary Protobuf schema on the fly, rather than manually defining and compiling Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable utility for converting BigQuery schemas to Protobuf descriptors, which is a great addition for users of the Write API. The implementation is well-structured and includes a comprehensive set of unit tests. I've identified a couple of areas for improvement. One is a minor code cleanup to remove an unused parameter. The other is a more important change to handle invalid RANGE field specifications more robustly by raising an error instead of silently using a default, which will improve correctness and prevent subtle bugs.
packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/schema.py
Outdated
Show resolved
Hide resolved
af778d7 to
2711330
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable feature for dynamically converting BigQuery schemas to Protobuf descriptors, with the new schema.py module and a comprehensive test suite in test_schema.py. The security review highlighted an unexpected observation: the files packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/__init__.py, packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/schema.py, and packages/google-cloud-bigquery-storage/tests/unit/test_schema.py appear to be empty. Please verify if this is intended or if content is missing, as no security vulnerabilities were found. For code quality, consider removing an unused function parameter and adding stricter type validation for RANGE fields in schema.py to enhance robustness and prevent potential runtime errors. Overall, this is a great addition.
packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/schema.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/schema.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/schema.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable feature for dynamically converting BigQuery schemas to Protobuf descriptors, which will be very useful for the BigQuery Storage Write API. However, a security audit identified two medium-severity vulnerabilities related to improper input validation. User-controllable names for messages and fields are used to construct the Protobuf descriptor without sanitization, potentially leading to injection attacks (e.g., path traversal) in the downstream BigQuery service. It is recommended to validate all user-provided names against valid Protobuf identifier naming conventions. Additionally, a critical issue was found in schema.py where the code incorrectly uses a non-existent append method on protobuf container objects, which will cause an AttributeError at runtime. Specific suggestions have been provided to replace these with the correct extend method. After addressing these points, the changes will be in excellent condition.
packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/schema.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/schema.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/schema.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/schema.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/schema.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/schema.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Code Review
This pull request introduces a new utility module for dynamically converting BigQuery table schemas into Protobuf descriptors. This is a valuable feature for users of the BigQuery Storage Write API, as it removes the need to manually define .proto files. The implementation covers a wide range of BigQuery types, including nested STRUCTs and RANGE types, and is accompanied by a comprehensive test suite. My main feedback concerns a potential name collision issue when generating nested message types, which could lead to invalid descriptors in some edge cases. Addressing this will make the new utility more robust.
packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/schema.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Code Review
This pull request introduces a new feature to dynamically convert BigQuery schemas to Protobuf descriptors, which is a valuable addition for users of the BigQuery Storage Write API. The implementation is well-structured and includes a comprehensive set of unit tests.
My review focuses on improving the robustness of the field name sanitization logic to handle all valid BigQuery field names and ensure the generated Protobuf descriptors are always valid. I've also identified some dead code that can be removed and suggested an enhancement to the test suite to cover the sanitization logic.
packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/schema.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/schema.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Code Review
This pull request introduces a new feature to dynamically convert BigQuery schemas to Protobuf descriptors, which is a valuable addition for users of the BigQuery Storage Write API. The implementation is well-structured and includes comprehensive unit tests covering a wide range of scenarios, including complex nested types, various data types, and field name sanitization. I've made a couple of suggestions for minor refactorings to improve code clarity and maintainability. Overall, this is a solid contribution.
packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/schema.py
Outdated
Show resolved
Hide resolved
packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/schema.py
Show resolved
Hide resolved
Add functionality to convert BigQuery table schemas to protocol buffer descriptors. This enables schema conversion for BigQuery Storage Write API operations. - Add table_schema_to_proto_descriptor function - Support basic types, structs, and range fields - Implement field name sanitization and collision avoidance - Add comprehensive test coverage - Update documentation and changelog
Fixes #14277
These changes are comparable to the existing conversion logic already in BQTableSchemaToProtoDescriptor in java-bigquerystorage. For certain use-cases it makes sense to support generating Protobuf descriptors dynamically from BigQuery schemas.