Skip to content

feat: support dynamic conversion of BigQuery schema to Protobuf descriptor#15572

Open
Regan-Koopmans wants to merge 2 commits intogoogleapis:mainfrom
Regan-Koopmans:main
Open

feat: support dynamic conversion of BigQuery schema to Protobuf descriptor#15572
Regan-Koopmans wants to merge 2 commits intogoogleapis:mainfrom
Regan-Koopmans:main

Conversation

@Regan-Koopmans
Copy link

@Regan-Koopmans Regan-Koopmans commented Feb 17, 2026

Fixes #14277

These changes are comparable to the existing conversion logic already in BQTableSchemaToProtoDescriptor in java-bigquerystorage. For certain use-cases it makes sense to support generating Protobuf descriptors dynamically from BigQuery schemas.

@google-cla
Copy link

google-cla bot commented Feb 17, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Regan-Koopmans, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new utility to dynamically convert BigQuery table schemas into Protocol Buffer descriptors. This enhancement streamlines the process of using the BigQuery Storage Write API by allowing users to generate the necessary Protobuf schema on the fly, rather than manually defining and compiling .proto files. It simplifies data ingestion workflows by automating schema translation.

Highlights

  • New Schema Conversion Utility: Introduced a new schema module within google-cloud-bigquery-storage to provide functionality for converting BigQuery TableSchema objects into Protocol Buffer DescriptorProto objects.
  • Dynamic Protobuf Descriptor Generation: Enabled dynamic generation of Protobuf descriptors from BigQuery schemas, eliminating the need for users to manually define and compile .proto files when using the BigQuery Storage Write API.
  • Comprehensive Type and Mode Mapping: Implemented detailed mapping for various BigQuery data types (including primitive types, STRUCT, and RANGE) to their corresponding Protobuf types, and BigQuery field modes (NULLABLE, REQUIRED, REPEATED) to Protobuf field labels.
Changelog
  • packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/init.py
    • Imported the new schema module.
    • Exported schema in __all__ to make it publicly accessible.
  • packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/schema.py
    • Added a new module schema.py containing the table_schema_to_proto_descriptor function.
    • Implemented helper functions _get_field_label, _create_range_descriptor, and _convert_fields_to_proto for detailed schema conversion logic.
    • Defined a mapping _BQ_TO_PROTO_TYPE_MAP for BigQuery to Protobuf type conversion.
  • packages/google-cloud-bigquery-storage/tests/unit/test_schema.py
    • Added a new test file test_schema.py to validate the table_schema_to_proto_descriptor function.
    • Included tests for basic types, special types (DATE, TIMESTAMP, NUMERIC), field modes (NULLABLE, REQUIRED, REPEATED), STRUCT fields, RANGE fields, deeply nested structures, custom message names, field numbering, and empty/complex schemas.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable utility for converting BigQuery schemas to Protobuf descriptors, which is a great addition for users of the Write API. The implementation is well-structured and includes a comprehensive set of unit tests. I've identified a couple of areas for improvement. One is a minor code cleanup to remove an unused parameter. The other is a more important change to handle invalid RANGE field specifications more robustly by raising an error instead of silently using a default, which will improve correctness and prevent subtle bugs.

@Regan-Koopmans Regan-Koopmans force-pushed the main branch 2 times, most recently from af778d7 to 2711330 Compare February 17, 2026 13:53
@Regan-Koopmans Regan-Koopmans changed the title Support conversion of BigQuery schema to Protobuf Support dynamic conversion of BigQuery schema to Protobuf descriptor Feb 17, 2026
@Regan-Koopmans Regan-Koopmans marked this pull request as ready for review February 17, 2026 14:03
@Regan-Koopmans Regan-Koopmans requested review from a team as code owners February 17, 2026 14:03
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for dynamically converting BigQuery schemas to Protobuf descriptors, with the new schema.py module and a comprehensive test suite in test_schema.py. The security review highlighted an unexpected observation: the files packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/__init__.py, packages/google-cloud-bigquery-storage/google/cloud/bigquery_storage_v1/schema.py, and packages/google-cloud-bigquery-storage/tests/unit/test_schema.py appear to be empty. Please verify if this is intended or if content is missing, as no security vulnerabilities were found. For code quality, consider removing an unused function parameter and adding stricter type validation for RANGE fields in schema.py to enhance robustness and prevent potential runtime errors. Overall, this is a great addition.

@Regan-Koopmans Regan-Koopmans changed the title Support dynamic conversion of BigQuery schema to Protobuf descriptor feat: support dynamic conversion of BigQuery schema to Protobuf descriptor Feb 17, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for dynamically converting BigQuery schemas to Protobuf descriptors, which will be very useful for the BigQuery Storage Write API. However, a security audit identified two medium-severity vulnerabilities related to improper input validation. User-controllable names for messages and fields are used to construct the Protobuf descriptor without sanitization, potentially leading to injection attacks (e.g., path traversal) in the downstream BigQuery service. It is recommended to validate all user-provided names against valid Protobuf identifier naming conventions. Additionally, a critical issue was found in schema.py where the code incorrectly uses a non-existent append method on protobuf container objects, which will cause an AttributeError at runtime. Specific suggestions have been provided to replace these with the correct extend method. After addressing these points, the changes will be in excellent condition.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new utility module for dynamically converting BigQuery table schemas into Protobuf descriptors. This is a valuable feature for users of the BigQuery Storage Write API, as it removes the need to manually define .proto files. The implementation covers a wide range of BigQuery types, including nested STRUCTs and RANGE types, and is accompanied by a comprehensive test suite. My main feedback concerns a potential name collision issue when generating nested message types, which could lead to invalid descriptors in some edge cases. Addressing this will make the new utility more robust.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new feature to dynamically convert BigQuery schemas to Protobuf descriptors, which is a valuable addition for users of the BigQuery Storage Write API. The implementation is well-structured and includes a comprehensive set of unit tests.

My review focuses on improving the robustness of the field name sanitization logic to handle all valid BigQuery field names and ensure the generated Protobuf descriptors are always valid. I've also identified some dead code that can be removed and suggested an enhancement to the test suite to cover the sanitization logic.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new feature to dynamically convert BigQuery schemas to Protobuf descriptors, which is a valuable addition for users of the BigQuery Storage Write API. The implementation is well-structured and includes comprehensive unit tests covering a wide range of scenarios, including complex nested types, various data types, and field name sanitization. I've made a couple of suggestions for minor refactorings to improve code clarity and maintainability. Overall, this is a solid contribution.

Add functionality to convert BigQuery table schemas to protocol buffer
descriptors. This enables schema conversion for BigQuery Storage Write API
operations.

- Add table_schema_to_proto_descriptor function
- Support basic types, structs, and range fields
- Implement field name sanitization and collision avoidance
- Add comprehensive test coverage
- Update documentation and changelog
@parthea parthea added kokoro:force-run Add this label to force Kokoro to re-run the tests. kokoro:run Add this label to force Kokoro to re-run the tests. labels Feb 17, 2026
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Feb 17, 2026
@parthea parthea added kokoro:force-run Add this label to force Kokoro to re-run the tests. kokoro:run Add this label to force Kokoro to re-run the tests. labels Feb 18, 2026
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Conversion from bq schema to proto

4 participants

Comments