Skip to content

Climate Trace GHG Emissions refresh#1867

Open
HarishC727 wants to merge 7 commits intodatacommonsorg:masterfrom
HarishC727:climate_trace_ref
Open

Climate Trace GHG Emissions refresh#1867
HarishC727 wants to merge 7 commits intodatacommonsorg:masterfrom
HarishC727:climate_trace_ref

Conversation

@HarishC727
Copy link
Contributor

@HarishC727 HarishC727 commented Feb 5, 2026

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @HarishC727, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes a robust and automated pipeline for integrating ClimateTrace's Greenhouse Gas emissions data. By leveraging a new Python script, the system efficiently fetches, processes, and standardizes vast amounts of country and gas-specific emissions data, making it readily available for analysis and ingestion into platforms like Data Commons. This enhancement significantly improves the freshness and accessibility of critical environmental data.

Highlights

  • New ClimateTrace Data Processor: Introduced a Python-based system to download, process, and segregate Greenhouse Gas (GHG) emissions data from the ClimateTrace API.
  • Parallel Data Acquisition: Implemented efficient parallel downloading of country-specific and gas-specific data zip files.
  • Data Segregation by Gas Type: The system now segregates emissions data into separate CSV files for CO2, CH4, N2O, CO2e_20yr, and CO2e_100yr.
  • Data Commons Integration: Added comprehensive mapping files (climate_pvmap_sectors.csv, climate_pvmap_subsectors.csv, statvar_remap.csv) and configuration for seamless ingestion into Data Commons.
  • Automated Refresh Schedule: A manifest.json file is included to define the import specifications and set up a cron schedule for regular data refreshes.
Changelog
  • statvar_imports/climate_trace/README.md
    • Added a detailed guide for the ClimateTrace GHG Emissions Data Processor, covering its features, data sources, setup, usage, and project structure.
  • statvar_imports/climate_trace/check_country.csv
    • Introduced a list of ISO country codes to ensure specific countries are always included in the data acquisition process.
  • statvar_imports/climate_trace/climate_pvmap_sectors.csv
    • Created a mapping file to standardize sector-level ClimateTrace data for Data Commons, including various greenhouse gases and emission categories.
  • statvar_imports/climate_trace/climate_pvmap_subsectors.csv
    • Provided a more granular mapping file for subsector-level ClimateTrace data, enhancing the detail of emission source categorization for Data Commons.
  • statvar_imports/climate_trace/common_metadata.csv
    • Added a configuration file to define essential metadata parameters for the data processing pipeline.
  • statvar_imports/climate_trace/common_output.tmcf
    • Included a template MCF file to specify the output structure for the processed ClimateTrace data, aligning it with Data Commons requirements.
  • statvar_imports/climate_trace/download_and_segregate_by_gas.py
    • Implemented the core Python script for dynamic country list generation, parallel downloading of zipped ClimateTrace data, and segregation of emissions by gas type.
  • statvar_imports/climate_trace/manifest.json
    • Defined the import specifications for ClimateTrace data, including the execution scripts, input/output paths, and a cron schedule for automated updates.
  • statvar_imports/climate_trace/statvar_remap.csv
    • Added a remapping file to convert internal statistical variable names to Data Commons identifiers, particularly for subsector data.
  • statvar_imports/climate_trace/test_data/sample_input.csv
    • Added a sample CSV file to serve as input for testing the data processing logic.
  • statvar_imports/climate_trace/test_data/sample_output.csv
    • Included a sample CSV file demonstrating the expected output format after data processing.
  • statvar_imports/climate_trace/test_data/sample_output.tmcf
    • Provided a sample template MCF file for validating the structure of the processed test output.
Activity
  • This pull request represents the initial implementation of the ClimateTrace GHG Emissions data refresh pipeline. All changes are new file additions, setting up the infrastructure for this data integration.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new data processing pipeline for Climate Trace GHG emissions data. The changes include a Python script for downloading and segregating data, along with necessary configuration files, documentation, and test data. My review has identified a critical issue in manifest.json due to a typo that would break the import. I've also found a data correctness issue in one of the pvmap files and several opportunities to improve code maintainability in the Python script by using the logging module. Additionally, there are minor issues in the documentation and a configuration file that I've provided suggestions for.

@pulkeet-wq
Copy link
Contributor

/gcbrun

@saanikaaa
Copy link
Contributor

Pls add PR checklist in description

@HarishC727
Copy link
Contributor Author

Pls add PR checklist in description

Done

@saanikaaa
Copy link
Contributor

LGTM

@@ -0,0 +1,6 @@
parameter,value
header_rows,1
output_columns,"observationAbout,observationDate,variableMeasured,value"
Copy link
Contributor

@ajaits ajaits Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about unit, observationPeriod?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

final_df.to_csv(output_filename, index=False)
logging.info(f" -> Successfully created {output_filename} with {len(final_df)} rows.\n")
except Exception as e:
logging.error(f" -> An error occurred during the final processing for {gas}: {e}\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please raise the exception again after log statement so that import script can stop on error.

"provenance_description": "Climate Trace data provides comprehensive, granular, and up-to-date greenhouse gas emissions data using satellite imagery and other public data sources.",
"scripts": [
"download_and_segregate_by_gas.py",
"../../tools/statvar_importer/stat_var_processor.py --input_data=input_files/all_countries_ch4.csv --pv_map=climate_pvmap_sectors.csv --config_file=common_metadata.csv --output_path=output/sectors_CH4_output --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How large are the input files?

if they are small enough (<100Mb), can we merge the pvmap and process all CSvs together?

../../tools/statvar_importer/stat_var_processor.py --input_data=input_files/*.csv --pv_map=climate_pvmap.csv --config_file=common_metadata.csv --output_path=output/climate_trace_output --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

2028-01-01 00:00:00,observationDate,2028,,,,,,,,,,,,
2029-01-01 00:00:00,observationDate,2029,,,,,,,,,,,,
2030-01-01 00:00:00,observationDate,2030,,,,,,,,,,,,
emissions_quantity,value,{Number},populationType,Emissions,measuredProperty,amount,measurementQualifier,Annual,#Aggregate,sum,observationPeriod,P1Y,unit,MetricTon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a note in the README on what data is being aggregated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,6 @@
parameter,value
header_rows,1
output_columns,"observationAbout,observationDate,variableMeasured,value"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add measurementMethod as well if data is being aggregated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added - unit, measurementmethod and observationperiod

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants