the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
5 Years of GOSAT-2 Retrievals with RemoTeC: XCO2 and XCH4 Data Products with Quality Filtering by Machine Learning
Abstract. Accurately measuring greenhouse gas concentrations to identify regional sources and sinks is essential for effectively monitoring and mitigating their impact on the Earth’s changing climate. In this article we present the scientific data products of XCO2 and XCH4, retrieved with RemoTeC, from the Greenhouse Gases Observing Satellite-2 (GOSAT-2), which span a time range of five years. GOSAT-2 has the capability to measure total columns of CO2 and CH4 to the necessary requirements set by the Global Climate Observing System (GCOS), who define said requirements as accuracy < 10 ppb and < 0.5 ppm for XCH4 and XCO2 respectively, and stability of < 3 ppb yr−1 and < 0.5 ppm yr−1 for XCH4 and XCO2 respectively.
Central to the quality of the XCO2 and XCH4 datasets is the post-retrieval quality flagging step. Previous versions of RemoTeC products have relied on threshold filtering, flagging data using boundary conditions from a list of retrieval parameters. We present a novel quality filtering approach utilising a machine learning technique known as Random Forest Classifier (RFC) models. This method is developed under the European Space Agency’s (ESA) Climate Change Initiative+ (CCI+) program and applied to data from GOSAT-2. Data from the Total Carbon Column Observing Network (TCCON) are employed to train the RFC models, where retrievals are categorized as good or bad quality based on the bias between GOSAT-2 and TCCON measurements. TCCON is a global network of Fourier transform spectrometers that measure telluric absorption spectra at infrared wavelengths. It serves as the scientific community’s standard for validating satellite-derived XCO2 and XCH4 data. Our results demonstrate that the machine learning-based quality filtering achieves a significant improvement, with data yield increasing by up to 85 % and RMSE improving by up to 30 %, compared to traditional threshold-based filtering. Furthermore, inter-comparison with the TROPOspheric Monitoring Instrument (TROPOMI) indicates that the quality filtering RFC models generalise well to the full dataset, as the expected behaviour is reproduced on a global scale.
Low systematic biases are essential for extracting meaningful fluxes from satellite data products. Through TCCON validation we find that all data products are within the breakthrough bias requirements set, with RMSE for XCH4 <15 ppb and XCO2 <2 ppm. We derive station-to-station biases of 4.2 ppb and 0.5 ppm for XCH4 and XCO2 respectively, and linear drift of 0.6 ppb yr−1 and 0.2 ppm yr−1 for XCH4 and XCO2 respectively.
For XCH4, GOSAT-2 and TROPOMI are highly correlated with standard deviations less than 18 ppb and globally averaged biases close to 0 ppb. The inter-satellite bias between GOSAT and GOSAT-2 is significant, with an average global bias of -15 ppb. This is comparable to that seen between GOSAT and TROPOMI, consistent with our findings that GOSAT-2 and TROPOMI are in close agreement.
Competing interests: Three of the co-authors are members of the editorial board for Atmospheric Measurement Techniques in the subject area of Gases
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.- Preprint
(8991 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 29 Jul 2025)
-
RC1: 'Comment on egusphere-2024-3990', Robert Parker, 02 Jul 2025
reply
Reivew of Barr et al., 5 Years of GOSAT-2 Retrievals with RemoTeC: XCO2 and XCH4 Data Products with Quality Filtering by Machine Learning
Here the authors present new GOSAT-2 datasets based on the RemoTeC algorithm for XCO2 and XCH4. They also deploy a new method for post-retrieval quality filtering that substantially improves the data yield.
This is a very nice study that should help advance how we handle the data quality of these sorts of retrievals in the future, with the method generally applicable to future missions . Overall, I recommend this manuscript for publication after addressing the following comments. These are ordered by line-number and the majority just seek additional clarification, justification or discussion to be added to the text.
L12 - Do TCCON/GOSAT-2 co-locations sample an adequate range of geophysical parameters (aerosols, albedos, etc) to produce a robust post-filter? Many approaches use multiple “truth metrics” (e.g. TCCON, models, small area approximation, etc). You deliberately include some high-albedo cases into your training data to compensate for this but do I understand correctly that you do not have TCCON co-locations for these? How confident are you that this data is not biased differently?
L24 - Why do GOSAT and GOSAT-2 disagree (with the same algorithm and priors?) compared to TROPOMI?
L31 - Some minor typos should be corrected, e.g. “anomoly”.
L72 - Is the stated TCCON performance (L72) relevant for GGG2020 or does it relate to older GGG2024 data?
L83 - Rather than footnotes for the ESA CCI documents (L83/84), can they please be cited fully in the bibliography?
L87 - The instrument line shape for GOSAT-2 potentially has caused some issues with other retrieval algorithms. Could you please elaborate on its usage here (L87) and whether, if at all, you do anything to compensate for these (e.g. fitting a shift/stretch). I would also strongly recommend including a figure of a typical spectral residual for each species so the quality of the fit can be shown.
L147 – Would it be possible to outline your full state vector (or explicitly link back to section/table in previous work where this is fully described)?
Table 1 – Can it please be made explicit in the table which of these do not apply to the Proxy (i.e. 6-8?).
L157 - Can you specify which TCCON version is used for this study (and ideally the temporal extent of the different TCCON datasets)?
L158 – Given that the XCH4 FP and Proxy (post-bias correction) have quite different biases, can you also compare the non-bias corrected data? Is this difference in bias coming from the original data or from the correction?
L201 - Can you elaborate further how the value of XT is decided upon?
L211 - Does taking the different filtering approaches for land vs glint (L211) lead to significant differences in the sampling statistics? Could this lead to ocean/land biases in the final data? It would also be good to see how the data looks for a few single orbits that pass over both ocean/land.
L217 – How true is that assumption and is it robust to issues such as sensor degradation (that we know GOSAT/GOSAT-2 can suffer from) which have a strong temporal component.
L237 – This may not be easily possible but it would be very interesting if the models for the different years themselves were all similar. They all clearly give consistent results but are the models compensating for different things in different years, to varying degrees. Some examination of the potential explainability of these models would be great but may not be possible.
L244 – This seems to be saying that as there’s typically much less “bad” Proxy data, it is harder to identify the bad cases as they stand out less. Could you evaluate the model CO2 used in the proxy in a similar way here to separate the components?
L259 – Can you elaborate on why XCH4 and XCO2 FP (from the same retrieval?) have different yields?
L269 – Lots of mentions of TCCON prior to defining that it’s GGG2020 that is used. I’d mention this sooner.
Figure 3 – Am I correct that this mixes together land and glint data? Can you separate the two out for some sites to better understand any ocean/land bias in the data?
L304 – Is this a fair comparison between GOSAT and GOSAT-2 when you have applied this new post-filtering and hence can define/tune the RMSE for GOSAT-2?
L316 – Minor grammar issue “by in the ratio”
L337 – How are you matching GOSAT to GOSAT-2 data? i.e. What criteria do you use to find co-located soundings?
Figure 7 – Can you show similar maps for the other 2 products? Maybe in an appendix?
L381 – Can you outline how you match GOSAT-2 to TROPOMI?
L387 – I don’t quite understand this point about the bias remaining “effectively constant”. The bias increases with QA value doesn’t it? (i.e. from -4.6 to -6.3 ppb). Can you also comment on why the proxy bias seems to systematically decrease with increasing QA?
L390 – typo (missing word after Northern?)
Citation: https://doi.org/10.5194/egusphere-2024-3990-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
229 | 39 | 11 | 279 | 13 | 19 |
- HTML: 229
- PDF: 39
- XML: 11
- Total: 279
- BibTeX: 13
- EndNote: 19
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1