5 Years of GOSAT-2 Retrievals with RemoTeC: XCO<sub>2</sub> and XCH<sub>4</sub> Data Products with Quality Filtering by Machine Learning

Barr, Andrew Gerald; Landgraf, Jochen; Martinez-Velarte, Mari; Vrekoussis, Mihalis; Sussmann, Ralf; Morino, Isamu; Strong, Kimberly; Zhou, Minqiang; Velazco, Voltaire A.; Ohyama, Hirofumi; Warneke, Thorsten; Hase, Frank; Borsdorff, Tobias

doi:https://doi.org/10.5194/egusphere-2024-3990

Preprints

https://doi.org/10.5194/egusphere-2024-3990

Preprints

24 Apr 2025

| 24 Apr 2025

Status: this preprint is open for discussion and under review for Atmospheric Measurement Techniques (AMT).

5 Years of GOSAT-2 Retrievals with RemoTeC: XCO₂ and XCH₄ Data Products with Quality Filtering by Machine Learning

Andrew Gerald Barr, Jochen Landgraf, Mari Martinez-Velarte, Mihalis Vrekoussis, Ralf Sussmann, Isamu Morino, Kimberly Strong, Minqiang Zhou, Voltaire A. Velazco, Hirofumi Ohyama, Thorsten Warneke, Frank Hase, and Tobias Borsdorff

Abstract. Accurately measuring greenhouse gas concentrations to identify regional sources and sinks is essential for effectively monitoring and mitigating their impact on the Earth’s changing climate. In this article we present the scientific data products of XCO₂ and XCH₄, retrieved with RemoTeC, from the Greenhouse Gases Observing Satellite-2 (GOSAT-2), which span a time range of five years. GOSAT-2 has the capability to measure total columns of CO₂ and CH₄ to the necessary requirements set by the Global Climate Observing System (GCOS), who define said requirements as accuracy < 10 ppb and < 0.5 ppm for XCH₄ and XCO₂ respectively, and stability of < 3 ppb yr⁻¹ and < 0.5 ppm yr⁻¹ for XCH₄ and XCO₂ respectively.

Central to the quality of the XCO₂ and XCH₄ datasets is the post-retrieval quality flagging step. Previous versions of RemoTeC products have relied on threshold filtering, flagging data using boundary conditions from a list of retrieval parameters. We present a novel quality filtering approach utilising a machine learning technique known as Random Forest Classifier (RFC) models. This method is developed under the European Space Agency’s (ESA) Climate Change Initiative+ (CCI+) program and applied to data from GOSAT-2. Data from the Total Carbon Column Observing Network (TCCON) are employed to train the RFC models, where retrievals are categorized as good or bad quality based on the bias between GOSAT-2 and TCCON measurements. TCCON is a global network of Fourier transform spectrometers that measure telluric absorption spectra at infrared wavelengths. It serves as the scientific community’s standard for validating satellite-derived XCO₂ and XCH₄ data. Our results demonstrate that the machine learning-based quality filtering achieves a significant improvement, with data yield increasing by up to 85 % and RMSE improving by up to 30 %, compared to traditional threshold-based filtering. Furthermore, inter-comparison with the TROPOspheric Monitoring Instrument (TROPOMI) indicates that the quality filtering RFC models generalise well to the full dataset, as the expected behaviour is reproduced on a global scale.

Low systematic biases are essential for extracting meaningful fluxes from satellite data products. Through TCCON validation we find that all data products are within the breakthrough bias requirements set, with RMSE for XCH₄ <15 ppb and XCO₂ <2 ppm. We derive station-to-station biases of 4.2 ppb and 0.5 ppm for XCH₄ and XCO₂ respectively, and linear drift of 0.6 ppb yr⁻¹ and 0.2 ppm yr⁻¹ for XCH₄ and XCO₂ respectively.

For XCH₄, GOSAT-2 and TROPOMI are highly correlated with standard deviations less than 18 ppb and globally averaged biases close to 0 ppb. The inter-satellite bias between GOSAT and GOSAT-2 is significant, with an average global bias of -15 ppb. This is comparable to that seen between GOSAT and TROPOMI, consistent with our findings that GOSAT-2 and TROPOMI are in close agreement.

Received: 18 Dec 2024 – Discussion started: 24 Apr 2025

Competing interests: Three of the co-authors are members of the editorial board for Atmospheric Measurement Techniques in the subject area of Gases

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Status: open (until 29 Jul 2025)

Post a comment Subscribe to comment alert

RC1: 'Comment on egusphere-2024-3990', Robert Parker, 02 Jul 2025 reply

Reivew of Barr et al., 5 Years of GOSAT-2 Retrievals with RemoTeC: XCO₂ and XCH₄ Data Products with Quality Filtering by Machine Learning
Here the authors present new GOSAT-2 datasets based on the RemoTeC algorithm for XCO₂ and XCH₄. They also deploy a new method for post-retrieval quality filtering that substantially improves the data yield.
This is a very nice study that should help advance how we handle the data quality of these sorts of retrievals in the future, with the method generally applicable to future missions . Overall, I recommend this manuscript for publication after addressing the following comments. These are ordered by line-number and the majority just seek additional clarification, justification or discussion to be added to the text.
L12 - Do TCCON/GOSAT-2 co-locations sample an adequate range of geophysical parameters (aerosols, albedos, etc) to produce a robust post-filter? Many approaches use multiple “truth metrics” (e.g. TCCON, models, small area approximation, etc). You deliberately include some high-albedo cases into your training data to compensate for this but do I understand correctly that you do not have TCCON co-locations for these? How confident are you that this data is not biased differently?
L24 - Why do GOSAT and GOSAT-2 disagree (with the same algorithm and priors?) compared to TROPOMI?
L31 - Some minor typos should be corrected, e.g. “anomoly”.
L72 - Is the stated TCCON performance (L72) relevant for GGG2020 or does it relate to older GGG2024 data?
L83 - Rather than footnotes for the ESA CCI documents (L83/84), can they please be cited fully in the bibliography?
L87 - The instrument line shape for GOSAT-2 potentially has caused some issues with other retrieval algorithms. Could you please elaborate on its usage here (L87) and whether, if at all, you do anything to compensate for these (e.g. fitting a shift/stretch). I would also strongly recommend including a figure of a typical spectral residual for each species so the quality of the fit can be shown.
L147 – Would it be possible to outline your full state vector (or explicitly link back to section/table in previous work where this is fully described)?
Table 1 – Can it please be made explicit in the table which of these do not apply to the Proxy (i.e. 6-8?).
L157 - Can you specify which TCCON version is used for this study (and ideally the temporal extent of the different TCCON datasets)?
L158 – Given that the XCH4 FP and Proxy (post-bias correction) have quite different biases, can you also compare the non-bias corrected data? Is this difference in bias coming from the original data or from the correction?
L201 - Can you elaborate further how the value of X_T is decided upon?
L211 - Does taking the different filtering approaches for land vs glint (L211) lead to significant differences in the sampling statistics? Could this lead to ocean/land biases in the final data? It would also be good to see how the data looks for a few single orbits that pass over both ocean/land.
L217 – How true is that assumption and is it robust to issues such as sensor degradation (that we know GOSAT/GOSAT-2 can suffer from) which have a strong temporal component.
L237 – This may not be easily possible but it would be very interesting if the models for the different years themselves were all similar. They all clearly give consistent results but are the models compensating for different things in different years, to varying degrees. Some examination of the potential explainability of these models would be great but may not be possible.
L244 – This seems to be saying that as there’s typically much less “bad” Proxy data, it is harder to identify the bad cases as they stand out less. Could you evaluate the model CO₂ used in the proxy in a similar way here to separate the components?
L259 – Can you elaborate on why XCH₄ and XCO₂ FP (from the same retrieval?) have different yields?
L269 – Lots of mentions of TCCON prior to defining that it’s GGG2020 that is used. I’d mention this sooner.
Figure 3 – Am I correct that this mixes together land and glint data? Can you separate the two out for some sites to better understand any ocean/land bias in the data?
L304 – Is this a fair comparison between GOSAT and GOSAT-2 when you have applied this new post-filtering and hence can define/tune the RMSE for GOSAT-2?
L316 – Minor grammar issue “by in the ratio”
L337 – How are you matching GOSAT to GOSAT-2 data? i.e. What criteria do you use to find co-located soundings?
Figure 7 – Can you show similar maps for the other 2 products? Maybe in an appendix?
L381 – Can you outline how you match GOSAT-2 to TROPOMI?
L387 – I don’t quite understand this point about the bias remaining “effectively constant”. The bias increases with QA value doesn’t it? (i.e. from -4.6 to -6.3 ppb). Can you also comment on why the proxy bias seems to systematically decrease with increasing QA?
L390 – typo (missing word after Northern?)

OSZAR »

Reply

Citation: https://doi.org/10.5194/egusphere-2024-3990-RC1

Viewed

Total article views: 279 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
229	39	11	279	13	19

HTML: 229
PDF: 39
XML: 11
Total: 279
BibTeX: 13
EndNote: 19

Views and downloads (calculated since 24 Apr 2025)

Month	HTML	PDF	XML	Total
Apr 2025	84	10	3	97
May 2025	66	12	2	80
Jun 2025	58	11	2	71
Jul 2025	21	6	4	31

Cumulative views and downloads (calculated since 24 Apr 2025)

Month	HTML	PDF	XML	Total
Apr 2025	84	10	3	97
May 2025	66	12	2	80
Jun 2025	58	11	2	71
Jul 2025	21	6	4	31

Viewed (geographical distribution)

Total article views: 268 (including HTML, PDF, and XML) Thereof 268 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 07 Jul 2025

Short summary

In 2019 GOSAT-2 was launched, to realise the second in a series of satellites dedicated to measuring concentrations of greenhouse gases from space. The datasets obtained from GOSAT-2 are used in the Copernicus atmospheric services to monitor the climate, in light of the Paris Agreement. Over the five years the increase of CH₄ and CO₂ concentration in the atmosphere is clear. Here we present three robust datasets from GOSAT-2, including a novel machine learning approach to data quality filtering.


Total:	0
HTML:	0
PDF:	0
XML:	0

5 Years of GOSAT-2 Retrievals with RemoTeC: XCO2 and XCH4 Data Products with Quality Filtering by Machine Learning

Viewed

Viewed (geographical distribution)

5 Years of GOSAT-2 Retrievals with RemoTeC: XCO₂ and XCH₄ Data Products with Quality Filtering by Machine Learning