This section collects the atmospheric datasets produced by the Complete Data Fusion (CDF) algorithm — the output products of the fusion process. Each CDF dataset is a new atmospheric product obtained by fusing two or more tested input datasets using the CDF algorithm, inheriting and improving upon the information content of its sources.
A CDF-fused product is itself a full optimal-estimation product: it comes with its own state vector, averaging kernel matrix, total error covariance matrix, and a priori information. It can therefore be subjected to the same quality tests applied to the input datasets (auto-consistency, completeness), and can in principle be used as input to further fusion steps. The key difference with respect to the input datasets is that a CDF dataset must also demonstrate, through independent validation, that the fusion has produced a genuine improvement over the individual inputs — not merely a formal combination.
Central goal of the EMM project. The production and public release of validated CDF datasets is the primary deliverable of the EMM project. The Tested Datasets section documents what goes into the CDF algorithm and verifies that the inputs are suitable; this section documents what comes out — the fused products, their quality characterization, and their validation against independent measurements. The two sections are designed to be read together, providing a complete, traceable chain from raw satellite products to fused atmospheric data.
What defines a CDF Dataset
Each CDF dataset is uniquely identified by:
- Input combination — the specific set of satellite instruments and retrieval products that were fused (e.g., MIPAS + IASI/FORLI)
- Atmospheric constituent — the target variable of the fused product (e.g., ozone, temperature)
- CDF configuration — the algorithm version (CDF(2022) or CDF(2015)), the reference vertical grid and a priori, and the strategy adopted for interpolation and coincidence errors
- Temporal and spatial coverage — the period, geographic extent, and spatio-temporal resolution of the production
For each dataset, the improvement introduced by the fusion is quantified through two complementary approaches:
- Internal characterization — comparison of the fused product’s degrees of freedom (DOFs), averaging kernel diagonal, and total error profiles against those of the individual input datasets. A successful fusion must yield higher DOFs and lower total errors.
- Independent validation — comparison of the fused product against reference measurements not used in the fusion (e.g., ozonesondes, ground-based lidars, independent satellite products). The validation must demonstrate that the improvement predicted by the internal characterization translates into a real reduction in bias and/or variability.
Template for CDF Dataset documentation
Each CDF dataset is documented in a dedicated child page following a standardized template. The template mirrors and extends the structure used for the input datasets, adding the sections specific to fused products (fusion configuration, improvement characterization, validation). The sections are:
| § | Section | Content |
|---|---|---|
| 1 | General information | Constituent, input instruments, observation period, geographic coverage, spatial and temporal resolution |
| 2 | Input datasets | Links to the Tested Datasets pages for each input product, with a summary of their auto-consistency test results. This provides full traceability of the fused product back to its sources. |
| 3 | CDF configuration | Algorithm version (CDF(2022) / CDF(2015)), reference vertical grid and a priori, coincidence criteria (spatial and temporal windows), interpolation error model, coincidence error model, Gram–Schmidt basis expansion (if applicable) |
| 4 | Product characterization | Averaging kernel diagonal, DOFs, total error profiles — shown alongside the corresponding quantities of the input datasets to demonstrate the improvement introduced by the fusion |
| 5 | Auto-consistency test | The fused product is itself a full OE product: the same auto-consistency tests applied to the input datasets are applied to the fused product to verify its internal coherence |
| 6 | Independent validation | Comparison with reference measurements not used in the fusion (ozonesondes, ground-based instruments, independent satellite products). Bias profiles, standard deviation, correlation statistics — shown for both the fused product and the individual inputs, to quantify the added value of the fusion |
| 7 | Data access and citation | FAIR references: persistent identifier (DOI), download URL, data format, licence, recommended citation. The goal is to make all CDF datasets publicly available for the scientific community. |
| 8 | Related work | Links to the pilot study that preceded the production, to the published paper describing the dataset (if available), and to the bibliography |
Connection to the input datasets. The template is deliberately structured to echo the Tested Datasets pages: the same tests that verify the quality of the inputs are also applied to the outputs, ensuring a consistent quality framework throughout the entire CDF processing chain. The key additions in the CDF Dataset template are §3 (fusion configuration, which has no counterpart in the input template) and §6 (independent validation, which goes beyond the auto-consistency tests used for inputs).
Dataset index
| CDF Dataset | Input instruments | Constituent | Period | Reference | Status |
|---|---|---|---|---|---|
| MIPAS+IASI O3 | MIPAS/IFAC (limb) + IASI/AERIS (nadir TIR) | O3 | 2008–2011 | Guidetti et al. (2026) | In production |
| MIPAS+GOME-2 O3 | MIPAS/IFAC (limb) + GOME-2/AC-SAF (nadir UV) | O3 | 2008–2011 | — | Planned |
| MIPAS+IASI+GOME-2 O3 | MIPAS/IFAC + IASI/AERIS + GOME-2/AC-SAF | O3 | 2008–2011 | — | Future |
Legend — status: Published = dataset produced, validated, and publicly available with DOI · In production = dataset being produced, publication in progress · Planned = dedicated tuning and validation study required before production · Future = target combination identified, feasibility demonstrated in pilot studies.
MIPAS+IASI O3 — first CDF dataset
The first CDF dataset produced from real satellite observations combines MIPAS (limb, Envisat) and IASI (nadir TIR, Metop) ozone profiles over the period 2008–2011. The dataset was developed and validated in the framework of L. Guidetti’s PhD project and is described in Guidetti et al. (2026).
The key scientific result is that the contribution of MIPAS improves the quality of the IASI product even in the troposphere, where MIPAS itself does not measure — a direct demonstration of the information propagation mechanism in the CDF framework. The fused product also enables the detection and characterization of stratospheric ozone intrusions that are not resolved by either instrument individually.
Traceability. The input MIPAS dataset is documented in the MIPAS/IFAC tested dataset page; the input IASI dataset in the IASI/AERIS page. Both pass the CDF(2022) auto-consistency test. The exploratory studies that preceded this production are described in the Pilot Studies section.
Planned and future datasets
MIPAS+GOME-2 O3
The exploratory characterization carried out in the pilot studies has shown that the MIPAS+GOME-2 fusion yields DOFs and error reduction comparable to MIPAS+IASI when both are expressed on the same grid and a priori. However, a dedicated tuning study is required before production: the coincidence and interpolation error strategies must be optimized specifically for the MIPAS+GOME-2 combination, and the fused product must be validated against independent reference measurements (ozonesondes) with the same rigour applied to the MIPAS+IASI dataset.
MIPAS+IASI+GOME-2 O3 — three-instrument gridded product
The combination of all three instruments — MIPAS (limb), IASI (nadir TIR), and GOME-2 (nadir UV) — on a regular 1°×1° grid over the full 2008–2011 period would represent the most comprehensive CDF ozone product achievable from these missions. The feasibility of this combination has been demonstrated in the pilot studies, and daily coverage analysis shows that the three instruments together provide dense spatial sampling. This dataset would combine MIPAS’s vertical resolution in the stratosphere with the complementary tropospheric sensitivity of IASI and GOME-2, and the dense horizontal coverage of the nadir instruments. Its production requires both the MIPAS+IASI and MIPAS+GOME-2 fusion chains to be individually validated and optimized.
Guiding principles
The production and documentation of CDF datasets follows these principles:
- Full traceability — every fused product is linked back to its input datasets, which are independently tested and documented in the Tested Datasets section. The fusion configuration (algorithm version, grid, a priori, error strategies) is recorded in detail.
- Self-consistency — the fused product is treated as a first-class OE product and subjected to the same auto-consistency tests used for input datasets. A fused product that fails its own auto-consistency test signals a problem in the fusion process.
- Independent validation — internal characterization (DOFs, errors) is necessary but not sufficient. Each dataset must be validated against reference measurements that were not part of the fusion, and the validation must cover both the tuning period and an independent temporal segment.
- FAIR data — all validated CDF datasets will be published with persistent identifiers (DOIs), open access, standardized metadata, and recommended citations, following the FAIR principles for scientific data.
- Reusability — because CDF-fused products carry full OE characterization (state vector, AK, VCM, a priori), they can be ingested by data assimilation systems, used as input to further CDF steps, or compared with model output using standard AK-smoothing techniques.
Related pages
- Tested Datasets — the input datasets used by the CDF algorithm, with completeness and auto-consistency test results
- Pilot Studies — exploratory fusion experiments on real data that precede systematic production
- CDF Algorithm — mathematical formulation, prerequisites, and test descriptions
- CDF Tests — auto-consistency and mono-type fusion test descriptions
- Bibliography — annotated references for the CDF algorithm and its applications
