Data Assimilation NetworksPierre Boudier, Anthony Fillion, Serge Gratton et al.|Journal of Advances in Modeling Earth Systems|2023 Abstract Data Assimilation aims at estimating the posterior conditional probability density functions based on error statistics of the noisy observations and the dynamical system. State of the art methods are sub‐optimal due to the common use of Gaussian error statistics and the linearization of the non‐linear dynamics. To achieve a good performance, these methods often require case‐by‐case fine‐tuning by using explicit regularization techniques such as inflation and localization. In this paper, we propose a fully data driven deep learning framework generalizing recurrent Elman networks and data assimilation algorithms. Our approach approximates a sequence of prior and posterior densities conditioned on noisy observations using a log‐likelihood cost function . By construction our approach can then be used for general nonlinear dynamics and non‐Gaussian densities. As a first step, we evaluate the performance of the proposed approach by using fully and partially observed Lorenz‐95 system in which the outputs of the recurrent network are fitted to Gaussian densities. We numerically show that our approach, without using any explicit regularization technique , achieves comparable performance to the state‐of‐the‐art methods, IEnKF‐Q and LETKF, across various ensemble size.
Quasi-static ensemble variational data assimilation: a theoretical and numerical study with the iterative ensemble Kalman smootherAnthony Fillion, Marc Bocquet, Serge Gratton|Nonlinear processes in geophysics|2018 Abstract. The analysis in nonlinear variational data assimilation is the solution of a non-quadratic minimization. Thus, the analysis efficiency relies on its ability to locate a global minimum of the cost function. If this minimization uses a Gauss–Newton (GN) method, it is critical for the starting point to be in the attraction basin of a global minimum. Otherwise the method may converge to a local extremum, which degrades the analysis. With chaotic models, the number of local extrema often increases with the temporal extent of the data assimilation window, making the former condition harder to satisfy. This is unfortunate because the assimilation performance also increases with this temporal extent. However, a quasi-static (QS) minimization may overcome these local extrema. It accomplishes this by gradually injecting the observations in the cost function. This method was introduced by Pires et al. (1996) in a 4D-Var context. We generalize this approach to four-dimensional strong-constraint nonlinear ensemble variational (EnVar) methods, which are based on both a nonlinear variational analysis and the propagation of dynamical error statistics via an ensemble. This forces one to consider the cost function minimizations in the broader context of cycled data assimilation algorithms. We adapt this QS approach to the iterative ensemble Kalman smoother (IEnKS), an exemplar of nonlinear deterministic four-dimensional EnVar methods. Using low-order models, we quantify the positive impact of the QS approach on the IEnKS, especially for long data assimilation windows. We also examine the computational cost of QS implementations and suggest cheaper algorithms.
An Iterative Ensemble Kalman Smoother in Presence of Additive Model ErrorAnthony Fillion, Marc Bocquet, Serge Gratton et al.|SIAM/ASA Journal on Uncertainty Quantification|2020 Ensemble variational methods are being increasingly used in the field of geophysical data assimilation. Their efficiency comes from the combined use of ensembles, which provide statistics estimates, and a variational analysis, which handles nonlinear operators through iterative optimization techniques. Taking model error into account in four-dimensional ensemble variational algorithms is challenging because the state trajectory over the data assimilation window (DAW) is no longer determined by its sole initial condition. In particular, the control variable dimension scales with the DAW length, which yields a high numerical complexity. This is unfortunate since accuracy improvement is expected with longer DAWs. Building upon the work of [P. Sakov and M. Bocquet, Tellus A, 70 (2018), 1414545], this paper discusses how to algorithmically construct and numerically test an iterative ensemble Kalman smoother with additive model error (IEnKS-Q) which is thought to be the natural weak constraint generalization of the IEnKS [M. Bocquet and P. Sakov, Quart. J. Roy. Meteorol. Soc., 140 (2014), pp. 1521--1535], as well as the generalization of IEnKF-Q [P. Sakov, J. Haussaire, and M. Bocquet, Quart. J. Roy. Meteorol. Soc., 144 (2018), pp. 1297--1309] to general DAWs. The number of model evaluations per cycle of the IEnKS-Q is also examined. Solutions based on perturbation decomposition are proposed to dissociate those numerically costly evaluations from the control variable dimension.
Latent space data assimilation by using deep learningMathis Peyron, Anthony Fillion, Selime Gürol et al.|Quarterly Journal of the Royal Meteorological Society|2021 Abstract Performing data assimilation (DA) at low cost is of prime concern in Earth system modeling, particularly in the era of Big Data, where huge quantities of observations are available. Capitalizing on the ability of neural network techniques to approximate the solution of partial differential equations (PDEs), we incorporate deep learning (DL) methods into a DA framework. More precisely, we exploit the latent structure provided by autoencoders (AEs) to design an ensemble transform Kalman filter with model error (ETKF‐Q) in the latent space. Model dynamics are also propagated within the latent space via a surrogate neural network. This novel ETKF‐Q‐Latent (ETKF‐Q‐L) algorithm is tested on a tailored instructional version of Lorenz 96 equations, named the augmented Lorenz 96 system , which possesses a latent structure that accurately represents the observed dynamics. Numerical experiments based on this particular system evidence that the ETKF‐Q‐L approach both reduces the computational cost and provides better accuracy than state‐of‐the‐art algorithms such as the ETKF‐Q.
DAN - An optimal Data Assimilation framework based on machine learning Recurrent Networks.Pierre Boudier, Anthony Fillion, Serge Gratton et al.|arXiv (Cornell University)|2020 Data assimilation algorithms aim at forecasting the state of a dynamical system by combining a mathematical representation of the system with noisy observations thereof. We propose a fully data driven deep learning architecture generalizing recurrent Elman networks and data assimilation algorithms which provably reaches the same prediction goals as the latter. On numerical experiments based on the well-known Lorenz system and when suitably trained using snapshots of the system trajectory (i.e. batches of state trajectories) and observations, our architecture successfully reconstructs both the analysis and the propagation of probability density functions of the system state at a given time conditioned to past observations.