Cristina Prieto

What Role Does Hydrological Science Play in the Age of Machine Learning?

Grey Nearing, Frederik Kratzert, Alden Keefe Sampson et al.|Water Resources Research|2020

Cited by 768Open Access

Abstract This paper is derived from a keynote talk given at the Google's 2020 Flood Forecasting Meets Machine Learning Workshop. Recent experiments applying deep learning to rainfall‐runoff simulation indicate that there is significantly more information in large‐scale hydrological data sets than hydrologists have been able to translate into theory or models. While there is a growing interest in machine learning in the hydrological sciences community, in many ways, our community still holds deeply subjective and nonevidence‐based preferences for models based on a certain type of “process understanding” that has historically not translated into accurate theory, models, or predictions. This commentary is a call to action for the hydrology community to focus on developing a quantitative understanding of where and when hydrological process understanding is valuable in a modeling discipline increasingly dominated by machine learning. We offer some potential perspectives and preliminary examples about how this might be accomplished.

A Ranking of Hydrological Signatures Based on Their Predictability in Space

Nans Addor, Grey Nearing, Cristina Prieto et al.|Water Resources Research|2018

Cited by 356Open Access

Abstract Hydrological signatures are now used for a wide range of purposes, including catchment classification, process exploration, and hydrological model calibration. The recent boost in the popularity and number of signatures has however not been accompanied by the development of clear guidance on signature selection. Here we propose that exploring the predictability of signatures in space provides important insights into their drivers and their sensitivity to data uncertainties and is hence useful for signature selection. We use three complementary approaches to compare and rank 15 commonly used signatures, which we evaluate in 600+ U.S. catchments from the Catchment Attributes and MEteorology for Large‐sample Studies (CAMELS) data set. First, we employ machine learning (random forests) to explore how attributes characterizing the climatic conditions, topography, land cover, soil, and geology influence (or not) the signatures. Second, we use simulations of the Sacramento Soil Moisture Accounting model to benchmark the random forest predictions. Third, we take advantage of the large sample of CAMELS catchments to characterize the spatial autocorrelation (using Moran's I ) of the signature field. These three approaches lead to remarkably similar rankings of the signatures. We show (i) that signatures with the noisiest spatial pattern tend to be poorly captured by hydrological simulations, (ii) that their relationship to catchments attributes are elusive (in particular they are not well explained by climatic indices), and (iii) that they are particularly sensitive to discharge uncertainties. We suggest that a better understanding of the drivers of hydrological signatures and a better characterization of their uncertainties would increase their value in hydrological studies.

Flow Prediction in Ungauged Catchments Using Probabilistic Random Forests Regionalization and New Statistical Adequacy Tests

Cristina Prieto, Nataliya Le Vine, Dmitri Kavetski et al.|Water Resources Research|2019

Cited by 118Open Access

Abstract Flow prediction in ungauged catchments is a major unresolved challenge in scientific and engineering hydrology. This study attacks the prediction in ungauged catchment problem by exploiting advances in flow index selection and regionalization in Bayesian inference and by developing new statistical tests of model performance in ungauged catchments. First, an extensive set of available flow indices is reduced using principal component (PC) analysis to a compact orthogonal set of “flow index PCs.” These flow index PCs are regionalized under minimal assumptions using random forests regression augmented with a residual error model and used to condition hydrological model parameters using a Bayesian scheme. Second, “adequacy” tests are proposed to evaluate a priori the hydrological and regionalization model performance in the space of flow index PCs. The proposed regionalization approach is applied to 92 northern Spain catchments, with 16 catchments treated as ungauged. It is shown that (1) a small number of PCs capture approximately 87% of variability in the flow indices and (2) adequacy tests with respect to regionalized information are indicative of (but do not guarantee) the ability of a hydrological model to predict flow time series and are hence proposed as a prerequisite for flow prediction in ungauged catchments. The adequacy tests identify the regionalization of flow index PCs as adequate in 12 of 16 catchments but the hydrological model as adequate in only 1 of 16 catchments. Hence, a focus on improving hydrological model structure and input data (the effects of which are not disaggregated in this work) is recommended.

Does Information Theory Provide a New Paradigm for Earth Science? Hypothesis Testing

Grey Nearing, Benjamin L. Ruddell, Andrew Bennett et al.|Water Resources Research|2020

Cited by 59Open Access

Abstract Model evaluation and hypothesis testing are fundamental to any field of science. We propose here that by changing slightly the way we think and communicate about inference—from being fundamentally a problem of uncertainty quantification to being a problem of information quantification—allows us to avoid certain problems related to testing models as hypotheses. We propose that scientists are typically interested in assessing the information provided by models, not the truth value or likelihood of a model. Information theory allows us to formalize this perspective.

Hyperparameter optimization of regional hydrological LSTMs by random search: A case study from Basque Country, Spain

Fateme Hosseini, Cristina Prieto, César Álvarez|Journal of Hydrology|2024

Cited by 35Open Access

Random Search optimizes hyperparameters of regional LSTM networks for accurate hourly predictions across 40 flashy catchments in Basque Country, Spain. The study achieves high NSE and KGE scores for two different targets, highlighting significant differences between two optimized network architectures and emphasizing the importance of tailored configurations for regional hydrological modeling. • A systematic method for optimizing regional hydrological LSTMs. • An efficient approach to identify optimal network configurations. • Distinct configurations show statistically significant different performance metrics. • Thoughtful configuration selection post-random search in regional hydrology. • Highly accurate hourly streamflow and water level predictions, up to 0.97 NSE/KGE. This paper introduces a novel approach for hyperparameter optimization of long short-term memory networks (LSTMs) to achieve highly accurate hourly streamflow and water level predictions in the realm of regional rainfall-runoff modeling. Leveraging simultaneous systematic hyperparameter optimization of 10 distinct hyperparameters by Random Search, the study achieves high accuracy in terms of predictions across 40 humid flashy catchments in Basque Country, north of Spain. By carefully designing the search space and incorporating domain expertise, the approach quickly converges to optimal and highly accurate network configurations with both efficiency and efficacy. LSTMs ingested precipitation, temperature, and potential evapotranspiration as inputs to predict 2 targets of streamflow and water level, in an hourly timestep. On the test set, the optimized LSTM networks accurately predicted streamflow and water level with Nash-Sutcliffe (NSE) and Kling-Gupta (KGE) efficiencies as high as 0.97, in one of the catchments. Across all 40 studied catchments, the overall average NSE and KGE values for streamflow were 0.89 and 0.87, respectively; water level exhibited average NSE and KGE scores of 0.91 and 0.92. Moreover, statistical analysis reveals significant differences in the performance of the 2 distinct optimized network architectures in different hydrological catchments, underscoring the importance of deliberate network configuration selection post-random search. This selection process is vital for achieving higher performance in as many catchments as possible. The findings highlight opportunities for enhancing the “learning maturity” of regional hydrological deep learning LSTM networks. This research provides valuable insights for researchers and practitioners involved in optimizing regional hydrological deep learning models for a variety of applications and on new datasets.

Is this you? Claim your profile.

Top publicationsby citations