Augmenting physical models with deep networks for complex dynamics forecasting*Yuan Yin, Vincent Le Guen, Jérémie Donà et al.|Journal of Statistical Mechanics Theory and Experiment|2021 Abstract Forecasting complex dynamical phenomena in settings where only partial knowledge of their dynamics is available is a prevalent problem across various scientific fields. While purely data-driven approaches are arguably insufficient in this context, standard physical modeling-based approaches tend to be over-simplistic, inducing non-negligible errors. In this work, we introduce the APHYNITY framework, a principled approach for augmenting incomplete physical dynamics described by differential equations with deep data-driven models. It consists of decomposing the dynamics into two components: a physical component accounting for the dynamics for which we have some prior knowledge, and a data-driven component accounting for errors of the physical model. The learning problem is carefully formulated such that the physical model explains as much of the data as possible, while the data-driven component only describes information that cannot be captured by the physical model; no more, no less. This not only provides the existence and uniqueness for this decomposition, but also ensures interpretability and benefit generalization. Experiments made on three important use cases, each representative of a different family of phenomena, i.e. reaction–diffusion equations, wave equations and the non-linear damped pendulum, show that APHYNITY can efficiently leverage approximate physical models to accurately forecast the evolution of the system and correctly identify relevant physical parameters. The code is available at https://github.com/yuan-yin/APHYNITY .
PDE-Driven Spatiotemporal DisentanglementA recent line of work in the machine learning community addresses the problem of predicting high-dimensional spatiotemporal phenomena by leveraging specific tools from the differential equations theory. Following this direction, we propose in this article a novel and general paradigm for this task based on a resolution method for partial differential equations: the separation of variables. This inspiration allows us to introduce a dynamical interpretation of spatiotemporal disentanglement. It induces a principled model based on learning disentangled spatial and temporal representations of a phenomenon to accurately predict future observations. We experimentally demonstrate the performance and broad applicability of our method against prior state-of-the-art models on physical and synthetic video datasets.
Constrained physical-statistics models for dynamical system identification and predictionJérémie Donà, Marie Déchelle, Marina Lévy et al.|HAL (Le Centre pour la Communication Scientifique Directe)|2022 International audience
Learning the Language of Protein StructureBenoit Gaujac, Jérémie Donà, Liviu Copoiu et al.|arXiv (Cornell University)|2024 Representation learning and \emph{de novo} generation of proteins are pivotal computational biology tasks. Whilst natural language processing (NLP) techniques have proven highly effective for protein sequence modelling, structure modelling presents a complex challenge, primarily due to its continuous and three-dimensional nature. Motivated by this discrepancy, we introduce an approach using a vector-quantized autoencoder that effectively tokenizes protein structures into discrete representations. This method transforms the continuous, complex space of protein structures into a manageable, discrete format with a codebook ranging from 4096 to 64000 tokens, achieving high-fidelity reconstructions with backbone root mean square deviations (RMSD) of approximately 1-5 Å. To demonstrate the efficacy of our learned representations, we show that a simple GPT model trained on our codebooks can generate novel, diverse, and designable protein structures. Our approach not only provides representations of protein structure, but also mitigates the challenges of disparate modal representations and sets a foundation for seamless, multi-modal integration, enhancing the capabilities of computational methods in protein design.
Differentiable Feature Selection, A Reparameterization ApproachJérémie Donà, Patrick Gallinari|Lecture notes in computer science|2021