Deep integrative models for large-scale human genomics

Arnór I. Sigurdsson(Broad Institute), Ioannis Louloudis(University of Copenhagen), Karina Banasik(University of Copenhagen), David Westergaard(University of Copenhagen), Ole Winther(University of Copenhagen), Ole Lund(Danish National Centre for Social Research), Sisse Rye Ostrowski(University of Copenhagen), Christian Erikstrup(Aarhus University), Ole Birger Pedersen(University of Copenhagen), Mette Nyegaard(Aalborg University), Karina Banasik(University of Copenhagen), Jakob Thaning Bay, Jens Kjærgaard Boldsen, Thorsten Brodersen, Søren Brunak(University of Copenhagen), Kristoffer Sølvsten Burgdorf, Mona Ameri Chalmer, Maria Didriksen, Khoa Manh Dinh, Joseph Dowsett, Christian Erikstrup(Aarhus University), Bjarke Feenstra, Frank Geller, Daníel F. Guðbjartsson, Thomas Folkmann Hansen, Lotte Hindhede, Henrik Hjalgrim, Rikke Louise Jacobsen, Gregor B. E. Jemec, Katrine Kaspersen, Bertram Kjerulff, Lisette J. A. Kogelman, Margit Anita Hørup Larsen, Ioannis Louloudis(University of Copenhagen), Agnete Troen Lundgaard, Susan Mikkelsen, Christina Mikkelsen, Kaspar René Nielsen, Ioanna Nissen, Mette Nyegaard(Aalborg University), Sisse Rye Ostrowski(University of Copenhagen), Ole Birger Pedersen(University of Copenhagen), Alexander Pil Henriksen, Palle Duun Rohde, Klaus Rostgaard, Michael Schwinn, Kāri Stefánsson(University of Copenhagen), Hreinn Stefónsson, Erik Sørensen(Aarhus University), Unnur Þorsteinsdóttir, Lise Wegner Thørner, Mie Topholm Bruun, Henrik Ullum, Thomas Werge, David Westergaard(University of Copenhagen), Søren Brunak(University of Copenhagen), Bjarni J. Vilhjálmsson(Aarhus University), Simon Rasmussen(Broad Institute)
Nucleic Acids Research
May 24, 2023
Cited by 33Open Access
Full Text

Abstract

Polygenic risk scores (PRSs) are expected to play a critical role in precision medicine. Currently, PRS predictors are generally based on linear models using summary statistics, and more recently individual-level data. However, these predictors mainly capture additive relationships and are limited in data modalities they can use. We developed a deep learning framework (EIR) for PRS prediction which includes a model, genome-local-net (GLN), specifically designed for large-scale genomics data. The framework supports multi-task learning, automatic integration of other clinical and biochemical data, and model explainability. When applied to individual-level data from the UK Biobank, the GLN model demonstrated a competitive performance compared to established neural network architectures, particularly for certain traits, showcasing its potential in modeling complex genetic relationships. Furthermore, the GLN model outperformed linear PRS methods for Type 1 Diabetes, likely due to modeling non-additive genetic effects and epistasis. This was supported by our identification of widespread non-additive genetic effects and epistasis in the context of T1D. Finally, we constructed PRS models that integrated genotype, blood, urine, and anthropometric data and found that this improved performance for 93% of the 290 diseases and disorders considered. EIR is available at https://github.com/arnor-sigurdsson/EIR.


Related Papers

No related papers found

Powered by citation graph analysis