A community effort to optimize sequence-based deep learning models of gene regulation

Abdul Muntakim Rafi(University of British Columbia Hospital), Daria Nogina(Lomonosov Moscow State University), Dmitry Penzar(Institute of Protein Research), Dohoon Lee(Seoul National University), Danyeong Lee(Seoul National University), N. S. KIM(Seoul National University), Sangyeup Kim(Seoul National University), Dohyeon Kim(Seoul National University), Yeojin Shin(Seoul National University), Il‐Youp Kwak(Chung-Ang University), G. A. Meshcheryakov(Institute of Protein Research), Andrey Lando(Yandex (Russia)), Arsenii Zinkevich(Lomonosov Moscow State University), Byeongchan Kim(Chung-Ang University), Juhyun Lee(Chung-Ang University), Taein Kang(Chung-Ang University), Eeshit Dhaval Vaishnav(Broad Institute), Payman Yadollahpour(Broad Institute), Susanne Bornelöv(University of Cambridge), Fredrik Svensson(University College London), Maria‐Anna Trapotsi(University of Cambridge), Duc Tran(University of Nevada, Reno), Tin Nguyen(University of Nevada, Reno), Xinming Tu(University of Washington), Wuwei Zhang(University of Washington), Wei Qiu(University of Washington), Rohan Ghotra(Cold Spring Harbor Laboratory), Yiyang Yu(Cold Spring Harbor Laboratory), Ethan Labelson(Cold Spring Harbor Laboratory), Aayush Prakash, Ashwin Narayanan, Peter K. Koo(Cold Spring Harbor Laboratory), Xiaoting Chen(Cincinnati Children's Hospital Medical Center), David T. Jones(University College London), Michele Tinti(Wellcome Centre for Anti-Infectives Research), Yuanfang Guan(University of Michigan), Maolin Ding(Sun Yat-sen University), Ken Chen(Sun Yat-sen University), Yuedong Yang(Sun Yat-sen University), Ke Ding(Australian National University), Gunjan Dixit(Australian National University), Jiayu Wen(Australian National University), Zhihan Zhou(Northwestern University), Pratik Dutta(Stony Brook University), Rekha Sathian(Stony Brook University), Pallavi Surana(Stony Brook University), Yanrong Ji(Northwestern University), Han Liu(Northwestern University), Ramana V. Davuluri(Stony Brook University), Yu Hiratsuka(Niigata University), Mao Takatsu(Niigata University), Tsai‐Min Chen(National Taiwan University), Chih-Han Huang, Hsuan-Kai Wang, Edward S.C. Shih(Institute of Biomedical Sciences, Academia Sinica), Sz-Hau Chen(Development Center for Biotechnology), Chih‐Hsun Wu(National Chengchi University), Jhih-Yu Chen(National Taiwan University), Kuei-Lin Huang(China Medical University), Ibrahim Alsaggaf(Birkbeck, University of London), P W Greaves(Birkbeck, University of London), Carl Barton(Birkbeck, University of London), Cen Wan(Birkbeck, University of London), Nicholas Allen Baclig Abad(German Cancer Research Center), Cindy Körner(German Cancer Research Center), Lars Feuerbach(German Cancer Research Center), Benedikt Brors(German Cancer Research Center), Yichao Li(St. Jude Children's Research Hospital), Sebastian Röner(Berlin Institute of Health at Charité - Universitätsmedizin Berlin), Pyaree Mohan Dash(Berlin Institute of Health at Charité - Universitätsmedizin Berlin), Max Schubach(Berlin Institute of Health at Charité - Universitätsmedizin Berlin), Onuralp Söylemez(Global Blood Therapeutics (United States)), Andreas Møller(University of Southern Denmark), Gabija Kavaliauskaite(University of Southern Denmark), Jesper Grud Skat Madsen(University of Southern Denmark), Zhixiu Lu(University of Tennessee at Knoxville), Owen Queen(University of Tennessee at Knoxville), Ashley Babjac(University of Tennessee at Knoxville), Scott Emrich(University of Tennessee at Knoxville), Konstantinos Kardamiliotis(Aristotle University of Thessaloniki), Konstantinos Kyriakidis(Aristotle University of Thessaloniki), Andigoni Malousi(Aristotle University of Thessaloniki), Ashok Palaniappan(SASTRA University), Krishna Kant Gupta(National Centre for Cell Science), Prasanna Kumar Saravanam(SASTRA University), Jake Bradford(Queensland University of Technology), Dimitri Perrin(Queensland University of Technology), Robert Salomone(Queensland University of Technology), Carl Schmitz(Queensland University of Technology), Chen JiaXing(Beijing Normal-Hong Kong Baptist University), Wang JingZhe(Beijing Normal-Hong Kong Baptist University), Yang AiWei(Beijing Normal-Hong Kong Baptist University), Sun Kim(Seoul National University), Jake Albrecht(Sage Bionetworks), Aviv Regev(Broad Institute), Wuming Gong(University of Minnesota), Ivan V. Kulakovskiy(Institute of Protein Research), Pablo Meyer(IBM (United States)), Carl G. de Boer(University of British Columbia Hospital)
Nature Biotechnology
October 11, 2024
Cited by 31Open Access
Full Text

Abstract

A systematic evaluation of how model architectures and training strategies impact genomics model performance is needed. To address this gap, we held a DREAM Challenge where competitors trained models on a dataset of millions of random promoter DNA sequences and corresponding expression levels, experimentally determined in yeast. For a robust evaluation of the models, we designed a comprehensive suite of benchmarks encompassing various sequence types. All top-performing models used neural networks but diverged in architectures and training strategies. To dissect how architectural and training choices impact performance, we developed the Prix Fixe framework to divide models into modular building blocks. We tested all possible combinations for the top three models, further improving their performance. The DREAM Challenge models not only achieved state-of-the-art results on our comprehensive yeast dataset but also consistently surpassed existing benchmarks on Drosophila and human genomic datasets, demonstrating the progress that can be driven by gold-standard genomics datasets.


Related Papers

No related papers found

Powered by citation graph analysis