Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping

Simon Höllerer(ETH Zurich), Laetitia Papaxanthos(SIB Swiss Institute of Bioinformatics), Anja Gumpinger(SIB Swiss Institute of Bioinformatics), Katrin Fischer(ETH Zurich), Christian Beisel(ETH Zurich), Karsten Borgwardt(SIB Swiss Institute of Bioinformatics), Yaakov Benenson(ETH Zurich), Markus Jeschek(ETH Zurich)
Nature Communications
July 15, 2020
Cited by 69Open Access
Full Text

Abstract

Predicting effects of gene regulatory elements (GREs) is a longstanding challenge in biology. Machine learning may address this, but requires large datasets linking GREs to their quantitative function. However, experimental methods to generate such datasets are either application-specific or technically complex and error-prone. Here, we introduce DNA-based phenotypic recording as a widely applicable, practicable approach to generate large-scale sequence-function datasets. We use a site-specific recombinase to directly record a GRE's effect in DNA, enabling readout of both sequence and quantitative function for extremely large GRE-sets via next-generation sequencing. We record translation kinetics of over 300,000 bacterial ribosome binding sites (RBSs) in >2.7 million sequence-function pairs in a single experiment. Further, we introduce a deep learning approach employing ensembling and uncertainty modelling that predicts RBS function with high accuracy, outperforming state-of-the-art methods. DNA-based phenotypic recording combined with deep learning represents a major advance in our ability to predict function from genetic sequence.


Related Papers