Domain-adaptive neural networks improve cross-species prediction of transcription factor binding

Kelly Cochran; Divyanshi Srivastava; Avanti Shrikumar; Akshay Balsubramani; Ross C. Hardison; Anshul Kundaje; Shaun Mahony

doi:10.1101/gr.275394.121

Domain-adaptive neural networks improve cross-species prediction of transcription factor binding

Kelly Cochran(Pennsylvania State University), Divyanshi Srivastava(Pennsylvania State University), Avanti Shrikumar(Stanford University), Akshay Balsubramani(Stanford University), Ross C. Hardison(Pennsylvania State University), Anshul Kundaje(Stanford University), Shaun Mahony(Pennsylvania State University)

Genome Research

January 18, 2022

10.1101/gr.275394.121

Cited by 43Open Access

Full Text

Abstract

The intrinsic DNA sequence preferences and cell type-specific cooperative partners of transcription factors (TFs) are typically highly conserved. Hence, despite the rapid evolutionary turnover of individual TF binding sites, predictive sequence models of cell type-specific genomic occupancy of a TF in one species should generalize to closely matched cell types in a related species. To assess the viability of cross-species TF binding prediction, we train neural networks to discriminate ChIP-seq peak locations from genomic background and evaluate their performance within and across species. Cross-species predictive performance is consistently worse than within-species performance, which we show is caused in part by species-specific repeats. To account for this domain shift, we use an augmented network architecture to automatically discourage learning of training species-specific sequence features. This domain adaptation approach corrects for prediction errors on species-specific repeats and improves overall cross-species model performance. Our results show that cross-species TF binding prediction is feasible when models account for domain shifts driven by species-specific repeats.

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan et al.|Advances in computer vision and pattern recognition|2017|7.6k

Integrative analysis of 111 reference human epigenomes

Anshul Kundaje, Wouter Meuleman, Jason Ernst et al.|Nature|2015|7.1k

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

Babak Alipanahi, Andrew Delong, Matthew T. Weirauch et al.|Nature Biotechnology|2015|3.1k

A comparative encyclopedia of DNA elements in the mouse genome

Feng Yue, Yong Cheng, Alessandra Breschi et al.|Nature|2014|1.9k

Alu repeats and human genomic diversity

Mark A. Batzer, Prescott L. Deininger|Nature Reviews Genetics|2002|1.5k

Domain-adaptive neural networks improve cross-species prediction of transcription factor binding

Abstract

Related Papers