GenVarLoader: An accelerated dataloader for applying deep learning to personalized genomics

David Laub(University of California San Diego), A. Ho(Salk Institute for Biological Studies), Jeff Jaureguy(Salk Institute for Biological Studies), Adam Klie(University of California San Diego), Rany M. Salem(University of California San Diego), Graham McVicker(Salk Institute for Biological Studies), Hannah Carter(University of California San Diego)
bioRxiv (Cold Spring Harbor Laboratory)
January 17, 2025
Cited by 2Open Access
Full Text

Abstract

Deep learning sequence models trained on personalized genomics can improve variant effect prediction, however, applications of these models are limited by computational requirements for storing and reading large datasets. We address this with GenVarLoader, which stores personalized genomic data in new memory-mapped formats with optimal data locality to achieve ~1,000x faster throughput and ~2,000x better compression compared to existing alternatives.


Related Papers

No related papers found

Powered by citation graph analysis