Scaling Genomics Data Processing with Memory-Driven Computing to Accelerate Computational Biology

Matthias Becker; Umesh Worlikar; Shobhit Agrawal; Hartmut Schultze; Thomas Ulas; Sharad Singhal; Joachim L. Schultze

doi:10.1007/978-3-030-50743-5_17

Scaling Genomics Data Processing with Memory-Driven Computing to Accelerate Computational Biology

Matthias Becker(University of Bonn), Umesh Worlikar(University of Bonn), Shobhit Agrawal(West German Broadcasting Cologne), Hartmut Schultze(Hewlett-Packard (Germany)), Thomas Ulas(University of Bonn), Sharad Singhal(Hewlett-Packard (United States)), Joachim L. Schultze(University of Bonn)

Lecture notes in computer science

January 1, 2020

10.1007/978-3-030-50743-5_17

Cited by 8Open Access

Full Text

Abstract

Abstract Research is increasingly becoming data-driven, and natural sciences are not an exception. In both biology and medicine, we are observing an exponential growth of structured data collections from experiments and population studies, enabling us to gain novel insights that would otherwise not be possible. However, these growing data sets pose a challenge for existing compute infrastructures since data is outgrowing limits within compute. In this work, we present the application of a novel approach, Memory-Driven Computing (MDC), in the life sciences. MDC proposes a data-centric approach that has been designed for growing data sizes and provides a composable infrastructure for changing workloads. In particular, we show how a typical pipeline for genomics data processing can be accelerated, and application modifications required to exploit this novel architecture. Furthermore, we demonstrate how the isolated evaluation of individual tasks misses significant overheads of typical pipelines in genomics data processing.

Related Papers

No related papers found

Powered by citation graph analysis