GenPipes: an open-source framework for distributed and scalable genomic analyses

Mathieu Bourgey(Ontario Genomics), Rola Dali(Ontario Genomics), Robert Eveleigh(Ontario Genomics), Kuang Chung Chen(Compute Canada), Louis Létourneau(Ontario Genomics), Joël Fillon(McGill University), Marc Michaud(McGill University and Génome Québec Innovation Centre), Maxime Caron(Ontario Genomics), Johanna Sandoval(Université de Montréal), François Lefebvre(Ontario Genomics), Gary Leveque(Ontario Genomics), Eloi Mercier(Ontario Genomics), David Bujold(Ontario Genomics), Pascale Marquis(Ontario Genomics), Patrick Tran Van(University of Lausanne), David Anderson de Lima Morais(Université de Sherbrooke), Julien Tremblay(National Research Council Canada), Xiaojian Shao(Ontario Genomics), Édouard Henrion(Ontario Genomics), Emmanuel González(Ontario Genomics), Pierre-Olivier Quirion(Ontario Genomics), B. Caron(Compute Canada), Guillaume Bourque(Ontario Genomics)
GigaScience
June 1, 2019
Cited by 209Open Access
Full Text

Abstract

BACKGROUND: With the decreasing cost of sequencing and the rapid developments in genomics technologies and protocols, the need for validated bioinformatics software that enables efficient large-scale data processing is growing. FINDINGS: Here we present GenPipes, a flexible Python-based framework that facilitates the development and deployment of multi-step workflows optimized for high-performance computing clusters and the cloud. GenPipes already implements 12 validated and scalable pipelines for various genomics applications, including RNA sequencing, chromatin immunoprecipitation sequencing, DNA sequencing, methylation sequencing, Hi-C, capture Hi-C, metagenomics, and Pacific Biosciences long-read assembly. The software is available under a GPLv3 open source license and is continuously updated to follow recent advances in genomics and bioinformatics. The framework has already been configured on several servers, and a Docker image is also available to facilitate additional installations. CONCLUSIONS: GenPipes offers genomics researchers a simple method to analyze different types of data, customizable to their needs and resources, as well as the flexibility to create their own workflows.


Related Papers

No related papers found

Powered by citation graph analysis