M

Marcel Martin

Stockholm University

ORCID: 0000-0002-0680-200X

Publishes on Genomics and Phylogenetic Studies, Genomics and Rare Diseases, RNA modifications and cancer. 73 papers and 40k citations.

73Publications
40kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Cutadapt removes adapter sequences from high-throughput sequencing reads
Marcel Martin|EMBnet journal|2011
Cited by 35.5kOpen Access

When small RNA is sequenced on current sequencing machines, the resulting reads are usually longer than the RNA and therefore contain parts of the 3' adapter. That adapter must be found and removed error-tolerantly from each read before read mapping. Previous solutions are either hard to use or do not offer required features, in particular support for color space data. As an easy to use alternative, we developed the command-line tool cutadapt, which supports 454, Illumina and SOLiD (color space) data, offers two adapter trimming algorithms, and has other useful features. Cutadapt, including its MIT-licensed source code, is available for download at http://code.google.com/p/cutadapt/

WhatsHap: fast and accurate read-based phasing
Marcel Martin, Murray Patterson, Shilpa Garg et al.|bioRxiv (Cold Spring Harbor Laboratory)|2016
Cited by 502Open Access

Abstract Read-based phasing allows to reconstruct the haplotypes of a sample purely from sequencing reads. While phasing is an important step for answering questions about population genetics, compound heterozygosity, and to aid in clinical decision making, there has been a lack of accurate, usable and standards-based software. WhatsHap is a production-ready tool for highly accurate read-based phasing. It was designed from the beginning to leverage third-generation sequencing technologies, whose long reads can span many variants and are therefore ideal for phasing. WhatsHap works also well with second-generation data, is easy to use and will phase not only SNVs, but also indels and other variants. It is unique in its ability to combine read-based with pedigree-based phasing, allowing to further improve accuracy if multiple related samples are provided.

Atropos: specific, sensitive, and speedy trimming of sequencing reads
Cited by 290Open Access

A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Atropos is open source and free software written in Python (3.3+) and available at https://github.com/jdidion/atropos.

Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants
M. Garcia, Szilveszter Juhos, Malin Larsson et al.|F1000Research|2020
Cited by 261Open Access

Whole-genome sequencing (WGS) is a fundamental technology for research to advance precision medicine, but the limited availability of portable and user-friendly workflows for WGS analyses poses a major challenge for many research groups and hampers scientific progress. Here we present Sarek, an open-source workflow to detect germline variants and somatic mutations based on sequencing data from WGS, whole-exome sequencing (WES), or gene panels. Sarek features (i) easy installation, (ii) robust portability across different computer environments, (iii) comprehensive documentation, (iv) transparent and easy-to-read code, and (v) extensive quality metrics reporting. Sarek is implemented in the Nextflow workflow language and supports both Docker and Singularity containers as well as Conda environments, making it ideal for easy deployment on any POSIX-compatible computers and cloud compute environments. Sarek follows the GATK best-practice recommendations for read alignment and pre-processing, and includes a wide range of software for the identification and annotation of germline and somatic single-nucleotide variants, insertion and deletion variants, structural variants, tumour sample purity, and variations in ploidy and copy number. Sarek offers easy, efficient, and reproducible WGS analyses, and can readily be used both as a production workflow at sequencing facilities and as a powerful stand-alone tool for individual research groups. The Sarek source code, documentation and installation instructions are freely available at https://github.com/nf-core/sarek and at https://nf-co.re/sarek/.