Efficient ancestry and mutation simulation with msprime 1.0

Franz Baumdicker(University of Tübingen), Gertjan Bisschop(University of Edinburgh), Daniel Goldstein(Broad Institute), Graham Gower(University of Copenhagen), Aaron P. Ragsdale(University of Wisconsin–Madison), Georgia Tsambos(The University of Melbourne), Sha Zhu(University of Oxford), Bjarki Eldon(Museum für Naturkunde), E. Castedo Ellerman(Fresh Pond Research Institute), Jared Galloway(University of Oregon), Ariella Gladstein(University of North Carolina at Chapel Hill), Gregor Gorjanc(Roslin Institute), Bing Guo(University of Maryland, Baltimore), Ben Jeffery(University of Oxford), Warren W Kretzschumar(Karolinska Institutet), Konrad Lohse(University of Edinburgh), Michael Matschiner(University of Oslo), Dominic Nelson(McGill University), Nathaniel S. Pope(Pennsylvania State University), Consuelo D. Quinto-Cortés(Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional), Murillo F. Rodrigues(University of Oregon), Kumar Saunack(Indian Institute of Technology Bombay), Thibaut Sellinger(Technical University of Munich), Kevin Thornton(University of California, Irvine), Hugo van Kemenade, Anthony Wilder Wohns(Broad Institute), Yan Wong(University of Oxford), Simon Gravel(McGill University), Andrew D. Kern(University of Oregon), Jere Koskela(University of Warwick), Peter L. Ralph(University of Oregon), Jerome Kelleher(University of Oxford)
Genetics
December 13, 2021
Cited by 498Open Access
Full Text

Abstract

Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime's many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement.


Related Papers

No related papers found

Powered by citation graph analysis