FlashPCA2: principal component analysis of Biobank-scale genotype datasets

Gad Abraham(The University of Melbourne), Yixuan Qiu(Purdue University West Lafayette), Michael Inouye(The University of Melbourne)
Bioinformatics
May 4, 2017
Cited by 450Open Access
Full Text

Abstract

Abstract Motivation Principal component analysis (PCA) is a crucial step in quality control of genomic data and a common approach for understanding population genetic structure. With the advent of large genotyping studies involving hundreds of thousands of individuals, standard approaches are no longer feasible. However, when the full decomposition is not required, substantial computational savings can be made. Results We present FlashPCA2, a tool that can perform partial PCA on 1 million individuals faster than competing approaches, while requiring substantially less memory. Availability and implementation https://github.com/gabraham/flashpca. Supplementary information Supplementary data are available at Bioinformatics online.


Related Papers

No related papers found

Powered by citation graph analysis