Quality Control Procedures for Genome‐Wide Association Studies

Stephen Turner(Vanderbilt University), Loren L. Armstrong(Northwestern University), Yuki Bradford(Vanderbilt University), Christopher S. Carlson(Fred Hutch Cancer Center), Dana C. Crawford(Vanderbilt University), Andrew Crenshaw(Broad Institute), Mariza de Andrade(Mayo Clinic), Kimberly F. Doheny(Johns Hopkins University), Jonathan L. Haines(Vanderbilt University), Geoffrey Hayes(Northwestern University), Gail P. Jarvik(University of Washington), Lan Jiang(Vanderbilt University), Iftikhar J. Kullo(Mayo Clinic), Rongling Li(National Institutes of Health), Hua Ling(Johns Hopkins University), Teri A. Manolio(National Institutes of Health), Martha Matsumoto(Mayo Clinic), Catherine A. McCarty(Marshfield Clinic), Andrew McDavid(Fred Hutch Cancer Center), Daniel B. Mirel(Broad Institute), Justin Paschall(National Institutes of Health), Elizabeth Pugh(Johns Hopkins University), Luke V. Rasmussen(Marshfield Clinic), Russell A. Wilke(Vanderbilt University), Rebecca L. Zuvich(Vanderbilt University), Marylyn D. Ritchie(Vanderbilt University)
Current Protocols in Human Genetics
January 1, 2011
Cited by 378

Abstract

Genome-wide association studies (GWAS) are being conducted at an unprecedented rate in population-based cohorts and have increased our understanding of the pathophysiology of complex disease. Regardless of context, the practical utility of this information will ultimately depend upon the quality of the original data. Quality control (QC) procedures for GWAS are computationally intensive, operationally challenging, and constantly evolving. Here we enumerate some of the challenges in QC of GWAS data and describe the approaches that the electronic MEdical Records and Genomics (eMERGE) network is using for quality assurance in GWAS data, thereby minimizing potential bias and error in GWAS results. We discuss common issues associated with QC of GWAS data, including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We propose best practices and discuss areas of ongoing and future research.


Related Papers