Genotyping Informatics and Quality Control for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort

Mark Kvale(University of California, San Francisco), Stephanie Hesselson(University of California, San Francisco), Thomas J. Hoffmann(University of California, San Francisco), Yang Cao(University of California, San Francisco), David Chan, Sheryl Connell(Kaiser Permanente), Lisa Croen(Kaiser Permanente), Brad Dispensa(University of California, San Francisco), Jasmin L Eshragh(University of California, San Francisco), Andrea Finn, Jeremy Gollub, Carlos Iribarren(Kaiser Permanente), Eric Jorgenson(Kaiser Permanente), Lawrence H. Kushi(Kaiser Permanente), Richard Lao(University of California, San Francisco), Yontao Lu, Dana Ludwig(Kaiser Permanente), Gurpreet K. Mathauda(University of California, San Francisco), William B McGuire(Kaiser Permanente), Gangwu Mei, Sunita Miles(Kaiser Permanente), Michael Mittman, Mohini A. Patil, Charles P. Quesenberry(Kaiser Permanente), Dilrini K. Ranatunga(Kaiser Permanente), Sarah Rowell(Kaiser Permanente), Marianne Sadler(Kaiser Permanente), Lori C. Sakoda(Kaiser Permanente), Michael H. Shapero, Ling Shen(Kaiser Permanente), Tanu Shenoy(University of California, San Francisco), David Smethurst(Kaiser Permanente), Carol P. Somkin(Kaiser Permanente), Stephen K. Van Den Eeden(Kaiser Permanente), Lawrence Walter(Kaiser Permanente), Eunice Wan(University of California, San Francisco), Teresa Webster, Rachel A. Whitmer(Kaiser Permanente), Simon Wong(University of California, San Francisco), Chia Zau(Kaiser Permanente), Yiping Zhan, Catherine Schaefer(Kaiser Permanente), Pui–Yan Kwok(University of California, San Francisco), Neil Risch(Kaiser Permanente)
Genetics
June 19, 2015
Cited by 228Open Access
Full Text

Abstract

The Kaiser Permanente (KP) Research Program on Genes, Environment and Health (RPGEH), in collaboration with the University of California-San Francisco, undertook genome-wide genotyping of >100,000 subjects that constitute the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. The project, which generated >70 billion genotypes, represents the first large-scale use of the Affymetrix Axiom Genotyping Solution. Because genotyping took place over a short 14-month period, creating a near-real-time analysis pipeline for experimental assay quality control and final optimized analyses was critical. Because of the multi-ethnic nature of the cohort, four different ethnic-specific arrays were employed to enhance genome-wide coverage. All assays were performed on DNA extracted from saliva samples. To improve sample call rates and significantly increase genotype concordance, we partitioned the cohort into disjoint packages of plates with similar assay contexts. Using strict QC criteria, the overall genotyping success rate was 103,067 of 109,837 samples assayed (93.8%), with a range of 92.1-95.4% for the four different arrays. Similarly, the SNP genotyping success rate ranged from 98.1 to 99.4% across the four arrays, the variation depending mostly on how many SNPs were included as single copy vs. double copy on a particular array. The high quality and large scale of genotype data created on this cohort, in conjunction with comprehensive longitudinal data from the KP electronic health records of participants, will enable a broad range of highly powered genome-wide association studies on a diversity of traits and conditions.


Related Papers

No related papers found

Powered by citation graph analysis