University of North Carolina at Chapel Hill
Publishes on Genetic Associations and Epidemiology, Genetic and phenotypic traits in livestock, Genetic Mapping and Diversity in Plants and Animals. 13 papers and 1.4k citations.
Add your photo, update your bio, and get notified when your ranking changes.
It is not economically feasible to sequence all study subjects in a large cohort. A cost-effective strategy is to sequence only the subjects with the extreme values of a quantitative trait. In the National Heart, Lung, and Blood Institute Exome Sequencing Project, subjects with the highest or lowest values of body mass index, LDL, or blood pressure were selected for whole-exome sequencing. Failure to account for such trait-dependent sampling can cause severe inflation of type I error and substantial loss of power in quantitative trait analysis, especially when combining results from multiple studies with different selection criteria. We present valid and efficient statistical methods for association analysis of sequencing data under trait-dependent sampling. We pay special attention to gene-based analysis of rare variants. Our methods can be used to perform quantitative trait analysis not only for the trait that is used to select subjects for sequencing but for any other traits that are measured. For a particular trait of interest, our approach properly combines the association results from all studies with measurements of that trait. This meta-analysis is substantially more powerful than the analysis of any single study. By contrast, meta-analysis of standard linear regression results (ignoring trait-dependent sampling) can be less powerful than the analysis of a single study. The advantages of the proposed methods are demonstrated through simulation studies and the National Heart, Lung, and Blood Institute Exome Sequencing Project data. The methods are applicable to other types of genetic association studies and nongenetic studies.
More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific gene expression measurements from single-cell RNA sequencing (scRNA-seq). We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We apply our framework to multiple scRNA-seq datasets from different platforms and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and scRNA-seq datasets and further validated using PubMed search and existing bulk case-control testing results.
Anthropometric traits, measuring body size and shape, are highly heritable and significant clinical risk factors for cardiometabolic disorders. These traits have been extensively studied in genome-wide association studies (GWASs), with hundreds of genome-wide significant loci identified. We performed a whole-exome sequence analysis of the genetics of height, body mass index (BMI) and waist/hip ratio (WHR). We meta-analyzed single-variant and gene-based associations of whole-exome sequence variation with height, BMI, and WHR in up to 22,004 individuals, and we assessed replication of our findings in up to 16,418 individuals from 10 independent cohorts from Trans-Omics for Precision Medicine (TOPMed). We identified four trait associations with single-nucleotide variants (SNVs; two for height and two for BMI) and replicated the LECT2 gene association with height. Our expression quantitative trait locus (eQTL) analysis within previously reported GWAS loci implicated CEP63 and RFT1 as potential functional genes for known height loci. We further assessed enrichment of SNVs, which were monogenic or syndromic variants within loci associated with our three traits. This led to the significant enrichment results for height, whereas we observed no Bonferroni-corrected significance for all SNVs. With a sample size of ∼20,000 whole-exome sequences in our discovery dataset, our findings demonstrate the importance of genomic sequencing in genetic association studies, yet they also illustrate the challenges in identifying effects of rare genetic variants.