A

Abhishek Sarkar

Westchester Medical Center

ORCID: 0000-0002-4636-9255

Publishes on Gene expression and cancer classification, Genetic Associations and Epidemiology, Genomics and Chromatin Dynamics. 68 papers and 10.1k citations.

68Publications
10.1kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Integrative analysis of 111 reference human epigenomes
Cited by 7.1kOpen Access

The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

A Simple New Approach to Variable Selection in Regression, with Application to Genetic Fine Mapping
Gao Wang, Abhishek Sarkar, Peter Carbonetto et al.|Journal of the Royal Statistical Society Series B (Statistical Methodology)|2020
Cited by 1.1kOpen Access

Summary We introduce a simple new approach to variable selection in linear regression, with a particular focus on quantifying uncertainty in which variables should be selected. The approach is based on a new model—the ‘sum of single effects’ model, called ‘SuSiE’—which comes from writing the sparse vector of regression coefficients as a sum of ‘single-effect’ vectors, each with one non-zero element. We also introduce a corresponding new fitting procedure—iterative Bayesian stepwise selection (IBSS)—which is a Bayesian analogue of stepwise selection methods. IBSS shares the computational simplicity and speed of traditional stepwise methods but, instead of selecting a single variable at each step, IBSS computes a distribution on variables that captures uncertainty in which variable to select. We provide a formal justification of this intuitive algorithm by showing that it optimizes a variational approximation to the posterior distribution under SuSiE. Further, this approximate posterior distribution naturally yields convenient novel summaries of uncertainty in variable selection, providing a credible set of variables for each selection. Our methods are particularly well suited to settings where variables are highly correlated and detectable effects are sparse, both of which are characteristics of genetic fine mapping applications. We demonstrate through numerical experiments that our methods outperform existing methods for this task, and we illustrate their application to fine mapping genetic variants influencing alternative splicing in human cell lines. We also discuss the potential and challenges for applying these methods to generic variable-selection problems.

Genetic analysis of complex traits in the emerging Collaborative Cross
David L. Aylor, William Valdar, Wendy Foulds-Mathes et al.|Genome Research|2011
Cited by 363Open Access

The Collaborative Cross (CC) is a mouse recombinant inbred strain panel that is being developed as a resource for mammalian systems genetics. Here we describe an experiment that uses partially inbred CC lines to evaluate the genetic properties and utility of this emerging resource. Genome-wide analysis of the incipient strains reveals high genetic diversity, balanced allele frequencies, and dense, evenly distributed recombination sites-all ideal qualities for a systems genetics resource. We map discrete, complex, and biomolecular traits and contrast two quantitative trait locus (QTL) mapping approaches. Analysis based on inferred haplotypes improves power, reduces false discovery, and provides information to identify and prioritize candidate genes that is unique to multifounder crosses like the CC. The number of expression QTLs discovered here exceeds all previous efforts at eQTL mapping in mice, and we map local eQTL at 1-Mb resolution. We demonstrate that the genetic diversity of the CC, which derives from random mixing of eight founder strains, results in high phenotypic diversity and enhances our ability to map causative loci underlying complex disease-related traits.