J

Junmin Peng

St. Jude Children's Research Hospital

ORCID: 0000-0003-0472-7648

Publishes on Ubiquitin and proteasome pathways, Advanced Proteomics Techniques and Applications, Alzheimer's disease research and treatments. 441 papers and 37.9k citations.

441Publications
37.9kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Evaluation of Multidimensional Chromatography Coupled with Tandem Mass Spectrometry (LC/LC−MS/MS) for Large-Scale Protein Analysis:  The Yeast Proteome
Junmin Peng, Joshua E. Elias, Carson C. Thoreen et al.|Journal of Proteome Research|2002
Cited by 1.7k

Highly complex protein mixtures can be directly analyzed after proteolysis by liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS). In this paper, we have utilized the combination of strong cation exchange (SCX) and reversed-phase (RP) chromatography to achieve two-dimensional separation prior to MS/MS. One milligram of whole yeast protein was proteolyzed and separated by SCX chromatography (2.1 mm i.d.) with fraction collection every minute during an 80-min elution. Eighty fractions were reduced in volume and then re-injected via an autosampler in an automated fashion using a vented-column (100 microm i.d.) approach for RP-LC-MS/MS analysis. More than 162,000 MS/MS spectra were collected with 26,815 matched to yeast peptides (7,537 unique peptides). A total of 1,504 yeast proteins were unambiguously identified in this single analysis. We present a comparison of this experiment with a previously published yeast proteome analysis by Yates and colleagues (Washburn, M. P.; Wolters, D.; Yates, J. R., III. Nat. Biotechnol. 2001, 19, 242-7). In addition, we report an in-depth analysis of the false-positive rates associated with peptide identification using the Sequest algorithm and a reversed yeast protein database. New criteria are proposed to decrease false-positives to less than 1% and to greatly reduce the need for manual interpretation while permitting more proteins to be identified.

Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder
Cited by 1.3kOpen Access

INTRODUCTION Our understanding of the pathophysiology of psychiatric disorders, including autism spectrum disorder (ASD), schizophrenia (SCZ), and bipolar disorder (BD), lags behind other fields of medicine. The diagnosis and study of these disorders currently depend on behavioral, symptomatic characterization. Defining genetic contributions to disease risk allows for biological, mechanistic understanding but is challenged by genetic complexity, polygenicity, and the lack of a cohesive neurobiological model to interpret findings. RATIONALE The transcriptome represents a quantitative phenotype that provides biological context for understanding the molecular pathways disrupted in major psychiatric disorders. RNA sequencing (RNA-seq) in a large cohort of cases and controls can advance our knowledge of the biology disrupted in each disorder and provide a foundational resource for integration with genomic and genetic data. RESULTS Analysis across multiple levels of transcriptomic organization—gene expression, local splicing, transcript isoform expression, and coexpression networks for both protein-coding and noncoding genes—provides an in-depth view of ASD, SCZ, and BD molecular pathology. More than 25% of the transcriptome exhibits differential splicing or expression in at least one disorder, including hundreds of noncoding RNAs (ncRNAs), most of which have unexplored functions but collectively exhibit patterns of selective constraint. Changes at the isoform level, as opposed to the gene level, show the largest effect sizes and genetic enrichment and the greatest disease specificity. We identified coexpression modules associated with each disorder, many with enrichment for cell type–specific markers, and several modules significantly dysregulated across all three disorders. These enabled parsing of down-regulated neuronal and synaptic components into a variety of cell type– and disease-specific signals, including multiple excitatory neuron and distinct interneuron modules with differential patterns of disease association, as well as common and rare genetic risk variant enrichment. The glial-immune signal demonstrates shared disruption of the blood-brain barrier and up-regulation of NFkB-associated genes, as well as disease-specific alterations in microglial-, astrocyte-, and interferon-response modules. A coexpression module associated with psychiatric medication exposure in SCZ and BD was enriched for activity-dependent immediate early gene pathways. To identify causal drivers, we integrated polygenic risk scores and performed a transcriptome-wide association study and summary-data–based Mendelian randomization. Candidate risk genes—5 in ASD, 11 in BD, and 64 in SCZ, including shared genes between SCZ and BD—are supported by multiple methods. These analyses begin to define a mechanistic basis for the composite activity of genetic risk variants. CONCLUSION Integration of RNA-seq and genetic data from ASD, SCZ, and BD provides a quantitative, genome-wide resource for mechanistic insight and therapeutic development at Resource.PsychENCODE.org. These data inform the molecular pathways and cell types involved, emphasizing the importance of splicing and isoform-level gene regulatory mechanisms in defining cell type and disease specificity, and, when integrated with genome-wide association studies, permit the discovery of candidate risk genes. The PsychENCODE cross-disorder transcriptomic resource. Human brain RNA-seq was integrated with genotypes across individuals with ASD, SCZ, BD, and controls, identifying pervasive dysregulation, including protein-coding, noncoding, splicing, and isoform-level changes. Systems-level and integrative genomic analyses prioritize previously unknown neurogenetic mechanisms and provide insight into the molecular neuropathology of these disorders.

Comprehensive functional genomic resource and integrative model for the human brain
Cited by 1.1k

INTRODUCTION Strong genetic associations have been found for a number of psychiatric disorders. However, understanding the underlying molecular mechanisms remains challenging. RATIONALE To address this challenge, the PsychENCODE Consortium has developed a comprehensive online resource and integrative models for the functional genomics of the human brain. RESULTS The base of the pyramidal resource is the datasets generated by PsychENCODE, including bulk transcriptome, chromatin, genotype, and Hi-C datasets and single-cell transcriptomic data from ~32,000 cells for major brain regions. We have merged these with data from Genotype-Tissue Expression (GTEx), ENCODE, Roadmap Epigenomics, and single-cell analyses. Via uniform processing, we created a harmonized resource, allowing us to survey functional genomics data on the brain over a sample size of 1866 individuals. From this uniformly processed dataset, we created derived data products. These include lists of brain-expressed genes, coexpression modules, and single-cell expression profiles for many brain cell types; ~79,000 brain-active enhancers with associated Hi-C loops and topologically associating domains; and ~2.5 million expression quantitative-trait loci (QTLs) comprising ~238,000 linkage-disequilibrium–independent single-nucleotide polymorphisms and of other types of QTLs associated with splice isoforms, cell fractions, and chromatin activity. By using these, we found that >88% of the cross-population variation in brain gene expression can be accounted for by cell fraction changes. Furthermore, a number of disorders and aging are associated with changes in cell-type proportions. The derived data also enable comparison between the brain and other tissues. In particular, by using spectral analyses, we found that the brain has distinct expression and epigenetic patterns, including a greater extent of noncoding transcription than other tissues. The top level of the resource consists of integrative networks for regulation and machine-learning models for disease prediction. The networks include a full gene regulatory network (GRN) for the brain, linking transcription factors, enhancers, and target genes from merging of the QTLs, generalized element-activity correlations, and Hi-C data. By using this network, we link disease genes to genome-wide association study (GWAS) variants for psychiatric disorders. For schizophrenia, we linked 321 genes to the 142 reported GWAS loci. We then embedded the regulatory network into a deep-learning model to predict psychiatric phenotypes from genotype and expression. Our model gives a ~6-fold improvement in prediction over additive polygenic risk scores. Moreover, it achieves a ~3-fold improvement over additive models, even when the gene expression data are imputed, highlighting the value of having just a small amount of transcriptome data for disease prediction. Lastly, it highlights key genes and pathways associated with disorder prediction, including immunological, synaptic, and metabolic pathways, recapitulating de novo results from more targeted analyses. CONCLUSION Our resource and integrative analyses have uncovered genomic elements and networks in the brain, which in turn have provided insight into the molecular mechanisms underlying psychiatric disorders. Our deep-learning model improves disease risk prediction over traditional approaches and can be extended with additional data types (e.g., microRNA and neuroimaging). A comprehensive functional genomic resource for the adult human brain. The resource forms a three-layer pyramid. The bottom layer includes sequencing datasets for traits, such as schizophrenia. The middle layer represents derived datasets, including functional genomic elements and QTLs. The top layer contains integrated models, which link genotypes to phenotypes. DSPN, Deep Structured Phenotype Network; PC1 and PC2, principal components 1 and 2; ref, reference; alt, alternate; H3K27ac, histone H3 acetylation at lysine 27.