Large-scale plasma proteomics comparisons through genetics and disease associationsAbstract High-throughput proteomics platforms measuring thousands of proteins in plasma combined with genomic and phenotypic information have the power to bridge the gap between the genome and diseases. Here we performed association studies of Olink Explore 3072 data generated by the UK Biobank Pharma Proteomics Project 1 on plasma samples from more than 50,000 UK Biobank participants with phenotypic and genotypic data, stratifying on British or Irish, African and South Asian ancestries. We compared the results with those of a SomaScan v4 study on plasma from 36,000 Icelandic people 2 , for 1,514 of whom Olink data were also available. We found modest correlation between the two platforms. Although cis protein quantitative trait loci were detected for a similar absolute number of assays on the two platforms (2,101 on Olink versus 2,120 on SomaScan), the proportion of assays with such supporting evidence for assay performance was higher on the Olink platform (72% versus 43%). A considerable number of proteins had genomic associations that differed between the platforms. We provide examples where differences between platforms may influence conclusions drawn from the integration of protein levels with the study of diseases. We demonstrate how leveraging the diverse ancestries of participants in the UK Biobank helps to detect novel associations and refine genomic location. Our results show the value of the information provided by the two most commonly used high-throughput proteomics platforms and demonstrate the differences between them that at times provides useful complementarity.
Evaluation of Large-Scale Proteomics for Prediction of Cardiovascular EventsImportance: Whether protein risk scores derived from a single plasma sample could be useful for risk assessment for atherosclerotic cardiovascular disease (ASCVD), in conjunction with clinical risk factors and polygenic risk scores, is uncertain. Objective: To develop protein risk scores for ASCVD risk prediction and compare them to clinical risk factors and polygenic risk scores in primary and secondary event populations. Design, Setting, and Participants: The primary analysis was a retrospective study of primary events among 13 540 individuals in Iceland (aged 40-75 years) with proteomics data and no history of major ASCVD events at recruitment (study duration, August 23, 2000 until October 26, 2006; follow-up through 2018). We also analyzed a secondary event population from a randomized, double-blind lipid-lowering clinical trial (2013-2016), consisting of individuals with stable ASCVD receiving statin therapy and for whom proteomic data were available for 6791 individuals. Exposures: Protein risk scores (based on 4963 plasma protein levels and developed in a training set in the primary event population); polygenic risk scores for coronary artery disease and stroke; and clinical risk factors that included age, sex, statin use, hypertension treatment, type 2 diabetes, body mass index, and smoking status at the time of plasma sampling. Main Outcomes and Measures: Outcomes were composites of myocardial infarction, stroke, and coronary heart disease death or cardiovascular death. Performance was evaluated using Cox survival models and measures of discrimination and reclassification that accounted for the competing risk of non-ASCVD death. Results: In the primary event population test set (4018 individuals [59.0% women]; 465 events; median follow-up, 15.8 years), the protein risk score had a hazard ratio (HR) of 1.93 per SD (95% CI, 1.75 to 2.13). Addition of protein risk score and polygenic risk scores significantly increased the C index when added to a clinical risk factor model (C index change, 0.022 [95% CI, 0.007 to 0.038]). Addition of the protein risk score alone to a clinical risk factor model also led to a significantly increased C index (difference, 0.014 [95% CI, 0.002 to 0.028]). Among White individuals in the secondary event population (6307 participants; 432 events; median follow-up, 2.2 years), the protein risk score had an HR of 1.62 per SD (95% CI, 1.48 to 1.79) and significantly increased C index when added to a clinical risk factor model (C index change, 0.026 [95% CI, 0.011 to 0.042]). The protein risk score was significantly associated with major adverse cardiovascular events among individuals of African and Asian ancestries in the secondary event population. Conclusions and Relevance: A protein risk score was significantly associated with ASCVD events in primary and secondary event populations. When added to clinical risk factors, the protein risk score and polygenic risk score both provided statistically significant but modest improvement in discrimination.
Ranking microbial metabolomic and genomic links in the NPLinker framework using complementary scoring functionsSpecialised metabolites from microbial sources are well-known for their wide range of biomedical applications, particularly as antibiotics. When mining paired genomic and metabolomic data sets for novel specialised metabolites, establishing links between Biosynthetic Gene Clusters (BGCs) and metabolites represents a promising way of finding such novel chemistry. However, due to the lack of detailed biosynthetic knowledge for the majority of predicted BGCs, and the large number of possible combinations, this is not a simple task. This problem is becoming ever more pressing with the increased availability of paired omics data sets. Current tools are not effective at identifying valid links automatically, and manual verification is a considerable bottleneck in natural product research. We demonstrate that using multiple link-scoring functions together makes it easier to prioritise true links relative to others. Based on standardising a commonly used score, we introduce a new, more effective score, and introduce a novel score using an Input-Output Kernel Regression approach. Finally, we present NPLinker, a software framework to link genomic and metabolomic data. Results are verified using publicly available data sets that include validated links.
Rare variants with large effects provide functional insights into the pathology of migraine subtypes, with and without auraMigraine is a complex neurovascular disease with a range of severity and symptoms, yet mostly studied as one phenotype in genome-wide association studies (GWAS). Here we combine large GWAS datasets from six European populations to study the main migraine subtypes, migraine with aura (MA) and migraine without aura (MO). We identified four new MA-associated variants (in PRRT2, PALMD, ABO and LRRK2) and classified 13 MO-associated variants. Rare variants with large effects highlight three genes. A rare frameshift variant in brain-expressed PRRT2 confers large risk of MA and epilepsy, but not MO. A burden test of rare loss-of-function variants in SCN11A, encoding a neuron-expressed sodium channel with a key role in pain sensation, shows strong protection against migraine. Finally, a rare variant with cis-regulatory effects on KCNK5 confers large protection against migraine and brain aneurysms. Our findings offer new insights with therapeutic potential into the complex biology of migraine and its subtypes.
Comparative Metabologenomics Analysis of Polar ActinomycetesBiosynthetic and chemical datasets are the two major pillars for microbial drug discovery in the omics era. Despite the advancement of analysis tools and platforms for multi-strain metabolomics and genomics, linking these information sources remains a considerable bottleneck in strain prioritisation and natural product discovery. In this study, molecular networking of the 100 metabolite extracts derived from applying the OSMAC approach to 25 Polar bacterial strains, showed growth media specificity and potential chemical novelty was suggested. Moreover, the metabolite extracts were screened for antibacterial activity and promising selective bioactivity against drug-persistent pathogens such as Klebsiella pneumoniae and Acinetobacter baumannii was observed. Genome sequencing data were combined with metabolomics experiments in the recently developed computational approach, NPLinker, which was used to link BGC and molecular features to prioritise strains for further investigation based on biosynthetic and chemical information. Herein, we putatively identified the known metabolites ectoine and chrloramphenicol which, through NPLinker, were linked to their associated BGCs. The metabologenomics approach followed in this study can potentially be applied to any large microbial datasets for accelerating the discovery of new (bioactive) specialised metabolites.