Plasma proteomic associations with genetics and health in the UK BiobankAbstract The Pharma Proteomics Project is a precompetitive biopharmaceutical consortium characterizing the plasma proteomic profiles of 54,219 UK Biobank participants. Here we provide a detailed summary of this initiative, including technical and biological validations, insights into proteomic disease signatures, and prediction modelling for various demographic and health indicators. We present comprehensive protein quantitative trait locus (pQTL) mapping of 2,923 proteins that identifies 14,287 primary genetic associations, of which 81% are previously undescribed, alongside ancestry-specific pQTL mapping in non-European individuals. The study provides an updated characterization of the genetic architecture of the plasma proteome, contextualized with projected pQTL discovery rates as sample sizes and proteomic assay coverages increase over time. We offer extensive insights into trans pQTLs across multiple biological domains, highlight genetic influences on ligand–receptor interactions and pathway perturbations across a diverse collection of cytokines and complement networks, and illustrate long-range epistatic effects of ABO blood group and FUT2 secretor status on proteins with gastrointestinal tissue-enriched expression. We demonstrate the utility of these data for drug discovery by extending the genetic proxied effects of protein targets, such as PCSK9, on additional endpoints, and disentangle specific genes and proteins perturbed at loci associated with COVID-19 susceptibility. This public–private partnership provides the scientific community with an open-access proteomics resource of considerable breadth and depth to help to elucidate the biological mechanisms underlying proteo-genomic discoveries and accelerate the development of biomarkers, predictive models and therapeutics 1 .
Proteomic signatures improve risk prediction for common and rare diseasesFor many diseases there are delays in diagnosis due to a lack of objective biomarkers for disease onset. Here, in 41,931 individuals from the United Kingdom Biobank Pharma Proteomics Project, we integrated measurements of ~3,000 plasma proteins with clinical information to derive sparse prediction models for the 10-year incidence of 218 common and rare diseases (81-6,038 cases). We then compared prediction models developed using proteomic data with models developed using either basic clinical information alone or clinical information combined with data from 37 clinical assays. The predictive performance of sparse models including as few as 5 to 20 proteins was superior to the performance of models developed using basic clinical information for 67 pathologically diverse diseases (median delta C-index = 0.07; range = 0.02-0.31). Sparse protein models further outperformed models developed using basic information combined with clinical assay data for 52 diseases, including multiple myeloma, non-Hodgkin lymphoma, motor neuron disease, pulmonary fibrosis and dilated cardiomyopathy. For multiple myeloma, single-cell RNA sequencing from bone marrow in newly diagnosed patients showed that four of the five predictor proteins were expressed specifically in plasma cells, consistent with the strong predictive power of these proteins. External replication of sparse protein models in the EPIC-Norfolk study showed good generalizability for prediction of the six diseases tested. These findings show that sparse plasma protein signatures, including both disease-specific proteins and protein predictors shared across several diseases, offer clinically useful prediction of common and rare diseases.
Genetic regulation of the human plasma proteome in 54,306 UK Biobank participantsBenjamin B. Sun, Joshua Chiou, Matthew Traylor et al.|bioRxiv (Cold Spring Harbor Laboratory)|2022 Abstract The UK Biobank Pharma Proteomics Project (UKB-PPP) is a collaboration between the UK Biobank (UKB) and thirteen biopharmaceutical companies characterising the plasma proteomic profiles of 54,306 UKB participants. Here, we describe results from the first phase of UKB-PPP, including protein quantitative trait loci (pQTL) mapping of 1,463 proteins that identifies 10,248 primary genetic associations, of which 85% are newly discovered. We also identify independent secondary associations in 92% of cis and 29% of trans loci, expanding the catalogue of genetic instruments for downstream analyses. The study provides an updated characterisation of the genetic architecture of the plasma proteome, leveraging population-scale proteomics to provide novel, extensive insights into trans pQTLs across multiple biological domains. We highlight genetic influences on ligand-receptor interactions and pathway perturbations across a diverse collection of cytokines and complement proteins, and illustrate long-range epistatic effects of ABO blood group and FUT2 secretor status on proteins with gastrointestinal tissue-enriched expression. We demonstrate the utility of these data for drug target discovery by extending the genetic proxied effect of PCSK9 levels on lipid concentrations, cardio- and cerebro-vascular diseases, and additionally disentangle specific genes and proteins perturbed at COVID-19 susceptibility loci. This public-private partnership provides the scientific community with an open-access proteomics resource of unprecedented breadth and depth to help elucidate biological mechanisms underlying genetic discoveries and accelerate the development of novel biomarkers and therapeutics.
Variants in tubule epithelial regulatory elements mediate most heritable differences in human kidney functionProteomic prediction of common and rare diseasesAbstract Background For many diseases there are delays in diagnosis due to a lack of objective biomarkers for disease onset. Whether measuring thousands of proteins offers predictive information across a wide range of diseases is unknown. Methods In 41,931 individuals from the UK Biobank Pharma Proteomics Project (UKB-PPP), we integrated ∼3000 plasma proteins with clinical information to derive sparse prediction models for the 10-year incidence of 218 common and rare diseases (81 – 6038 cases). We compared prediction models based on proteins with a) basic clinical information alone, b) basic clinical information + 37 clinical biomarkers, and c) genome-wide polygenic risk scores. Results For 67 pathologically diverse diseases, a model including as few as 5 to 20 proteins was superior to clinical models (median delta C-index = 0.07; range = 0.02 – 0.31) and to clinical models with biomarkers for 52 diseases. In multiple myeloma, for example, a set of 5 proteins significantly improved prediction over basic clinical information (delta C-index = 0.25 (95% confidence interval 0.20 – 0.29)). At a 5% false positive rate (FPR), proteomic prediction (5 proteins) identified individuals at high risk of multiple myeloma (detection rate (DR) = 50%), non-Hodgkin lymphoma (DR = 55%) and motor neuron disease (DR = 29%). At a 20% FPR, proteomic prediction identified individuals at high-risk for pulmonary fibrosis (DR= 80%) and dilated cardiomyopathy (DR = 75%). Conclusions Sparse plasma protein signatures offer novel, clinically useful prediction of common and rare diseases, through disease-specific proteins and protein predictors shared across multiple diseases. (Funded by Medical Research Council, NIHR, Wellcome Trust.)