Regularized estimation of large-scale gene association networks using graphical Gaussian modelsBACKGROUND: Graphical Gaussian models are popular tools for the estimation of (undirected) gene association networks from microarray data. A key issue when the number of variables greatly exceeds the number of samples is the estimation of the matrix of partial correlations. Since the (Moore-Penrose) inverse of the sample covariance matrix leads to poor estimates in this scenario, standard methods are inappropriate and adequate regularization techniques are needed. Popular approaches include biased estimates of the covariance matrix and high-dimensional regression schemes, such as the Lasso and Partial Least Squares. RESULTS: In this article, we investigate a general framework for combining regularized regression methods with the estimation of Graphical Gaussian models. This framework includes various existing methods as well as two new approaches based on ridge regression and adaptive lasso, respectively. These methods are extensively compared both qualitatively and quantitatively within a simulation study and through an application to six diverse real data sets. In addition, all proposed algorithms are implemented in the R package "parcor", available from the R repository CRAN. CONCLUSION: In our simulation studies, the investigated non-sparse regression methods, i.e. Ridge Regression and Partial Least Squares, exhibit rather conservative behavior when combined with (local) false discovery rate multiple testing in order to decide whether or not an edge is present in the network. For networks with higher densities, the difference in performance of the methods decreases. For sparse networks, we confirm the Lasso's well known tendency towards selecting too many edges, whereas the two-stage adaptive Lasso is an interesting alternative that provides sparser solutions. In our simulations, both sparse and non-sparse methods are able to reconstruct networks with cluster structures. On six real data sets, we also clearly distinguish the results obtained using the non-sparse methods and those obtained using the sparse methods where specification of the regularization parameter automatically means model selection. In five out of six data sets, Partial Least Squares selects very dense networks. Furthermore, for data that violate the assumption of uncorrelated observations (due to replications), the Lasso and the adaptive Lasso yield very complex structures, indicating that they might not be suited under these conditions. The shrinkage approach is more stable than the regression based approaches when using subsampling.
Targeted Quantitative Analysis of Streptococcus pyogenes Virulence Factors by Multiple Reaction MonitoringVinzenz Lange, Johan Malmström, John P. Didion et al.|Molecular & Cellular Proteomics|2008 In many studies, particularly in the field of systems biology, it is essential that identical protein sets are precisely quantified in multiple samples such as those representing differentially perturbed cell states. The high degree of reproducibility required for such experiments has not been achieved by classical mass spectrometry-based proteomics methods. In this study we describe the implementation of a targeted quantitative approach by which predetermined protein sets are first identified and subsequently quantified at high sensitivity reliably in multiple samples. This approach consists of three steps. First, the proteome is extensively mapped out by multidimensional fractionation and tandem mass spectrometry, and the data generated are assembled in the PeptideAtlas database. Second, based on this proteome map, peptides uniquely identifying the proteins of interest, proteotypic peptides, are selected, and multiple reaction monitoring (MRM) transitions are established and validated by MS2 spectrum acquisition. This process of peptide selection, transition selection, and validation is supported by a suite of software tools, TIQAM (Targeted Identification for Quantitative Analysis by MRM), described in this study. Third, the selected target protein set is quantified in multiple samples by MRM. Applying this approach we were able to reliably quantify low abundance virulence factors from cultures of the human pathogen Streptococcus pyogenes exposed to increasing amounts of plasma. The resulting quantitative protein patterns enabled us to clearly define the subset of virulence proteins that is regulated upon plasma exposure. In many studies, particularly in the field of systems biology, it is essential that identical protein sets are precisely quantified in multiple samples such as those representing differentially perturbed cell states. The high degree of reproducibility required for such experiments has not been achieved by classical mass spectrometry-based proteomics methods. In this study we describe the implementation of a targeted quantitative approach by which predetermined protein sets are first identified and subsequently quantified at high sensitivity reliably in multiple samples. This approach consists of three steps. First, the proteome is extensively mapped out by multidimensional fractionation and tandem mass spectrometry, and the data generated are assembled in the PeptideAtlas database. Second, based on this proteome map, peptides uniquely identifying the proteins of interest, proteotypic peptides, are selected, and multiple reaction monitoring (MRM) transitions are established and validated by MS2 spectrum acquisition. This process of peptide selection, transition selection, and validation is supported by a suite of software tools, TIQAM (Targeted Identification for Quantitative Analysis by MRM), described in this study. Third, the selected target protein set is quantified in multiple samples by MRM. Applying this approach we were able to reliably quantify low abundance virulence factors from cultures of the human pathogen Streptococcus pyogenes exposed to increasing amounts of plasma. The resulting quantitative protein patterns enabled us to clearly define the subset of virulence proteins that is regulated upon plasma exposure. A key element of the experimental framework for systems biology is the comprehensive, quantitative measurement of whole biological systems in differentially perturbed states (1Ideker T. Thorsson V. Ranish J.A. Christmas R. Buhler J. Eng J.K. Bumgarner R. Goodlett D.R. Aebersold R. Hood L. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network.Science. 2001; 292: 929-934Crossref PubMed Scopus (1646) Google Scholar). Among the different types of measurements possible, protein quantification is particularly informative because proteins catalyze or control the majority of cellular functions. Currently the most widely applied quantitative proteome analysis technologies consist of the labeling of the samples by stable isotopes, the reproducible separation of complex peptide mixtures, usually by capillary LC, and the identification and quantification of selected peptides by tandem mass spectrometry and sequence database searching (2Ong S.E. Mann M. Mass spectrometry-based proteomics turns quantitative.Nat. Chem. Biol. 2005; 1: 252-262Crossref PubMed Scopus (1309) Google Scholar, 3Aebersold R. Mann M. Mass spectrometry-based proteomics.Nature. 2003; 422: 198-207Crossref PubMed Scopus (5562) Google Scholar). Relative quantitative values are generated by these methods if two or more samples are being compared, and absolute quantification can be achieved if suitable, calibrated reference samples are available (4Gerber S.A. Rush J. Stemman O. Kirschner M.W. Gygi S.P. Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS.Proc. Natl. Acad. Sci. U. S. A. 2003; 100: 6940-6945Crossref PubMed Scopus (1537) Google Scholar). Using such shotgun methods, in each measurement only a fraction of the analytes present in a complex sample is identified and quantified. Peptide ions are selected by the mass spectrometer automatically based on precursor ion signal intensities. Due to a multitude of factors, including interference between analytes and variations in precursor ion spectra, the selection of peptides is not reproducible in consecutive runs in particular for peptides of lower signal intensities. As a critical consequence of this undersampling effect comprehensive analyses of whole systems are not supported by these technologies rendering them poorly suitable for systems biology and other experiments that depend on the comparison of complete or at least reproducible data sets. To overcome these fundamental technical limitations confronting proteomics we have suggested in the past a substantially different approach that emulates successful genomics strategies (5Aebersold R. Constellations in a cellular universe.Nature. 2003; 422: 115-116Crossref PubMed Scopus (78) Google Scholar, 6Domon B. Aebersold R. Mass spectrometry and protein analysis.Science. 2006; 312: 212-217Crossref PubMed Scopus (1593) Google Scholar, 7Kuster B. Schirle M. Mallick P. Aebersold R. Scoring proteomes with proteotypic peptide probes.Nat. Rev. Mol. Cell Biol. 2005; 6: 577-583Crossref PubMed Scopus (303) Google Scholar). It depends on the generation of deep, ideally complete proteome maps followed by the targeted analysis of peptides that collectively represent the proteins that constitute the system under investigation. We have termed peptides that are typically observed in a mass spectrometer and that uniquely a particular protein proteotypic protein multiple reaction Identification for Quantitative Analysis by MRM. proteotypic protein multiple reaction Identification for Quantitative Analysis by MRM. P. Schirle M. Ranish J. B. R. T. B. Aebersold R. of proteotypic peptides for quantitative PubMed Scopus Google Scholar). has been achieved with of this First, we have the PeptideAtlas Mallick P. Eng J. S. J. Aebersold R. The PeptideAtlas 2006; PubMed Scopus Google a of high peptide and software types of including the identification of present data sets for the Ranish J.A. Mallick P. Eng J. M. B. B. Aebersold R. Analysis of the proteome with Biol. 2006; PubMed Scopus Google Mallick P. Eng J.K. A. R. S. Hood L. B. Ranish J.A. B. L. Aebersold R. with the human of peptide by mass Biol. 6: PubMed Google human Mallick P. Eng J.K. A. R. S. Hood L. B. Ranish J.A. B. L. Aebersold R. with the human of peptide by mass Biol. 6: PubMed Google and human plasma Eng J.K. B. R. Aebersold R. 2005; PubMed Scopus Google have been in the Second, multiple reaction monitoring (MRM) has as the of for the targeted and quantification of peptides and in complex B. Aebersold R. and in proteomics data 2006; PubMed Scopus Google Scholar, A. S. reaction monitoring for quantitative proteomic analysis of cellular Natl. Acad. Sci. U. S. A. PubMed Scopus Google Scholar, J. V. R. Aebersold R. B. sensitivity of plasma proteins by multiple reaction monitoring of 6: PubMed Scopus Google Scholar, J. of for quantification of proteins by mass Chem. 2006; PubMed Scopus Google Scholar). sensitivity and a have been achieved mass in The of first at the of the precursor ions and at the of the the increasing the the of transitions and the of peptides in a has been by the of for the transitions J. V. R. Aebersold R. B. sensitivity of plasma proteins by multiple reaction monitoring of 6: PubMed Scopus Google Scholar). these ideally for the quantification of low abundance proteins in complex samples. these technical limitations at present the of the targeted proteomics described are the of target selection and the of the measurement it is critical that for each protein in the targeted protein set the peptides are and that for these peptides the transitions are identified and To this we a suite of software termed TIQAM (Targeted Identification for Quantitative Analysis by with a protein set in the system the PeptideAtlas and other of to the and a of are for experiments MS2 acquisition. TIQAM the data to the selection of the transitions has been by MS2 In this study we TIQAM and for the first the implementation of the targeted proteomics in a First, the proteome in mapped peptide and and a PeptideAtlas with proteome Second, based on this proteome map, were selected, and transitions were validated Third, these transitions were to quantify the protein set in biological samples. We applied the targeted proteomics to study the of virulence factors of the Streptococcus S. pyogenes is a of and the it such as or M.W. of A Rev. PubMed Scopus Google Scholar). can and and such as and A study by the that S. pyogenes of of and of M. The of A 2005; PubMed Scopus Google Scholar). In at least each to S. pyogenes that the pathogen is of and the of S. pyogenes has been extensively and virulence factors that system and have been a comprehensive of the is a systems biology analysis of in A J. 2005; PubMed Scopus Google Scholar). of or the human plasma the of to the S. pyogenes has the to and other present in plasma. To the have the and proteomic upon plasma of S. pyogenes U. T. L. P. The protein of Streptococcus pyogenes is by human 2005; PubMed Scopus Google Scholar). only a of the virulence factors be and quantified on the proteome we applied the targeted proteomics described to S. pyogenes the of virulence proteins exposed to increasing of human plasma. We identified a subset of virulence factors that is clearly upon with plasma and is of particular the of S. The in this study is the cultures of were in of in with with or human plasma at in a The at at of the cultures The cultures were three in and in of the with cell for in a samples were with and and in The proteins were with A for the quantitative of protein in in the of and PubMed Scopus Google Scholar). The protein with of of and of and The the the of were and the sample by The samples were in at The proteins were with for at and with for in the the sample with at to a were by with for at least at The peptides were by to the of the peptides were to a of in separation The separation in the of out as described J. S. M. Aebersold R. peptide separation and identification for mass spectrometry based proteomics 2006; PubMed Scopus Google at a sample of The other of the peptides in and The peptides were on with a as described M. M. P. Aebersold R. for tandem mass spectrometry shotgun proteomics data validation of 2005; PubMed Scopus Google a of of at a of and followed by the separation at a of and were the were and by to the The of the system as described J. S. M. Aebersold R. peptide separation and identification for mass spectrometry based proteomics 2006; PubMed Scopus Google Scholar, Aebersold R. Goodlett D.R. A of peptide tandem mass spectrometry at the on ion mass spectrometer with Mass 2003; PubMed Scopus Google Scholar). analyses were out on a sample the samples were by a of in with a of The system to ion mass spectrometer with a from The data set to to followed by three the three most ions from a were to if the of the precursor ion ion The set to and the of the peptide by from for with a of The mass spectrometer to a ion separation of peptides achieved on system with in with a were by a of in with a of were in the ion each that at with of to for ions with at least two and ions with were for MS2 analysis and from MS2 for The mass spectrometer set to ions for more and ions a of for MS2 The from the and and were to by by were J.K. approach to tandem mass data of peptides with in a protein Mass PubMed Scopus Google the proteome from S. pyogenes complete for of proteins as as such as and human the samples exposed to human plasma the S. pyogenes protein database with the protein database. The with with mass of for the precursor ions and for the as and as The database were the A. Aebersold R. to the of peptide by and database Chem. PubMed Scopus Google to from peptides the between and observed peptide values J. S. M. Aebersold R. peptide separation and identification for mass spectrometry based proteomics 2006; PubMed Scopus Google Scholar). The protein in the data set by the A. Aebersold R. A for identifying proteins by tandem mass Chem. 2003; PubMed Scopus Google Scholar). The S. pyogenes PeptideAtlas the sequence of were in the if peptides in this the for in the is and the sensitivity is at the identified with of The identified were peptide of which were observed at least The majority of the be the peptides only observed The identified peptides to a of proteins of which were only identified with or more observed peptides The of this analysis were the PeptideAtlas database as described Mallick P. Eng J. S. J. Aebersold R. The PeptideAtlas 2006; PubMed Scopus Google and are available for and identified proteins and peptides can be with the protein precursor and The labeling as described A. J. A for quantitative proteomics protein 2005; PubMed Scopus Google with the of S. pyogenes protein as described A for the quantitative of protein in in the of and PubMed Scopus Google and in of The proteins were and as described with the that with The to by the of and the proteins were with for at least at sample with and a of samples with the samples were at a and by to the samples were on the mass spectrometer as described were the A. Eng J. Aebersold R. A proteomics analysis Biol. 2005; PubMed Scopus Google with the with two peptide mass of at the and at as and of as The were by and as described The TIQAM software suite consists of three TIQAM is a for from the TIQAM and other data to transition to TIQAM a to MS2 experiments and TIQAM in the software It is available in for and TIQAM for The TIQAM are available for on the PeptideAtlas data we generated transitions for the proteotypic peptides We the peptides to those in the mass of and not each precursor we the transitions with precursor and and the with precursor were to the and experiments were with transitions on a of samples. were and the three transitions with the were selected for quantitative analysis if the spectrum in with the targeted The validated transitions are in quantitative analysis we of samples as to each sample in a each transition we the transition of the This in transitions peptide and in transitions for peptides of virulence and proteins three peptides only transitions each were The proteins were in the analysis for of sample the of each transition to a the of the software enabled the analysis of transitions in a sample on analyses were on a ion mass spectrometer with a of experiments The to a system for peptide separation a from to at A of in with of validation runs we the spectrum on the two were in the with or with low of of and two Quantitative analyses in were with and in quantification with software for each peptide the of with or in the or transitions were from data analysis and from the transition data were We of The of to for and between the from a of the of a particular transition the samples be to To for we transition to the of the transition samples. To for from protein in the we for sample the of peptides from the proteins and the resulting for the The were to the of and to we that to amounts of human plasma virulence in S. We the present data The were out we of the virulence and proteins in with the of plasma. The experimental is a with on protein We have the factors plasma with and and biological sample with two each we on the of proteotypic peptides and on the of the transitions that the measurements for each transition were each the of the proteotypic and transition were to be the biological sample a the quantitative of protein abundance as the can be as and are and with and and The and are the of protein abundance upon with different amounts of plasma can be as for at least two To the for the targeted proteomics we first generated a S. pyogenes this we peptide separation of of S. pyogenes protein by and The were out by peptide separation in a J. S. M. Aebersold R. peptide separation and identification for mass spectrometry based proteomics 2006; PubMed Scopus Google or by M. M. P. Aebersold R. for tandem mass spectrometry shotgun proteomics data validation of 2005; PubMed Scopus Google Scholar). The resulting were by to or mass a of more spectra, peptide were identified with a of to a of J. S. M. Aebersold R. peptide separation and identification for mass spectrometry based proteomics 2006; PubMed Scopus Google Scholar, A. Aebersold R. to the of peptide by and database Chem. PubMed Scopus Google Scholar). The identified peptides in the identification of proteins with a of A. Aebersold R. A for identifying proteins by tandem mass Chem. 2003; PubMed Scopus Google Scholar). This of the and of peptides identified protein from to peptides the were identified by peptides at least the of a In the S. pyogenes proteome proteins have a or proteins are as In this data set we present protein data the of proteins as the proteome of S. pyogenes The proteins were identified with peptides to of peptides the proteins with or we identified to identification of The whole data set assembled in a PeptideAtlas in a described Mallick P. Eng J. S. J. Aebersold R. The PeptideAtlas 2006; PubMed Scopus Google Scholar, Eng J.K. B. R. Aebersold R. 2005; PubMed Scopus Google and is available for or Analysis of of the identified proteins that the cellular were with the majority of identified proteins being as or the of the S. pyogenes proteome in were for the biological process and and the only a proteins that a in on the we assembled a of M.W. of A Rev. PubMed Scopus Google and virulence factors Quantitative and analysis of these virulence factors upon plasma in the process of in a we these proteins as for In to shotgun in measurements complete ion are signal for selected transitions are transition is by a of a and a resulting in a at the peptide and ion The in measurements high for each transition resulting in high is achieved because of the signal the of transitions is critical for the sensitivity and of the targeted peptides under and uniquely the be The selection of and the validation and of transitions are critical in targeted We a suite of software termed to targeted proteomics in as a subset of the peptides of a is observed by mass spectrometry in a proteomics The selection of these uniquely identifying the proteins of is of critical for a targeted proteomics TIQAM to PeptideAtlas to the peptides based on the of that In TIQAM the of from the to of and In experimental data are available it has been that proteotypic peptides be based on P. Schirle M. Ranish J. B. R. T. B. Aebersold R. of proteotypic peptides for quantitative PubMed Scopus Google Scholar, P. P. A approach protein quantification peptide 2006; PubMed Scopus Google Scholar, P. R. Absolute protein the of and PubMed Scopus Google Scholar). such TIQAM the selection of peptides with of from proteins which have by In to the target peptides, precursor ions and ions to be for on the of the TIQAM a of transitions for the targeted analysis by MRM. the selection of experiments in a transitions be by MS2 spectrum to identification and quantification to from more peptides in the different this validation the is in a the of transitions in is to the of TIQAM maps the resulting and to the of targeted proteins and In database validated by A. Aebersold R. to the of peptide by and database Chem. PubMed Scopus Google be The of with ion spectra, and to the peptide In to the validated peptides, the and the signal are by The transitions are selected for quantitative in a targeted proteomics with a quantitative shotgun is the of validation In proteins of low abundance transitions are TIQAM the of peptides for proteins not identified in the first validation This validation process a peptides are targeted in the first of validation and only the abundance proteins the of more transitions have been the proteotypic transitions be for quantitative analysis of multiple samples. The of virulence factors has been as a for the of particular S. pyogenes a analysis of the of virulence proteins to not we applied TIQAM to target virulence factors based on the the proteome we identified of these virulence proteins that were under the experimental on this in the PeptideAtlas we TIQAM to and transitions for the ions with the precursor ion for and precursor we experiments to these transitions by and observed TIQAM a and We were able to virulence factors by this approach in the sample In we validated transitions for proteins for validated we selected for each peptide the three transitions with the resulting in a of transitions for peptides three peptides we only two that we to quantify the virulence factors in the biological samples. To quantify the targeted virulence factors and the control each sample with a of samples as The resulting transitions each for and were in a This by software that the of transitions to that at a particular only a fraction of the transitions is J. V. R. Aebersold R. B. sensitivity of plasma proteins by multiple reaction monitoring of 6: PubMed Scopus Google Scholar). a of transitions be the to on or To the virulence upon with we S. pyogenes in with different amounts of plasma from to experiments were for the plasma The resulting samples were by MRM. We quantitative data for each targeted peptide from samples the of the approach for quantification of multiple samples In we the peptide of the biological experiments as the available transitions of the The the in the analysis with the for the biological experiments To which proteins in abundance to the in plasma we a to the data for each targeted virulence the for the of for the amounts of human plasma and plasma proteins were protein A and The first three were upon plasma these that three proteins protein and were not and to a upon plasma exposure. The quantitative data different the of two types of and we a in protein with increasing plasma at and to In the protein A in the plasma We applied the analysis to the proteins in the measurements for of these proteins a in protein abundance This the in virulence cell in S. is upon biology experiments in particular the reproducible of quantitative data for sets of proteins from multiple samples. To the of targeted approach with a shotgun we samples of each plasma in on a ion We many of the targeted virulence proteins were reliably identified by database Using the targeted we quantitative data for validated proteins from each of the samples. In only two proteins of the targeted virulence factors were the shotgun approach proteins were identified in a fraction of the samples. be in of the samples for the were by these samples on the mass spectrometer not This the of the targeted approach to and subsequently reliably quantify proteins the of a shotgun of protein identification the targeted approach and shotgun proteomics in a This study a proteomics for targeted quantitative the proteome is mapped and subsequently proteins of are targeted for quantification the and To the of this approach we to software suite with We applied this to study the of virulence protein of S. pyogenes upon to plasma. S. pyogenes of proteins that and M.W. of A Rev. PubMed Scopus Google Scholar). To and it is to S. pyogenes protein in the different proteomics methods for the and quantification of proteins have sensitivity and and are by amounts of human we applied to quantify the of virulence factors by S. pyogenes exposed to increasing amounts of human plasma. We in this study that of the targeted virulence factors are regulated in to human the of S. pyogenes as a for and that the to of the the is and as the of The has been identified as a V. of of by Streptococcus pyogenes PubMed Scopus Google Scholar, A. on A as a A J. Google and in this is by the present that S. pyogenes the of virulence factors upon plasma exposure. It to be the of virulence proteins in S. pyogenes different human of targeted quantitative analyses to on the have to overcome the The key for the successful reproducible quantitative analysis of these samples the of mass in have been widely in the field for quantitative It has been that can be to quantify peptides (4Gerber S.A. Rush J. Stemman O. Kirschner M.W. Gygi S.P. Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS.Proc. Natl. Acad. Sci. U. S. A. 2003; 100: 6940-6945Crossref PubMed Scopus (1537) Google or target A. reaction monitoring to of protein with high 2005; PubMed Scopus Google Scholar, M. T. reaction monitoring as a for identifying protein 2005; Google Scholar). In particular the analysis of which is by the of protein from J. V. R. Aebersold R. B. sensitivity of plasma proteins by multiple reaction monitoring of 6: PubMed Scopus Google Scholar, J. B. S. J. R. of mass spectrometry to protein of in the and of with PubMed Scopus Google Scholar, L. Quantitative mass multiple reaction monitoring for plasma 2006; PubMed Scopus Google Scholar). the that and are key factors for successful quantitative proteomics has not been applied widely in for this is the required for protein to a quantitative In the proteomics the samples are by and the data are subsequently by software under and quantitative In the quantitative analysis be in it is to proteotypic peptides, to and and to these this process is not supported by the available software the generation of transition from proteins or peptides A. reaction monitoring to of protein with high 2005; PubMed Scopus Google Scholar). the TIQAM software this not to the from the or TIQAM the required the of for with a of target the generation of the transition is only the first in The analysis of the validation experiments and the of the data the for a of different software to be to and MS2 spectra, database and of targeted the validated peptides and and the TIQAM is on a database to these different types of and of this in a the process of peptide validation and transition selection is in This is the key for the targeted approach to of proteins and Applying a of by and quantification we a of to set validated and transitions for proteins that are available in a database This to comprehensive of proteotypic transitions the proteins in those in a in proteotypic transition that or proteotypic transitions are the reproducible and quantitative analysis by can be with a of samples in each sample at present to transitions can be MRM. In we a for quantitative proteomics that we applied to quantify virulence factors in cell of Streptococcus a shotgun proteomics approach to proteotypic peptides with for the and quantification of targeted We TIQAM that and the generation and validation of transitions for the we study to the analysis of virulence TIQAM the to target of proteins by we established for the more and quantification of proteins of biological or We for on the labeling and with the with