A Software Suite for the Generation and Comparison of Peptide Arrays from Sets of Data Collected by Liquid Chromatography-Mass Spectrometry
Abstract
There is an increasing interest in the quantitative proteomic measurement of the protein contents of substantially similar biological samples, e.g. for the analysis of cellular response to perturbations over time or for the discovery of protein biomarkers from clinical samples. Technical limitations of current proteomic platforms such as limited reproducibility and low throughput make this a challenging task. A new LC-MS-based platform is able to generate complex peptide patterns from the analysis of proteolyzed protein samples at high throughput and represents a promising approach for quantitative proteomics. A crucial component of the LC-MS approach is the accurate evaluation of the abundance of detected peptides over many samples and the identification of peptide features that can stratify samples with respect to their genetic, physiological, or environmental origins. We present here a new software suite, SpecArray, that generates a peptide versus sample array from a set of LC-MS data. A peptide array stores the relative abundance of thousands of peptide features in many samples and is in a format identical to that of a gene expression microarray. A peptide array can be subjected to an unsupervised clustering analysis to stratify samples or to a discriminant analysis to identify discriminatory peptide features. We applied the SpecArray to analyze two sets of LC-MS data: one was from four repeat LC-MS analyses of the same glycopeptide sample, and another was from LC-MS analysis of serum samples of five male and five female mice. We demonstrate through these two study cases that the SpecArray software suite can serve as an effective software platform in the LC-MS approach for quantitative proteomics. There is an increasing interest in the quantitative proteomic measurement of the protein contents of substantially similar biological samples, e.g. for the analysis of cellular response to perturbations over time or for the discovery of protein biomarkers from clinical samples. Technical limitations of current proteomic platforms such as limited reproducibility and low throughput make this a challenging task. A new LC-MS-based platform is able to generate complex peptide patterns from the analysis of proteolyzed protein samples at high throughput and represents a promising approach for quantitative proteomics. A crucial component of the LC-MS approach is the accurate evaluation of the abundance of detected peptides over many samples and the identification of peptide features that can stratify samples with respect to their genetic, physiological, or environmental origins. We present here a new software suite, SpecArray, that generates a peptide versus sample array from a set of LC-MS data. A peptide array stores the relative abundance of thousands of peptide features in many samples and is in a format identical to that of a gene expression microarray. A peptide array can be subjected to an unsupervised clustering analysis to stratify samples or to a discriminant analysis to identify discriminatory peptide features. We applied the SpecArray to analyze two sets of LC-MS data: one was from four repeat LC-MS analyses of the same glycopeptide sample, and another was from LC-MS analysis of serum samples of five male and five female mice. We demonstrate through these two study cases that the SpecArray software suite can serve as an effective software platform in the LC-MS approach for quantitative proteomics. The identification and quantification of the protein contents of biological samples plays a crucial role in biological and biomedical research (1Tyers M. Mann M. From genomics to proteomics.Nature. 2003; 422: 193-197Google Scholar, 2Aebersold R. Mann M. Mass spectrometry-based proteomics.Nature. 2003; 422: 198-207Google Scholar, 3Hanash S. Disease proteomics.Nature. 2003; 422: 226-232Google Scholar, 4Boguski M.S. McIntosh M.W. Biomedical informatics for proteomics.Nature. 2003; 422: 233-237Google Scholar). Due to the large dynamic range and the high complexity of most proteomes, it is very challenging to identify and accurately quantify the majority of proteins from such samples. LC-MS/MS-based methods are currently most efficient for the identification of a large number of proteins and have been widely applied in biological and biomedical research (5Washburn M.P. Wolters D. Yates III, J.R. Large-scale analysis of the yeast proteome by multidimensional protein identification technology.Nat. Biotechnol. 2001; 19: 242-247Google Scholar, 6Washburn M.P. Ulaszek R. Deciu C. Schieltz D.M. Yates III, J.R. Analysis of quantitative proteomic data generated via multidimensional protein identification technology.Anal. Chem. 2002; 74: 1650-1657Google Scholar, 7Gygi S.P. Rist B. Gerber S.A. Turecek F. Gelb M.H. Aebersold R. Quantitative analysis of complex protein mixtures using isotope-coded affinity Biotechnol. Scholar). with such methods can accurately quantify proteins S.P. Rist B. Gerber S.A. Turecek F. Gelb M.H. Aebersold R. Quantitative analysis of complex protein mixtures using isotope-coded affinity Biotechnol. Scholar, Aebersold R. Quantitative of proteins using isotope-coded affinity and Biotechnol. 2001; 19: Scholar, B. Mann M. by in as a and accurate approach to expression 2002; Scholar, B. M. Mann M. A to applied to Biotechnol. 2003; Scholar, D. of protein expression and S. Scholar, C. for with two of Chem. 2001; Scholar, B. S. S. S. S. S. S. M. F. protein in using Scholar). a LC-MS/MS-based quantitative proteomic samples to be are and and The peptide samples are by a multidimensional and by are by Mann M. for of large or by M. F. of proteins with Chem. and peptide are in the of for by Yates III, J.R. S. by S. Scholar, R. Mass in 2001; Scholar). are by using an such as Yates III, J.R. approach to data of peptides with in a protein Mass or D.M. protein identification by using to their a protein are by using a quantification software such as Aebersold R. Quantitative of proteins using isotope-coded affinity and Biotechnol. 2001; 19: or Aebersold R. analysis of protein abundance from data generated by and Chem. 2003; that the relative of the to the relative abundance of The identification and quantification of proteins is by the from the peptides that with the protein Aebersold R. analysis of protein abundance from data generated by and Chem. 2003; Scholar, Aebersold R. A for proteins by Chem. 2003; Scholar). The quantitative approach can identify and quantify to thousands of proteins from a biological sample is to analyze There is an increasing interest in the quantitative proteomic measurement of the protein contents of substantially similar samples. the discovery of protein biomarkers from clinical samples S. Disease proteomics.Nature. 2003; 422: 226-232Google Scholar, a serum analysis by multidimensional with 2002; Scholar, Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and and the measurement of the response of and to large of samples to be to to from proteome the the study of cellular response to and are for the identification of patterns of proteins that are by the B. Mann M. analysis of by quantitative Biotechnol. Scholar). these and similar high sample throughput and of the Quantitative is of limited for such large of of complex proteomic samples. peptide a of the peptides present in a sample is by the for D. Aebersold R. A to and data by Chem. Scholar). peptides are substantially similar samples are the of peptides that is in sample with increasing sample the peptides that are detected to be the high abundance peptides that generate many peptides are a it is very to quantitative low abundance proteins samples by the an to LC-MS/MS-based peptide by LC-MS been as a for quantitative Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar, S. S. M. of proteins and by or Chem. 2003; Scholar, D. S. S. platform for proteomic and discovery using Scholar). The LC-MS approach is the that the of peptide in a substantially similar sample identical is to the abundance of the peptide the dynamic range of the and is at to the abundance the dynamic one the relative abundance of a peptide in samples by the samples identical LC-MS and by of the same peptide in LC-MS S. S. M. of proteins and by or Chem. 2003; Scholar, D. S. S. platform for proteomic and discovery using Scholar, D. and relative of protein mixtures by by Chem. 2002; 74: Scholar). The relative abundance is quantitative dynamic range and the The LC-MS approach for quantitative can be in the are and from samples and samples are by an using a high high or identical peptides can be with to accurate and a limited number of can be to identify a of the detected and are applied to peptide peptide relative abundance from LC-MS and identify a of peptides that stratify samples with respect to their genetic, physiological, or environmental the that the of the detected peptides is identical peptides in samples can be by their their and their and of discriminatory peptides is a to identify the of such peptides by and Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar). the relative abundance of thousands of peptides and the of discriminatory this a of discriminatory proteins the present a new software suite, SpecArray, that the of the LC-MS approach for quantitative proteomics. The software suite a set of LC-MS data as and a peptide versus sample array that stores the relative abundance of thousands of peptide features in samples. The format of a peptide array is identical to that of a gene expression that peptide features gene in a peptide D. analysis and of expression S. peptide can be subjected to unsupervised clustering analyses as or to sample to discriminant analyses as or discriminant to identify peptides samples of The SpecArray software suite five software The LC-MS data in to data D. Aebersold R. A to and data by Chem. the high from the to peptide features from the peptide features and the generates a peptide array from peptide features in samples. the of time in a time time of time of two samples and the to peptides of samples. to sample or Aebersold R. analysis of protein abundance from data generated by and Chem. 2003; peptide relative We applied the SpecArray software suite to analyze two sets of LC-MS data: one was from four repeat LC-MS analyses of the same glycopeptide sample, and another was from LC-MS analysis of serum samples of five male and five female mice. by these two study the SpecArray software suite is very for LC-MS data and can serve as an effective software platform in the LC-MS approach for quantitative proteomics. serum samples from five male and five female of the same at the of using a Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar). A sample was from a male at the of using a similar that was in a and at for at peptides from of sample using the glycopeptide as Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar, Aebersold R. and quantification of using and Biotechnol. 2003; Scholar). from of serum or sample by a as Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar). was identical in sample data the format using the R. M. B. B. R. S. R. R. Aebersold R. A of data and to Biotechnol. Scholar). The software LC-MS data. The is in and the five A is generated from by using the A for the Scholar, and Scholar). A is generated from that of by and in the is as the of that are a of the are from using a of and is from the The as by and a a is from the current R. M. B. B. R. S. R. R. Aebersold R. A of data and to Biotechnol. for an format to of an LC-MS of the format are in The format for to and the The the to the range and the time range data are and such as the is can be The software peptide features from LC-MS data as in The the in are with peptide to identify peptide features from the with the most in the the software the of the by the the most is by the of a the software the peptide M.W. of and for large from Mass Scholar, M. R. M. B. of for analysis of complex Mass 2001; and the that the of a peptide be the most of the the software the by that the most is by or of a at one of the the a peptide is from the the the of the A peptide is by and the time of the The and the of the peptide are by the of the cases the of the most be such cases are the one that the of is that are to the peptide are from the peptide can be the most is The identification is for the in the a this the a of peptide features from peptide features be generated by the same peptide in features of in similar a of and the same are with and of a are A is for peptide by of the of the peptide and the in the Aebersold R. analysis of protein abundance from data generated by and Chem. 2003; the is and the in the is features at is the are The peptide features are as the of the peptide is by and the time at the the of the the abundance of the and the at the the of the The software peptide features from samples. The can be in the features of two samples are their are identical and their are a of this peptide time is and a peptide in one sample with peptide features in another The software from peptide the the two LC-MS The is in the time versus time the two The the peptide and and the is The software the peptide and the the the and the and the of a to a for the peptide The the of a peptide a very low a of are peptide that at one peptide the one the is and are peptide have a the of and peptide with one peptide in another this the the two LC-MS analyses is the peptide from the two the software a for peptide in The is similar to that in that the of a peptide to the and the the two peptide features. The peptide are in the same as in this peptide features the two LC-MS analyses are to are for samples peptide features are two samples. features of samples are a peptide the peptide of sample the peptide as a and by the by are in the of are as and in the the the peptide stores peptide features of samples. The software generates a peptide array from a peptide of There are five in the the software a of peptide features from the peptide as to the peptide The software a to samples by and the number of for the peptide a peptide the of at one The software a to a range and a time range for peptide features. peptide features are for the peptide The software generates an peptide versus sample array from peptide features. The of the array is the of samples, the is the of peptide and an array the abundance of the peptide in the a peptide is in a sample, the array is in the peptide the software the peptide in the LC-MS data to the is The time of the in the sample is from of the in samples using the the samples to time peptide are the abundance of the is as in the and in the peptide the array is The software a Aebersold R. analysis of protein abundance from data generated by and Chem. 2003; to peptide abundance that from in sample or LC-MS The software a of peptide features that in at of samples in the peptide a peptide in a sample, the abundance in the sample and abundance samples is The software abundance of peptide features in sample and a for the Aebersold R. analysis of protein abundance from data generated by and Chem. 2003; to a for the abundance in the peptide array is by the the the software the abundance of peptide samples, peptide abundance in samples by the and the relative abundance of the peptide in samples. The of the software is a peptide array the relative abundance of peptide features in samples. features in the peptide array are by their time and the of the samples, a to in a peptide array or to with a very low abundance The peptide array is that it can be by the clustering D. analysis and of expression S. Scholar). the software a to The D. analysis and of expression S. was for unsupervised clustering analysis of peptide array data and samples and peptide features using clustering with The D. analysis and of expression S. was to the clustering in the software suite SpecArray generates a peptide array from a set of LC-MS data in five and is by one of the five software and We in the these five in and their features with data from four repeat LC-MS analyses of an glycopeptide sample that was from of a male Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar, Aebersold R. and quantification of using and Biotechnol. 2003; Scholar). LC-MS data in format R. M. B. B. R. S. R. R. Aebersold R. A of data and to Biotechnol. is by the software to data D. Aebersold R. A to and data by Chem. for an of a the LC-MS approach for peptide it is crucial to high LC-MS data. is for data to the of the and of the such as low peptide or LC-MS can be by D. Aebersold R. A to and data by Chem. Scholar). LC-MS data are data are are high data are from LC-MS data in the are data by the software LC-MS of data. peptide from such data is an in the analysis of LC-MS data. The software such as the A for the Scholar, and to peptide from LC-MS data as the of this two by in was generated from LC-MS data in was from the data. is that the data most peptide the majority of Due to peptide are the in in are in to the of The of the data is that of the data: the from the four repeat LC-MS analyses was in format and in the format in the a one the software to the of LC-MS and the of data a of peptide features are from LC-MS data by the software The LC-MS approach to of peptides and to identify the it is crucial for the LC-MS approach to peptide features from LC-MS data. to that most peptide features are in from peptides as a peptide by peptide and time and to the abundance and the of the A and peptide features from data of the four repeat LC-MS peptide features from one of the analyses the peptide features are with of their most peptide features by the The number of peptide features from the four LC-MS data an of a of and a of of The in the number of peptide features an that LC-MS data are peptide features by the same peptides in features of the four LC-MS analyses can be and of peptide features in one the software to and be the of the peptide features of four LC-MS a of peptide features in the in in in and in of peptide features in the in a of peptides in the in a Aebersold R. analysis of protein abundance from data generated by and Chem. 2003; Scholar). The in the peptide the two from in in peptide and protein with in Chem. 2003; Scholar). The the that peptide are and peptides by an such as and be detected as peptide features D. Aebersold R. A to and data by Chem. Scholar). the of samples and the of the LC-MS peptide features from one LC-MS data features from samples are by the software that the abundance of the same peptide present in samples can be most LC-MS the peptide is from to to a two peptide is a to time two LC-MS analyses and peptide features their and time as the time of peptide features in two LC-MS analyses with the The peptide features and the was and the peptide features was the of the the of peptide features in the two analyses in peptide features with in with in The of most with a in the that the is The the same as that of two is that peptide features of high and high are of low or low The in are most to as by their low A peptide versus sample array is generated from peptide features by the software The peptide array the relative abundance of peptide features in samples and is the of the SpecArray software The of a peptide array from peptide features is We in a of the peptide array that was generated from the four LC-MS features in the peptide array to at two peptide features that analyses are in a peptide peptide features that in analyses be by peptide features can be a a set of peptide features for the peptide There a of peptide features in the peptide array peptide relative abundance four peptide relative abundance and peptide relative abundance of array be by peptide peptide or in low abundance peptides in analyses by or by one peptide features that are samples a peptide of the same peptide in as in the peptide one to features of the same peptide a in the peptide is in to the that features in have features in high be samples, in low be in or many samples. a it is to features The peptide features in the peptide array to peptides one in the peptide and two The of the relative abundance of the same peptide in a of a of and a of the relative abundance of the same peptide in was The of peptide relative abundance in analyses is an in the of the peptide the four repeat the data are in The a of a of and a of the LC-MS approach is able to peptide relative abundance with an of the be detected by the it to the LC-MS approach to discriminatory peptides with abundance in samples. demonstrate the of the LC-MS approach for large sample serum samples from five male and five female of the same at the of that are in the proteins in their using Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar, Aebersold R. and quantification of using and Biotechnol. 2003; Scholar). been that the sample is Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar). glycopeptide samples by an identical We applied the SpecArray software suite to analyze the LC-MS data. four that generated from these data. of samples are very We identical to generate these that the of a peptide the of the The of these that the samples similar peptide The from the sample is that from samples to peptide in the The of LC-MS analyses the of to such The number of peptide features that detected from samples is in The number from to the is The a the number of peptide features and the as by of male and male in peptide features detected from samples of male from samples of female mice. is The number of peptide features detected from the serum samples is to that from the four repeat LC-MS analyses of the the is that of the four repeat The in was to biological or to of of peptide features detected from glycopeptide samples of in a new We four a peptide array from the serum data. The are in the peptide features in at two of the samples The peptide array a of peptide features. of the features samples, and of array be to the low of peptides in the serum samples and biological be to and software The is the for most in this peptide the peptide features in five of the five samples of male or five of the five samples of female or The peptide array a of peptide features. of the features samples, and of array relative abundance in this peptide array is that in the peptide the of the number of peptide features in peptide that generated from the number of peptide features that samples was the same peptide features in peptide the large of array the same was in peptide that generated from the same set of LC-MS data. the software and peptide features samples at a low at a high of peptide features that to a number of samples applied to generate a peptide array from LC-MS data of of of the samples of male and female of the samples of male and female or of the five samples of male or two of the five samples of female or of the five samples of male or two of the five samples of female of the samples of male and female of the five samples of male or two of the five samples of female mice. in a new is to the of the relative abundance of the same peptide in samples from a peptide We in the from the or peptide The samples of male a of and a of the samples of female and with the the four repeat LC-MS analyses of same sample samples of the same of it to features. from sample and sample to the been that from sample are Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and The samples of of a of and a of There was a in the of in with of mice. was to biological male and female mice. it that in peptide abundance to mice. is to this The from four peptide in are in the large in the number of peptide features peptide very similar to the of peptide We an unsupervised clustering analysis of the or peptide array using the Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar). A of the clustering was in in the with respect to their female and with male mice. from four peptide in the male and female the these samples. this that in a peptide array be to two the peptides are in the samples, or and software the peptides to be detected or in the samples. repeat it is to the two origins. The as and such data in the unsupervised clustering that in the or peptide array from the with of the relative abundance in the array in a unsupervised clustering in the new clustering for a of peptide features of the same analysis was applied to peptide in female and female with male that unsupervised clustering analysis can be by biological and by and software The of unsupervised clustering analysis of peptide be with for sample A of peptide features in the or peptide array a peptide features that can male from female applied to the discriminatory of the peptide features from peptide relative abundance in the peptide The are in The five most discriminatory peptides a of and an the and the relative abundance of the most discriminatory peptide in The generated with the same that the of the peptide in samples the peptide the large in the relative abundance of the peptide of the same the relative abundance of male was that of female in this the and the most discriminatory peptides is in The two features a of and a time of is the two features e.g. by of at in for the Scholar). this peptide features that present in one A of peptides a Due to the sample and the in peptide very peptide features a in male from female mice. the peptide array to such discriminatory by their and time and can be for identification Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar). We present here a new software suite, SpecArray, that generates peptide from sets of LC-MS data. We data from four repeat LC-MS analyses of a glycopeptide sample to the features of We that the SpecArray software suite was able to from LC-MS data accurate thousands of peptide features samples. We glycopeptide and LC-MS approach to serum proteins of five male and five female mice. We applied SpecArray to peptide features that male from female mice. We through these two samples that the SpecArray software suite the analysis of LC-MS data and is a very software platform for the LC-MS approach to large protein in quantitative proteomics. The SpecArray software suite in an array format that is identical to that of a gene expression microarray. for a of the LC-MS approach and the SpecArray software suite in quantitative proteomics. new platform of quantitative can be one to the protein contents of a large number of substantially similar samples as in the of time discriminatory peptides and using to identify the the platform a most current proteomic platforms of proteins of biological or The array format it to analysis to the analysis of SpecArray the of biological from LC-MS data. There is that the LC-MS approach and the SpecArray software suite are at their and software are present in peptide sample LC-MS and data analysis is and is to that be sample sample sample M.S. McIntosh M.W. Biomedical informatics for proteomics.Nature. 2003; 422: 233-237Google Scholar). LC-MS are to in peptide time and peptide and to the of peptide features by sample increasing and We have that one a and LC-MS analysis of a samples LC-MS such as and such as and the and the reproducibility of peptide R. D. M. for peptide analysis with an sample and Chem. Scholar). such as the and in peptide S. M.W. B. B. S. and in the analysis of Scholar). these the of LC-MS data. data low abundance or peptide features be to A of peptide features be by the software methods are to low abundance peptide peptide and peptide features that are from a new methods make in peptide the of the LC-MS approach and the of peptide are to with the sample LC-MS and data The SpecArray software suite is in The current the A new for the is software by the SpecArray software suite be an and be at with
Related Papers
No related papers found
Powered by citation graph analysis